EA - Exploring Metaculus’s AI Track Record by Peter Scoblic

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categories:

Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Exploring Metaculus’s AI Track Record, published by Peter Scoblic on May 1, 2023 on The Effective Altruism Forum.By Peter Mühlbacher, Research Scientist at Metaculus, and Peter Scoblic, Director of Nuclear Risk at MetaculusMetaculus is a forecasting platform where an active community of thousands of forecasters regularly make probabilistic predictions on topics of interest ranging from scientific progress to geopolitics. Forecasts are aggregated into a time-weighted median, the “Community Prediction”, as well as the more sophisticated “Metaculus Prediction”, which weights forecasts based on past performance and extremises in order to compensate for systematic human cognitive biases. Although we feature questions on a wide range of topics, Metaculus focuses on issues of artificial intelligence, biosecurity, climate change and nuclear risk.In this post, we report the results of a recent analysis we conducted exploring the performance of all AI-related forecasts on the Metaculus platform, including an investigation of the factors that enhance or degrade accuracy.Most significantly, in this analysis we found that both the Community and Metaculus Predictions robustly outperform naïve baselines. The recent claim that performance on binary questions is “near chance” requires sampling on only a small subset of the forecasting questions we have posed or on the questionable proposition that a Brier score of 0.207 is akin to a coin flip. What’s more, forecasters performed better on continuous questions, as measured by the continuous ranked probability score (CRPS). In sum, both the Community Prediction and the Metaculus Prediction—on both binary and continuous questions—provide a clear and useful insight into the future of artificial intelligence, despite not being “perfect”.Summary FindingsWe reviewed Metaculus’s resolved binary questions (“What is the probability that X will happen?”) and resolved continuous questions (“What will be the value of X?”) that were related to the future of artificial intelligence. For the purpose of this analysis, we defined AI-related questions as those which belonged to one or more of the following categories: “Computer Science: AI and Machine Learning”; “Computing: Artificial Intelligence”; “Computing: AI”; and “Series: Forecasting AI Progress.” This gave us: 64 resolved binary questions (with 10,497 forecasts by 2,052 users) and 88 resolved continuous questions (with 13,683 predictions by 1,114 users). Our review of these forecasts found:Both the community and Metaculus predictions robustly outperform naïve baselines.Analysis showing that the community prediction’s Brier score on binary questions is 0.237 relies on sampling only a small subset of our AI-related questions.Our analysis of all binary AI-related questions finds that the score is actually 0.207 (a point a recent analysis agrees with), which is significantly better than “chance”.Forecasters performed better on continuous questions than binary ones.Top-Line ResultsThis chart details the performance of both the Community and Metaculus predictions on binary and continuous questions. Please note that, for all scores, lower is better and that Brier scores, which range from 0 to 1 (where 0 represents oracular omniscience and 1 represents complete anticipatory failure) are roughly comparable to continuous ranked probability scores (CRPS) given the way we conducted our analysis. (For more on scoring methodology, see below.)Brier (binary questions)CRPS (continuous questions)Community Prediction0.2070.096Metaculus Prediction0.1820.103baseline prediction0.250.172Results for Binary QuestionsWe can use Brier scores to measure the quality of a forecast on binary questions. Given that a Brier score is the mean squared error of a forecast, the following things are true:If you alread...

Visit the podcast's native language site