EA - Exploring Metaculusâs AI Track Record by Peter Scoblic
The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund
Categories:
Welcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Exploring Metaculusâs AI Track Record, published by Peter Scoblic on May 1, 2023 on The Effective Altruism Forum.By Peter Mühlbacher, Research Scientist at Metaculus, and Peter Scoblic, Director of Nuclear Risk at MetaculusMetaculus is a forecasting platform where an active community of thousands of forecasters regularly make probabilistic predictions on topics of interest ranging from scientific progress to geopolitics. Forecasts are aggregated into a time-weighted median, the âCommunity Predictionâ, as well as the more sophisticated âMetaculus Predictionâ, which weights forecasts based on past performance and extremises in order to compensate for systematic human cognitive biases. Although we feature questions on a wide range of topics, Metaculus focuses on issues of artificial intelligence, biosecurity, climate change and nuclear risk.In this post, we report the results of a recent analysis we conducted exploring the performance of all AI-related forecasts on the Metaculus platform, including an investigation of the factors that enhance or degrade accuracy.Most significantly, in this analysis we found that both the Community and Metaculus Predictions robustly outperform naïve baselines. The recent claim that performance on binary questions is ânear chanceâ requires sampling on only a small subset of the forecasting questions we have posed or on the questionable proposition that a Brier score of 0.207 is akin to a coin flip. Whatâs more, forecasters performed better on continuous questions, as measured by the continuous ranked probability score (CRPS). In sum, both the Community Prediction and the Metaculus Predictionâon both binary and continuous questionsâprovide a clear and useful insight into the future of artificial intelligence, despite not being âperfectâ.Summary FindingsWe reviewed Metaculusâs resolved binary questions (âWhat is the probability that X will happen?â) and resolved continuous questions (âWhat will be the value of X?â) that were related to the future of artificial intelligence. For the purpose of this analysis, we defined AI-related questions as those which belonged to one or more of the following categories: âComputer Science: AI and Machine Learningâ; âComputing: Artificial Intelligenceâ; âComputing: AIâ; and âSeries: Forecasting AI Progress.â This gave us: 64 resolved binary questions (with 10,497 forecasts by 2,052 users) and 88 resolved continuous questions (with 13,683 predictions by 1,114 users). Our review of these forecasts found:Both the community and Metaculus predictions robustly outperform naïve baselines.Analysis showing that the community predictionâs Brier score on binary questions is 0.237 relies on sampling only a small subset of our AI-related questions.Our analysis of all binary AI-related questions finds that the score is actually 0.207 (a point a recent analysis agrees with), which is significantly better than âchanceâ.Forecasters performed better on continuous questions than binary ones.Top-Line ResultsThis chart details the performance of both the Community and Metaculus predictions on binary and continuous questions. Please note that, for all scores, lower is better and that Brier scores, which range from 0 to 1 (where 0 represents oracular omniscience and 1 represents complete anticipatory failure) are roughly comparable to continuous ranked probability scores (CRPS) given the way we conducted our analysis. (For more on scoring methodology, see below.)Brier (binary questions)CRPS (continuous questions)Community Prediction0.2070.096Metaculus Prediction0.1820.103baseline prediction0.250.172Results for Binary QuestionsWe can use Brier scores to measure the quality of a forecast on binary questions. Given that a Brier score is the mean squared error of a forecast, the following things are true:If you alread...
