Frontier Leaderboards
Legacy Leaderboards
2025 Scale AI. All rights reserved.
Humanity's Last Exam
Challenging LLMs at the frontier of human knowledge
Last updated: April 30, 2025
Performance Comparison
1
21.64±1.61Calib Err: 72
1
20.32±1.58Calib Err: 34
1
19.20±1.54Calib Err: 39
2
18.16±1.51Calib Err: 71
2
18.08±1.51Calib Err: 57
2
17.80±1.50Calib Err: 70
7
14.28±1.37Calib Err: 59
7
12.08±1.28Calib Err: 80
8
10.96±1.22Calib Err: 82
8
10.72±1.21Calib Err: 73
11
8.12±1.07Calib Err: 82
11
8.04±1.07Calib Err: 80
11
7.96±1.06Calib Err: 83
11
7.76±1.05Calib Err: 75
11
6.68±0.98Calib Err: 74
11
6.56±0.97Calib Err: 82
11
5.68±0.91Calib Err: 83
15
5.52±0.90Calib Err: 76
15
5.44±0.89Calib Err: 85
15
5.40±0.89Calib Err: 89
17
4.60±0.82Calib Err: 88
17
4.52±0.81Calib Err: 77
18
4.40±0.80Calib Err: 80
18
4.08±0.78Calib Err: 84
21
3.64±0.73Calib Err: 82
24
2.72±0.64Calib Err: 89