Scale AI logo
SEAL Logo

Humanity's Last Exam

Challenging LLMs at the frontier of human knowledge

Last updated: April 30, 2025

Performance Comparison

1

21.64±1.61Calib Err: 72

1

20.32±1.58Calib Err: 34

1

19.20±1.54Calib Err: 39

2

18.16±1.51Calib Err: 71

2

18.08±1.51Calib Err: 57

2

17.80±1.50Calib Err: 70

7

14.28±1.37Calib Err: 59

7

12.08±1.28Calib Err: 80

8

10.96±1.22Calib Err: 82

8

10.72±1.21Calib Err: 73

11

8.12±1.07Calib Err: 82

11

8.04±1.07Calib Err: 80

11

7.96±1.06Calib Err: 83

11

7.76±1.05Calib Err: 75

11

6.68±0.98Calib Err: 74

11

6.56±0.97Calib Err: 82

11

5.68±0.91Calib Err: 83

15

5.52±0.90Calib Err: 76

15

5.44±0.89Calib Err: 85

15

5.40±0.89Calib Err: 89

17

4.60±0.82Calib Err: 88

17

4.52±0.81Calib Err: 77

18

4.40±0.80Calib Err: 80

18

4.08±0.78Calib Err: 84

21

3.64±0.73Calib Err: 82

24

2.72±0.64Calib Err: 89