GLUE Benchmark

Video generation benchmark https://github.com/Vchitect/VBench AIGCBench

https://github.com/NickRiccardi/two-word-test

https://codeforces.com/blog/entry/133874

https://www.swebench.com/

https://linzhiqiu.github.io/papers/naturalbench/?fbclid=IwY2xjawJ1xCpleHRuA2FlbQIxMQABHnFZ6hln8p75Kuz4l9F4Mgow7kzEgS1GKuRYj6q-DlvUAWVVRiyVmW1SvnwQ_aem_y_RMPY4cokQHJk8TpxQwpQ

https://github.com/Baiqi-Li/NaturalBench

trackingAI.org

https://llm-stats.com/

https://lmarena.ai/?leaderboard

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87

Data Agent Benchmark

https://artificialanalysis.ai/

LiveCodeBench và SciCode

Aider polyglot github

https://livebench.ai https://github.com/LiveBench/LiveBench

https://openrouter.ai/rankings

https://arcprize.org/leaderboard?fbclid=IwY2xjawJkGOJleHRuA2FlbQIxMAABHpInxwGwuzaVHnGeNNycEGfhmweu8Xb_aBq5dhGnOHLm1qEbktYZYnqZzNmc_aem_ttSWRTegPXjvOSU1K0DAlg

![[Pasted image 20250410103636.png]]

EQ-Bench - Longform Creative Writing: paper ![EQ-Bench][https://eqbench.com/images/eqbench3-judge-comparison.png]

https://llmbenchmark.kili-technology.com/?_gl=11y0re2j_gcl_au*NzA4OTAwNjM4LjE3NDQ5MjAzNDE.

Judge Comparison