Change the repository type filter
All
Repositories list
35 repositories
opencompass
PublicOpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.VLMEvalKit
PublicOpen-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarksCompassVerifier
PublicMMBench-GUI
PublicOfficial repo of "MMBench-GUI: Hierarchical Multi-Platform Evaluation Framework for GUI Agents". It can be used to evaluate a GUI agent with a hierarchical manner across multiple platforms, including Windows, Linux, macOS, iOS, Android and Web.Creation-MMBench
PublicCompassJudger
PublicSAGA
PublicRaML
PublicBotChat
PublicAda-LEval
PublicMathBench
PublicMMBench
PublicProSA
PublicANAH
PublicGTA
PublicGPassK
Public[ACL 2025] Are Your LLMs Capable of Stable Reasoning?oc_doc_website
PublicGAOKAO-Eval
PublicCriticEval
Publiclagent-cibench
Publichinode
Publicstorage
PublicCompassBench
PublicCIBench
Public.github
PublicDevEval
PublicCodeBench
PublicT-Eval
Publichuman-eval
PublicOpenFinData
Public