dmxbd
  • Chatbot Arena
    Operated by LMYSY Org, composed of institutions such as the University of California, Berkeley, is an anonymous big model competition platform. It adopts a blind testing mechanism. Users have an anonymous dialogue with the model and then vote. It uses an ELO mechanism to calculate model scores and generate authoritative rankings. Covering 190+ models around the world, covering general dialogue, mathematical reasoning, programming and other capabilities, it is currently recognized as a fair and authoritative model performance evaluation tool in the industry.
  • Open LLM rankings
    Open LLM Leaderboard is an open source large language model (LLM) performance evaluation platform launched by Hugging Face, which aims to compare the capabilities of different models through unified benchmark testing. The rankings use multiple high-quality data sets (such as MMLU-Pro, GPQA, BBH, etc.) to evaluate the performance of models in tasks such as knowledge, reasoning, mathematics, and instruction compliance to ensure comprehensiveness and fairness of the evaluation.
  • SuperCLUE Rating List
    SuperCLUE (Chinese General Large Model Evaluation Benchmark) is a comprehensive evaluation benchmark for Chinese General Large Models launched by the CLUE team to evaluate the understanding and reasoning capabilities of models in complex scenarios. It is a further development of CLUE (The Chinese Language Understanding Evaluation) in the era of general artificial intelligence. Since its official release on May 9, 2023, it has become the authoritative comprehensive evaluation benchmark for general large models in China.
  • CompassRank Reviews
    The authoritative large model evaluation platform launched by the Shanghai Artificial Intelligence Laboratory covers seven core areas including language, reasoning, knowledge, code, mathematics, command following, and agents, and is subdivided into more than ten specific tasks to ensure the accuracy and comprehensiveness of the evaluation results. CompassRank adopts bilingual evaluation benchmarks in both Chinese and English, and combines innovative circular evaluation strategies to ensure objectivity and fairness of evaluation.
  • Creation-MMBench
    Zhejiang University, in conjunction with the Shanghai AI Laboratory, Tongji, Nanjing University, East China Normal University, Jiao Tong University, Hong Kong Chinese and other teams, has released the world's first multimodal creativity evaluation benchmark for real scenarios, covering four major task categories and 51 fine-grained tasks, using 765 highly difficult test cases and a dual evaluation system to ensure the fairness and consistency of the evaluation and provide comprehensive physical examination for MLLMs '"visual creative intelligence."
  • aicpb.com product list
    The "AI Product List aicpb.com" list is dominated by the AI Product List and is regularly published by 30+ media in the AI field. It is also the most cited AI product list by authoritative media and authoritative experts such as China Newsweek, China Foundation News, Tencent Technology, Snowball, South China Morning Post, Phoenix Technology. This list is issued once a month. In addition to the website list (Global, China) and application list (Global, China), there are also general shipping lists, growth and deceleration lists, review lists, mobile phone lists, etc.
The all-time list
More +