Maison Résultats de la recherche Artificial Analysis

Artificial Analysis

4 mois il y a

Artificial Analysis positions itself as the world’s most cited independent AI model evaluator. The site’s core product is the Artificial Analysis Intelligence Index v3.0, a composite score that ranks frontier language models by running the same ten rigorous benchmarks on every model in its own GPU cluster. The benchmarks—MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard and τ²-Bench Telecom—cover reasoning, knowledge, coding, agentic terminal use, long-context retrieval, instruction-following and competition math. Each model is tested on identical hardware with identical prompts; no vendor-supplied numbers are accepted, guaranteeing neutrality. Results are published as sortable, filterable tables that show an overall Intelligence Index plus separate Coding and Agentic sub-indexes, letting users see at a glance which model is “smartest” for a given domain. Beyond raw intelligence, the platform continuously measures real-world API performance: output tokens per second, time-to-first-token (including reasoning-model “thinking” time), end-to-end latency for a 500-token response, and USD cost to run the full benchmark suite. Price curves and token-efficiency plots are provided so buyers can balance capability against budget. The data set is released under CC-BY-4.0, encouraging journalists, researchers and enterprises to cite or embed the findings. In short, Artificial Analysis is a one-stop, vendor-agnostic dashboard that turns opaque vendor claims into transparent, reproducible numbers, updated whenever a new model appears.