Artificial Analysis

3 mois il y a

Écrire un avis

Vous devez Se connecter ou Registre publier un avis

Artificial Analysis positions itself as the world’s most cited independent AI model evaluator. The site’s core product is the Artificial Analysis Intelligence Index v3.0, a composite score that ranks frontier language models by running the same ten rigorous benchmarks on every model in its own GPU cluster. The benchmarks—MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME 2025, IFBench, AA-LCR, Terminal-Bench Hard and τ²-Bench Telecom—cover reasoning, knowledge, coding, agentic terminal use, long-context retrieval, instruction-following and competition math. Each model is tested on identical hardware with identical prompts; no vendor-supplied numbers are accepted, guaranteeing neutrality. Results are published as sortable, filterable tables that show an overall Intelligence Index plus separate Coding and Agentic sub-indexes, letting users see at a glance which model is “smartest” for a given domain. Beyond raw intelligence, the platform continuously measures real-world API performance: output tokens per second, time-to-first-token (including reasoning-model “thinking” time), end-to-end latency for a 500-token response, and USD cost to run the full benchmark suite. Price curves and token-efficiency plots are provided so buyers can balance capability against budget. The data set is released under CC-BY-4.0, encouraging journalists, researchers and enterprises to cite or embed the findings. In short, Artificial Analysis is a one-stop, vendor-agnostic dashboard that turns opaque vendor claims into transparent, reproducible numbers, updated whenever a new model appears.

Ajouter aux favoris
Signaler un abus
Copyright © 2025 CogAINav.com. Tous droits réservés.
fr_FRFrench