Laden...
Exported from 10 run(s) (50 norms).
| Rank | Modell↕ | Score?Durchschnittliche Textähnlichkeit zur Musterlösung (0-100%), gemessen über normalisierte Levenshtein-Distanz.↓ | Net Correctness?Net Correctness Index: (korrekt - inkorrekt) / gesamt × 100.↕ | Kalibrierung?Wie oft das Modell die richtige Entscheidung trifft zu antworten vs. sich zu enthalten.↕ | Kosten?Gesamtkosten in USD für alle API-Aufrufe dieses Modells.↕ |
|---|---|---|---|---|---|
| 1 | Claude Opus 4.5Anthropic | 61.92% | -14.0✓21 ✗28 ○1 | 44.0% | $0.2763(0.14x) |
| 2 | Gemini 3 Pro PreviewGoogle | 61.57% | -26.0✓18 ✗31 ○1 | 38.0% | $1.6950(0.83x) |
| 3 | Gemini 3 Flash PreviewGoogle | 60.23% | -24.0✓19 ✗31 ○0 | 38.0% | $0.0215(0.01x) |
| 4 | Gemini 2.5 ProGoogle | 51.39% | -30.0✓17 ✗32 ○1 | 36.0% | $0.7644(0.38x) |
| 5 | GPT-5OpenAI | 46.34% | -12.0✓8 ✗14 ○28 | 70.0% | $2.0327(1.00x) |
| 6 | Mistral Large 2512Mistralai | 46.03% | -16.0✓13 ✗21 ○16 | 56.0% | $0.0123(0.01x) |
| 7 | GPT-5.1OpenAI | 45.94% | 16.0✓10 ✗2 ○38 | 94.0% | $0.6596(0.32x) |
| 8 | o3OpenAI | 45.28% | -26.0✓7 ✗20 ○23 | 58.0% | $1.3632(0.67x) |
| 9 | Claude Opus 4.1Anthropic | 45.18% | -28.0✓9 ✗23 ○18 | 54.0% | $0.7597(0.37x) |
| 10 | Claude Opus 4Anthropic | 45.08% | -28.0✓8 ✗22 ○20 | 56.0% | $0.7636(0.38x) |
| 11 | GPT-5.2 ThinkingOpenAI | 43.17% | 18.0✓10 ✗1 ○39 | 98.0% | $0.7397(0.36x) |
| 12 | Llama 4 MaverickMeta Llama | 42.68% | -64.0✓9 ✗41 ○0 | 18.0% | $0.0079(0.00x) |
| 13 | GPT-5.2 ChatOpenAI | 40.72% | 6.0✓10 ✗7 ○33 | 86.0% | $0.3319(0.16x) |
| 14 | Grok 4xAI | 37.23% | -46.0✓10 ✗33 ○7 | 34.0% | $1.5813(0.78x) |
| 15 | Kimi K2 ThinkingMoonshot AI | 35.58% | -56.0✓8 ✗36 ○6 | 28.0% | $0.1544(0.08x) |
| 16 | DeepSeek-V3.2DeepSeek | 35.11% | -38.0✓6 ✗25 ○19 | 48.0% | $0.0069(0.00x) |
| 17 | GPT-4.1OpenAI | 34.28% | -50.0✓6 ✗31 ○13 | 38.0% | $0.0518(0.03x) |
| 18 | Gemini 2.5 FlashGoogle | 33.65% | -58.0✓9 ✗38 ○3 | 24.0% | $0.0112(0.01x) |
| 19 | Qwen3 MaxQwen | 26.71% | -50.0✓5 ✗30 ○15 | 40.0% | $0.0413(0.02x) |
| 20 | Grok 4.1 FastxAI | 21.61% | -88.0✓0 ✗44 ○6 | 12.0% | $0.0334(0.02x) |
| Gesamt (20 Modelle) | $11.3078(Σ$19.3460) |