Work in Progress

BGB 40/10

Exported from 14 run(s) (50 norms).

Modellauswahl

10 / 25 Modelle

ODER innerhalb einer Kategorie, UND zwischen Kategorien. z.B. Open Source + Europa → nur europäische Open-Source-Modelle.

Grösse
Typ
Region
Anbieter

Score-Tabelle

Rank
Modell
Score?
Net Correctness?
Kalibrierung?
Halluzinationsrate?
VerteilungTP / FP / TN / FN
1
Gemini 3 Pro PreviewGoogle
61.57%
-13.018311
38.0%96.4%
183110
2
Gemini 3 Flash PreviewGoogle
60.23%
-12.019310
38.0%96.6%
193100
3
Claude Opus 4.6Anthropic
59.36%
-15.016313
38.0%87.1%
163130
4
GPT-5OpenAI
46.34%
-6.081428
70.0%32.5%
814271
5
Mistral Large 2512MistralAI
46.03%
-8.0132116
56.0%55.9%
1321151
6
GPT-5.4OpenAI
41.24%
6.010436
92.0%9.8%
104360
7
Grok 4xAI
37.23%
-23.010337
34.0%82.5%
103370
8
DeepSeek-V3.2DeepSeek
35.11%
-19.062519
48.0%52.4%
625181
9
GPT-4.1OpenAI
34.28%
-25.063113
38.0%69.8%
631130
10
GPT-3.5OpenAI
16.78%
-47.00473
6.0%93.8%
04730
Gesamt (10 Modelle)
LandDE
GesetzBGB
Normen50
Modelle25