We use cookies to enhance your browsing experience, analyze site traffic, and personalize content. By clicking "Accept All", you consent to our use of cookies. You can customize your preferences or reject non-essential cookies.
Learn more about our cookie policyGoogle’s newest open family brings a giant leap in reasoning, coding, and especially autonomous tool use. Two flagship sizes are live on AI·Collab with full AGENTIC tools — memory, knowledge, chat history, and more.
Gemma 4 31B and Gemma 4 26B A4B (instruction-tuned / “thinking” variants) are available on AI·Collab with native function calling enabled — the same AGENTIC experience you know from GPT-5, Claude 4.5+, and Gemini 3. Built on research from Gemini 3, Gemma 4 is Google DeepMind’s push for intelligence per parameter: a dense 31B flagship and a mixture-of-experts 26B model that activates only a fraction of weights per step — so you get frontier-level answers without always paying for a full dense run.
In the picker, look for Google: Gemma 4 31B and Google: Gemma 4 26B A4B. Model IDs include google/gemma-4-31b-it and google/gemma-4-26b-a4b-it — always check the live catalog for exact labels and credits.
On τ2-bench (agentic tool use, retail scenario published by Google), Gemma 4 jumps from single-digit baselines for Gemma 3 to roughly 85–86% for the 26B and 31B instruction models — a step change in how reliably the model can follow multi-step workflows. In AI·Collab, AGENTIC mode means OpenWebUI injects our native tool suite: memories, knowledge bases, chat history, notes, and structured actions — so the model can decide when to recall, search, or organize instead of you clicking everything by hand.
Highlights from Google DeepMind’s published Gemma 4 comparisons versus Gemma 3 27B — rounded for readability. Full methodology and additional benchmarks are in the official model card.
| Benchmark | Gemma 4 31B IT | Gemma 4 26B A4B IT | Gemma 3 27B IT |
|---|---|---|---|
| Arena AI (text) | 1452 | 1441 | 1365 |
| MMMLU (multilingual Q&A) | 85.2% | 82.6% | 67.6% |
| MMMU Pro (multimodal reasoning) | 76.9% | 73.8% | 49.7% |
| AIME 2026 (mathematics) | 89.2% | 88.3% | 20.8% |
| LiveCodeBench v6 (coding) | 80.0% | 77.1% | 29.1% |
| GPQA Diamond (science) | 84.3% | 82.3% | 42.4% |
| τ2-bench — agentic tool use (retail) | 86.4% | 85.5% | 6.6% |
Source: Google DeepMind Gemma 4 overview and model documentation (figures as published; scenarios and dates may be updated by Google).
These links are independent references — useful if you want architecture detail, license terms, or local run options:
Benchmarks and capabilities describe Google’s published evaluations; your results depend on prompt, settings, and task. Pricing and availability are always defined by the in-product model catalog.
Discover how frontier AI models can now manage your memories, search knowledge bases, and access chat history—without you asking.
Read moreA practical guide to selecting and using AI models effectively with AI·Collab.
Read moreCredits explained: real cost data for GPT-5.4, Claude Opus, Gemini 3.1 Pro, Perplexity Sonar Pro and more. See how far 3,000 or 15,000 credits take you.
Read moreGet started today. Access models from OpenAI, Google, Anthropic, Grok and more.
GDPR compliant · Zero data retention · Cancel anytime