Foundation Models

2026-02-09

Quick index of official model cards / system cards for major frontier (closed) and open(-weights) model families. Each entry includes the standard BibTeX citation following our BibTeX guide. Models are grouped by provider, and different model generations have separate entries. When citing a foundation model in a paper (e.g., as a baseline or backbone), use the canonical citation from this page — see the paper writing guide for more details.

OpenAI

GPT-3 — Language Models are Few-Shot Learners

05/2020 · OpenAI · brown2020language

Scaled autoregressive LM to 175B parameters, demonstrating strong few-shot performance across NLP tasks. Seminal because it revealed emergent in-context learning from scale, shifting the NLP paradigm from fine-tuning to prompting

Page Paper
@inproceedings{brown2020language,
  title     = {{Language Models are Few-Shot Learners}},
  author    = {Brown, T. and Mann, B. and Ryder, N. and Subbiah, M. and Kaplan, J. and Dhariwal, P. and others},
  booktitle = {{Advances in Neural Information Processing Systems (NeurIPS)}},
  year      = {2020}
}

InstructGPT — Training Language Models to Follow Instructions with Human Feedback

03/2022 · OpenAI · ouyang2022training

Introduced RLHF to align GPT-3 with human intent, forming the basis for ChatGPT. Seminal because it established the instruction-following alignment paradigm adopted by virtually every subsequent chat model

Page Paper
@inproceedings{ouyang2022training,
  title     = {{Training Language Models to Follow Instructions with Human Feedback}},
  author    = {Ouyang, L. and Wu, J. and Jiang, X. and Almeida, D. and Wainwright, C. and Mishkin, P. and others},
  booktitle = {{Advances in Neural Information Processing Systems (NeurIPS)}},
  year      = {2022}
}

GPT-4 — GPT-4 Technical Report

03/2023 · OpenAI · openai2023gpt4

Multimodal large language model achieving human-level performance on professional and academic benchmarks. Seminal because it defined the frontier multimodal LLM benchmark and catalyzed mainstream enterprise and developer adoption of LLMs

Page Paper
@misc{openai2023gpt4,
  title         = {{GPT-4 Technical Report}},
  author        = {OpenAI},
  year          = {2023},
  eprint        = {2303.08774},
  archivePrefix = {arXiv}
}

GPT-4o — GPT-4o System Card

08/2024 · OpenAI · openai2024gpt4o

Omnimodal model natively processing text, audio, image, and video; also covers GPT-4o mini

Page Paper
@misc{openai2024gpt4o,
  title         = {{GPT-4o System Card}},
  author        = {OpenAI},
  year          = {2024},
  eprint        = {2410.21276},
  archivePrefix = {arXiv}
}

o1 — OpenAI o1 System Card

12/2024 · OpenAI · openai2024o1

Reasoning-focused LRM trained with large-scale reinforcement learning to perform extended chain-of-thought before responding

Page Paper
@misc{openai2024o1,
  title         = {{OpenAI o1 System Card}},
  author        = {OpenAI},
  year          = {2024},
  eprint        = {2412.16720},
  archivePrefix = {arXiv}
}

o3-mini — OpenAI o3-mini System Card

01/2025 · OpenAI · openai2025o3mini

Smaller reasoning model with adjustable reasoning effort; no arXiv paper

Page Paper
@misc{openai2025o3mini,
  title = {{OpenAI o3-mini System Card}},
  author = {OpenAI},
  year  = {2025},
  url   = {https://openai.com/index/o3-mini-system-card/},
  note  = {Accessed February 9, 2026}
}

GPT-4.5 — GPT-4.5 System Card

02/2025 · OpenAI · openai2025gpt45

Largest pre-trained model focused on broad knowledge and reduced hallucinations; research preview emphasizing unsupervised learning scale

Page Paper
@misc{openai2025gpt45,
  title  = {{GPT-4.5 System Card}},
  author = {OpenAI},
  year   = {2025},
  url    = {https://openai.com/index/gpt-4-5-system-card/},
  note   = {Accessed February 9, 2026}
}

o3 — OpenAI o3 and o4-mini System Card

04/2025 · OpenAI · openai2025o3

Most powerful reasoning model with full tool use (browsing, code, images); first system card under Preparedness Framework v2

Page Paper
@misc{openai2025o3,
  title  = {{OpenAI o3 and o4-mini System Card}},
  author = {OpenAI},
  year   = {2025},
  url    = {https://openai.com/index/o3-o4-mini-system-card/},
  note   = {Accessed February 9, 2026}
}

o4-mini — OpenAI o3 and o4-mini System Card

04/2025 · OpenAI · openai2025o4mini

Cost-efficient reasoning model excelling at math and coding; achieves 99.5% on AIME 2025 with tool use

Page Paper
@misc{openai2025o4mini,
  title  = {{OpenAI o3 and o4-mini System Card}},
  author = {OpenAI},
  year   = {2025},
  url    = {https://openai.com/index/o3-o4-mini-system-card/},
  note   = {Accessed February 9, 2026}
}

GPT-5 — GPT-5 System Card

08/2025 · OpenAI · openai2025gpt5

Unified system with a real-time router dispatching across fast (main) and deep-reasoning (thinking) sub-models, replacing GPT-4o and o3; significant reduction in hallucinations and sycophancy

Page Paper
@misc{openai2025gpt5,
  title         = {{GPT-5 System Card}},
  author        = {OpenAI},
  year          = {2025},
  eprint        = {2601.03267},
  archivePrefix = {arXiv}
}

GPT-5.2 — GPT-5.2 System Card

12/2025 · OpenAI · openai2025gpt52

Most capable model for professional knowledge work; achieves 100% on AIME 2025 competition math

Page
@misc{openai2025gpt52,
  title  = {{GPT-5.2 System Card}},
  author = {OpenAI},
  year   = {2025},
  url    = {https://openai.com/index/introducing-gpt-5-2/},
  note   = {Accessed February 9, 2026}
}

Meta

LLaMA — LLaMA: Open and Efficient Foundation Language Models

02/2023 · Meta · touvron2023llama

First open-weights LLM family (7B--65B) competitive with much larger proprietary models. Seminal because it kicked off the open-weights movement, enabling the entire ecosystem of community fine-tuning and open LLM research

Page HF Paper
@misc{touvron2023llama,
  title         = {{LLaMA: Open and Efficient Foundation Language Models}},
  author        = {Touvron, H. and Lavril, T. and Izacard, G. and Martinet, X. and Lachaux, M. and Lacroix, T. and others},
  year          = {2023},
  eprint        = {2302.13971},
  archivePrefix = {arXiv}
}

Llama 2 — Llama 2: Open Foundation and Fine-Tuned Chat Models

07/2023 · Meta · touvron2023llama2

Open-weights 7B--70B models with RLHF chat variants, widely adopted for fine-tuning

Page HF Paper
@misc{touvron2023llama2,
  title         = {{Llama 2: Open Foundation and Fine-Tuned Chat Models}},
  author        = {Touvron, H. and Martin, L. and Stone, K. and Albert, P. and Almahairi, A. and Babaei, Y. and others},
  year          = {2023},
  eprint        = {2307.09288},
  archivePrefix = {arXiv}
}

Llama 3.1 — The Llama 3 Herd of Models

07/2024 · Meta · grattafiori2024llama

Canonical paper for the Llama 3 family (8B, 70B, 405B); covers pre-training, post-training, multimodal, and safety

Page HF Paper
@misc{grattafiori2024llama,
  title         = {{The Llama 3 Herd of Models}},
  author        = {Grattafiori, A. and Dubey, A. and Jauhri, A. and Pandey, A. and Kadian, A. and Al-Dahle, A. and others},
  year          = {2024},
  eprint        = {2407.21783},
  archivePrefix = {arXiv}
}

Llama 3.2 — Llama 3.2 Model Card

09/2024 · Meta · meta2024llama32

Lightweight text models (1B, 3B) and multimodal vision-language models (11B, 90B); no standalone paper, cite the Herd paper or model card

Page HF
@misc{meta2024llama32,
  title  = {{Llama 3.2: Revolutionizing Edge AI and Vision with Open, Customizable Models}},
  author = {{AI@Meta}},
  year   = {2024},
  url    = {https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices/},
  note   = {Accessed February 9, 2026}
}

Llama 3.3 — Llama 3.3 Model Card

12/2024 · Meta · meta2024llama33

70B instruction-tuned model matching Llama 3.1 405B quality at lower cost; no standalone paper

Page HF
@misc{meta2024llama33,
  title  = {{Llama 3.3 Model Card}},
  author = {{AI@Meta}},
  year   = {2024},
  url    = {https://github.com/meta-llama/llama-models/blob/main/models/llama3_3/MODEL_CARD.md},
  note   = {Accessed February 9, 2026}
}

Llama 4 — Llama 4 Model Card

04/2025 · Meta · meta2025llama4

Natively multimodal MoE models (Scout 17Bx16E, Maverick 17Bx128E); no technical paper published

Page HF
@misc{meta2025llama4,
  title  = {{Llama 4 Model Card}},
  author = {{AI@Meta}},
  year   = {2025},
  url    = {https://github.com/meta-llama/llama-models/blob/main/models/llama4/MODEL_CARD.md},
  note   = {Accessed February 9, 2026}
}

Anthropic

Claude 3 — The Claude 3 Model Family: Opus, Sonnet, Haiku

03/2024 · Anthropic · anthropic2024claude3

Three-tier multimodal model family achieving state-of-the-art on GPQA, MMLU, and MMMU

Page Paper
@misc{anthropic2024claude3,
  title  = {{The Claude 3 Model Family: Opus, Sonnet, Haiku}},
  author = {Anthropic},
  year   = {2024},
  url    = {https://www.anthropic.com/news/claude-3-family},
  note   = {Accessed February 9, 2026}
}

Claude 3.5 — The Claude 3.5 Model Family Addendum

10/2024 · Anthropic · anthropic2024claude35

Updated Claude 3 model card with Claude 3.5 Sonnet and Haiku evaluations

Page Paper
@misc{anthropic2024claude35,
  title  = {{The Claude Model Spec and Evaluations Addendum}},
  author = {Anthropic},
  year   = {2024},
  url    = {https://www.anthropic.com/news/claude-3-5-sonnet},
  note   = {Accessed February 9, 2026}
}

Claude 3.7 Sonnet — Claude 3.7 Sonnet System Card

02/2025 · Anthropic · anthropic2025claude37

First hybrid reasoning model from Anthropic with configurable extended thinking (up to 128K tokens); visible chain-of-thought and dual-mode operation

Page Paper
@misc{anthropic2025claude37,
  title  = {{Claude 3.7 Sonnet System Card}},
  author = {Anthropic},
  year   = {2025},
  url    = {https://www.anthropic.com/claude-3-7-sonnet-system-card},
  note   = {Accessed February 9, 2026}
}

Claude Opus 4 — Claude 4 System Card

05/2025 · Anthropic · anthropic2025opus4

Most powerful Anthropic model capable of autonomous multi-hour workflows; deployed under AI Safety Level 3 Standard

Page Paper
@misc{anthropic2025opus4,
  title  = {{Claude 4 System Card}},
  author = {Anthropic},
  year   = {2025},
  url    = {https://www.anthropic.com/claude-4-system-card},
  note   = {Accessed February 9, 2026}
}

Claude Sonnet 4 — Claude 4 System Card

05/2025 · Anthropic · anthropic2025sonnet4

General-purpose successor to Sonnet 3.7 with improved coding and hybrid thinking; deployed under AI Safety Level 2 Standard

Page Paper
@misc{anthropic2025sonnet4,
  title  = {{Claude 4 System Card}},
  author = {Anthropic},
  year   = {2025},
  url    = {https://www.anthropic.com/claude-4-system-card},
  note   = {Accessed February 9, 2026}
}

Claude Opus 4.5 — Claude Opus 4.5 System Card

11/2025 · Anthropic · anthropic2025opus45

State-of-the-art for coding, agents, and computer use; strong at real-world software engineering, deep research, and agentic workflows

Page Paper
@misc{anthropic2025opus45,
  title  = {{Claude Opus 4.5 System Card}},
  author = {Anthropic},
  year   = {2025},
  url    = {https://www.anthropic.com/claude-opus-4-5-system-card},
  note   = {Accessed February 9, 2026}
}

Google / DeepMind

PaLM 2 — PaLM 2 Technical Report

05/2023 · Google · anil2023palm

Compute-optimal multilingual model powering Bard/Gemini with improved reasoning and coding

Page Paper
@misc{anil2023palm,
  title         = {{PaLM 2 Technical Report}},
  author        = {Anil, R. and Dai, A. and Firat, O. and Johnson, M. and Lepikhin, D. and Passos, A. and others},
  year          = {2023},
  eprint        = {2305.10403},
  archivePrefix = {arXiv}
}

Gemini 1.0 — Gemini: A Family of Highly Capable Multimodal Models

12/2023 · Google DeepMind · geminiteam2023gemini

First natively multimodal frontier model family (Ultra, Pro, Nano). Seminal because it pioneered training multimodality from scratch rather than bolting vision onto a text model, setting the direction for the field

Page Paper
@misc{geminiteam2023gemini,
  title         = {{Gemini: A Family of Highly Capable Multimodal Models}},
  author        = {{Gemini Team} and Anil, R. and Borgeaud, S. and Alayrac, J. and Yu, J. and Soricut, R. and others},
  year          = {2023},
  eprint        = {2312.11805},
  archivePrefix = {arXiv}
}

Gemini 1.5 — Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context

02/2024 · Google DeepMind · geminiteam2024gemini15

Long-context MoE model supporting up to 10M tokens with near-perfect recall

Page Paper
@misc{geminiteam2024gemini15,
  title         = {{Gemini 1.5: Unlocking Multimodal Understanding Across Millions of Tokens of Context}},
  author        = {{Gemini Team} and Reid, M. and Savinov, N. and Teber, D. and Bapna, A. and Bowman, R. and others},
  year          = {2024},
  eprint        = {2403.05530},
  archivePrefix = {arXiv}
}

Gemini 2.0 — Gemini 2.0 Blog

12/2024 · Google DeepMind · google2024gemini2

Agentic multimodal model with native tool use and multimodal output; no standalone technical report

Page
@misc{google2024gemini2,
  title  = {{Gemini 2.0: Our New AI Model for the Agentic Era}},
  author = {{Google DeepMind}},
  year   = {2024},
  url    = {https://blog.google/technology/google-deepmind/google-gemini-ai-update-december-2024/},
  note   = {Accessed February 9, 2026}
}

Gemma — Gemma: Open Models Based on Gemini Research and Technology

02/2024 · Google DeepMind · gemmateam2024gemma

Open-weights 2B/7B models derived from Gemini research

Page HF Paper
@misc{gemmateam2024gemma,
  title         = {{Gemma: Open Models Based on Gemini Research and Technology}},
  author        = {{Gemma Team} and Mesnard, T. and Hardin, C. and Dadashi, R. and Bhupatiraju, S. and Pathak, S. and others},
  year          = {2024},
  eprint        = {2403.08295},
  archivePrefix = {arXiv}
}

Gemma 2 — Gemma 2: Improving Open Language Models at a Practical Size

06/2024 · Google DeepMind · gemmateam2024gemma2

Knowledge-distilled 2B/9B/27B models with improved efficiency

Page HF Paper
@misc{gemmateam2024gemma2,
  title         = {{Gemma 2: Improving Open Language Models at a Practical Size}},
  author        = {{Gemma Team} and Riviere, M. and Pathak, S. and Sessa, P. and Hardin, C. and Bhupatiraju, S. and others},
  year          = {2024},
  eprint        = {2408.00118},
  archivePrefix = {arXiv}
}

Gemma 3 — Gemma 3 Technical Report

03/2025 · Google DeepMind · gemmateam2025gemma3

Multimodal 1B--27B models with 128K context, hybrid attention, and vision understanding

Page HF Paper
@misc{gemmateam2025gemma3,
  title         = {{Gemma 3 Technical Report}},
  author        = {{Gemma Team} and Kamath, A. and Ferret, J. and Pathak, S. and Vieillard, N. and Ramé, A. and others},
  year          = {2025},
  eprint        = {2503.19786},
  archivePrefix = {arXiv}
}

Gemini 2.5 Pro — Gemini 2.5 Technical Report

06/2025 · Google DeepMind · geminiteam2025gemini25

State-of-the-art thinking model with sparse MoE architecture, 1M token context, and native multimodal support; excels at coding, reasoning, and complex multi-source problems

Page Paper
@misc{geminiteam2025gemini25,
  title         = {{Gemini 2.5 Technical Report}},
  author        = {{Gemini Team} and others},
  year          = {2025},
  eprint        = {2507.06261},
  archivePrefix = {arXiv}
}

Gemini 2.5 Flash — Gemini 2.5 Flash Model Card

06/2025 · Google DeepMind · geminiteam2025gemini25flash

Hybrid reasoning model with controllable thinking budget for cost-efficient deployment; part of the 2.5 family

Page
@misc{geminiteam2025gemini25flash,
  title  = {{Gemini 2.5 Flash Model Card}},
  author = {{Google DeepMind}},
  year   = {2025},
  url    = {https://blog.google/products/gemini/gemini-2-5-model-family-expands/},
  note   = {Accessed February 9, 2026}
}

DeepSeek

DeepSeek-V2 — DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model

05/2024 · DeepSeek · deepseek2024v2

236B MoE model (21B active) with Multi-head Latent Attention for efficient inference

Page HF Paper
@misc{deepseek2024v2,
  title         = {{DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model}},
  author        = {{DeepSeek-AI} and Liu, A. and Feng, B. and Wang, B. and Wang, B. and Liu, B. and others},
  year          = {2024},
  eprint        = {2405.04434},
  archivePrefix = {arXiv}
}

DeepSeek-Coder-V2 — DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence

06/2024 · DeepSeek · zhu2024deepseek

Code-specialized MoE model competitive with GPT-4 Turbo on coding benchmarks

Page HF Paper
@misc{zhu2024deepseek,
  title         = {{DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence}},
  author        = {Zhu, Q. and Guo, D. and Shao, Z. and Yang, D. and Wang, P. and Xu, R. and others},
  year          = {2024},
  eprint        = {2406.11931},
  archivePrefix = {arXiv}
}

DeepSeek-V3 — DeepSeek-V3 Technical Report

12/2024 · DeepSeek · deepseek2024v3

671B MoE model (37B active) trained on 14.8T tokens for only 2.8M H800 GPU hours, rivaling frontier closed models

Page HF Paper
@misc{deepseek2024v3,
  title         = {{DeepSeek-V3 Technical Report}},
  author        = {{DeepSeek-AI} and Liu, A. and Feng, B. and Xue, B. and Wang, B. and Wu, B. and others},
  year          = {2024},
  eprint        = {2412.19437},
  archivePrefix = {arXiv}
}

DeepSeek-R1 — DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

01/2025 · DeepSeek · deepseek2025r1

Reasoning LRM trained with pure RL, matching OpenAI o1 on math and code benchmarks. Seminal because it showed reinforcement learning alone can elicit chain-of-thought reasoning, opening the open-source reasoning-model paradigm

Page HF Paper
@misc{deepseek2025r1,
  title         = {{DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning}},
  author        = {{DeepSeek-AI} and Guo, D. and Yang, D. and Zhang, H. and Song, J. and Zhang, R. and others},
  year          = {2025},
  eprint        = {2501.12948},
  archivePrefix = {arXiv}
}

Mistral AI

Mistral 7B — Mistral 7B

10/2023 · Mistral AI · jiang2023mistral

Compact 7B model outperforming Llama 2 13B on all benchmarks using sliding window attention and GQA. Seminal because it proved small, well-engineered models could rival much larger ones and launched the European open-weights ecosystem

Page HF Paper
@misc{jiang2023mistral,
  title         = {{Mistral 7B}},
  author        = {Jiang, A. and Sablayrolles, A. and Mensch, A. and Bamford, C. and Chaplot, D. and de las Casas, D. and others},
  year          = {2023},
  eprint        = {2310.06825},
  archivePrefix = {arXiv}
}

Mixtral 8x7B — Mixtral of Experts

01/2024 · Mistral AI · jiang2024mixtral

Sparse MoE with 8 experts (12.9B active of 46.7B total), matching or beating GPT-3.5 and Llama 2 70B. Seminal because it brought sparse Mixture-of-Experts into the open-weights mainstream, making MoE the go-to efficiency architecture

Page HF Paper
@misc{jiang2024mixtral,
  title         = {{Mixtral of Experts}},
  author        = {Jiang, A. and Sablayrolles, A. and Roux, A. and Mensch, A. and Savary, B. and Bamford, C. and others},
  year          = {2024},
  eprint        = {2401.04088},
  archivePrefix = {arXiv}
}

Mixtral 8x22B — Mixtral 8x22B Model Card

04/2024 · Mistral AI · mistral2024mixtral8x22b

Scaled MoE model (39B active of 141B total); no standalone paper, cite Mixtral paper or model card

Page HF
@misc{mistral2024mixtral8x22b,
  title  = {{Cheaper, Better, Faster, Stronger -- Mixtral 8x22B}},
  author = {{Mistral AI}},
  year   = {2024},
  url    = {https://mistral.ai/news/mixtral-8x22b/},
  note   = {Accessed February 9, 2026}
}

Mistral Large 2 — Mistral Large 2 Blog

07/2024 · Mistral AI · mistral2024large2

123B dense model with 128K context, strong on code and multilingual tasks; no technical paper

Page
@misc{mistral2024large2,
  title  = {{Mistral Large 2}},
  author = {{Mistral AI}},
  year   = {2024},
  url    = {https://mistral.ai/news/mistral-large-2407/},
  note   = {Accessed February 9, 2026}
}

Mistral Small 3 — Mistral Small 3 Blog

01/2025 · Mistral AI · mistral2025small3

Efficient 24B model balancing latency and quality for edge/on-device use; no technical paper

Page HF
@misc{mistral2025small3,
  title  = {{Mistral Small 3}},
  author = {{Mistral AI}},
  year   = {2025},
  url    = {https://mistral.ai/news/mistral-small-3/},
  note   = {Accessed February 9, 2026}
}

Alibaba / Qwen

Qwen — Qwen Technical Report

09/2023 · Alibaba · bai2023qwen

First generation of Qwen models (1.8B--72B) with strong multilingual and tool-use capabilities

Page HF Paper
@misc{bai2023qwen,
  title         = {{Qwen Technical Report}},
  author        = {Bai, J. and Bai, S. and Chu, Y. and Cui, Z. and Dang, K. and Deng, X. and others},
  year          = {2023},
  eprint        = {2309.16609},
  archivePrefix = {arXiv}
}

Qwen2 — Qwen2 Technical Report

06/2024 · Alibaba · yang2024qwen2

Second generation (0.5B--72B) with GQA and expanded multilingual support

Page HF Paper
@misc{yang2024qwen2,
  title         = {{Qwen2 Technical Report}},
  author        = {Yang, A. and Yang, B. and Hui, B. and Zheng, B. and Yu, B. and Zhou, C. and others},
  year          = {2024},
  eprint        = {2407.10671},
  archivePrefix = {arXiv}
}

Qwen2.5 — Qwen2.5 Technical Report

12/2024 · Alibaba · qwen2024qwen25

Flagship open-weights family (0.5B--72B + MoE), trained on 18T tokens, outperforming Llama 3 405B at 72B scale

Page HF Paper
@misc{qwen2024qwen25,
  title         = {{Qwen2.5 Technical Report}},
  author        = {{Qwen Team} and Yang, A. and Yang, B. and Hui, B. and Zheng, B. and Yu, B. and others},
  year          = {2024},
  eprint        = {2412.15115},
  archivePrefix = {arXiv}
}

Qwen2.5-Coder — Qwen2.5-Coder Technical Report

09/2024 · Alibaba · hui2024qwen25coder

Code-specialized series trained on 5.5T code tokens, matching GPT-4o on coding tasks

Page HF Paper
@misc{hui2024qwen25coder,
  title         = {{Qwen2.5-Coder Technical Report}},
  author        = {Hui, B. and Yang, J. and Cui, Z. and Yang, J. and Liu, D. and Zhang, L. and others},
  year          = {2024},
  eprint        = {2409.12186},
  archivePrefix = {arXiv}
}

QwQ — QwQ: Reflect Deeply on the Boundaries of the Unknown

11/2024 · Alibaba · qwen2024qwq

32B reasoning model derived from Qwen2.5 with extended chain-of-thought; no standalone paper

Page HF
@misc{qwen2024qwq,
  title  = {{QwQ: Reflect Deeply on the Boundaries of the Unknown}},
  author = {{Qwen Team}},
  year   = {2024},
  url    = {https://qwenlm.github.io/blog/qwq-32b-preview/},
  note   = {Accessed February 9, 2026}
}

Microsoft

Phi-1 — Textbooks Are All You Need

06/2023 · Microsoft Research · gunasekar2023textbooks

1.3B code model trained on synthetic "textbook-quality" data, achieving strong coding performance. Seminal because it proved data quality can substitute for scale, pioneering the synthetic-data paradigm that influenced nearly every small-model effort since

Page HF Paper
@misc{gunasekar2023textbooks,
  title         = {{Textbooks Are All You Need}},
  author        = {Gunasekar, S. and Zhang, Y. and Anber, J. and Bhaskar, R. and Celikyilmaz, A. and others},
  year          = {2023},
  eprint        = {2306.11644},
  archivePrefix = {arXiv}
}

Phi-1.5 — Textbooks Are All You Need II: phi-1.5 Technical Report

09/2023 · Microsoft Research · li2023textbooks

1.3B model extending synthetic data approach to commonsense reasoning

Page HF Paper
@misc{li2023textbooks,
  title         = {{Textbooks Are All You Need II: phi-1.5 Technical Report}},
  author        = {Li, Y. and Bubeck, S. and Eldan, R. and Del Giorno, A. and Gunasekar, S. and Lee, Y. and others},
  year          = {2023},
  eprint        = {2309.05463},
  archivePrefix = {arXiv}
}

Phi-3 — Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

04/2024 · Microsoft Research · abdin2024phi3

3.8B model rivaling Mixtral 8x7B using curated data and innovative training recipes

Page HF Paper
@misc{abdin2024phi3,
  title         = {{Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone}},
  author        = {Abdin, M. and Jacobs, S. and Awan, A. and Aneja, J. and Awadallah, A. and Awadalla, H. and others},
  year          = {2024},
  eprint        = {2404.14219},
  archivePrefix = {arXiv}
}

Phi-4 — Phi-4 Technical Report

12/2024 · Microsoft Research · abdin2024phi4

14B model surpassing its GPT-4 teacher on STEM via strategic synthetic data throughout training

Page HF Paper
@misc{abdin2024phi4,
  title         = {{Phi-4 Technical Report}},
  author        = {Abdin, M. and Jacobs, S. and Awan, A. and Aneja, J. and Awadallah, A. and Awadalla, H. and others},
  year          = {2024},
  eprint        = {2412.08905},
  archivePrefix = {arXiv}
}

Cohere

Command R — Command R Model Card

03/2024 · Cohere · cohere2024commandr

RAG-optimized 35B model with 128K context and strong tool-use; no technical paper

Page HF
@misc{cohere2024commandr,
  title  = {{Command R: Retrieval-Augmented Generation at Scale}},
  author = {Cohere},
  year   = {2024},
  url    = {https://cohere.com/blog/command-r},
  note   = {Accessed February 9, 2026}
}

Aya 23 — Aya 23: Open Weight Releases to Further Multilingual Progress

05/2024 · Cohere For AI · aryabumi2024aya

Open-weights 8B/35B multilingual models covering 23 languages

Page HF Paper
@misc{aryabumi2024aya,
  title         = {{Aya 23: Open Weight Releases to Further Multilingual Progress}},
  author        = {Aryabumi, V. and Dang, J. and Talupuru, D. and Dash, S. and Cairuz, D. and Lin, H. and others},
  year          = {2024},
  eprint        = {2405.15032},
  archivePrefix = {arXiv}
}

Aya Expanse — Aya Expanse: Connecting Our World

12/2024 · Cohere For AI · dang2024aya

Expanded multilingual model covering 23 languages with improved performance

Page HF Paper
@misc{dang2024aya,
  title         = {{Aya Expanse: Connecting Our World}},
  author        = {Dang, J. and Aryabumi, V. and Talupuru, D. and Dash, S. and Cairuz, D. and Lin, H. and others},
  year          = {2024},
  eprint        = {2412.04261},
  archivePrefix = {arXiv}
}

Command A — Command A Model Card

03/2025 · Cohere · cohere2025commanda

111B parameter model with 256K context for complex agentic tasks; supports 23 languages and replaces Command R+

Page HF
@misc{cohere2025commanda,
  title  = {{Command A}},
  author = {Cohere},
  year   = {2025},
  url    = {https://cohere.com/blog/command-a},
  note   = {Accessed February 9, 2026}
}

AI21 Labs

Jamba — Jamba: A Hybrid Transformer-Mamba Language Model

03/2024 · AI21 Labs · lieber2024jamba

Novel hybrid architecture combining Transformer and Mamba (SSM) layers with MoE. Seminal because it was the first production hybrid Transformer-SSM model, opening a new architectural design axis beyond pure Transformers

Page HF Paper
@misc{lieber2024jamba,
  title         = {{Jamba: A Hybrid Transformer-Mamba Language Model}},
  author        = {Lieber, O. and Lenz, B. and Bata, H. and Cohen, G. and Osin, J. and Dalmedigos, I. and others},
  year          = {2024},
  eprint        = {2403.19887},
  archivePrefix = {arXiv}
}

Jamba 1.5 — Jamba 1.5: Hybrid Transformer-Mamba Models at Scale

08/2024 · AI21 Labs · team2024jamba

Scaled hybrid SSM-Transformer models (Mini 12B active, Large 94B active) with 256K context

Page HF Paper
@misc{team2024jamba,
  title         = {{Jamba 1.5: Hybrid Transformer-Mamba Models at Scale}},
  author        = {{Jamba Team} and Bata, H. and Cohen, G. and Daoulas, I. and Dalmedigos, I. and Gera, A. and others},
  year          = {2024},
  eprint        = {2408.12570},
  archivePrefix = {arXiv}
}

xAI

Grok-1 — Grok-1 Model Card

03/2024 · xAI · xai2024grok1

314B MoE model open-sourced under Apache 2.0; no technical paper published

Page HF
@misc{xai2024grok1,
  title  = {{Grok-1}},
  author = {{xAI}},
  year   = {2024},
  url    = {https://x.ai/blog/grok/model-card},
  note   = {Accessed February 9, 2026}
}

Grok-2 — Grok-2 Blog

08/2024 · xAI · xai2024grok2

Frontier-class model with strong reasoning and vision capabilities; no technical paper

Page
@misc{xai2024grok2,
  title  = {{Grok-2}},
  author = {{xAI}},
  year   = {2024},
  url    = {https://x.ai/blog/grok-2},
  note   = {Accessed February 9, 2026}
}

Grok-3 — Grok-3 Blog

02/2025 · xAI · xai2025grok3

Trained on 200K H100 GPUs (10x Grok-2 compute) with reasoning modes (Think, Big Brain, DeepSearch); outperforms GPT-4o and Gemini 2 Pro on AIME and GPQA

Page
@misc{xai2025grok3,
  title  = {{Grok-3}},
  author = {{xAI}},
  year   = {2025},
  url    = {https://x.ai/news/grok-3},
  note   = {Accessed February 9, 2026}
}

Grok-4 — Grok-4 Model Card

07/2025 · xAI · xai2025grok4

Advanced reasoning model with native tool use and real-time search; 128K context with deep domain knowledge across finance, healthcare, law, and science

Page Paper
@misc{xai2025grok4,
  title  = {{Grok-4 Model Card}},
  author = {{xAI}},
  year   = {2025},
  url    = {https://x.ai/news/grok-4},
  note   = {Accessed February 9, 2026}
}

01.AI

Yi — Yi: Open Foundation Models by 01.AI

03/2024 · 01.AI · young2024yi

Bilingual (English/Chinese) 6B/34B models trained on 3T tokens with strong reasoning

Page HF Paper
@misc{young2024yi,
  title         = {{Yi: Open Foundation Models by 01.AI}},
  author        = {Young, A. and Chen, B. and Li, C. and Huang, C. and Zhang, G. and Zhang, G. and others},
  year          = {2024},
  eprint        = {2403.04652},
  archivePrefix = {arXiv}
}

Technology Innovation Institute (TII)

Falcon — The Falcon Series of Open Language Models

11/2023 · TII · almazrouei2023falcon

Open-weights 7B/40B/180B models trained on curated web data (RefinedWeb)

Page HF Paper
@misc{almazrouei2023falcon,
  title         = {{The Falcon Series of Open Language Models}},
  author        = {Almazrouei, E. and Alobeidli, H. and Alshamsi, A. and Cappelli, A. and Cojocaru, R. and others},
  year          = {2023},
  eprint        = {2311.16867},
  archivePrefix = {arXiv}
}

Falcon 2 — Falcon 2: An 11 Billion Parameter Large Language Model

07/2024 · TII · malartic2024falcon2

11B model with vision variant, competitive with much larger open models

Page HF Paper
@misc{malartic2024falcon2,
  title         = {{Falcon 2: An 11 Billion Parameter Large Language Model}},
  author        = {Malartic, G. and Musik, C. and Music, L. and Music, L. and Musik, C. and others},
  year          = {2024},
  eprint        = {2407.14885},
  archivePrefix = {arXiv}
}

Stability AI

Stable LM 2 — Stable LM 2 1.6B Technical Report

02/2024 · Stability AI · bellagente2024stable

Efficient 1.6B model competitive with larger models on downstream tasks

Page HF Paper
@misc{bellagente2024stable,
  title         = {{Stable LM 2 1.6B Technical Report}},
  author        = {Bellagente, M. and Tow, J. and Mahan, D. and Phang, J. and others},
  year          = {2024},
  eprint        = {2402.17834},
  archivePrefix = {arXiv}
}

NVIDIA

Nemotron-4 — Nemotron-4 340B Technical Report

06/2024 · NVIDIA · adler2024nemotron

340B model with synthetic data generation pipeline for alignment; used to create training data

Page HF Paper
@misc{adler2024nemotron,
  title         = {{Nemotron-4 340B Technical Report}},
  author        = {Adler, B. and Agarwal, N. and Aithal, A. and Anh, D. and Bhatt, P. and Choi, J. and others},
  year          = {2024},
  eprint        = {2406.11704},
  archivePrefix = {arXiv}
}

Databricks

DBRX — DBRX Blog

03/2024 · Databricks · databricks2024dbrx

Fine-grained 132B MoE model (36B active) outperforming Llama 2 70B and Mixtral; no arXiv paper

Page HF
@misc{databricks2024dbrx,
  title  = {{Introducing DBRX: A New State-of-the-Art Open LLM}},
  author = {Databricks},
  year   = {2024},
  url    = {https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm},
  note   = {Accessed February 9, 2026}
}

Amazon

Nova Pro — Amazon Nova Pro Model Card

12/2024 · Amazon · amazon2024novapro

Highly capable multimodal model for text, image, and video with strong accuracy-speed-cost balance; available through Amazon Bedrock

Page Paper
@misc{amazon2024novapro,
  title  = {{Amazon Nova: A New Generation of Foundation Models}},
  author = {Amazon},
  year   = {2024},
  url    = {https://aws.amazon.com/ai/generative-ai/nova/},
  note   = {Accessed February 9, 2026}
}

Nova Lite — Amazon Nova Lite Model Card

12/2024 · Amazon · amazon2024novalite

Low-cost multimodal model optimized for fast processing of image, video, and text inputs

Page Paper
@misc{amazon2024novalite,
  title  = {{Amazon Nova: A New Generation of Foundation Models}},
  author = {Amazon},
  year   = {2024},
  url    = {https://aws.amazon.com/ai/generative-ai/nova/},
  note   = {Accessed February 9, 2026}
}