Model Collection
模型概览与选择建议
TL;DR(中文)
- 这页是
LLM的基础索引:帮助你快速建立 “哪些模型出现过、它们大致属于哪一类” 的认知地图,覆盖 GPT-5.1 / GPT-4.1 / o1、Claude 4.5、Gemini 3、Llama 3.1、Grok-2 等。 - 选模型时不要只看参数/榜单:更重要的是
latency、cost、context length、tool support(例如Tool Calling)、以及你的evaluation结果。 - 在真实项目里,通常用 “capability model + fast model” 组合:复杂规划/推理用一个更强的 model,日常步骤与批处理用更快更便宜的 model。
2024-2025 实用速览
| 厂商 | 能力档(多模态/工具) | 快速/便宜档 | 备注 |
|---|---|---|---|
| OpenAI | ChatGPT 5.1(GPT-5 系列) / GPT-4.1 | GPT-4o-mini | 视觉/工具优秀;Mini 适合批处理、自动化 |
| OpenAI(推理) | o1 / o1-mini | — | 长链推理/规划更强,成本更高 |
| Anthropic | Claude 4.5 Sonnet | Claude 3.5/4.5 Haiku | 长文档/表格能力好,安全性高 |
| Gemini 3 Pro | Gemini 3 Flash / Flash-Lite | 1M tokens 上下文,多模态/工具;Flash 系列速度快 | |
| Meta(开源) | Llama 3/3.1 70B | Llama 3.1 8B | 适合私有化部署,生态丰富 |
| Mistral | Mistral Large | Mistral Small | 性价比模型,多语言表现好 |
| xAI | Grok-2 | Grok-2 mini | 对时效/联网敏感场景可选 |
选型建议:先用“小模型跑通 → 大模型抬质量 → eval 回归”。多模态/截图优先 ChatGPT 5.1 / GPT-4o / Gemini 3;长文档/表格优先 Claude 4.5 或 Gemini 3 Pro。
Last updated: 2025-02
中文导读(术语保留英文)
这份列表偏 “foundational + notable” 的历史脉络整理,不保证覆盖所有最新模型与所有版本。建议把它当作:
- 回溯用:遇到论文/博客提到某个模型时,快速定位来源与年代
- 选型用:在你做
evaluation前先建立候选池(candidate set)
如果你要做工程选型,建议至少回答这些问题:
- 任务更像是
chat、coding、reasoning、还是RAG? - 是否需要 long context?需要多长?(context length)
- 是否需要
Tool Calling?是否要 structured output(JSON schema)? - 你能否跑一个小的
evaluation集合做回归(10-50 条就够起步)?
Data adopted from Papers with Code and Zhao et al. (2023).
Models
| Model | Release Date | Description |
|---|---|---|
| BERT | 2018 | Bidirectional Encoder Representations from Transformers |
| GPT | 2018 | Improving Language Understanding by Generative Pre-Training |
| RoBERTa | 2019 | A Robustly Optimized BERT Pretraining Approach |
| GPT-2 | 2019 | Language Models are Unsupervised Multitask Learners |
| T5 | 2019 | Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer |
| BART | 2019 | Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension |
| ALBERT | 2019 | A Lite BERT for Self-supervised Learning of Language Representations |
| XLNet | 2019 | Generalized Autoregressive Pretraining for Language Understanding and Generation |
| CTRL | 2019 | CTRL: A Conditional Transformer Language Model for Controllable Generation |
| ERNIE | 2019 | ERNIE: Enhanced Representation through Knowledge Integration |
| GShard | 2020 | GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding |
| GPT-3 | 2020 | Language Models are Few-Shot Learners |
| LaMDA | 2021 | LaMDA: Language Models for Dialog Applications |
| PanGu-α | 2021 | PanGu-α: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation |
| mT5 | 2021 | mT5: A massively multilingual pre-trained text-to-text transformer |
| CPM-2 | 2021 | CPM-2: Large-scale Cost-effective Pre-trained Language Models |
| T0 | 2021 | Multitask Prompted Training Enables Zero-Shot Task Generalization |
| HyperCLOVA | 2021 | What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers |
| Codex | 2021 | Evaluating Large Language Models Trained on Code |
| ERNIE 3.0 | 2021 | ERNIE 3.0: Large-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
| Jurassic-1 | 2021 | Jurassic-1: Technical Details and Evaluation |
| FLAN | 2021 | Finetuned Language Models Are Zero-Shot Learners |
| MT-NLG | 2021 | Using DeepSpeed and Megatron to Train Megatron-Turing NLG 530B, A Large-Scale Generative Language Model |
| Yuan 1.0 | 2021 | Yuan 1.0: Large-Scale Pre-trained Language Model in Zero-Shot and Few-Shot Learning |
| WebGPT | 2021 | WebGPT: Browser-assisted question-answering with human feedback |
| Gopher | 2021 | Scaling Language Models: Methods, Analysis & Insights from Training Gopher |
| ERNIE 3.0 Titan | 2021 | ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation |
| GLaM | 2021 | GLaM: Efficient Scaling of Language Models with Mixture-of-Experts |
| InstructGPT | 2022 | Training language models to follow instructions with human feedback |
| GPT-NeoX-20B | 2022 | GPT-NeoX-20B: An Open-Source Autoregressive Language Model |
| AlphaCode | 2022 | Competition-Level Code Generation with AlphaCode |
| CodeGen | 2022 | CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis |
| Chinchilla | 2022 | Shows that for a compute budget, the best performances are not achieved by the largest models but by smaller models trained on more data. |
| Tk-Instruct | 2022 | Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks |
| UL2 | 2022 | UL2: Unifying Language Learning Paradigms |
| PaLM | 2022 | PaLM: Scaling Language Modeling with Pathways |
| OPT | 2022 | OPT: Open Pre-trained Transformer Language Models |
| BLOOM | 2022 | BLOOM: A 176B-Parameter Open-Access Multilingual Language Model |
| GLM-130B | 2022 | GLM-130B: An Open Bilingual Pre-trained Model |
| AlexaTM | 2022 | AlexaTM 20B: Few-Shot Learning Using a Large-Scale Multilingual Seq2Seq Model |
| Flan-T5 | 2022 | Scaling Instruction-Finetuned Language Models |
| Sparrow | 2022 | Improving alignment of dialogue agents via targeted human judgements |
| U-PaLM | 2022 | Transcending Scaling Laws with 0.1% Extra Compute |
| mT0 | 2022 | Crosslingual Generalization through Multitask Finetuning |
| Galactica | 2022 | Galactica: A Large Language Model for Science |
| OPT-IML | 2022 | OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization |
| LLaMA | 2023 | LLaMA: Open and Efficient Foundation Language Models |
| GPT-4 | 2023 | GPT-4 Technical Report |
| PanGu-Σ | 2023 | PanGu-Σ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing |
| BloombergGPT | 2023 | BloombergGPT: A Large Language Model for Finance |
| PaLM 2 | 2023 | A Language Model that has better multilingual and reasoning capabilities and is more compute-efficient than its predecessor PaLM. |
| Claude 2 | 2023 | Anthropic’s second-gen assistant, improved writing/code quality and safety |
| Llama 2 | 2023 | Open-weight chat models (7B-70B) widely used for 私有化部署 |
| Mixtral 8x7B | 2023 | Sparse Mixture-of-Experts 开源模型,性价比高 |
| Gemini 1.0 | 2023 | Google 多模态模型(Ultra/Pro/ Nano),是 Gemini 系列的首发 |
| Claude 3(Opus/Sonnet/Haiku) | 2024 | 新一代多模态模型,长文档/表格提取和安全性强 |
| Gemini 1.5 Pro | 2024 | 最高 1M+ tokens 长上下文,多模态 |
| Gemini 1.5 Flash | 2024 | 便宜快速的多模态模型,适合批量和实时交互 |
| Mistral Large | 2024 | 多语言大模型,支持函数调用与长上下文 |
| Grok-1.5 | 2024 | xAI 的长上下文模型,侧重实时性 |
| GPT-4o | 2024 | OpenAI 全模态旗舰,快于 GPT-4,支持语音/图像/视频 |
| GPT-4o mini | 2024 | 低成本、强工具能力的小型模型 |
| Llama 3 | 2024 | 8B/70B 开源,英文和多语表现强 |
| Llama 3.1 | 2024 | 8B/70B/405B,升级推理和 128k 上下文 |
| Claude 3.5 Sonnet | 2024 | Claude 3.5 主力型号,代码与工具调用强 |
| Claude 3.5 Haiku | 2024 | 轻量快速版,保留高安全与多模态能力 |
| o1 | 2024 | OpenAI 推理型模型,偏链式思考与规划 |
| o1-mini | 2024 | o1 的低价/更快版本 |
| GPT-4.1 | 2024 | “全能” 模型,统一文本/图像/语音,推理与工具调用升级 |
| Grok-2 | 2024 | xAI 新一代模型,提升代码与联网问答 |
| Gemini 2.0 Flash (Exp) | 2024 | 面向实时/工具场景的 Gemini 2.0 预览版 |
| Gemini 2.0 Pro (Exp) | 2024 | Gemini 2.0 预览旗舰,多模态与长上下文 |
| DeepSeek-R1 | 2025 | 侧重推理效率的开源模型,支持长上下文与数学/代码 |
| Claude 4.5 Sonnet | 2025 | Anthropic 2025 主力模型,长上下文(200K–1M beta)、代码与表格检索更强 |
| ChatGPT 5.1(GPT-5 系列) | 2025 | 最新 OpenAI 产品面向 Responses API,支持可控推理深度与多模态 |
| Gemini 3 Pro | 2025 | Google 超长上下文旗舰(1,048,576 tokens),多模态 + 工具链优化 |
| Gemini 3 Flash / Flash-Lite | 2025 | 快速/低成本多模态模型,适合产品内实时交互与批处理 |