From 20b1dcf95a71494a61fb07e24de486793e14d1f8 Mon Sep 17 00:00:00 2001 From: azurechen97 <36040190+azurechen97@users.noreply.github.com> Date: Sat, 3 May 2025 17:21:02 +0800 Subject: [PATCH] Update 4.mdx Fixed wrong position of the sentence "This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories:" --- chapters/en/chapter1/4.mdx | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/chapters/en/chapter1/4.mdx b/chapters/en/chapter1/4.mdx index 3870b541f..438180b61 100644 --- a/chapters/en/chapter1/4.mdx +++ b/chapters/en/chapter1/4.mdx @@ -35,7 +35,6 @@ The [Transformer architecture](https://arxiv.org/abs/1706.03762) was introduced - **May 2020**, [GPT-3](https://huggingface.co/papers/2005.14165), an even bigger version of GPT-2 that is able to perform well on a variety of tasks without the need for fine-tuning (called _zero-shot learning_) - **January 2022**: [InstructGPT](https://huggingface.co/papers/2203.02155), a version of GPT-3 that was trained to follow instructions better -This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories: - **January 2023**: [Llama](https://huggingface.co/papers/2302.13971), a large language model that is able to generate text in a variety of languages. @@ -45,6 +44,8 @@ This list is far from comprehensive, and is just meant to highlight a few of the - **November 2024**: [SmolLM2](https://huggingface.co/papers/2502.02737), a state-of-the-art small language model (135 million to 1.7 billion parameters) that achieves impressive performance despite its compact size, and unlocking new possibilities for mobile and edge devices. +This list is far from comprehensive, and is just meant to highlight a few of the different kinds of Transformer models. Broadly, they can be grouped into three categories: + - GPT-like (also called _auto-regressive_ Transformer models) - BERT-like (also called _auto-encoding_ Transformer models) - T5-like (also called _sequence-to-sequence_ Transformer models)