Skip to content

Commit bee4a42

Browse files
LLM example path re-structure (release 2.4) (#3080)
* LLM example files restructure * update * update path in docs * symlink * cherry-pick the typo fix (#3083) * fix path in quant script --------- Co-authored-by: WeizhuoZhang-intel <[email protected]>
1 parent f3b57ef commit bee4a42

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

64 files changed

+370
-323
lines changed

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@ Intel® Extension for PyTorch\*
55

66
</div>
77

8-
**CPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.4.0%2Bcpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm) <br>
8+
**CPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=cpu&version=v2.4.0%2Bcpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/llm) <br>
99
**GPU** [💻main branch](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🌱Quick Start](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/tutorials/getting_started.html)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[📖Documentations](https://intel.github.io/intel-extension-for-pytorch/xpu/latest/)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[🏃Installation](https://intel.github.io/intel-extension-for-pytorch/index.html#installation?platform=gpu)&nbsp;&nbsp;&nbsp;|&nbsp;&nbsp;&nbsp;[💻LLM Example](https://github.com/intel/intel-extension-for-pytorch/tree/xpu-main/examples/gpu/inference/python/llm)<br>
1010

1111
Intel® Extension for PyTorch\* extends PyTorch\* with up-to-date features optimizations for an extra performance boost on Intel hardware. Optimizations take advantage of Intel® Advanced Vector Extensions 512 (Intel® AVX-512) Vector Neural Network Instructions (VNNI) and Intel® Advanced Matrix Extensions (Intel® AMX) on Intel CPUs as well as Intel X<sup>e</sup> Matrix Extensions (XMX) AI engines on Intel discrete GPUs. Moreover, Intel® Extension for PyTorch* provides easy GPU acceleration for Intel discrete GPUs through the PyTorch* xpu device.
1212

1313
## ipex.llm - Large Language Models (LLMs) Optimization
1414

15-
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [**LLM optimizations**](./examples/cpu/inference/python/llm) for details.
15+
In the current technological landscape, Generative AI (GenAI) workloads and models have gained widespread attention and popularity. Large Language Models (LLMs) have emerged as the dominant models driving these GenAI applications. Starting from 2.1.0, specific optimizations for certain LLM models are introduced in the Intel® Extension for PyTorch\*. Check [**LLM optimizations**](./examples/cpu/llm) for details.
1616

1717
### Optimized Model List
1818

docs/tutorials/examples.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -240,7 +240,7 @@ generate results for the input prompt.
240240
[//]: # (marker_llm_optimize_woq)
241241
[//]: # (marker_llm_optimize_woq)
242242

243-
**Note:** Please check [LLM Best Known Practice Page](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm)
243+
**Note:** Please check [LLM Best Known Practice Page](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/llm)
244244
for detailed environment setup and LLM workload running instructions.
245245

246246
## C++

docs/tutorials/features/int8_recipe_tuning_api.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@ Users need to provide a fp32 model and some parameters required for tuning. The
1010
Please refer to [static_quant example](../../../examples/cpu/features/int8_recipe_tuning/imagenet_autotune.py).
1111

1212
- Smooth Quantization
13-
Please refer to [llm sq example](../../../examples/cpu/inference/python/llm/single_instance/run_generation.py).
13+
Please refer to [LLM SmoothQuant example](../../../examples/cpu/llm/inference/single_instance/run_generation.py).
1414

1515
## Smooth Quantization Autotune
1616
### Algorithm: Auto-tuning of $\alpha$.

docs/tutorials/features/sq_recipe_tuning_api.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
Smooth Quant Recipe Tuning API (Prototype)
22
=============================================
33

4-
Smooth Quantization is a popular method to improve the accuracy of int8 quantization. The [autotune API](../api_doc.html#ipex.quantization.autotune) allows automatic global alpha tuning, and automatic layer-by-layer alpha tuning provided by Intel® Neural Compressor for the best INT8 accuracy.
4+
Smooth Quantization is a popular method to improve the accuracy of int8 quantization.
5+
The [autotune API](../api_doc.html#ipex.quantization.autotune) allows automatic global alpha tuning, and automatic layer-by-layer alpha tuning provided by Intel® Neural Compressor for the best INT8 accuracy.
56

67
SmoothQuant will introduce alpha to calculate the ratio of input and weight updates to reduce quantization error. SmoothQuant arguments are as below:
78

@@ -15,6 +16,6 @@ SmoothQuant will introduce alpha to calculate the ratio of input and weight upda
1516
| shared_criterion | "mean" | ["min", "mean","max"] | criterion for input LayerNorm op of a transformer block. |
1617
| enable_blockwise_loss | False | [True, False] | whether to enable block-wise auto-tuning |
1718

18-
For LLM examples, please refer to [example](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/inference/python/llm).
19+
Please refer to the [LLM examples](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/llm) for complete examples.
1920

2021
**Note**: When defining dataloaders for calibration, please follow INC's dataloader [format](https://github.com/intel/neural-compressor/blob/master/docs/source/dataloader.md).

docs/tutorials/getting_started.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -157,4 +157,4 @@ with torch.inference_mode(), torch.cpu.amp.autocast(enabled=amp_enabled):
157157
print(gen_text, total_new_tokens, flush=True)
158158
```
159159

160-
More LLM examples, including usage of low precision data types are available in the [LLM Examples](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm) section.
160+
More LLM examples, including usage of low precision data types are available in the [LLM Examples](https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/llm) section.

docs/tutorials/installation.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -5,4 +5,4 @@ Select your preferences and follow the installation instructions provided on the
55

66
After successful installation, refer to the [Quick Start](getting_started.md) and [Examples](examples.md) sections to start using the extension in your code.
77

8-
**NOTE:** For detailed instructions on installing and setting up the environment for Large Language Models (LLM), as well as example scripts, refer to the [LLM best practices](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/inference/python/llm).
8+
**NOTE:** For detailed instructions on installing and setting up the environment for Large Language Models (LLM), as well as example scripts, refer to the [LLM best practices](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/llm).

docs/tutorials/llm.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@ These LLM-specific optimizations can be automatically applied with a single fron
1313

1414
llm/llm_optimize
1515

16-
`ipex.llm` Optimized Model List
16+
`ipex.llm` Optimized Model List for Inference
1717
-------------------------------
1818

1919
Verified for single instance mode
@@ -30,7 +30,7 @@ Verified for distributed inference mode via DeepSpeed
3030

3131
*Note*: The above verified models (including other models in the same model family, like "codellama/CodeLlama-7b-hf" from LLAMA family) are well supported with all optimizations like indirect access KV cache, fused ROPE, and prepacked TPP Linear (fp32/bf16). We are working in progress to better support the models in the tables with various data types. In addition, more models will be optimized in the future.
3232

33-
Please check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/inference/python/llm>`_ for instructions to install/setup environment and example scripts.
33+
Please check `LLM best known practice <https://github.com/intel/intel-extension-for-pytorch/tree/main/examples/cpu/llm>`_ for instructions to install/setup environment and example scripts.
3434

3535
Module Level Optimization API for customized LLM (Prototype)
3636
------------------------------------------------------------

docs/tutorials/llm/llm_optimize.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,20 @@
1-
Transformers Optimization Frontend API
1+
LLM Optimizations Frontend API
22
======================================
33

4-
The new API function, `ipex.llm.optimize`, is designed to optimize transformer-based models within frontend Python modules, with a particular focus on Large Language Models (LLMs). It provides optimizations for both model-wise and content-generation-wise. You just need to invoke the `ipex.llm.optimize` function instead of the `ipex.optimize` function to apply all optimizations transparently.
4+
The new API function, `ipex.llm.optimize`, is designed to optimize transformer-based models within frontend Python modules, with a particular focus on Large Language Models (LLMs).
5+
It provides optimizations for both model-wise and content-generation-wise.
6+
You just need to invoke the `ipex.llm.optimize` function instead of the `ipex.optimize` function to apply all optimizations transparently.
57

6-
This API currently works for inference workloads. Support for training is undergoing. Currently, this API supports certain models. Supported model list can be found at [Overview](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html#ipexllm-optimized-model-list).
8+
This API currently works for inference workloads.
9+
Currently, this API supports certain models. Supported model list can be found at [this page](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/llm.html#ipexllm-optimized-model-list-for-inference).
10+
For LLM fine-tuning, please check the [LLM fine-tuning tutorial](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/llm/fine-tuning).
711

812
API documentation is available at [API Docs page](https://intel.github.io/intel-extension-for-pytorch/cpu/latest/tutorials/api_doc.html#ipex.llm.optimize).
913

1014
## Pseudocode of Common Usage Scenarios
1115

12-
The following sections show pseudocode snippets to invoke Intel® Extension for PyTorch\* APIs to work with LLM models. Complete examples can be found at [the Example directory](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/inference/python/llm).
16+
The following sections show pseudocode snippets to invoke Intel® Extension for PyTorch\* APIs to work with LLM models.
17+
Complete examples can be found at [the Example directory](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/llm/inference).
1318

1419
### FP32/BF16
1520

@@ -98,7 +103,7 @@ model = ipex.llm.optimize(model, quantization_config=qconfig, low_precision_chec
98103

99104
Distributed inference can be performed with `DeepSpeed`. Based on original Intel® Extension for PyTorch\* scripts, the following code changes are required.
100105

101-
Check [LLM distributed inference examples](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/inference/python/llm/distributed) for complete codes.
106+
Check [LLM distributed inference examples](https://github.com/intel/intel-extension-for-pytorch/tree/v2.4.0%2Bcpu/examples/cpu/llm/inference/distributed) for complete codes.
102107

103108
``` python
104109
import torch

examples/cpu/inference/python/llm/tools/env_activate.sh

Lines changed: 0 additions & 27 deletions
This file was deleted.

examples/cpu/inference/python/llm/tools/get_libstdcpp_lib.sh

Lines changed: 0 additions & 1 deletion
This file was deleted.

examples/cpu/inference/python/llm/Dockerfile renamed to examples/cpu/llm/Dockerfile

Lines changed: 3 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,7 +39,7 @@ ENV PATH=/root/.local/bin:${PATH}
3939
FROM base AS dev
4040
ARG COMPILE
4141
COPY . ./intel-extension-for-pytorch
42-
RUN cd intel-extension-for-pytorch/examples/cpu/inference/python/llm && \
42+
RUN cd intel-extension-for-pytorch/examples/cpu/llm && \
4343
export CC=gcc && export CXX=g++ && \
4444
if [ -z ${COMPILE} ]; then bash tools/env_setup.sh 6; else bash tools/env_setup.sh 2; fi && \
4545
unset CC && unset CXX
@@ -53,7 +53,7 @@ RUN apt update && \
5353
apt clean && \
5454
rm -rf /var/lib/apt/lists/* && \
5555
if [ -f /etc/apt/apt.conf.d/proxy.conf ]; then rm /etc/apt/apt.conf.d/proxy.conf; fi
56-
COPY --from=dev /root/intel-extension-for-pytorch/examples/cpu/inference/python/llm ./llm
56+
COPY --from=dev /root/intel-extension-for-pytorch/examples/cpu/llm ./llm
5757
COPY --from=dev /root/intel-extension-for-pytorch/tools/get_libstdcpp_lib.sh ./llm/tools
5858
RUN cd /usr/lib/x86_64-linux-gnu/ && ln -s libtcmalloc.so.4 libtcmalloc.so && cd && \
5959
echo "echo \"**Note:** For better performance, please consider to launch workloads with command 'ipexrun'.\"" >> ./.bashrc && \
@@ -62,8 +62,7 @@ RUN cd /usr/lib/x86_64-linux-gnu/ && ln -s libtcmalloc.so.4 libtcmalloc.so && cd
6262
python -m pip cache purge && \
6363
mv ./oneCCL_release /opt/oneCCL && \
6464
chown -R root:root /opt/oneCCL && \
65-
sed -i "s|ONECCL_PATH=.*|ONECCL_PATH=/opt/oneCCL|" ./tools/env_activate.sh && \
66-
LN=$(grep "Conda environment is not available." -n ./tools/env_activate.sh | cut -d ":" -f 1) && sed -i "${LN}s|.*| export LD_PRELOAD=\${LD_PRELOAD}:/usr/lib/x86_64-linux-gnu/libtcmalloc.so:/usr/local/lib/libiomp5.so|" ./tools/env_activate.sh
65+
sed -i "s|ONECCL_PATH=.*|ONECCL_PATH=/opt/oneCCL|" ./tools/env_activate.sh
6766
ARG PORT_SSH=22
6867
RUN mkdir /var/run/sshd && \
6968
sed -i "s/#Port.*/Port ${PORT_SSH}/" /etc/ssh/sshd_config && \

examples/cpu/llm/README.md

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
# 1. LLM Optimization Overview
2+
3+
`ipex.llm` provides dedicated optimization for running Large Language Models (LLM) faster, including technical points like paged attention, ROPE fusion, etc.
4+
And a set of data types are supported for various scenarios, including FP32, BF16, Smooth Quantization INT8, Weight Only Quantization INT8/INT4 (prototype).
5+
6+
<br>
7+
8+
# 2. Environment Setup
9+
10+
There are several environment setup methodologies provided. You can choose either of them according to your usage scenario. The Docker-based ones are recommended.
11+
12+
## 2.1 [RECOMMENDED] Docker-based environment setup with pre-built wheels
13+
14+
```bash
15+
# Get the Intel® Extension for PyTorch\* source code
16+
git clone https://github.com/intel/intel-extension-for-pytorch.git
17+
cd intel-extension-for-pytorch
18+
git checkout v2.4.0+cpu
19+
git submodule sync
20+
git submodule update --init --recursive
21+
22+
# Build an image with the provided Dockerfile by installing from Intel® Extension for PyTorch\* prebuilt wheel files
23+
# To have a custom ssh server port for multi-nodes run, please add --build-arg PORT_SSH=<CUSTOM_PORT> ex: 2345, otherwise use the default 22 SSH port
24+
DOCKER_BUILDKIT=1 docker build -f examples/cpu/llm/Dockerfile --build-arg PORT_SSH=2345 -t ipex-llm:2.4.0 .
25+
26+
# Run the container with command below
27+
docker run --rm -it --privileged -v /dev/shm:/dev/shm ipex-llm:2.4.0 bash
28+
29+
# When the command prompt shows inside the docker container, enter llm examples directory
30+
cd llm
31+
32+
# Activate environment variables
33+
# set bash script argument to "inference" or "fine-tuning" for different usages
34+
source ./tools/env_activate.sh [inference|fine-tuning]
35+
```
36+
37+
## 2.2 Conda-based environment setup with pre-built wheels
38+
39+
```bash
40+
# Get the Intel® Extension for PyTorch\* source code
41+
git clone https://github.com/intel/intel-extension-for-pytorch.git
42+
cd intel-extension-for-pytorch
43+
git checkout v2.4.0+cpu
44+
git submodule sync
45+
git submodule update --init --recursive
46+
47+
# GCC 12.3 is required. Installation can be taken care of by the environment configuration script.
48+
# Create a conda environment
49+
conda create -n llm python=3.10 -y
50+
conda activate llm
51+
52+
# Setup the environment with the provided script
53+
cd examples/cpu/llm
54+
bash ./tools/env_setup.sh 7
55+
56+
# Activate environment variables
57+
# set bash script argument to "inference" or "fine-tuning" for different usages
58+
source ./tools/env_activate.sh [inference|fine-tuning]
59+
```
60+
61+
## 2.3 Docker-based environment setup with compilation from source
62+
63+
```bash
64+
# Get the Intel® Extension for PyTorch\* source code
65+
git clone https://github.com/intel/intel-extension-for-pytorch.git
66+
cd intel-extension-for-pytorch
67+
git checkout v2.4.0+cpu
68+
git submodule sync
69+
git submodule update --init --recursive
70+
71+
# Build an image with the provided Dockerfile by compiling Intel® Extension for PyTorch\* from source
72+
# To have a custom ssh server port for multi-nodes run, please add --build-arg PORT_SSH=<CUSTOM_PORT> ex: 2345, otherwise use the default 22 SSH port
73+
docker build -f examples/cpu/llm/Dockerfile --build-arg COMPILE=ON --build-arg PORT_SSH=2345 -t ipex-llm:2.4.0 .
74+
75+
# Run the container with command below
76+
docker run --rm -it --privileged -v /dev/shm:/dev/shm ipex-llm:2.4.0 bash
77+
78+
# When the command prompt shows inside the docker container, enter llm examples directory
79+
cd llm
80+
81+
# Activate environment variables
82+
# set bash script argument to "inference" or "fine-tuning" for different usages
83+
source ./tools/env_activate.sh [inference|fine-tuning]
84+
```
85+
86+
## 2.4 Conda-based environment setup with compilation from source
87+
88+
```bash
89+
# Get the Intel® Extension for PyTorch\* source code
90+
git clone https://github.com/intel/intel-extension-for-pytorch.git
91+
cd intel-extension-for-pytorch
92+
git checkout v2.4.0+cpu
93+
git submodule sync
94+
git submodule update --init --recursive
95+
96+
# GCC 12.3 is required. Installation can be taken care of by the environment configuration script.
97+
# Create a conda environment
98+
conda create -n llm python=3.10 -y
99+
conda activate llm
100+
101+
# Setup the environment with the provided script
102+
cd examples/cpu/llm
103+
bash ./tools/env_setup.sh
104+
105+
# Activate environment variables
106+
# set bash script argument to "inference" or "fine-tuning" for different usages
107+
source ./tools/env_activate.sh [inference|fine-tuning]
108+
```
109+
110+
<br>
111+
112+
*Note*: In `env_activate.sh` script a `prompt.json` file is downloaded, which provides prompt samples with pre-defined input token lengths for benchmarking.
113+
For **Llama-3 models** benchmarking, the users need to download a specific `prompt.json` file, overwriting the original one.
114+
115+
```bash
116+
wget -O prompt.json https://intel-extension-for-pytorch.s3.amazonaws.com/miscellaneous/llm/prompt-3.json
117+
```
118+
119+
The original `prompt.json` file can be restored from the repository if needed.
120+
121+
```bash
122+
wget https://intel-extension-for-pytorch.s3.amazonaws.com/miscellaneous/llm/prompt.json
123+
```
124+
125+
<br>
126+
127+
# 3. How To Run LLM with ipex.llm
128+
129+
Inference and fine-tuning are supported in respective directories.
130+
131+
For inference example scripts, visit the [inference](./inference/) directory.
132+
133+
For fine-tuning example scripts, visit the [fine-tuning](./fine-tuning/) directory.

0 commit comments

Comments
 (0)