Fine-tuning OpenAI's Models for JSON Output Formatting

This project demonstrates how to fine-tune one of OpenAI's key models to achieve JSON output formatting for generating fake identity data. By leveraging fine-tuning, we can get better steerability, shorter prompts, and therefore, reduced costs.

Detailed Article on This Project - A comprehensive guide on this project, its motivation, and methodology.

Project Description

Often, in the development stages, there's a need to generate structured data to seed our databases, populate dashboards, etc. This project specifically focuses on generating Twitter-like user profiles in a structured format.

With the fine-tuned model, the aim is to reduce the number of tokens used in a prompt without compromising on the quality of the response. This project shows you how to:

Prepare synthetic training data
Format the data according to OpenAI's guidelines
Fine-tune the model using the prepared data
Test the fine-tuned model

Installation and Setup

Clone the GitHub repository.
Install required packages:
```
pip install -U -r requirements.txt
```
Include your OpenAI API key in your environment variables:
```
export OPENAI_API_KEY="sk-XXXXX"
```

Usage

Follow the instructions in the article to generate the training data, fine-tune the model, and test it.

Resources

Detailed Article on This Project - A comprehensive guide on this project, its motivation, and methodology.
Langchain - a popular library for language processing
Native Function Calling Demo

Project Files and Their Descriptions

requirements.txt
- Purpose: Lists all the required Python packages and libraries for this project.
prepare_data.py
- Purpose: Contains scripts to generate synthetic training data for model fine-tuning.
transform_data.py
- Purpose: Formats the synthetic data according to OpenAI's guidelines.
openai_formatting.py
- Purpose: Validates the data formatting according to OpenAI's guidelines. Counts tokens. Source.
finetuning.py
- Purpose: Contains scripts and instructions to fine-tune the OpenAI model with the prepared data.
run_model.py
- Purpose: Allows users to test the fine-tuned model by generating JSON formatted data.
training_examples.json
- Purpose: Output of prepare_data.py so you don't have to pay for generating synthetic data again.

Connect

For more insights, updates, and discussions, connect with me:

🐦 Twitter @horosin_
📧 Subscribe to my newsletter for regular tips and insights.
🌐 LinkedIn

License

This project is open-sourced under the MIT License. The exemption is openai_formatting.py, which is proprietary to OpenAI.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Fine-tuning OpenAI's Models for JSON Output Formatting

Project Description

Installation and Setup

Usage

Resources

Project Files and Their Descriptions

Connect

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
finetuning-cover.png		finetuning-cover.png
finetuning.py		finetuning.py
openai_formatting.py		openai_formatting.py
prepare_data.py		prepare_data.py
requirements.txt		requirements.txt
run_model.py		run_model.py
training_examples.json		training_examples.json
transform_data.py		transform_data.py

horosin/open-finetuning

Folders and files

Latest commit

History

Repository files navigation

Fine-tuning OpenAI's Models for JSON Output Formatting

Project Description

Installation and Setup

Usage

Resources

Project Files and Their Descriptions

Connect

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages