Skip to content

This project demonstrates how to fine-tune one of OpenAI's key models to achieve JSON output formatting for generating fake identity data. By leveraging fine-tuning, we can get better steerability, shorter prompts, and therefore, reduced costs.

Notifications You must be signed in to change notification settings

horosin/open-finetuning

Repository files navigation

cover image of the repo saying Fine-tuning OpenAI's Models for JSON Output Formatting practival example with python

Fine-tuning OpenAI's Models for JSON Output Formatting

This project demonstrates how to fine-tune one of OpenAI's key models to achieve JSON output formatting for generating fake identity data. By leveraging fine-tuning, we can get better steerability, shorter prompts, and therefore, reduced costs.

Detailed Article on This Project - A comprehensive guide on this project, its motivation, and methodology.

Project Description

Often, in the development stages, there's a need to generate structured data to seed our databases, populate dashboards, etc. This project specifically focuses on generating Twitter-like user profiles in a structured format.

With the fine-tuned model, the aim is to reduce the number of tokens used in a prompt without compromising on the quality of the response. This project shows you how to:

  • Prepare synthetic training data
  • Format the data according to OpenAI's guidelines
  • Fine-tune the model using the prepared data
  • Test the fine-tuned model

Installation and Setup

  1. Clone the GitHub repository.

  2. Install required packages:

    pip install -U -r requirements.txt
  3. Include your OpenAI API key in your environment variables:

    export OPENAI_API_KEY="sk-XXXXX"

Usage

Follow the instructions in the article to generate the training data, fine-tune the model, and test it.

Resources

Project Files and Their Descriptions

  1. requirements.txt

    • Purpose: Lists all the required Python packages and libraries for this project.
  2. prepare_data.py

    • Purpose: Contains scripts to generate synthetic training data for model fine-tuning.
  3. transform_data.py

    • Purpose: Formats the synthetic data according to OpenAI's guidelines.
  4. openai_formatting.py

    • Purpose: Validates the data formatting according to OpenAI's guidelines. Counts tokens. Source.
  5. finetuning.py

    • Purpose: Contains scripts and instructions to fine-tune the OpenAI model with the prepared data.
  6. run_model.py

    • Purpose: Allows users to test the fine-tuned model by generating JSON formatted data.
  7. training_examples.json

    • Purpose: Output of prepare_data.py so you don't have to pay for generating synthetic data again.

Connect

For more insights, updates, and discussions, connect with me:

License

This project is open-sourced under the MIT License. The exemption is openai_formatting.py, which is proprietary to OpenAI.

About

This project demonstrates how to fine-tune one of OpenAI's key models to achieve JSON output formatting for generating fake identity data. By leveraging fine-tuning, we can get better steerability, shorter prompts, and therefore, reduced costs.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages