Skip to content

Refactor for extraction docs #8465

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Aug 5, 2023
Merged

Conversation

fpingham
Copy link
Collaborator

Refactor for the extraction use case documentation

@vercel
Copy link

vercel bot commented Jul 29, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Aug 5, 2023 0:20am

@dosubot dosubot bot added the 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label Jul 29, 2023
Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from.
## Overview

In LangChain we provide a few useful abstractions that help you leverage OpenAI [function calling](https://openai.com/blog/function-calling-and-other-api-updates) (`-0613` models) and which go a long way to avoid model hallucination. This is in general the best go-to to extract structured data from text using langchain. However there's other two options that you should also consider:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i believe that the latest gpt-4 and gpt-3.5 models come with functions, don't need to be -0613 specifically anymore

Copy link
Collaborator

@rlancemartin rlancemartin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Updated into more granular comments below.)

@rlancemartin
Copy link
Collaborator

One higher-level strategy point: @baskaryan do you think we should support the option to spin up a Colab based on any use case doc? IMO would certainly be cool, but I don't have a sense for the cost. At least, it seems then all the use-case docs need to be Jupyter notebooks?

@rlancemartin
Copy link
Collaborator

One higher-level strategy point: @baskaryan do you think we should support the option to spin up a Colab based on any use case doc? IMO would certainly be cool, but I don't have a sense for the cost. At least, it seems then all the use-case docs need to be Jupyter notebooks?

Confirmed from discussion that we should do this.

But we can take up the work to convert to notebooks later.

@@ -4,6 +4,8 @@ sidebar_position: 2

# Extraction

## Use case
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe something more explicit like:

Getting structured output from LLM generation is hard.

For example, suppose you need the model output formatted as JSON or in some other specified schema.

Two primary approach have emerged for this:
* Functions
* Output parsing

Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from.
## Overview

In LangChain we provide a few useful abstractions that help you leverage OpenAI [function calling](https://openai.com/blog/function-calling-and-other-api-updates) (`-0613` models) and which go a long way to avoid model hallucination. This is in general the best go-to to extract structured data from text using langchain. However there's other two options that you should also consider:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here we might offer a simple schematic to explain the differences:

image


All this said, let's see how easy it is to quickly and accurately extract structured data with LangChain.

## Example #1: using a JSON schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we re-title this to Quickstart: OAI function are probably the quickest way to get started.

[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},
{'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]

## Example #2: using a Pydantic schema
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we re-title this as Functions

1/ Provide the Pydantic example

2/ Going Deeper

  • Link to OAI functions page
  • Can link to other functions pages as they are crated

For a deep dive on extraction, we recommend checking out [`kor`](https://eyurtsev.github.io/kor/),
a library that uses the existing LangChain chain and OutputParser abstractions
but deep dives on allowing extraction of more complicated schemas.
As we have seen, by leveraging OpenAI `-0613` models we can extract structured data from unstructured documents with minimal hallucination. For more detail on how to use these chains and different tips and tricks you can use, please check out the [docs](docs/extras/modules/chains/additional/extraction.ipynb) for the extraction chain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we add a final section on Parsing

1/ Include the JSON example.

2/ Going Deeper

@baskaryan
Copy link
Collaborator

think we're missing an image extraction_trace_function_2.png @fpingham @rlancemartin

@rlancemartin
Copy link
Collaborator

think we're missing an image extraction_trace_function_2.png @fpingham @rlancemartin

added! also scrubbed the ntbk; should build!

@rlancemartin rlancemartin merged commit ef5bc1f into master Aug 5, 2023
@rlancemartin rlancemartin deleted the francisco/extraction_docs_refactor branch August 5, 2023 17:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants