-
Notifications
You must be signed in to change notification settings - Fork 17.9k
Refactor for extraction docs #8465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The latest updates on your projects. Learn more about Vercel for Git ↗︎
|
Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from. | ||
## Overview | ||
|
||
In LangChain we provide a few useful abstractions that help you leverage OpenAI [function calling](https://openai.com/blog/function-calling-and-other-api-updates) (`-0613` models) and which go a long way to avoid model hallucination. This is in general the best go-to to extract structured data from text using langchain. However there's other two options that you should also consider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i believe that the latest gpt-4 and gpt-3.5 models come with functions, don't need to be -0613 specifically anymore
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(Updated into more granular comments below.)
One higher-level strategy point: @baskaryan do you think we should support the option to spin up a Colab based on any use case doc? IMO would certainly be cool, but I don't have a sense for the cost. At least, it seems then all the use-case docs need to be Jupyter notebooks? |
Confirmed from discussion that we should do this. But we can take up the work to convert to notebooks later. |
@@ -4,6 +4,8 @@ sidebar_position: 2 | |||
|
|||
# Extraction | |||
|
|||
## Use case |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe something more explicit like:
Getting structured output from LLM generation is hard.
For example, suppose you need the model output formatted as JSON or in some other specified schema.
Two primary approach have emerged for this:
* Functions
* Output parsing
Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from. | ||
## Overview | ||
|
||
In LangChain we provide a few useful abstractions that help you leverage OpenAI [function calling](https://openai.com/blog/function-calling-and-other-api-updates) (`-0613` models) and which go a long way to avoid model hallucination. This is in general the best go-to to extract structured data from text using langchain. However there's other two options that you should also consider: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we might offer a simple schematic to explain the differences:
|
||
All this said, let's see how easy it is to quickly and accurately extract structured data with LangChain. | ||
|
||
## Example #1: using a JSON schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we re-title this to Quickstart
: OAI function are probably the quickest way to get started.
[{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'}, | ||
{'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}] | ||
|
||
## Example #2: using a Pydantic schema |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we re-title this as Functions
1/ Provide the Pydantic example
2/ Going Deeper
- Link to OAI functions page
- Can link to other functions pages as they are crated
For a deep dive on extraction, we recommend checking out [`kor`](https://eyurtsev.github.io/kor/), | ||
a library that uses the existing LangChain chain and OutputParser abstractions | ||
but deep dives on allowing extraction of more complicated schemas. | ||
As we have seen, by leveraging OpenAI `-0613` models we can extract structured data from unstructured documents with minimal hallucination. For more detail on how to use these chains and different tips and tricks you can use, please check out the [docs](docs/extras/modules/chains/additional/extraction.ipynb) for the extraction chain. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we add a final section on Parsing
1/ Include the JSON example.
2/ Going Deeper
- Link to broader set of parser docs
think we're missing an image |
added! also scrubbed the ntbk; should build! |
Refactor for the extraction use case documentation