Refactor for extraction docs #8465

fpingham · 2023-07-29T19:41:36Z

Refactor for the extraction use case documentation

…e chain nb

vercel · 2023-07-29T19:41:40Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Aug 5, 2023 0:20am

baskaryan · 2023-07-31T03:09:20Z

docs/extras/use_cases/extraction/index.mdx

-Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from.
+## Overview
+
+In LangChain we provide a few useful abstractions that help you leverage OpenAI [function calling](https://openai.com/blog/function-calling-and-other-api-updates) (`-0613` models) and which go a long way to avoid model hallucination. This is in general the best go-to to extract structured data from text using langchain. However there's other two options that you should also consider:


i believe that the latest gpt-4 and gpt-3.5 models come with functions, don't need to be -0613 specifically anymore

rlancemartin

(Updated into more granular comments below.)

rlancemartin · 2023-07-31T03:52:40Z

One higher-level strategy point: @baskaryan do you think we should support the option to spin up a Colab based on any use case doc? IMO would certainly be cool, but I don't have a sense for the cost. At least, it seems then all the use-case docs need to be Jupyter notebooks?

rlancemartin · 2023-07-31T18:37:12Z

One higher-level strategy point: @baskaryan do you think we should support the option to spin up a Colab based on any use case doc? IMO would certainly be cool, but I don't have a sense for the cost. At least, it seems then all the use-case docs need to be Jupyter notebooks?

Confirmed from discussion that we should do this.

But we can take up the work to convert to notebooks later.

rlancemartin · 2023-07-31T18:38:22Z

docs/extras/use_cases/extraction/index.mdx

@@ -4,6 +4,8 @@ sidebar_position: 2

 # Extraction

+## Use case


Maybe something more explicit like:

Getting structured output from LLM generation is hard. For example, suppose you need the model output formatted as JSON or in some other specified schema. Two primary approach have emerged for this: * Functions * Output parsing

rlancemartin · 2023-07-31T18:39:25Z

docs/extras/use_cases/extraction/index.mdx

-Then, in addition to the output format instructions, the prompt should also contain the data you would like to extract information from.
+## Overview
+
+In LangChain we provide a few useful abstractions that help you leverage OpenAI [function calling](https://openai.com/blog/function-calling-and-other-api-updates) (`-0613` models) and which go a long way to avoid model hallucination. This is in general the best go-to to extract structured data from text using langchain. However there's other two options that you should also consider:


Here we might offer a simple schematic to explain the differences:

rlancemartin · 2023-07-31T18:47:58Z

docs/extras/use_cases/extraction/index.mdx

+
+All this said, let's see how easy it is to quickly and accurately extract structured data with LangChain.
+
+## Example #1: using a JSON schema


Maybe we re-title this to Quickstart: OAI function are probably the quickest way to get started.

rlancemartin · 2023-07-31T18:50:12Z

docs/extras/use_cases/extraction/index.mdx

+    [{'name': 'Alex', 'height': 5, 'hair_color': 'blonde'},
+     {'name': 'Claudia', 'height': 6, 'hair_color': 'brunette'}]
+
+## Example #2: using a Pydantic schema


Maybe we re-title this as Functions

1/ Provide the Pydantic example

2/ Going Deeper

Link to OAI functions page

Can link to other functions pages as they are crated

rlancemartin · 2023-07-31T18:52:08Z

docs/extras/use_cases/extraction/index.mdx

-For a deep dive on extraction, we recommend checking out [`kor`](https://eyurtsev.github.io/kor/),
-a library that uses the existing LangChain chain and OutputParser abstractions
-but deep dives on allowing extraction of more complicated schemas.
+As we have seen, by leveraging OpenAI `-0613` models we can extract structured data from unstructured documents with minimal hallucination. For more detail on how to use these chains and different tips and tricks you can use, please check out the [docs](docs/extras/modules/chains/additional/extraction.ipynb) for the extraction chain.


Maybe we add a final section on Parsing

1/ Include the JSON example.

2/ Going Deeper

Link to broader set of parser docs

baskaryan · 2023-08-04T22:07:36Z

think we're missing an image extraction_trace_function_2.png @fpingham @rlancemartin

rlancemartin · 2023-08-04T22:40:24Z

think we're missing an image extraction_trace_function_2.png @fpingham @rlancemartin

added! also scrubbed the ntbk; should build!

fpingham added 6 commits July 26, 2023 00:15

improved extraction use case docs by including a few examples from th…

147e2ba

…e chain nb

small fix to extraction use case docs

d34297c

improved extraction docs

f989374

added reference to output parsers in extraction docs

35ac631

added output parsers link in extraction docs

a825d55

improved structure of extraction use case docs

7103d29

dosubot bot added the 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder label Jul 29, 2023

Merge branch 'master' into francisco/extraction_docs_refactor

1fa0105

vercel bot deployed to Preview – langchain July 30, 2023 00:49 View deployment

baskaryan reviewed Jul 31, 2023

View reviewed changes

rlancemartin reviewed Jul 31, 2023

View reviewed changes

fpingham and others added 4 commits August 1, 2023 12:11

second iteration of extraction use case docs

328609a

Add notebook

f8180c7

Rm older docs

1e2bf62

Small fix

f96bc26

vercel bot had a problem deploying to Preview – langchain August 1, 2023 17:05 Failure

Add JSONFormer

a9689ea

vercel bot had a problem deploying to Preview – langchain August 1, 2023 17:23 Failure

Add colab link

4bae159

vercel bot had a problem deploying to Preview – langchain August 2, 2023 18:34 Failure

solved a few typos

74fc631

vercel bot had a problem deploying to Preview – langchain August 3, 2023 13:42 Failure

fixed image links for extraction docs

f3f05e9

vercel bot deployed to Preview – langchain August 3, 2023 16:02 View deployment

Update extraction

70d6cda

vercel bot had a problem deploying to Preview – langchain August 3, 2023 23:28 Failure

Minor updates

d5f4c38

vercel bot had a problem deploying to Preview – langchain August 3, 2023 23:39 Failure

Move to main dir

ae63a13

vercel bot had a problem deploying to Preview – langchain August 4, 2023 00:13 Failure

rlancemartin added 2 commits August 4, 2023 15:36

fmt

7cdcaac

fix img

ba8dbea

vercel bot deployed to Preview – langchain August 4, 2023 22:55 View deployment

fmt

c650eb3

vercel bot deployed to Preview – langchain August 5, 2023 00:20 View deployment

rlancemartin merged commit ef5bc1f into master Aug 5, 2023

rlancemartin deleted the francisco/extraction_docs_refactor branch August 5, 2023 17:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactor for extraction docs #8465

Refactor for extraction docs #8465

Uh oh!

fpingham commented Jul 29, 2023

Uh oh!

vercel bot commented Jul 29, 2023 •

edited

Loading

Uh oh!

baskaryan Jul 31, 2023

Uh oh!

rlancemartin left a comment •

edited

Loading

Uh oh!

rlancemartin commented Jul 31, 2023

Uh oh!

rlancemartin commented Jul 31, 2023

Uh oh!

rlancemartin Jul 31, 2023

Uh oh!

rlancemartin Jul 31, 2023

Uh oh!

rlancemartin Jul 31, 2023

Uh oh!

rlancemartin Jul 31, 2023

Uh oh!

rlancemartin Jul 31, 2023

Uh oh!

baskaryan commented Aug 4, 2023

Uh oh!

rlancemartin commented Aug 4, 2023

Uh oh!

Uh oh!


		All this said, let's see how easy it is to quickly and accurately extract structured data with LangChain.

		## Example #1: using a JSON schema

Refactor for extraction docs #8465

Refactor for extraction docs #8465

Uh oh!

Conversation

fpingham commented Jul 29, 2023

Uh oh!

vercel bot commented Jul 29, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

baskaryan Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

rlancemartin left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rlancemartin commented Jul 31, 2023

Uh oh!

rlancemartin commented Jul 31, 2023

Uh oh!

rlancemartin Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

rlancemartin Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

rlancemartin Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

rlancemartin Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

rlancemartin Jul 31, 2023

Choose a reason for hiding this comment

Uh oh!

baskaryan commented Aug 4, 2023

Uh oh!

rlancemartin commented Aug 4, 2023

Uh oh!

Uh oh!

vercel bot commented Jul 29, 2023 •

edited

Loading

rlancemartin left a comment •

edited

Loading