Guidelines
Contents
Guidelines#
Kor
is a wrapper around LLMs to help with information extraction.
Kor is best used with LLMs that do NOT natively support function calling.
If you’re working with a chat model that does support native function calling, please read through this guide first (https://python.langchain.com/v0.2/docs/how_to/tool_calling/).
The quality of the results depends on a lot of factors.
Here are a few things to experiment with to improve quality:
Add more examples. Diverse examples can help, including examples where nothing should be extracted.
Improve the descriptions of the attributes.
If working with multi-paragraph text, specify an
input_formatter
of"triple_quotes"
when creating the chain.Try a better model.
Break the schema into a few smaller schemas, run separate extractions, and merge the results.
If possible to flatten the schema, and use a CSV encoding instead of a JSON encoding.
Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).
Keep in mind! 😶🌫️#
If you’re extracting information from a single structured source (e.g., linkedin), using an LLM is not a good idea – traditional web-scraping will be much cheaper and reliable.
If perfect quality is needed, then even with all the hacks above, you’ll need to plan on having a human in the loop as even the best LLMs will make mistakes with complex extraction tasks.