Guidelines#

Kor is a wrapper around LLMs to help with information extraction.

The quality of the results depends on a lot of factors.

Here are a few things to experiment with to improve quality:

  • Add more examples. Diverse examples can help, including examples where nothing should be extracted.

  • Improve the descriptions of the attributes.

  • If working with multi-paragraph text, specify an input_formatter of "triple_quotes" when creating the chain.

  • Try a better model (e.g., text-davinci-003, gpt-4).

  • Break the schema into a few smaller schemas, run separate extractions, and merge the results.

  • If possible to flatten the schema, and use a CSV encoding instead of a JSON encoding.

  • Add verification/correction steps (ask an LLM to correct or verify the results of the extraction).

Keep in mind! 😶‍🌫️#

  • If you’re extracting information from a single structured source (e.g., linkedin), using an LLM is not a good idea – traditional web-scraping will be much cheaper and reliable.

  • If perfect quality is needed, then even with all the hacks above, you’ll need to plan on having a human in the loop as even the best LLMs will make mistakes with complex extraction tasks.