Schema serialization#

A Kor schema can be serialized and deserialzed to JSON. This lets you store the schema outside of the code.

ATTENTION This only works with pydantic v1 at the moment.

from kor.nodes import Object, Text, Number

Serialization#

To serialize a schema just call the json() method on the schema

schema = Object(
    id="personal_info",
    description="Personal information about a given person.",
    attributes=[
        Text(
            id="first_name",
            description="The first name of the person",
            examples=[("John Smith went to the store", "John")],
        ),
        Text(
            id="last_name",
            description="The last name of the person",
            examples=[("John Smith went to the store", "Smith")],
        ),
        Number(
            id="age",
            description="The age of the person in years.",
            examples=[("23 years old", "23"), ("I turned three on sunday", "3")],
        ),
    ],
    examples=[
        (
            "John Smith was 23 years old. He was very tall. He knew Jane Doe. She was 5 years old.",
            [
                {"first_name": "John", "last_name": "Smith", "age": 23},
                {"first_name": "Jane", "last_name": "Doe", "age": 5},
            ],
        )
    ],
    many=True,
)

print(schema.json())
{"id": "personal_info", "description": "Personal information about a given person.", "many": true, "attributes": [{"id": "first_name", "description": "The first name of the person", "many": false, "examples": [["John Smith went to the store", "John"]], "$type": "Text"}, {"id": "last_name", "description": "The last name of the person", "many": false, "examples": [["John Smith went to the store", "Smith"]], "$type": "Text"}, {"id": "age", "description": "The age of the person in years.", "many": false, "examples": [["23 years old", "23"], ["I turned three on sunday", "3"]], "$type": "Number"}], "examples": [["John Smith was 23 years old. He was very tall. He knew Jane Doe. She was 5 years old.", [{"first_name": "John", "last_name": "Smith", "age": 23}, {"first_name": "Jane", "last_name": "Doe", "age": 5}]]]}

Deserialization#

Kor lets you define the schema in JSON. The structure of the JSON matches the struture of the Object type.

The following attribute types must be annotated with a type descrimintator ($type):

  • Number

  • Text

  • Bool

  • Selection

json = """
{
    "id": "personal_info",
    "description": "Personal information about a given person.",
    "attributes": [
        {
            "$type": "Text",
            "id": "first_name",
            "description": "The first name of the person",
            "examples": [["John Smith went to the store", "John"]]
        },
        {
            "$type": "Text",
            "id": "last_name",
            "description": "The last name of the person",
            "examples": [["John Smith went to the store", "Smith"]]
        },
        {
            "$type": "Number",
            "id": "age",
            "description": "The age of the person in years.",
            "examples": [["23 years old", "23"], ["I turned three on sunday", "3"]]
        }
    ],
    "examples": [
        [
            "John Smith was 23 years old. He was very tall. He knew Jane Doe. She was 5 years old.",
            [
                {"first_name": "John", "last_name": "Smith", "age": 23},
                {"first_name": "Jane", "last_name": "Doe", "age": 5}
            ]
        ]
    ],
    "many": true
}
"""

To deserialize a schema from JSON simply call the parse_raw() method.

schema = Object.parse_raw(json)
from kor.extraction import create_extraction_chain
from langchain.chat_models import ChatOpenAI
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0,
    max_tokens=2000,
    model_kwargs={"frequency_penalty": 0, "presence_penalty": 0, "top_p": 1.0},
)
chain = create_extraction_chain(llm, schema)
chain.run("Eugene was 18 years old a long time ago.")["data"]
{'personal_info': [{'first_name': 'Eugene', 'last_name': '', 'age': '18'}]}
chain = create_extraction_chain(llm, schema)
print(
    chain.run(
        "My name is Bob Alice and my phone number is (123)-444-9999. I found my true love one"
        " on a blue sunday. Her number was (333)1232832. Her name was Moana Sunrise and she was 10 years old."
    )["data"]
)
{'personal_info': [{'first_name': 'Bob', 'last_name': 'Alice', 'age': ''}]}