Nested Objects and Lists#

Kor attempts to make it easy to capture more complex structure during extraction.

ATTENTION At the moment to use either nested objects or nested lists, one should use the json encoder.

from kor.extraction import create_extraction_chain
from kor.nodes import Object, Text, Number
from langchain.chat_models import ChatOpenAI
from langchain.llms import OpenAI
llm = ChatOpenAI(
    model_name="gpt-3.5-turbo",
    temperature=0,
    max_tokens=2000,
)

Nested Objects#

Here, we’ll introduce an Address object which will be neste inside of the main schema.

from_address = Object(
    id="from_address",
    description="Person moved away from this address",
    attributes=[
        Text(id="street"),
        Text(id="city"),
        Text(id="state"),
        Text(id="zipcode"),
        Text(id="country", description="A country in the world; e.g., France."),
    ],
    examples=[
        (
            "100 Main St, Boston, MA, 23232, USA",
            {
                "street": "100 Marlo St",
                "city": "Boston",
                "state": "MA",
                "zipcode": "23232",
                "country": "USA",
            },
        )
    ],
)

to_address = from_address.replace(
    id="to_address", description="Address to which the person is moving"
)

schema = Object(
    id="information",
    attributes=[
        Text(
            id="person_name",
            description="The full name of the person or partial name",
            examples=[("John Smith was here", "John Smith")],
        ),
        from_address,
        to_address,
    ],
    many=True,
)

JSON encoding#

To use nested objects, at least for now we have to swap to the JSON encoder.

Anecdotally, CSV encoding seems to produce more robust extraction results, so JSON encoding may perform worse even though it’s more flexible.

chain = create_extraction_chain(
    llm, schema, encoder_or_encoder_class="json", input_formatter=None
)
chain.run(
    "Alice Doe moved from New York to Boston, MA while Bob Smith did the opposite."
)["data"]
{'information': [{'person_name': 'Alice Doe',
   'from_address': {'city': 'New York'},
   'to_address': {'city': 'Boston', 'state': 'MA'}},
  {'person_name': 'Bob Smith',
   'from_address': {'city': 'Boston', 'state': 'MA'},
   'to_address': {'city': 'New York'}}]}
chain.run(
    "Alice Doe and Bob Smith moved from New York to Boston. Andrew was 12 years"
    " old. He also moved to Boston. So did Joana and Paul. Betty did the opposite."
)["data"]
{'information': [{'person_name': 'Alice Doe',
   'from_address': {'city': 'New York'},
   'to_address': {'city': 'Boston'}},
  {'person_name': 'Bob Smith',
   'from_address': {'city': 'New York'},
   'to_address': {'city': 'Boston'}},
  {'person_name': 'Andrew', 'to_address': {'city': 'Boston'}},
  {'person_name': 'Joana', 'to_address': {'city': 'Boston'}},
  {'person_name': 'Paul', 'to_address': {'city': 'Boston'}},
  {'person_name': 'Betty',
   'from_address': {'city': 'Boston'},
   'to_address': {'city': 'New York'}}]}

Nested Lists#

Let’s repeat the same schema as above, but let the address be a many=True field.

from_address = Object(
    id="from_address",
    description="Person moved away from this address",
    attributes=[
        Text(id="street"),
        Text(id="city"),
        Text(id="state"),
        Text(id="zipcode"),
        Text(id="country", description="A country in the world; e.g., France."),
    ],
    examples=[
        (
            "100 Main St, Boston,MA, 23232, USA",
            {
                "street": "100 Marlo St",
                "city": "Boston",
                "state": "MA",
                "zipcode": "23232",
                "country": "USA",
            },
        )
    ],
    many=True,  # <-- PLEASE NOTE THIS CHANGE
)

to_address = from_address.replace(
    id="to_address", description="Address to which the person is moving"
)

schema = Object(
    id="information",
    attributes=[
        Text(
            id="person_name",
            description="The full name of the person or partial name",
            examples=[("John Smith was here", "John Smith")],
        ),
        from_address,
        to_address,
    ],
    many=True,
)
chain = create_extraction_chain(llm, schema, encoder_or_encoder_class="json")
chain.run(
    "Alice Doe and Bob Smith moved from New York to Boston. Bob later moved to LA."
)["data"]
{'information': [{'person_name': 'Alice Doe',
   'from_address': [{'city': 'New York'}],
   'to_address': [{'city': 'Boston'}]},
  {'person_name': 'Bob Smith',
   'from_address': [{'city': 'New York'}],
   'to_address': [{'city': 'Boston'}, {'city': 'LA'}]}]}