Untyped Obects 🤷
Untyped Obects 🤷#
It’s possible to provide just examples without type information. It may be that the quality of results won’t be affected significantly, if one adds sufficient examples to compensate for lack of information about the schema.
from kor.extraction import create_extraction_chain
from kor.nodes import Object, Text, Number
from langchain_openai import ChatOpenAI, OpenAI
llm = ChatOpenAI(
model_name="gpt-4o",
temperature=0,
)
schema = Object(
id="information",
attributes=[],
examples=[
(
"John Smith moved to Boston from New York. Billy moved to LA.",
[
{
"person_name": "John Smith",
"from_address": {"city": "New York"},
"to_address": {"city": "Boston"},
},
{"person_name": "Billy", "to_address": {"city": "LA"}},
],
)
],
)
chain = create_extraction_chain(llm, schema, encoder_or_encoder_class="json")
chain.invoke(
"Alice Doe and Bob Smith moved from New York to Boston. Andrew was 12 years"
" old. He also moved to Boston. So did Joana and Paul. Betty did the opposite."
)["data"]
{'information': [{'person_name': 'Alice Doe',
'from_address': {'city': 'New York'},
'to_address': {'city': 'Boston'}},
{'person_name': 'Bob Smith',
'from_address': {'city': 'New York'},
'to_address': {'city': 'Boston'}},
{'person_name': 'Andrew', 'to_address': {'city': 'Boston'}},
{'person_name': 'Joana', 'to_address': {'city': 'Boston'}},
{'person_name': 'Paul', 'to_address': {'city': 'Boston'}},
{'person_name': 'Betty',
'from_address': {'city': 'Boston'},
'to_address': {'city': 'New York'}}]}
print(chain.get_prompts()[0].format_prompt("[user_input]").to_string())
Your goal is to extract structured information from the user's input that matches the form described below. When extracting information please make sure it matches the type information exactly. Do not add any attributes that do not appear in the schema shown below.
```TypeScript
information: { //
}
```
Please output the extracted information in JSON format. Do not output anything except for the extracted information. Do not add any clarifying information. Do not add any fields that are not in the schema. If the text contains attributes that do not appear in the schema, please ignore them. All output must be in JSON format and follow the schema specified above. Wrap the JSON in <json> tags.
Input: John Smith moved to Boston from New York. Billy moved to LA.
Output: <json>{"information": [{"person_name": "John Smith", "from_address": {"city": "New York"}, "to_address": {"city": "Boston"}}, {"person_name": "Billy", "to_address": {"city": "LA"}}]}</json>
Input: [user_input]
Output: