{ "cells": [ { "cell_type": "markdown", "id": "4b3a0584-b52c-4873-abb8-8382e13ff5c0", "metadata": {}, "source": [ "# Schema serialization\n", "\n", "A Kor schema can be serialized and deserialzed to JSON. This lets you store the schema outside of the code.\n", "\n", "**ATTENTION** This only works with pydantic v1 at the moment." ] }, { "cell_type": "code", "execution_count": 1, "id": "0b4597b2-2a43-4491-8830-bf9f79428074", "metadata": { "nbsphinx": "hidden", "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import sys\n", "\n", "sys.path.insert(0, \"../../\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "47a11a37", "metadata": { "tags": [] }, "outputs": [], "source": [ "from kor.nodes import Object, Text, Number" ] }, { "cell_type": "markdown", "id": "ac403159", "metadata": {}, "source": [ "## Serialization\n", "\n", "To serialize a schema just call the `json()` method on the schema" ] }, { "cell_type": "code", "execution_count": 3, "id": "67cb9713", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{\"id\": \"personal_info\", \"description\": \"Personal information about a given person.\", \"many\": true, \"attributes\": [{\"id\": \"first_name\", \"description\": \"The first name of the person\", \"many\": false, \"examples\": [[\"John Smith went to the store\", \"John\"]], \"$type\": \"Text\"}, {\"id\": \"last_name\", \"description\": \"The last name of the person\", \"many\": false, \"examples\": [[\"John Smith went to the store\", \"Smith\"]], \"$type\": \"Text\"}, {\"id\": \"age\", \"description\": \"The age of the person in years.\", \"many\": false, \"examples\": [[\"23 years old\", \"23\"], [\"I turned three on sunday\", \"3\"]], \"$type\": \"Number\"}], \"examples\": [[\"John Smith was 23 years old. He was very tall. He knew Jane Doe. She was 5 years old.\", [{\"first_name\": \"John\", \"last_name\": \"Smith\", \"age\": 23}, {\"first_name\": \"Jane\", \"last_name\": \"Doe\", \"age\": 5}]]]}\n" ] } ], "source": [ "schema = Object(\n", " id=\"personal_info\",\n", " description=\"Personal information about a given person.\",\n", " attributes=[\n", " Text(\n", " id=\"first_name\",\n", " description=\"The first name of the person\",\n", " examples=[(\"John Smith went to the store\", \"John\")],\n", " ),\n", " Text(\n", " id=\"last_name\",\n", " description=\"The last name of the person\",\n", " examples=[(\"John Smith went to the store\", \"Smith\")],\n", " ),\n", " Number(\n", " id=\"age\",\n", " description=\"The age of the person in years.\",\n", " examples=[(\"23 years old\", \"23\"), (\"I turned three on sunday\", \"3\")],\n", " ),\n", " ],\n", " examples=[\n", " (\n", " \"John Smith was 23 years old. He was very tall. He knew Jane Doe. She was 5 years old.\",\n", " [\n", " {\"first_name\": \"John\", \"last_name\": \"Smith\", \"age\": 23},\n", " {\"first_name\": \"Jane\", \"last_name\": \"Doe\", \"age\": 5},\n", " ],\n", " )\n", " ],\n", " many=True,\n", ")\n", "\n", "print(schema.json())" ] }, { "cell_type": "markdown", "id": "11712477", "metadata": {}, "source": [ "## Deserialization\n", "\n", "Kor lets you define the schema in JSON. The structure of the JSON matches the struture of the `Object` type.\n", "\n", "The following attribute types must be annotated with a type descrimintator (`$type`):\n", "\n", "- Number\n", "- Text\n", "- Bool\n", "- Selection" ] }, { "cell_type": "code", "execution_count": 4, "id": "3bd33817", "metadata": { "tags": [] }, "outputs": [], "source": [ "json = \"\"\"\n", "{\n", " \"id\": \"personal_info\",\n", " \"description\": \"Personal information about a given person.\",\n", " \"attributes\": [\n", " {\n", " \"$type\": \"Text\",\n", " \"id\": \"first_name\",\n", " \"description\": \"The first name of the person\",\n", " \"examples\": [[\"John Smith went to the store\", \"John\"]]\n", " },\n", " {\n", " \"$type\": \"Text\",\n", " \"id\": \"last_name\",\n", " \"description\": \"The last name of the person\",\n", " \"examples\": [[\"John Smith went to the store\", \"Smith\"]]\n", " },\n", " {\n", " \"$type\": \"Number\",\n", " \"id\": \"age\",\n", " \"description\": \"The age of the person in years.\",\n", " \"examples\": [[\"23 years old\", \"23\"], [\"I turned three on sunday\", \"3\"]]\n", " }\n", " ],\n", " \"examples\": [\n", " [\n", " \"John Smith was 23 years old. He was very tall. He knew Jane Doe. She was 5 years old.\",\n", " [\n", " {\"first_name\": \"John\", \"last_name\": \"Smith\", \"age\": 23},\n", " {\"first_name\": \"Jane\", \"last_name\": \"Doe\", \"age\": 5}\n", " ]\n", " ]\n", " ],\n", " \"many\": true\n", "}\n", "\"\"\"" ] }, { "cell_type": "markdown", "id": "3581b713", "metadata": {}, "source": [ "To deserialize a schema from JSON simply call the `parse_raw()` method." ] }, { "cell_type": "code", "execution_count": 5, "id": "6088c98a", "metadata": { "tags": [] }, "outputs": [], "source": [ "schema = Object.parse_raw(json)" ] }, { "cell_type": "code", "execution_count": 6, "id": "718c66a7-6186-4ed8-87e9-5ed28e3f209e", "metadata": { "tags": [] }, "outputs": [], "source": [ "from kor.extraction import create_extraction_chain\n", "from langchain.chat_models import ChatOpenAI" ] }, { "cell_type": "code", "execution_count": 7, "id": "9bc98f35-ea5f-4b74-a32e-a300a22c0c89", "metadata": { "tags": [] }, "outputs": [], "source": [ "llm = ChatOpenAI(\n", " model_name=\"gpt-3.5-turbo\",\n", " temperature=0,\n", " max_tokens=2000,\n", " model_kwargs={\"frequency_penalty\": 0, \"presence_penalty\": 0, \"top_p\": 1.0},\n", ")" ] }, { "cell_type": "code", "execution_count": 8, "id": "54a199a5-24b4-442c-8907-1449e437a880", "metadata": { "tags": [] }, "outputs": [], "source": [ "chain = create_extraction_chain(llm, schema)" ] }, { "cell_type": "code", "execution_count": 9, "id": "193e257b-df01-45ec-af77-076d2070533b", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'personal_info': [{'first_name': 'Eugene', 'last_name': '', 'age': '18'}]}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"Eugene was 18 years old a long time ago.\")[\"data\"]" ] }, { "cell_type": "code", "execution_count": 10, "id": "c8295f36-f986-4db2-97bc-ef2e6cdbcc87", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "{'personal_info': [{'first_name': 'Bob', 'last_name': 'Alice', 'age': ''}]}\n" ] } ], "source": [ "chain = create_extraction_chain(llm, schema)\n", "print(\n", " chain.run(\n", " \"My name is Bob Alice and my phone number is (123)-444-9999. I found my true love one\"\n", " \" on a blue sunday. Her number was (333)1232832. Her name was Moana Sunrise and she was 10 years old.\"\n", " )[\"data\"]\n", ")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }