{ "cells": [ { "cell_type": "markdown", "id": "4b3a0584-b52c-4873-abb8-8382e13ff5c0", "metadata": {}, "source": [ "# Natural Language Based APIs\n", "\n", "Being able to understand the content of text can help in tasks other than information extraction.\n", "\n", "Here, we'll see how extracting information from text can help with powering a natural language based assistant \n", "that has different skills." ] }, { "cell_type": "code", "execution_count": 1, "id": "0b4597b2-2a43-4491-8830-bf9f79428074", "metadata": { "nbsphinx": "hidden", "tags": [ "remove-cell" ] }, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2\n", "\n", "import sys\n", "\n", "sys.path.insert(0, \"../../\")" ] }, { "cell_type": "code", "execution_count": 2, "id": "c719e4fc-3ccf-4633-a787-b2fe0d1eac65", "metadata": { "tags": [] }, "outputs": [], "source": [ "from langchain.chat_models import ChatOpenAI\n", "from kor import create_extraction_chain, Object, Text, Number" ] }, { "cell_type": "code", "execution_count": 3, "id": "f1313c02-d415-4ce6-bff0-3df537cc06c2", "metadata": { "tags": [] }, "outputs": [], "source": [ "llm = ChatOpenAI(\n", " model_name=\"gpt-3.5-turbo\",\n", " temperature=0,\n", " max_tokens=2000,\n", ")" ] }, { "cell_type": "markdown", "id": "bdd0f1af-55d4-4ab2-8932-b52a86450cdf", "metadata": {}, "source": [ "## Control Music" ] }, { "cell_type": "markdown", "id": "c6ce4726-db5b-49ee-abf7-780fd707be5f", "metadata": {}, "source": [ "Here's a hypotehtical API for controlling music." ] }, { "cell_type": "code", "execution_count": 4, "id": "f61c94db-c05d-43ba-9ffc-b58552c715c3", "metadata": { "tags": [] }, "outputs": [], "source": [ "schema = Object(\n", " id=\"player\",\n", " description=(\n", " \"User is controlling a music player to select songs, pause or start them or play\"\n", " \" music by a particular artist.\"\n", " ),\n", " attributes=[\n", " Text(\n", " id=\"song\",\n", " description=\"User wants to play this song\",\n", " examples=[],\n", " many=True,\n", " ),\n", " Text(\n", " id=\"album\",\n", " description=\"User wants to play this album\",\n", " examples=[],\n", " many=True,\n", " ),\n", " Text(\n", " id=\"artist\",\n", " description=\"Music by the given artist\",\n", " examples=[(\"Songs by paul simon\", \"paul simon\")],\n", " many=True,\n", " ),\n", " Text(\n", " id=\"action\",\n", " description=\"Action to take one of: `play`, `stop`, `next`, `previous`.\",\n", " examples=[\n", " (\"Please stop the music\", \"stop\"),\n", " (\"play something\", \"play\"),\n", " (\"play a song\", \"play\"),\n", " (\"next song\", \"next\"),\n", " ],\n", " ),\n", " ],\n", " many=False,\n", ")" ] }, { "cell_type": "markdown", "id": "49029ef7-c084-46f2-9791-6fb731252a9f", "metadata": {}, "source": [ "**ATTENTION** Use the JSON encoder here rather than the default CSV encoder as it supports nested lists" ] }, { "cell_type": "code", "execution_count": 5, "id": "ff9ad27f-7a81-4123-8d0b-1e14802df67e", "metadata": { "tags": [] }, "outputs": [], "source": [ "chain = create_extraction_chain(llm, schema, encoder_or_encoder_class=\"json\")" ] }, { "cell_type": "markdown", "id": "e0b612d2-6893-4881-ad97-e09425511010", "metadata": {}, "source": [ "## Music Player" ] }, { "cell_type": "code", "execution_count": 6, "id": "760baa5f-9368-4b5a-abc0-6ac65c34b7a7", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'player': {'action': 'stop'}}" ] }, "execution_count": 6, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"stop the music now\")[\"data\"]" ] }, { "cell_type": "code", "execution_count": 7, "id": "462303c0-e83a-4e39-86cd-cab6875b40ef", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'player': {'action': 'play'}}" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"i want to hear a song\")[\"data\"]" ] }, { "cell_type": "code", "execution_count": 8, "id": "02c7f1e5-1c8d-4e9f-82e6-c37a41d6de14", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'player': {'album': ['the lion king soundtrack']}}" ] }, "execution_count": 8, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"can you play the lion king soundtrack\")[\"data\"]" ] }, { "cell_type": "code", "execution_count": 9, "id": "7a6d918c-53fe-426b-b37e-eec2abb8a704", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'data': {'player': {'artist': ['paul simon', 'led zeppelin', 'the doors']}},\n", " 'raw': '{\"player\": {\"artist\": [\"paul simon\", \"led zeppelin\", \"the doors\"]}}',\n", " 'errors': [],\n", " 'validated_data': {}}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"play songs by paul simon and led zeppelin and the doors\")" ] }, { "cell_type": "code", "execution_count": 10, "id": "b18acf0a-d99e-48de-ace5-fb01bded5a41", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'player': {'action': 'previous'}}" ] }, "execution_count": 10, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"could you play the previous song again?\")[\"data\"]" ] }, { "cell_type": "markdown", "id": "4a90e50e-aa1b-45c6-920b-4589b424e561", "metadata": {}, "source": [ "## Ticket ordering\n", "\n", "Here's an imaginary API for searching and buying tickets" ] }, { "cell_type": "code", "execution_count": 11, "id": "c50b080b-7179-4bbe-b234-83ce59e2d215", "metadata": { "tags": [] }, "outputs": [], "source": [ "schema = Object(\n", " id=\"action\",\n", " description=\"User is looking for sports tickets\",\n", " attributes=[\n", " Text(\n", " id=\"sport\",\n", " description=\"which sports do you want to buy tickets for?\",\n", " examples=[\n", " (\n", " \"I want to buy tickets to basketball and football games\",\n", " [\"basketball\", \"footbal\"],\n", " )\n", " ],\n", " ),\n", " Text(\n", " id=\"location\",\n", " description=\"where would you like to watch the game?\",\n", " examples=[\n", " (\"in boston\", \"boston\"),\n", " (\"in france or italy\", [\"france\", \"italy\"]),\n", " ],\n", " ),\n", " Object(\n", " id=\"price_range\",\n", " description=\"how much do you want to spend?\",\n", " attributes=[],\n", " examples=[\n", " (\"no more than $100\", {\"price_max\": \"100\", \"currency\": \"$\"}),\n", " (\n", " \"between 50 and 100 dollars\",\n", " {\"price_max\": \"100\", \"price_min\": \"50\", \"currency\": \"$\"},\n", " ),\n", " ],\n", " ),\n", " ],\n", ")" ] }, { "cell_type": "code", "execution_count": 12, "id": "404e4f1a-d316-41f2-ab94-040e22001fc4", "metadata": { "tags": [] }, "outputs": [], "source": [ "chain = create_extraction_chain(llm, schema, encoder_or_encoder_class=\"json\")" ] }, { "cell_type": "code", "execution_count": 13, "id": "73c31ace-32dd-4a33-ae39-475db6934f6d", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'action': {'sport': 'baseball',\n", " 'location': 'LA area',\n", " 'price_range': {'price_max': '100', 'currency': '$'}}}" ] }, "execution_count": 13, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\"I want to buy tickets for a baseball game in LA area under $100\")[\"data\"]" ] }, { "cell_type": "code", "execution_count": 14, "id": "78e3b3af-bfa8-4503-854a-b83a7f8f49e6", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'action': {'sport': 'celtics',\n", " 'location': 'boston',\n", " 'price_range': {'price_min': '20', 'price_max': '40', 'currency': '$'}}}" ] }, "execution_count": 14, "metadata": {}, "output_type": "execute_result" } ], "source": [ "chain.run(\n", " \"I want to see a celtics game in boston somewhere between 20 and 40 dollars per ticket\"\n", ")[\"data\"]" ] }, { "cell_type": "markdown", "id": "20ddd000-df00-4014-9433-fba85181ba46", "metadata": {}, "source": [ "## Company Search\n", "\n", "**ATTENTION** This is a demo that shows how to build a complex schema to run a company search that matches different criteria.\n", "\n", "However, using this format for issuing database queries (e.g., by translating the JSON into SQL) will only works well for simple queries. \n", "\n", "There's a better way to leverage LLMs to issue database queries, and support for that may be added to the package in the future." ] }, { "cell_type": "code", "execution_count": 15, "id": "2b0bcf09-a3ae-4a8a-9ce3-f86834ce6ca2", "metadata": { "tags": [] }, "outputs": [], "source": [ "company_name = Text(\n", " id=\"company_name\",\n", " description=\"what is the name of the company you want to find\",\n", " many=True,\n", " examples=[\n", " (\"Apple inc\", \"Apple inc\"),\n", " (\"largest 10 banks in the world\", \"\"),\n", " (\"microsoft and apple\", \"microsoft,apple\"),\n", " ],\n", ")\n", "\n", "industry_name = Text(\n", " id=\"industry_name\",\n", " description=\"what is the name of the company's industry\",\n", " many=True,\n", " examples=[\n", " (\"companies in the steel manufacturing industry\", \"steel manufacturing\"),\n", " (\"large banks\", \"banking\"),\n", " (\"military companies\", \"defense\"),\n", " (\"chinese companies\", \"\"),\n", " (\"companies that cell cigars\", \"cigars\"),\n", " ],\n", ")\n", "\n", "geography_name = Text(\n", " id=\"geography_name\",\n", " description=\"where is the company based?\",\n", " examples=[\n", " (\"chinese companies\", \"china\"),\n", " (\"companies based in france\", \"france\"),\n", " (\"LaMaple was based in france, italy\", [\"france\", \"italy\"]),\n", " (\"italy\", \"\"),\n", " ],\n", ")\n", "\n", "foundation_date = Text(\n", " id=\"foundation_date\",\n", " description=\"Foundation date of the company\",\n", " examples=[(\"companies founded in 2023\", \"2023\")],\n", ")\n", "\n", "attribute_filter = Object(\n", " id=\"attribute_filter\",\n", " many=True,\n", " description=(\n", " \"Filter by a value of an attribute using a binary expression. Specify the\"\n", " \" attribute's name, an operator (>, <, =, !=, >=, <=, in, not in) and a value.\"\n", " ),\n", " attributes=[],\n", " examples=[\n", " (\n", " \"Companies with revenue > 100\",\n", " {\n", " \"attribute\": \"revenue\",\n", " \"op\": \">\",\n", " \"value\": \"100\",\n", " },\n", " ),\n", " (\n", " \"number of employees between 50 and 1000\",\n", " {\"attribute\": \"employees\", \"op\": \"in\", \"value\": [\"50\", \"1000\"]},\n", " ),\n", " (\n", " \"blue or green color\",\n", " {\n", " \"attribute\": \"color\",\n", " \"op\": \"in\",\n", " \"value\": [\"blue\", \"green\"],\n", " },\n", " ),\n", " (\n", " \"companies that do not sell in california\",\n", " {\n", " \"attribute\": \"geography-sales\",\n", " \"op\": \"not in\",\n", " \"value\": \"california\",\n", " },\n", " ),\n", " ],\n", ")\n", "\n", "sales_geography = Text(\n", " id=\"geography_sales\",\n", " description=\"where is the company doing sales? Please use a single country name.\",\n", " many=True,\n", " examples=[\n", " (\"companies with sales in france\", \"france\"),\n", " (\"companies that sell their products in germany\", \"germany\"),\n", " (\"france, italy\", \"\"),\n", " ],\n", ")\n", "\n", "attribute_selection_block = Text(\n", " id=\"attribute_selection\",\n", " description=\"Asking to see the value of one or more attributes\",\n", " many=True,\n", " examples=[\n", " (\"What is the revenue of tech companies?\", \"revenue\"),\n", " (\"market cap of apple?\", \"market cap\"),\n", " (\"number of employees of largest company\", \"number of employees\"),\n", " (\"what are the revenue and market cap of apple\", [\"revenue\", \"market cap\"]),\n", " (\n", " \"share price and number of shares of indian companies\",\n", " [\"share price\", \"number of shares\"],\n", " ),\n", " ],\n", ")\n", "\n", "sort_by_attribute_block = Object(\n", " id=\"sort_block\",\n", " description=(\n", " \"Use to request to sort the results by a particular attribute. \"\n", " \"Can specify the direction\"\n", " ),\n", " attributes=[\n", " Text(id=\"direction\", description=\"The direction of the sort\"),\n", " Text(id=\"attribute\", description=\"The sort attribute\"),\n", " ],\n", " examples=[\n", " (\n", " \"Largest by market-cap tech companies\",\n", " {\"direction\": \"descending\", \"attribute\": \"market-cap\"},\n", " ),\n", " (\n", " \"sort by companies with smallest revenue \",\n", " {\"direction\": \"ascending\", \"attribute\": \"revenue\"},\n", " ),\n", " ],\n", ")\n", "\n", "schema = Object(\n", " id=\"search_for_companies\",\n", " description=\"Search for companies matching the following criteria.\",\n", " attributes=[\n", " company_name,\n", " geography_name,\n", " foundation_date,\n", " industry_name,\n", " sales_geography,\n", " attribute_filter,\n", " attribute_selection_block,\n", " sort_by_attribute_block,\n", " ],\n", ")" ] }, { "cell_type": "markdown", "id": "ee6725a8-246b-4163-a657-5f3eddbf5d2b", "metadata": {}, "source": [ "**ATTENTION** Some of the queries below fail. One common reason is that more examples could be useful to show the model how to group objects together. Pay attention to failures!" ] }, { "cell_type": "code", "execution_count": 16, "id": "7b389d20-ae6b-4764-9209-3cd3c2f0a715", "metadata": { "tags": [] }, "outputs": [], "source": [ "chain = create_extraction_chain(llm, schema, encoder_or_encoder_class=\"json\")" ] }, { "cell_type": "markdown", "id": "3c59de74-1cb6-4644-9bd9-526669dd7fa4", "metadata": {}, "source": [ "Confirm that we're not getting **false** positives" ] }, { "cell_type": "code", "execution_count": 17, "id": "b203ac4a-4f9f-45c6-a509-39b9a6cfd98f", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{}" ] }, "execution_count": 17, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = (\n", " \"Today Alice MacDonald is turning sixty days old. She had blue eyes. \"\n", " \"Bob is turning 10 years old. His eyes were bright red.\"\n", ")\n", "chain.run(text)[\"data\"]" ] }, { "cell_type": "code", "execution_count": 18, "id": "398377bf-5d30-4b4c-b637-e9af969d16a4", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'search_for_companies': {'attribute_filter': [{'attribute': 'market cap',\n", " 'op': '>',\n", " 'value': '1 million'},\n", " {'attribute': 'employees', 'op': 'in', 'value': ['20', '50']}],\n", " 'attribute_selection': ['revenue', 'eps']}}" ] }, "execution_count": 18, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = (\n", " \"revenue, eps of indian companies that have market cap of over 1 million, and\"\n", " \" and between 20-50 employees\"\n", ")\n", "chain.run(text)[\"data\"]" ] }, { "cell_type": "code", "execution_count": 19, "id": "2a620246-4c85-4256-8f58-0acbcc9455a3", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'search_for_companies': {'attribute_filter': [{'attribute': 'building_color',\n", " 'op': 'in',\n", " 'value': ['red', 'blue']}]}}" ] }, "execution_count": 19, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"companies that own red and blue buildings\"\n", "chain.run(text)[\"data\"]" ] }, { "cell_type": "code", "execution_count": 20, "id": "4745517e-507e-4d1a-97e0-d143fa34cea2", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'search_for_companies': {'geography_name': 'germany',\n", " 'sort_block': {'direction': 'descending',\n", " 'attribute': 'number of employees'},\n", " 'attribute_selection': ['revenue']}}" ] }, "execution_count": 20, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = \"revenue of largest german companies sorted by number of employees\"\n", "chain.run(text)[\"data\"]" ] }, { "cell_type": "code", "execution_count": 25, "id": "b206407f-57e0-4212-8e75-970cb49b52e5", "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "{'search_for_companies': {'attribute_filter': [{'attribute': 'market cap',\n", " 'op': '>',\n", " 'value': '1000000'},\n", " {'attribute': 'building color', 'op': 'in', 'value': ['red', 'blue']}],\n", " 'attribute_selection': ['revenue', 'eps']}}" ] }, "execution_count": 25, "metadata": {}, "output_type": "execute_result" } ], "source": [ "text = (\n", " \"revenue, eps of indian companies that have market cap of over 1 million, \"\n", " \"that own red and blue buildings\"\n", ")\n", "chain.run(text)[\"data\"]" ] }, { "cell_type": "code", "execution_count": 22, "id": "a1025f99-eb0a-4d96-923e-35f36e4ac6b2", "metadata": { "tags": [] }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Your goal is to extract structured information from the user's input that matches the form described below. When extracting information please make sure it matches the type information exactly. Do not add any attributes that do not appear in the schema shown below.\n", "\n", "```TypeScript\n", "\n", "search_for_companies: { // Search for companies matching the following criteria.\n", " company_name: Array // what is the name of the company you want to find\n", " geography_name: string // where is the company based?\n", " foundation_date: string // Foundation date of the company\n", " industry_name: Array // what is the name of the company's industry\n", " geography_sales: Array // where is the company doing sales? Please use a single country name.\n", " attribute_filter: Array<{ // Filter by a value of an attribute using a binary expression. Specify the attribute's name, an operator (>, <, =, !=, >=, <=, in, not in) and a value.\n", " }>\n", " attribute_selection: Array // Asking to see the value of one or more attributes\n", " sort_block: { // Use to request to sort the results by a particular attribute. Can specify the direction\n", " direction: string // The direction of the sort\n", " attribute: string // The sort attribute\n", " }\n", "}\n", "```\n", "\n", "\n", "Please output the extracted information in JSON format. Do not output anything except for the extracted information. Do not add any clarifying information. Do not add any fields that are not in the schema. If the text contains attributes that do not appear in the schema, please ignore them. All output must be in JSON format and follow the schema specified above. Wrap the JSON in tags.\n", "\n", "\n", "\n", "Input: Apple inc\n", "Output: {\"search_for_companies\": {\"company_name\": [\"Apple inc\"]}}\n", "Input: largest 10 banks in the world\n", "Output: {}\n", "Input: microsoft and apple\n", "Output: {\"search_for_companies\": {\"company_name\": [\"microsoft,apple\"]}}\n", "Input: chinese companies\n", "Output: {\"search_for_companies\": {\"geography_name\": \"china\"}}\n", "Input: companies based in france\n", "Output: {\"search_for_companies\": {\"geography_name\": \"france\"}}\n", "Input: LaMaple was based in france, italy\n", "Output: {\"search_for_companies\": {\"geography_name\": [\"france\", \"italy\"]}}\n", "Input: italy\n", "Output: {}\n", "Input: companies founded in 2023\n", "Output: {\"search_for_companies\": {\"foundation_date\": \"2023\"}}\n", "Input: companies in the steel manufacturing industry\n", "Output: {\"search_for_companies\": {\"industry_name\": [\"steel manufacturing\"]}}\n", "Input: large banks\n", "Output: {\"search_for_companies\": {\"industry_name\": [\"banking\"]}}\n", "Input: military companies\n", "Output: {\"search_for_companies\": {\"industry_name\": [\"defense\"]}}\n", "Input: chinese companies\n", "Output: {}\n", "Input: companies that cell cigars\n", "Output: {\"search_for_companies\": {\"industry_name\": [\"cigars\"]}}\n", "Input: companies with sales in france\n", "Output: {\"search_for_companies\": {\"geography_sales\": [\"france\"]}}\n", "Input: companies that sell their products in germany\n", "Output: {\"search_for_companies\": {\"geography_sales\": [\"germany\"]}}\n", "Input: france, italy\n", "Output: {}\n", "Input: Companies with revenue > 100\n", "Output: {\"search_for_companies\": {\"attribute_filter\": [{\"attribute\": \"revenue\", \"op\": \">\", \"value\": \"100\"}]}}\n", "Input: number of employees between 50 and 1000\n", "Output: {\"search_for_companies\": {\"attribute_filter\": [{\"attribute\": \"employees\", \"op\": \"in\", \"value\": [\"50\", \"1000\"]}]}}\n", "Input: blue or green color\n", "Output: {\"search_for_companies\": {\"attribute_filter\": [{\"attribute\": \"color\", \"op\": \"in\", \"value\": [\"blue\", \"green\"]}]}}\n", "Input: companies that do not sell in california\n", "Output: {\"search_for_companies\": {\"attribute_filter\": [{\"attribute\": \"geography-sales\", \"op\": \"not in\", \"value\": \"california\"}]}}\n", "Input: What is the revenue of tech companies?\n", "Output: {\"search_for_companies\": {\"attribute_selection\": [\"revenue\"]}}\n", "Input: market cap of apple?\n", "Output: {\"search_for_companies\": {\"attribute_selection\": [\"market cap\"]}}\n", "Input: number of employees of largest company\n", "Output: {\"search_for_companies\": {\"attribute_selection\": [\"number of employees\"]}}\n", "Input: what are the revenue and market cap of apple\n", "Output: {\"search_for_companies\": {\"attribute_selection\": [\"revenue\", \"market cap\"]}}\n", "Input: share price and number of shares of indian companies\n", "Output: {\"search_for_companies\": {\"attribute_selection\": [\"share price\", \"number of shares\"]}}\n", "Input: Largest by market-cap tech companies\n", "Output: {\"search_for_companies\": {\"sort_block\": {\"direction\": \"descending\", \"attribute\": \"market-cap\"}}}\n", "Input: sort by companies with smallest revenue \n", "Output: {\"search_for_companies\": {\"sort_block\": {\"direction\": \"ascending\", \"attribute\": \"revenue\"}}}\n", "Input: [user_input]\n", "Output:\n" ] } ], "source": [ "print(chain.prompt.format_prompt(\"[user_input]\").to_string())" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.10.12" } }, "nbformat": 4, "nbformat_minor": 5 }