kor.encoders package
Contents
kor.encoders package#
Submodules#
kor.encoders.csv_data module#
Module that contains Kor flavored encoders/decoders for CSV data.
The code will need to eventually support handling some form of nested objects, via either JSON encoded column values or by breaking down nested attributes into additional columns (likely both methods).
- class kor.encoders.csv_data.CSVEncoder(node: kor.nodes.AbstractSchemaNode, use_tags: bool = False)[source]#
Bases:
kor.encoders.typedefs.SchemaBasedEncoder
CSV encoder.
kor.encoders.encode module#
- kor.encoders.encode.encode_examples(examples: Sequence[Tuple[str, str]], encoder: kor.encoders.typedefs.Encoder, input_formatter: Union[Literal['text_prefix'], Literal['triple_quotes'], None, Callable[[str], str]] = None) List[Tuple[str, str]] [source]#
Encode the output using the given encoder.
- kor.encoders.encode.format_text(text: str, input_formatter: Union[Literal['text_prefix'], Literal['triple_quotes'], None, Callable[[str], str]] = None) str [source]#
An encoder for the input text.
- Parameters
text – the text to encode
input_formatter – the formatter to use for the input * None: use for single sentences or single paragraphs, no formatting * triple_quotes: surround input with “””, use for long text * text_prefix: same as triple_quote but with `TEXT: ` prefix * Callable: user provided function
- Returns
The encoded text if it was encoded
- kor.encoders.encode.initialize_encoder(encoder_or_encoder_class: Union[Type[kor.encoders.typedefs.Encoder], kor.encoders.typedefs.Encoder, str], schema: kor.nodes.AbstractSchemaNode, **kwargs: Any) kor.encoders.typedefs.Encoder [source]#
Flexible way to initialize an encoder, used only for top level API.
- Parameters
encoder_or_encoder_class – Either an encoder instance, an encoder class or a string representing the encoder class.
schema – The schema to use for the encoder.
**kwargs – Keyword arguments to pass to the encoder class.
- Returns
An encoder instance
kor.encoders.json_data module#
JSON encoder and decoder.
- class kor.encoders.json_data.JSONEncoder(use_tags: bool = True, ensure_ascii: bool = False)[source]#
Bases:
kor.encoders.typedefs.Encoder
JSON encoder and decoder.
The encoder by default adds additional <json> tags around the JSON output,
Additional tags are added to the output to help identify the JSON content within the LLM response and extract it.
The usage of <json> tags is similar to the usage of
`JSON and `
marks.Examples
from kor import JSONEncoder json_encoder = JSONEncoder(use_tags=True) data = {"name": "Café"} json_encoder.encode(data) # '<json>{"name": "Café"}</json>' json_encoder = JSONEncoder(use_tags=True, ensure_ascii=True) data = {"name": "Café"} json_encoder.encode(data) # '<json>{"name": "Caf\u00e9"}</json>'
- decode(text: str) Any [source]#
Decode the text as JSON.
If the encoder is using tags, the <json> content is identified within the text and then is decoded.
- Parameters
text – the text to be decoded
- Returns
The decoded JSON data.
kor.encoders.typedefs module#
Type-definitions for encoders.
This file only contains the interface for encoders.
Added a pre-built format instruction segment.
May remove it at some point later or modify it if we discover that there are many ways of phrasing the format instructions.
- class kor.encoders.typedefs.Encoder[source]#
Bases:
abc.ABC
Abstract interface for an encoder.
The encoder is responsible for encoding and decoding the Output portion of examples provided to the LLM.
It must implement a method called get_instruction_segment that contains instructions for the LLM on how to format its output.
- class kor.encoders.typedefs.SchemaBasedEncoder(node: kor.nodes.AbstractSchemaNode, **kwargs: Any)[source]#
Bases:
kor.encoders.typedefs.Encoder
,abc.ABC
Abstract interface for an encoder that has the data schema.
Inherit from this encoder if the encoder needs to know the schema of the data that’s being encoded.
kor.encoders.utils module#
kor.encoders.xml module#
- class kor.encoders.xml.XMLEncoder[source]#
Bases:
kor.encoders.typedefs.Encoder
Experimental XML encoder to encode and decode data.
Warning
This encoder is not recommended for usage, at least not without further benchmarking for your use-case.
The decoder re-interprets all data types as lists, which makes validating and using parser results more involved. It’s unclear whether the encoder offers more advantages over other encoders (e.g., JSON or CSV).
The encoder would encode the following dictionary
{ "color": ["red", "blue"], "height": ["6.1"], "width": ["3"], }
As:
<color>red</color><height>6.1</height><width>3</width><color>blue</color>
A tag be repeated multiple times to represent multiple list elements.
Module contents#
Declare public interface for encoders.
An encoder follows the Encoder interface.
It can encode, decode and contains instructions about the encoding format for an LLM.
- class kor.encoders.CSVEncoder(node: kor.nodes.AbstractSchemaNode, use_tags: bool = False)[source]#
Bases:
kor.encoders.typedefs.SchemaBasedEncoder
CSV encoder.
- class kor.encoders.Encoder[source]#
Bases:
abc.ABC
Abstract interface for an encoder.
The encoder is responsible for encoding and decoding the Output portion of examples provided to the LLM.
It must implement a method called get_instruction_segment that contains instructions for the LLM on how to format its output.
- class kor.encoders.JSONEncoder(use_tags: bool = True, ensure_ascii: bool = False)[source]#
Bases:
kor.encoders.typedefs.Encoder
JSON encoder and decoder.
The encoder by default adds additional <json> tags around the JSON output,
Additional tags are added to the output to help identify the JSON content within the LLM response and extract it.
The usage of <json> tags is similar to the usage of
`JSON and `
marks.Examples
from kor import JSONEncoder json_encoder = JSONEncoder(use_tags=True) data = {"name": "Café"} json_encoder.encode(data) # '<json>{"name": "Café"}</json>' json_encoder = JSONEncoder(use_tags=True, ensure_ascii=True) data = {"name": "Café"} json_encoder.encode(data) # '<json>{"name": "Caf\u00e9"}</json>'
- decode(text: str) Any [source]#
Decode the text as JSON.
If the encoder is using tags, the <json> content is identified within the text and then is decoded.
- Parameters
text – the text to be decoded
- Returns
The decoded JSON data.
- class kor.encoders.SchemaBasedEncoder(node: kor.nodes.AbstractSchemaNode, **kwargs: Any)[source]#
Bases:
kor.encoders.typedefs.Encoder
,abc.ABC
Abstract interface for an encoder that has the data schema.
Inherit from this encoder if the encoder needs to know the schema of the data that’s being encoded.
- class kor.encoders.XMLEncoder[source]#
Bases:
kor.encoders.typedefs.Encoder
Experimental XML encoder to encode and decode data.
Warning
This encoder is not recommended for usage, at least not without further benchmarking for your use-case.
The decoder re-interprets all data types as lists, which makes validating and using parser results more involved. It’s unclear whether the encoder offers more advantages over other encoders (e.g., JSON or CSV).
The encoder would encode the following dictionary
{ "color": ["red", "blue"], "height": ["6.1"], "width": ["3"], }
As:
<color>red</color><height>6.1</height><width>3</width><color>blue</color>
A tag be repeated multiple times to represent multiple list elements.
- kor.encoders.encode_examples(examples: Sequence[Tuple[str, str]], encoder: kor.encoders.typedefs.Encoder, input_formatter: Union[Literal['text_prefix'], Literal['triple_quotes'], None, Callable[[str], str]] = None) List[Tuple[str, str]] [source]#
Encode the output using the given encoder.
- kor.encoders.initialize_encoder(encoder_or_encoder_class: Union[Type[kor.encoders.typedefs.Encoder], kor.encoders.typedefs.Encoder, str], schema: kor.nodes.AbstractSchemaNode, **kwargs: Any) kor.encoders.typedefs.Encoder [source]#
Flexible way to initialize an encoder, used only for top level API.
- Parameters
encoder_or_encoder_class – Either an encoder instance, an encoder class or a string representing the encoder class.
schema – The schema to use for the encoder.
**kwargs – Keyword arguments to pass to the encoder class.
- Returns
An encoder instance