kor.documents package
Contents
kor.documents package#
Submodules#
kor.documents.html module#
Load and chunk HTMLs with potential pre-processing to clean the html.
- class kor.documents.html.MarkdownifyHTMLProcessor(tags_to_remove: Tuple[str, ...] = ('svg', 'img', 'script', 'style'))[source]#
Bases:
kor.documents.typedefs.AbstractDocumentProcessor
A preprocessor to clean HTML and convert to markdown using markdownify.