∞ PyMuPDF¶
Documentation¶
- Class name:
LLMPyMuPDFReader
- Category:
SALT/Language Toolkit/Readers
- Output node:
False
The LLMPyMuPDFReader node is designed to read PDF files and convert them into a llama_index Document format, leveraging the PyMuPDF library for efficient processing. This node facilitates the extraction of text and potentially metadata from PDF documents, making them accessible for further analysis or processing within the llama_index ecosystem.
Input types¶
Required¶
path
- Specifies the file path to the PDF document to be read. This path is essential for locating and accessing the file for processing.
- Comfy dtype:
STRING
- Python dtype:
str
metadata
- A boolean flag indicating whether metadata should be extracted from the PDF document alongside the text. This option allows for more comprehensive document analysis by including additional information.
- Comfy dtype:
COMBO[BOOLEAN]
- Python dtype:
bool
Optional¶
extra_info
- A string containing extra configuration or information in JSON format, which can be used to customize the reading process. This parameter allows for flexible adaptation to specific requirements.
- Comfy dtype:
STRING
- Python dtype:
str
Output types¶
documents
- Comfy dtype:
DOCUMENT
- The output is a Document object containing the extracted text (and optionally metadata) from the PDF file, ready for integration into the llama_index ecosystem.
- Python dtype:
Document
- Comfy dtype:
Usage tips¶
- Infra type:
CPU
- Common nodes: unknown
Source code¶
class LLMPyMuPDFReader(PyMuPDFReader):
"""
@NOTE: Reads PDF files into a llama_index Document using Pymu
@Source: https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/readers/llama-index-readers-file/llama_index/readers/file/pymu_pdf/base.py
@Documentation: https://docs.llamaindex.ai/en/latest/api_reference/readers/file/#llama_index.readers.file.PyMuPDFReader
"""
def __init__(self):
super().__init__()
@classmethod
def INPUT_TYPES(cls):
return {
"required": {
"path": ("STRING", {"default": ""}),
"metadata": ([False, True], {"default": True}),
},
"optional": {
"extra_info": ("STRING", {"multiline": True, "dynamicPrompts": False, "default": "{}"}),
}
}
RETURN_TYPES = ("DOCUMENT", )
RETURN_NAMES = ("documents",)
FUNCTION = "execute"
CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Readers"
def execute(self, path:str, metadata:bool, extra_info:str):
get_full_path(1, path)
if not os.path.exists(path):
raise FileNotFoundError(f"No file available at: {path}")
path = Path(path)
extra_info = read_extra_info(extra_info)
data = self.load_data(path, metadata, extra_info)
return (data, )