∞ Chat Engine¶

Documentation¶

Class name: LLMChatBot
Category: SALT/Language Toolkit/Querying
Output node: False

The LLMChatBot node is designed to facilitate interactive chat sessions using large language models (LLMs). It dynamically generates responses based on user inputs, context, and optionally, a set of documents to enrich the conversation. This node leverages advanced natural language processing techniques to understand and engage in human-like dialogue, making it suitable for applications requiring conversational AI capabilities.

Input types¶

Required¶

llm_model
- A dictionary specifying the large language model to be used for the chat session. It includes configurations such as the embedding model and the main language model itself, which are crucial for generating responses.
- Comfy dtype: LLM_MODEL
- Python dtype: Dict[str, Any]
llm_context
- An optional context that can be provided to tailor the chat session's environment, influencing how responses are generated by the model.
- Comfy dtype: LLM_CONTEXT
- Python dtype: Dict[str, Any]
prompt
- The user's input message to which the chatbot will respond. This is a key component in initiating the dialogue flow.
- Comfy dtype: STRING
- Python dtype: str

Optional¶

reset_engine
- A flag to reset the chat engine, clearing any existing context or history to start a fresh conversation.
- Comfy dtype: BOOLEAN
- Python dtype: bool
user_nickname
- An optional nickname for the user, used to personalize the chat session.
- Comfy dtype: STRING
- Python dtype: str
system_nickname
- An optional nickname for the system or chatbot, enhancing the conversational experience.
- Comfy dtype: STRING
- Python dtype: str
char_per_token
- Defines the average number of characters per token, used for tokenization purposes in the chat session.
- Comfy dtype: INT
- Python dtype: int
documents
- An optional list of documents that can be used to provide additional context or information for generating responses. Each document is structured to include text and potentially additional metadata.
- Comfy dtype: DOCUMENT
- Python dtype: List[Dict[str, Any]]

Output types¶

chat_history
- Comfy dtype: STRING
- A record of the chat session, including both user and system messages, maintained to provide context for the conversation.
- Python dtype: List[Dict[str, str]]
response
- Comfy dtype: STRING
- The generated response from the chatbot to the user's input, showcasing the chatbot's ability to engage in meaningful dialogue.
- Python dtype: str
chat_token_count
- Comfy dtype: INT
- The cumulative count of tokens used in the chat session, helping to monitor and manage the session's complexity.
- Python dtype: int

Usage tips¶

Infra type: GPU
Common nodes: unknown

Source code¶

class LLMChatBot:
    def __init__(self):
        self.chat_history = []
        self.history = []
        self.token_map = {} 

    @classmethod
    def INPUT_TYPES(cls):
        return {
            "required": {
                "llm_model": ("LLM_MODEL", ),
                "llm_context": ("LLM_CONTEXT", ),
                "prompt": ("STRING", {"multiline": True, "dynamicPrompt": False}),
            },
            "optional": {
                "reset_engine": ("BOOLEAN", {"default": False}),
                "user_nickname": ("STRING", {"default": "User"}),
                "system_nickname": ("STRING", {"default": "Assistant"}),
                "char_per_token": ("INT", {"min": 1, "default": 4}),
                "documents": ("DOCUMENT", )
            }
        }

    RETURN_TYPES = ("STRING", "STRING", "INT")
    RETURN_NAMES = ("chat_history", "response", "chat_token_count")

    FUNCTION = "chat"
    CATEGORY = f"{MENU_NAME}/{SUB_MENU_NAME}/Querying"

    def chat(self, llm_model: Dict[str, Any], llm_context:Any, prompt: str, reset_engine:bool = False, user_nickname:str = "User", system_nickname:str = "Assistant", char_per_token:int = 4, documents: List[Document] = None) -> str:

        if reset_engine:
            self.chat_history.clear()
            self.history.clear()
            self.token_map.clear()

        max_tokens = llm_model.get("max_tokens", 4096)
        using_mock_tokenizer = False
        try:
            tokenizer = tiktoken.encoding_for_model(llm_model.get('llm_name', 'gpt-3-turbo'))
        except Exception:
            using_mock_tokenizer = True
            tokenizer = MockTokenizer(max_tokens, char_per_token=char_per_token)

        if not self.chat_history:
            system_prompt = getattr(llm_model['llm'], "system_prompt", None)
            if system_prompt not in (None, ""):
                initial_msg = ChatMessage(role=MessageRole.SYSTEM, content=system_prompt)
                self.chat_history.append(initial_msg)
                self.token_map[0] = tokenizer.encode(system_prompt)

        # Tokenize and count initial tokens
        cumulative_token_count = 0
        for index, message in enumerate(self.chat_history):
            if index not in self.token_map:
                self.token_map[index] = tokenizer.encode(message.content)
            #if not using_mock_tokenizer:
            cumulative_token_count += len(self.token_map[index])
            #else:
            #    cumulative_token_count += tokenizer.count(self.token_map[index])

        # Prune messages from the history if over max_tokens
        index = 0
        while cumulative_token_count > max_tokens and index < len(self.chat_history):
            tokens = self.token_map[index]
            token_count = len(tokens) #if not using_mock_tokenizer else tokenizer.count(tokens)
            if token_count > 1:
                tokens.pop(0)
                self.chat_history[index].content = tokenizer.decode(tokens)
                cumulative_token_count -= 1
            else:
                cumulative_token_count -= token_count
                self.chat_history.pop(index)
                self.token_map.pop(index)
                for old_index in list(self.token_map.keys()):
                    if old_index > index:
                        self.token_map[old_index - 1] = self.token_map.pop(old_index)
                continue
            index += 1

        history_string = ""
        reply_string = ""

        # Build prior history string
        for history in self.history:
            user, assistant, timestamp = history
            history_string += f"""[{user_nickname}]: {history[user]}

[{system_nickname}]: {history[assistant]}

"""
        # Spoof documents -- Why can't we just talk to a modeL?
        input_documents = [Document(text="null", extra_info={})] 
        if documents is not None:
            input_documents = documents

        index = VectorStoreIndex.from_documents(
            input_documents, 
            service_context=llm_context,
            transformations=[SentenceSplitter(chunk_size=1024, chunk_overlap=20)]
        )
        chat_engine = index.as_chat_engine(chat_mode="best")

        response = chat_engine.chat(prompt, chat_history=self.chat_history)

        response_dict = {
            user_nickname: prompt, 
            system_nickname: response.response,
            "timestamp": str(time.time())
        }

        user_cm = ChatMessage(role=MessageRole.USER, content=prompt)
        system_cm = ChatMessage(role=MessageRole.SYSTEM, content=response.response)
        self.chat_history.append(user_cm)
        self.chat_history.append(system_cm)

        self.history.append(response_dict)

        reply_string = response.response

        history_string += f"""[{user_nickname}]: {prompt}

[{system_nickname}]: {response.response}"""

        return (history_string, reply_string, cumulative_token_count)