Create Your Own Custom ChatGPT-like Chatbot in Under 5 Minutes: Step-by-Step Guide Using Your Own Data and No OpenAI API

March 16, 2023

Building a Custom Chatbot Using Python and Your Company Documents

In today's digital landscape, businesses are constantly seeking ways to enhance customer service and engagement. One effective approach is creating a chatbot that can quickly and accurately address customer queries. This article provides a step-by-step guide on building a custom chatbot using Python and powerful AI tools, utilizing your own company documents.

The Process

Step 01: Prepare Your Training Data

The initial step involves gathering the documents you intend to utilize for creating the chatbot. These documents may include product manuals, FAQs, and other relevant resources for your customers. Organize these documents into a folder named 'data' and save them in a format readable by Python.

Step 02: Install the Required Libraries

To build the chatbot, you'll need specific Python libraries designed for natural language processing and machine learning. Use the following command to install the required libraries using pip:

$ pip install llama_index
$ pip install transformers
$ pip install langchain

Step 03: Import Libraries and Modules

To start the chatbot creation process, import the necessary libraries and modules. These imports will be used throughout the process. Use the following code snippet to import the required libraries:

import torch
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader, GPTListIndex, PromptHelper
from llama_index import LLMPredictor, ServiceContext, QuestionAnswerPrompt
from transformers import pipeline
from typing import Optional, List, Mapping, Any

Step 04: Define Prompt Variables

Next, define some variables that will serve as prompts for the chatbot. These variables determine the maximum input size, the desired number of output tokens, and the maximum overlap between chunks. Use the following code snippet to define these variables:

max_input_size = 2048
num_output = 256
max_chunk_overlap = 20

Step 05: Define and Use the Prompt Helper

The PromptHelper class assists in handling prompts and chunking long documents. Initialize the prompt helper by passing the previously defined prompt variables. Use the following code snippet to create the prompt helper:

prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)

Step 06: Create a Custom Language Model (LLM)

To generate responses, download and load a pre-trained language model. Use the following code snippet to define a custom LLM class that utilizes the "facebook/opt-iml-max-30b" model from Hugging Face:

class CustomLLM(LLM):
    model_name = "facebook/opt-iml-max-30b"
    pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype": torch.bfloat16})

    def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
        prompt_length = len(prompt)
        response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
        return response[prompt_length:]

    @property
    def _identifying_params(self) -> Mapping[str, Any]:
        return {"name_of_model": self.model_name}

    @property
    def _llm_type(self) -> str:
        return "custom"

Step 07: Initialize the Language Model and Service Context

Once the custom LLM is defined, initialize it and create a service context. The service context encapsulates the necessary components for the chatbot, including the LLM and prompt helper. The following code initializes the LLM and service context:

llm_predictor = LLMPredictor(llm=CustomLLM())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)

Step 08: Define the Question-Answer Prompt Template

To structure the interaction with the chatbot, define a template for the question-answer prompt. This template includes placeholders for the context information and the user's question. The code snippet below illustrates the template definition:

QA_PROMPT_TMPL = (
    "We have provided context information below. \n"
    "---------------------\n"
    "{context_str}"
    "\n---------------------\n"
    "Given this information, please answer the question: {query_str}\n"
)

QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)

Step 09: Load Training Data

To make the chatbot knowledgeable about your company, load your company documents into the chatbot's index. The code snippet below demonstrates loading data from a specified directory:

documents = SimpleDirectoryReader('./data').load_data()

Step 10: Generate the Index

Once the documents are loaded, generate an index using the GPTListIndex class. The index efficiently retrieves relevant information based on user queries. Here's the code to generate the index:

index = GPTListIndex.from_documents(documents, service_context=service_context)

Step 11: Save and Load the Index

To avoid re-indexing the documents every time the chatbot is restarted, you can save the index to disk and load it later. Here's how you can save and load the index:

index.save_to_disk('index.json')
index = GPTListIndex.load_from_disk('index.json')

Step 12: Query the Chatbot and Get a Response

Finally, interact with the chatbot by querying it with user input. The chatbot will process the query and provide a response based on the indexed company documents. The code snippet below demonstrates querying the chatbot and printing the response:

query_engine = index.as_query_engine()
response = query_engine.query("Hello, what is your function?", text_qa_template=QA_PROMPT)
print(response)

Conclusion

Congratulations! You have successfully created a chatbot that leverages your company documents to provide accurate and relevant responses to user queries. By harnessing the power of machine learning and natural language processing, this chatbot can enhance your customer service capabilities.

Remember, the provided code serves as a starting point, and you can customize and expand it according to your specific requirements. Building a chatbot is an iterative process, so feel free to experiment, improve functionality based on user feedback, and incorporate additional data.

Happy coding!

Nexus.