Create Your Own Custom ChatGPT-like Chatbot in Under 5 Minutes: Step-by-Step Guide Using Your Own Data and No OpenAI API

Building a Custom Chatbot Using Python and Your Company Documents
In today's digital landscape, businesses are constantly seeking ways to enhance customer service and engagement. One effective approach is creating a chatbot that can quickly and accurately address customer queries. This article provides a step-by-step guide on building a custom chatbot using Python and powerful AI tools, utilizing your own company documents.
The Process
Step 01: Prepare Your Training Data
The initial step involves gathering the documents you intend to utilize for creating the chatbot. These documents may include product manuals, FAQs, and other relevant resources for your customers. Organize these documents into a folder named 'data' and save them in a format readable by Python.
Step 02: Install the Required Libraries
To build the chatbot, you'll need specific Python libraries designed for natural language processing and machine learning. Use the following command to install the required libraries using pip:
$ pip install llama_index
$ pip install transformers
$ pip install langchain
Step 03: Import Libraries and Modules
To start the chatbot creation process, import the necessary libraries and modules. These imports will be used throughout the process. Use the following code snippet to import the required libraries:
import torch
from langchain.llms.base import LLM
from llama_index import SimpleDirectoryReader, GPTListIndex, PromptHelper
from llama_index import LLMPredictor, ServiceContext, QuestionAnswerPrompt
from transformers import pipeline
from typing import Optional, List, Mapping, Any
Step 04: Define Prompt Variables
Next, define some variables that will serve as prompts for the chatbot. These variables determine the maximum input size, the desired number of output tokens, and the maximum overlap between chunks. Use the following code snippet to define these variables:
max_input_size = 2048
num_output = 256
max_chunk_overlap = 20
Step 05: Define and Use the Prompt Helper
The PromptHelper class assists in handling prompts and chunking long documents. Initialize the prompt helper by passing the previously defined prompt variables. Use the following code snippet to create the prompt helper:
prompt_helper = PromptHelper(max_input_size, num_output, max_chunk_overlap)
Step 06: Create a Custom Language Model (LLM)
To generate responses, download and load a pre-trained language model. Use the following code snippet to define a custom LLM class that utilizes the "facebook/opt-iml-max-30b" model from Hugging Face:
class CustomLLM(LLM):
model_name = "facebook/opt-iml-max-30b"
pipeline = pipeline("text-generation", model=model_name, device="cuda:0", model_kwargs={"torch_dtype": torch.bfloat16})
def _call(self, prompt: str, stop: Optional[List[str]] = None) -> str:
prompt_length = len(prompt)
response = self.pipeline(prompt, max_new_tokens=num_output)[0]["generated_text"]
return response[prompt_length:]
@property
def _identifying_params(self) -> Mapping[str, Any]:
return {"name_of_model": self.model_name}
@property
def _llm_type(self) -> str:
return "custom"
Step 07: Initialize the Language Model and Service Context
Once the custom LLM is defined, initialize it and create a service context. The service context encapsulates the necessary components for the chatbot, including the LLM and prompt helper. The following code initializes the LLM and service context:
llm_predictor = LLMPredictor(llm=CustomLLM())
service_context = ServiceContext.from_defaults(llm_predictor=llm_predictor, prompt_helper=prompt_helper)
Step 08: Define the Question-Answer Prompt Template
To structure the interaction with the chatbot, define a template for the question-answer prompt. This template includes placeholders for the context information and the user's question. The code snippet below illustrates the template definition:
QA_PROMPT_TMPL = (
"We have provided context information below. \n"
"---------------------\n"
"{context_str}"
"\n---------------------\n"
"Given this information, please answer the question: {query_str}\n"
)
QA_PROMPT = QuestionAnswerPrompt(QA_PROMPT_TMPL)
Step 09: Load Training Data
To make the chatbot knowledgeable about your company, load your company documents into the chatbot's index. The code snippet below demonstrates loading data from a specified directory:
documents = SimpleDirectoryReader('./data').load_data()
Step 10: Generate the Index
Once the documents are loaded, generate an index using the GPTListIndex class. The index efficiently retrieves relevant information based on user queries. Here's the code to generate the index:
index = GPTListIndex.from_documents(documents, service_context=service_context)
Step 11: Save and Load the Index
To avoid re-indexing the documents every time the chatbot is restarted, you can save the index to disk and load it later. Here's how you can save and load the index:
index.save_to_disk('index.json')
index = GPTListIndex.load_from_disk('index.json')
Step 12: Query the Chatbot and Get a Response
Finally, interact with the chatbot by querying it with user input. The chatbot will process the query and provide a response based on the indexed company documents. The code snippet below demonstrates querying the chatbot and printing the response:
query_engine = index.as_query_engine()
response = query_engine.query("Hello, what is your function?", text_qa_template=QA_PROMPT)
print(response)
Conclusion
Congratulations! You have successfully created a chatbot that leverages your company documents to provide accurate and relevant responses to user queries. By harnessing the power of machine learning and natural language processing, this chatbot can enhance your customer service capabilities.
Remember, the provided code serves as a starting point, and you can customize and expand it according to your specific requirements. Building a chatbot is an iterative process, so feel free to experiment, improve functionality based on user feedback, and incorporate additional data.
Happy coding!