Chatbots are one of the solutions that are used for automation. Bots are used in many areas like fintech, e-commerce, government, and many others. Most chatbots use natural language processing methods and in many cases also machine learning methods. This article is an introduction to chatbots and can be helpful for everyone starting in chatbots. In the first section, we show the differences between types of chatbots. Tools overview is given in the second section. Recommended ones are used in the third. The pipeline of building an intelligent chatbot is proposed and explained. Finally, we give a few recommendations and show the upcoming trends in chatbots development.

Chatbot taxonomy

Chatbots can be divided into groups, depending on topics like complexity, usage or privacy. The most popular taxonomy divides chatbots into three groups:

  • rule-based chatbots,
  • retrieval-based chatbots,
  • generative-based chatbots.

Rule-based chatbots

The rule-based chatbots rely on a list of questions and corresponding answers. It can be a loose list of questions or a simple scenario with such questions where the user is asked questions one by one until the chatbot get all information needed to return a valuable response. It could be a flight booking chatbots asking about the departure date, destination and departure locations. After the response, the chatbot can ask more to filter the best fit like the number of stops, time of departure or arrival and so on. It is simple and we don’t need to use any machine learning methods and in most cases also no natural language processing is needed. We can easily rely on regular expressions or in more advanced cases use string distance metrics that can surprisingly give good results in many cases.

 

intelligent chatbots in python
Figure 1. Word2vec example in three-dimensional space

Retrieval-based chatbots

Retrieval-based chatbots rely on machine learning and word vectorization. Words are vectorized, because machine learning methods use numbers for prediction. There are several methods that can be used for vectorization. The result of vectorization allows putting each word in a feature space. In many cases, the vector consists of more values. In SpaCy the vector for the most popular model is the size of 384. It can be reduced to a two- or three-dimensional space with t-SNE or similar methods. An example of a few words in a three-dimensional space is given in figure 1. We can easily apply mathematical operations to manipulate the data. Each word that is represented as a vector can next be used to train a model. In the case of retrieval-based chatbots, we train intent prediction. It means that we are able to discover users’ intent and based on that we can build scenarios of a conversation. When it comes to machine learning methods, there is no one method that gives the best results and we should check several and compare which one gives the best results.

Generative-based chatbots

Generative-based chatbots are the most complex compared to the two previous approaches. It’s more generic, but the training requires a much bigger data set. This fact is in many cases a limitation for many companies. They cannot afford to get a big data set of examples for training. Generative-based chatbots use deep learning methods for training. If done properly, generative-based chatbots give impressive results.

There are many companies, especially in fintech, that are highly regulated or limited in the way that doesn’t allow to share the information outside of the company. In such a case, we cannot use tools like Dialogflow or any other that we cannot setup locally. Such conditions allow us to divide the chatbots into two groups:

  • online,
  • on-premise.

In other words, it divides the chatbots into ones that base on open-source tools only and chatbots that are based on any kind of tools. We can divide the chatbots also because of the role or domain. There are two simple types:

  • superbots,
  • domain-driven chatbots.

There are not so many superbots available to everyone. Google Assistant or Siri are good examples of superbots that cover many domains. Most companies focus on domain-driven chatbots that cover just one domain.

Available tools for chatbots

Tools for chatbots can be divided into three types:

  • NLU/NLP related tools that can help to understand the text,
  • libraries and frameworks for building chatbots,
  • API or online solutions that allow using tools that cannot be deployed locally.

In the first group, we have such classic libraries like NLTK or CoreNLP. Both are well-known by everyone who did something with NLP before. Currently, tools like SpaCy, TextBlob and Gensim are more popular. Especially, the first one is trending and became the new first choice for NLP tasks.

The second group is libraries that can be used for chatbots directly like Rasa. It consists of a few subprojects which combined allow to build a retrieval-based chatbot with less effort. It is an open source and delivers a few predefined pipelines. It’s easier to train a model for intent prediction using Rasa.

The last group is probably the biggest one and consists of all online tools that allow to do some basic NLP/NLU analysis, use more advanced methods and train a chatbots. The most popular tools are delivered by Google, Microsoft, Facebook and Amazon. Some allow to analyse the sentiment, other tools allow to build chatbot scenarios or build new skills or an already existing bot like Alexa. Dialogflow is a very similar solution to Rasa, but it’s not open source. We can build similar scenarios as in Rasa and feed with examples for intent recognition. It can be done through a nice web interface.

Building intelligent chatbots

Currently, most chatbots are retrieval-based. The goal here is to recognize the intent from the build scenarios. Intent recognition is just a supervised classifier that distinguishes between many classes (intents). The training phase needs a labelled data set with intents and example user input. It goes through the pipeline to get finally a model that is able to predict the intent. The topic of NLP pipeline will be covered in a separate article. Finally, we get results as given in figure 2.

Intent recognition example
Figure 2. Intent recognition example

A separate part of the training is to train the scenarios where each intent is followed by several utters, the chatbot responses. Usually, there are many scenarios that can be complementary or even overlap one another. To manage it in a proper way, usually a state machine is used, where the chatbot change the state after each action is taken.

State machine used for Rasa chatbot conversation
Figure 3. State machine used for Rasa chatbot conversation

The action is invoked by the intent. What makes the chatbot intelligent is that it can give a different answer to the same intent/question, based on the current state. In other words, it remembers what the chatbot did earlier. An example is shown in figure 3.

Current trends

For a few years now, deep learning methods have been used more often also for chatbots. There are new neural network architectures that can have an important impact on chatbots development introduced. Such example is the attention network. Most papers at NIPS and ACL conferences do not directly say about chatbots, but more about NLP methods that potentially can be used for chatbots. One of a popular topic which is recently discussed at such conferences is sentiment analysis, so we can expect an improvement in this area also for chatbots.

Upcoming trainings

We are proud to announce our upcoming online trainings at the Safari platform. There are two trainings about chatbots scheduled for three hours each. Each consists of three exercises. The first one is on building intelligent chatbots in Python and will be held on March 7th. In the second we go through a sentiment analysis pipeline and show how to build your own method of sentiment analysis.

o'reilly codete trainings o'reilly codete trainings

References

[1] Designing Bots, 1st Edition, Amir Shevat. O’Reilly 2017

[2] Data Science from Scratch, Joel Grus. O’Reilly 2015

[3] Mining the Social Web, 3rd Edition, Mikhail Klasse and; Matthew A. Russell. O’Reilly 2019

[4] Natural Language Processing with Python, Edward Loper, Ewan Klein and Steven Bird. O’Reilly 2009

 

karol.przystalski

Karol Przystalski is CTO and founder of Codete. He obtained a Ph.D in Computer Science from the Institute of Fundamental Technological Research, Polish Academy of Sciences, and was a research assistant at Jagiellonian University in Cracow. His role at Codete is focused on leading and mentoring teams.