Applying the n-gram-like model, which is a specific architecture for NLP related issues, suprisingly seems to be a good starting point for solving a problem of modeling user behaviour. As described in the previous survey of methods for data prefetching, some researches acheived an accuracy up to 60%, using it for a raw log data of HTTP server. In this article we would like to propose an extended n-gram model, adapted to a problem of predicting user’s actions.

Adapting n-grams to modeling the behaviour

N-gram-like architectures are mostly used in the field of NLP and operate on  sequences of words, letters or even whole sentences, depending on the chosen level of language modeling. Simple adaptation of such models to the problem of behaviour modeling could take both HTTP method and the URL that the request was sent to. All in all, using this approach we are able to convert the server logs into the sequences of actions without much effort.


The following snippet is a part of a NASA web server log:

The sequence of actions performed by the user with the hostname “” looks like following:

  1. GET /history/apollo/as-201/as-201-info.html
  2. GET /history/apollo/as-201/sounds/
  3. GET /icons/blank.xbm
  4. GET /icons/menu.xbm
  5. GET /history/apollo/as-201/
  6. GET /icons/text.xbm
  7. GET /icons/image.xbm

Assuming that we use 3-gram model, there would be 5 different subsequences of actions:

  1. GET /history/apollo/as-201/as-201-info.html
    GET /history/apollo/as-201/sounds/
    GET /icons/blank.xbm
  2. GET /history/apollo/as-201/sounds/
    GET /icons/blank.xbm
    GET /icons/menu.xbm
  3. GET /icons/blank.xbm
    GET /icons/menu.xbm
    GET /history/apollo/as-201/
  4. GET /icons/menu.xbm
    GET /history/apollo/as-201/
    GET /icons/text.xbm/
  5. GET /history/apollo/as-201/
    GET /icons/text.xbm
    GET /icons/image.xbm

REST API structure

Nowadays, in the era of single page applications, a majority of websites is built on the top of some API. Typically, REST HTTP API is being used, with some well-structured data format like JSON or XML. It allows to create different client applications, each one using the same backend for retrieving and storing the data. Such an architecture allows to enclose the business logic in one service, or even in a bunch of microservices, and create a whole ecosystem of web and mobile applications using these services.

A REST API delivers a collection of endpoints to the end users, which allow to retrieve different resources or perform some actions on them. By the endpoint we will define a HTTP method with, possibly parameterized, URL. Well designed REST API uses HTTP methods according to their semantics. From the perspective of typical application the following ones seem to be important:

  • GET – to retrieve some resource
  • HEAD – same like GET, but only headers are returned
  • POST – to create a new resource
  • PUT – to update existing resource
  • PATCH – a partial update of a resource
  • DELETE – to remove an existing resource

As long as we would like to predict user’s actions and execute them before they even want it, we should focus on two HTTP methods only: GET and HEAD, which should never change the state of the application data. It could be dangerous to create, change or remove the resources automatically. Let’s imagine that we recognized a typical behaviour of a particular user who have always ordered a product after checking its reviews. It might have happened a couple of times before, but making an order when a similar sequence happens for a next time is obviously not a good idea. Therefore, we will assume that only “safe” methods can be performed without a direct request coming from the user.

Generally speaking, the HTTP request is built from the request line, headers and optionally a body. Request line contains a method, path pointing to the resource and a HTTP version supported by client application. For the methods that we will consider, the body will never appear. The path, in case of REST APIs, often looks as following:


In this example, there are three parameters that should be filled by the application before sending the actual request. Assuming that we operate on the API using some structured data format, we can perform similar process for the response as well. JSON is a good example of such format, and probably most frequently used one:

Response Values
id [123]
name [“Dummy resource”]
tags.[] [“foo”, “bar”] [“John Done”] [“”]

In order to retrieve all the values returned in the response, we flatten the document to have it stored as a key-value map. Key is a path determining how to obtain the value from a document, while the value is actually a collection of values under this path.

These parameters, for request, and values, for response, will be called tokens, and the process of retrieving them is tokenization.

A proposal of n-gram architecture extension

Existing solutions apply try to predict whole URLs that a particular user may request in the nearest future. It is, in our opinion, not a perfect solution in case of well-structured application built on top of REST architecture. Every single endpoint defines a specific action performed on some kind of resource, and it seems to be reasonable to assume that some sequences of actions may be repeated many times, but with changed values of tokens. That’s why we propose to predict the flow of endpoints used one by one first, and then try to fill the tokens. Such approach may help us to generalize and find the patterns of actions, not the patterns of actions performed on particular resources.

Mapping the URLs into endpoints is quite easy, as most of the frameworks delivers a possibility to do that. We will perform n-gram modeling on endpoints, not on the URLs. However, to prepare a request, we still need to fill the request tokens somehow. Here we would assume that there can be four different situations:

  • The endpoint does not have any tokens to be filled.
  • Some tokens may be filled with the values coming from the requests or responses of previous actions.
  • Values of tokens may be filled based on external knowledge (i.e. directly by the user).
  • In order to fill the values of the tokens, some extra processing of the possessed data may be done by the user – indeed there is a relation, but it is not direct.

We will focus on the first three cases only. In order to find an indirect connection between two actions, we would have to find an algorithm used to create the values of tokens.

User model creation

In learning phase, we collect each request with matching endpoint and the response. All these elements form a session of the user. Whenever new entry in the session appears and the length of the session is greater or equal n, we update the parameters of the model. As mentioned before, we perform standard n-gram modeling for endpoints, but along with finding the relations between actions. Each last request in the n-gram window is tokenized, and these tokens are then compared with the values of tokens in preceding requests and responses. Matching pairs of tokens are saved with the n-gram statistics. It covers the second case, when there is a dependency between the received data and sent request. Independently, we collect the statistics of the tokens’ values for the last endpoint in the n-gram window. Such statistics are helpful in terms of the third case – when the external source of knowledge is used.

Model evaluation

The prediction process starts after receiving a HTTP request, which is appended to the session. Last n - 1 entries are then used to find the most probable endpoint that will be used just after. This is done with standard n-gram prediction. In the next step we try to create a request using selected endpoint. At first we try to do that using collected relations to the preceding actions. Please note that current request has not generated the response yet, so if there is a relation between this response and predicted endpoint, it cannot be used. If it is not possible to fill the tokens using these relations, the statistics of token values in this particular window are taken into account. Here we used a mixed strategy – we select the most frequent values which appeared in the previous user’s interactions, but only if they exceeded a threshold, which has to be adapted somehow. If the request can be generated using chosen tokens values, it is processed in parallel. In case, using the relations is possible only when current response is available, the processing is done after the response is generated. Finally, if the prediction was done and some subrequests were processed, we collect all the responses and send them back as one HTTP response, with some vendor content type. User receives multiple response, caches them, and can load some resources without even sending HTTP request to the application server.


Simple ecommerce site could be built on top of a simple REST API would probably deliver the following endpoints (intentionally, only the ones we will use in this example are listed):

  • GET /user – gets the details of the currently logged in user
  • GET /order – lists the past orders of the user
  • GET /product/{product_id} – gets the details of the product with given id
  • GET /product?category={category_id} – lists the products from the category with given id

The table shows an exemplary interaction of a particular user:

Request Response
GET /user
GET /order
GET /product/1
GET /product/2
GET /product/3
GET /product?category=A

As shown, user gets their personal information, lists all past orders, calls for the details of some products and, finally, lists all the products from a selected category. There is also a couple of things which are interesting regarding our previous assumptions:

  • the first action that user performs directly after calling for their personal details is to get the list of all past orders
  • after receiving a list of all past orders, user asks for the details of all products that appeared in any order
  • at the end user wants to get all the products which belong to some specific category, but its identifier does not appear in the preceding actions

All these observations actually cover the three cases mentioned before. First one can be actually solved without filling any tokens, the second one needs to take the tokens from previous response and the last one can be fulfilled using statistics of the user’s interactions. For obvious reasons, our observations are not enough to assume that our predictions are perfect yet, however with the growth of the training dataset, we could expect to improve the accuracy.


Software Engineer

I am a big fan of AI and applying machine learning methods in real-life problems, with an experience in web development and databases. Currently, I'm involved in Big Data projects as well as in internal research at Codete.