At Codete we combine various technologies in order to maximize efficiency of software solutions we create. Start of each project is preceded by discussions on the architecture and technology stack that should best fit project requirements. These discussions are held by our tech experts with different technological backgrounds, therefore, proposed solutions are always tailor-made and diversified.

World Tweeting Tendencies

Overview

We are going to describe and use an architecture for efficient processing and presenting data in real-time, that we have had designed and implemented for one of our clients. As a use-case example of this architecture, for the sake of this article, we have created a web application that uses Twitter[1] as a real-time data provider and Google Maps[2] as a presentation layer. We used Twitter’s geolocalization feature to dynamically visualize the incoming tweets on the map and observe world tweeting tendency in real-time.
In this part of the article we will describe the architecture and some implementation details of the application.

Architecture

The main goal of the application is to process the incoming, real-time data from Twitter and efficiently provide the data to the web browsers where it is visualized on the map. To handle this smoothly, an appropriate technology stack had to be adopted. Below, there is a description of each meaningful part of the designed architecture.

Spark

When it comes to lightning fast processing of huge amounts of data, there is only one answer nowadays: Apache Spark[3]. It’s an advanced, highly efficient execution framework for performing batch computation, machine learning and real-time stream processing. Additionally, Spark has a built-in integration with Twitter Streaming API. Therefore, it’s virtually effortless to access the data published on Twitter in real-time.

Scala

As the implementation language we could choose one of the following: Scala, Java, Python. It’s because only these are compatible with Spark. And we chose… Scala!
Comparing with Python, Scala is generally faster and perfectly supports Spark Streaming features, while Python support for them is still evolving[4]. Moreover, Spark itself is built on Scala, therefore there is no underlying middleware wrappers like in case of communication between Python environment and JVM.
On the other hand, in comparison with Java, Scala is just much more concise and is a functional programming language, which in terms of data processing, is usually very convenient.

Play Framework

Play[5] is a scala web framework. It’s well supported, developer-friendly and scales well, because it is built on top of Akka. These features made Play attractive and matching our needs.

WebSockets

Communication between front- and backend is based on WebSockets. This approach perfectly fits our use-case. We need to push to the browser a few updates per second. WebSockets reuse one opened connection to handle bidirectional, single-socket communication. This means that the whole overhead and latency, which is bound to other solutions like HTTP polling or AJAX, is significantly reduced.

Akka

For message dispatching and asynchronous communication with web app users we used Akka[6] actors. They are an intuitive option when using WebSocket-based communication in Scala.
Having chosen Play as a web framework, we can use Akka features out-of-the-box.

Frontend

Frontend of the application is very simple. We use HTML (within Play’s ScalaTemplates[7]), CSS and JavaScript. In order to visualize tweets on a map, we go with Google Maps API.

The application

As we have already described the concept and the architecture, let us now go into details. Below there is a diagram that reflects implementation entities and will serve us to describe how the application is built.

Prerequisites

In order to run the application there are three prerequisites to be done before:

  • Install the Play Framework
  • Create a Twitter App
  • Enable Google Maps JavaScript API

To create a Twitter App visit: https://apps.twitter.com/. Then make sure that you got the following Twitter App properties written down:

  • Consumer Key
  • Consumer Secret
  • Access Token
  • Access Token Secret

Fill these values into a proper place in Application.scala.

Enable Google Maps JavaScript API in Google Developer Console. Then create an API key in the tab Credentials and replace a placeholder in index.scala.html file with this key.

Implementation details

Here, we will describe how the application works inside. You may find the source code of the application here.

First, few seconds after application startup, the system sends a message to the SparkActor to initialize it and start data gathering.

This creates application’s SparkStreamingContext, registers a job to be performed on the incoming tweets and runs data gathering. For each incoming package of tweets, we filter out those which do not have any geolocation information attached and send all the remaining tweets as a MessagePackage to the MapActor.

MapActor dispatches messages among all currently connected users. When a user first connects to the application, he is assigned a UID, which is stored in Play Framework’s Session scope (which in turn is underneath based on cookies). Along with the UID, a corresponding UserActor instance is created for that user. UserActor is responsible for communication with the user using the WebSocket connection. Right after creation, each UserActor notifies the MapActor to subscribe to it as a message receiver.

Frontend part of the application is located in views and assets packages. We will not go through this part of the code, as it is rather straightforward.

Summary

In this part of the article, we described the use-case we aim to cover, the architecture that we used to achieve that and shortly went through the implementation points of the application.
In the next part of the article we are going to focus on more eye-catching things. Namely, we are going to play with the app, see how does it work and draw some conclusions.

Stay tuned!

References:

  1. https://twitter.com/
  2. https://developers.google.com/maps/
  3. http://spark.apache.org/
  4. https://www.linkedin.com/pulse/why-i-choose-scala-apache-spark-project-lan-jiang
  5. https://playframework.com/
  6. http://akka.io/
  7. https://www.playframework.com/documentation/2.5.x/ScalaTemplates

Java Developer

I am an eager fan of fresh approach to Java programming, especially Java 8 features and annotation-based Spring stack. I enjoy solving Java quirks & gotchas, algorithmic puzzles and Rubik's cube. In spare time I adore fishing and low-cost travelling.