Type for search...
codete Datafication Concept Definition and Examples main fc71d9a578
Codete Blog

Datafication: Concept, Definition & Examples

avatar male f667854eaa
Dawid Pacholczyk 3622ceab56

11/08/2022 |

12 min read

Piotr Wawryka,

Dawid Pacholczyk

The world we live in is highly dependent on data. Data centers all over the world are storing ever-increasing amounts of data on their massive servers. So, why are we debating datafication now?

Let's take a closer look at the essence of datafication, its implications for our current online habits, and its potential impact on the importance of data security.


Table of contents:

  1. Enter the data pool: what is datafication?
  2. Datafication for business
  3. Datafication examples
  4. Conclusion 

The concept of data

Before we get into the definition of datafication, let's first take a minute to explain the concept of data itself. In computing, data is defined as information that has been translated into a form that is efficient for movement or processing. As of today, it is a binary digital form.

A piece of information can be generated by any measurable action taken by anyone using almost anything tech-related. So, you generate data when you use your email, pay with a credit card, or unlock a personal device. Your children generate data as well while they complete another level of their favorite game, check their social media feeds, or go to a brand store with a smartphone in their pocket. Your boss generates a solid portion of data as he moves around the smart office, which is jam-packed with sensors, or when his car registration activates automatic garage doors. Your phone is also streaming data and constantly updating your location, adding it to photos, and, if you give it permission, letting other devices know where you are (learn more about beacons).

The sun or rain generates the data as well (collected by the sensors). Your smart home devices do. A tram stopped at a red light. A water heater in your basement. Even your dog can generate data if a store or a dog park installs a scanner that can read its chip number (and connect it to your customer profile). And the data is undoubtedly recorded as you both pass by a security camera at a nearby ATM – and we're not talking about the visuals only.

This leads to two important observations. 

  1. To begin with, data is an extremely abstract concept that does not exist in nature. We create it by collecting and processing data from various actions as they occur – you can think of its collection as “snapping a photo”. It catches a glimpse of reality of a simple parameter and freezes it forever. So, to observe any change you have to take a solid handful of photos and compare them, focusing on a chosen factor. But some cameras have great lenses and true colors and others shoot only black-and-white, highly contrasted frames. This is the difference between the number of details collected by sensors vs. complex devices (such a s smartphone).
  2. Secondly, the data processing possibilities are only limited by the capabilities of the devices assigned to the task and our own creativity. As a matter of fact, some devices collect far more data than expected, resulting in complex big data clusters aside to a small portion to "traditional" sorted datasets. Think of them as the “backgrounds” of your picture collection, which can also be compared, based on a variety of factors.

Those massive clusters (measured in tera-, peta-, and exa-bytes) have yet to be analyzed (and then classified) using advanced machine learning algorithms distributed across a network of computers. All while continuously acquiring new records in real time.

Data science, which combines math, programming, domain knowledge, scientific methods, algorithms, processes, and systems to make working with data easier, promises to aid in managing such massive amounts of data.

Datafication in data science

The concept of datafication was popularized by Mayer-Schoenberger and Cukier's (2013) early descriptions of "big data", and is now used by researchers to describe how digital interactions are being turned into records that can be collected, processed and finally - sold.

Keep in mind that data collection is an ongoing process that involves converting as many aspects of our lives as possible into computerized data in order to enable real-time tracking and predictive analysis. Due to the issue of continuity, gathered information is automatically collected, processed, and stored in dedicated data infrastructure, which is mostly owned by corporations or governments.

Datafication is already assisting society by monitoring weather and seismic activity, improving health care, detecting fraud schemes, and tracking students' progress. And, as the number of records grows, more and more businesses are looking for new ways to turn even more aspects of human life into a continuous source of data, with a particular emphasis on social interactions.

The primary reason for this shift is that once habits and routines are converted to data, they can be monitored, analyzed, improved, and monetized. This provides businesses with the opportunity to translate human behavior into practical knowledge that has the potential to influence customer actions and adjust core business strategy. Or, for social organizations to quickly identify those in need. In general, the more types of value that data can generate, the more valuable it is.

What matters most is that businesses can still collect a large amount of data, store it, and decide how to use it later, even if they don't use it right now. As a result, businesses can now begin collecting data on previously untraceable processes. And, once processed, they can become data-driven (being able to, i.e., reduce the risk of introducing new products or services to the market).

Datafication vs. Digitization 

According to the original Big Data article (2013) by Mayer-Schoenberger and Cukier, 

“datafication is not the same as digitization, which takes analog content—books, films, photographs—and converts it into digital information, a sequence of ones and zeros that computers can read. Datafication is a far broader activity: taking all aspects of life and turning them into data format [...] Once we datafy things, we can transform their purpose and turn the information into new forms of value.”

So, in fact, datafication is more about the process of collecting, storing, and managing customer data from real-world actions, while digitization is the process of converting chosen media into computer-ready format. 

Going back to the picture metaphor, digitization uploads it to the server, whereas datafication provides a set of analytical tools to measure its changes over a chosen period of time. 

The controversion over datafication

There have been big debates about how corporations or regions use datafication in certain areas to discriminate against people, especially those from lower-income or minority groups.

Aside from the that, here’s the list of the most frequently mentioned datafication issues:

  • Data can be accessed by anyone. The more data we collect, the more precise information on an individual we can dig up. This is already in use by the law, journalists, and some companies to run a background check on a specific person, connecting them with a specific place (and time), actions, and even ideology. Sadly, the same data can be analyzed by a hacker or spammer to perform identity theft or other forms of cybercrime.
  • Data is used to monitor every activity within its reach. Massive datasets are stored (and daily updated) on multi-store server rooms owned by tech giants (Facebook, Apple, Microsoft, Google, Amazon, Baidu, Alibaba, Xiaomi, and so on), forcing datafication on their users. Collected data is then used for paid ad personalisation within the giant’s apps/platforms, and the level of interference is usually regulated by the law. Sadly, in some regions, the government has adapted similar monitoring methods too. In others, the law is trying to shield individual autonomy from the dangers of continuous data collection (by implementing i.e., GRPR).
  • Data is a commodity. Platforms are a new kind of multi-sided datafication market. The currency is data. To produce it, tech giants bring together platform users who create data, data buyers (like advertisers and data brokers) who are willing to exchange it for real money, and service providers who profit from the release, sale, and internal use of data. Contrary to the typical goods, datasets can be not only stolen, re-sold, but also used to commit cybercrimes on users, whose records are gathered in a compromised set.
  • Data is collected globally. Data surveillance is not limited to  a region or a language. In other words, platform owners are now able to store information regarding every person on the planet who has access to the Internet. This is especially important in times of rising cybercrime, which is usually more efficient in attacks on smaller platforms.

GDPR and other anti-datafication measures, such as platform opt-outs, may have an effect on future data collection. Because of the strong association of children in social media, some argue that this technological trend is one of the most pressing social issues of our time.

Datafication for business

The moral issue: everyone is the same

As the rapid advancements in data collection can be witnessed daily over the last decade-including much more powerful computational power, advances in AI, and the vast storage capacity of cloud computing-we must now learn to  handle data responsibly. Keep in mind  that the information you store represents a piece of current society's organization. And the way you use it may affect it in real time. 

As a matter of fact, social science research is already focusing on this matter. 

You can create a new trend, habit, or influence a whole generation with your tailored media content. But you can also make someone isolated, in debt, hostile towards other brands/ideas followers, or mold them into some kind of collective, labeled "segment"-so they become a perfect copy of each other (‘wine mom’ or ‘insta travel girl’ anyone?). 

"Dark data" is a Gartner term for “information assets organizations collect, process, and store during regular business activities but generally fail to use for other purposes (for example, analytics, business relationships, and direct monetization). This "dark matter" of data is in fact a majority of records gathered by companies, using space and resources and posing the risk of consequences in the event of data theft.

Take this section with a grain of salt, but keep in mind that the majority of core business decisions are now supported by some pre-analyzed data, which is typically sorted into several clusters that do not favor diversity. This could have far-reaching consequences for some.

To avoid this scenario (and handle dark data as effectively as possible), it is best to combine datasets so that we have access to the most comprehensive and up-to-date data stream depicting our users. Remember to run the results through new algorithms on a regular basis to discover new, emergent categories.

And if you don’t know how to do this, book a free consultation session with Codete

Datafication examples 

As mentioned before, data can be gathered practically at every point of contact between technology and our everyday life. For example, you can store: numbers, text, images, routes, audio and mobile data, IP addresses; but also clicks, scrolls, interaction times, logins and passwords, acquisition paths and device activity logs. 

The most well-known sectors who use datafication are:

  • social platforms (i.e., Facebook, Instagram, LinkedIn, and TikTok) which invite users to move their environmental relationships online, stay active, and share as much social data as possible—especially when it comes to profile updates, reactions, and preferences. The data is mainly used for paid ads profiling.
  • internet streaming platforms (i.e., YouTube, Netflix, HBO, and Disney) which supplement traditional television combined with blockbusters. The main goal here is to make binge watching addictive again. The data is used to plan customized, influential media content and recommendations.
  • banking which provides a secure network to use money online. The data is used to assess clients' credit scores (“trustworthiness") and suggest the best ratio between risk and profit from lending the money. This way, banks can identify risk taking profiles and conduct statistical analysis. In other words, datafication replaces sampling techniques, constantly updating the outcome with the use of monitored data.
  • Human Resources/journalism – the publicly available data can be used to verify the person's background. Also, the data collected within the company can assess the employee productivity and-based on a chosen set of factors-the chance for a raise. It can be also used as a substitute to extend or even replace personality tests.

As a matter of fact, each company that either uses e-mail, owns a website, has a marketing/logistic department, or monitors its production chain, is already collecting a number of data points that can be used and should be (for the best possible results) expanded and updated. What’s interesting, the number of business-related data is so huge, that in 2018, the amount of data generated by commerce has already outnumbered that generated by the datafication of human life

Datafication - Conclusion

We live in remarkable times. As the industrial age ended, computers and easy Internet access revolutionized how we live today. Almost everyone has an Internet-connected computer and generates data. Also, the number of devices that create data is constantly growing.

Corporations are the main beneficiaries here, but in several regions, the government profits from constant surveillance as well. Assuming the problem isn't the data, we should always ask if datafication can be even fairer towards individual users. Who should control dataset access, and how can we spot breaches? How to transfer the "right to be forgotten" to multiple devices that collect dark data about us? Should we store our data online if it can't be deleted?

Although the concept of datafication may scare some of us, properly handled datasets (by law regulations, security measures, and work ethics) could bring more industries into a world of less aggressive ads and more customer-friendly services, as each experience could be improved due to thousands of records collected (as opposed to decades on the market). And in which brand size and name will no longer be a deciding factor when it comes to choosing a provider. 

The datafication hasn't arrived yet, but if you don't want to be left behind, check your databases now. Consider the types and quantities of data you currently store (is there any dark data? ), as well as the level of security provided by its sources. Is everything connected? Is everything utilized? Is it properly stored?

If you have any concerns, share them with our R&D department, and we will respond with a detailed response within 48 hours (no strings attached).

Rated: 4.3 / 3 opinions
avatar male f667854eaa

Piotr Wawryka

Piotr has over 5 years of commercial experience writing Python applications. He is a software developer and data scientist at Codete since 2017 and a Ph.D. student at AGH University of Science Technology. His main field of interest is Neural Networks and their practical applications. He gives speeches at meetups and international conferences.

Dawid Pacholczyk 3622ceab56

Dawid Pacholczyk

Consulting Manager at Codete with over 15 years of experience in the IT sector and a strong technical background. Seasoned in working with multinational companies. Ph.D. student and lecturer at Polish-Japanese Academy of IT, focused on software architecture, software development and management.

Our mission is to accelerate your growth through technology

Contact us

Codete Global
Spółka z ograniczoną odpowiedzialnością

Na Zjeździe 11
30-527 Kraków

NIP (VAT-ID): PL6762460401
REGON: 122745429
KRS: 0000983688

  • Kraków

    Na Zjeździe 11
    30-527 Kraków

  • Lublin

    Wojciechowska 7E
    20-704 Lublin

  • Berlin

    Wattstraße 11
    13355 Berlin