codete Machine Learning and Artificial Intelligence in Genomics Applications and Predictions 1 main 48af3812c4
Codete Blog

Machine Learning and Artificial Intelligence in Genomics: Applications and Predictions

Karol Przystalski c529978f2b

26/03/2020 |

9 min read

Karol Przystalski

Artificial intelligence and machine learning have disrupted practically every sector. Healthcare is no exception. The industry has long been a strong adopter of innovative technologies, and now an increasing number of researchers are turning their heads to advances in artificial intelligence. 

One of such areas is genomics. Machine learning plays an increasingly important role in the evolution of this field. 

Genomics is a branch of molecular biology that focuses on exploring the aspects of genomes or sets of genes within particular organisms. 

In this article, we take a closer look at all the applications of machine learning in genomics today to help you understand both the current and emerging trends within this exciting and innovative field. Read on to find out what genomics is all about, what the current applications of machine learning are, and what potential future applications we're looking to develop in the future.


Table of contents:

  1. What is genomics?
  2. Artificial intelligence in genomics – an overview
  3. Machine learning applications in genomics today
  4. Future applications of machine learning in genomics


What is genomics?

Genomics is an interdisciplinary field of biology that concentrates on studying the structure, function, mapping, and editing of genomes. A genome is a complete set of DNA of an organism; it includes all of the genes. 

We can divide genomics into several subsets: regulatory genomics, structural genomics, and functional genomics.

  • Regulatory genomics – it's the study of genomics features and ways of regulating expression. For example, machine learning applications in this area include predicting classifying gene expression, producing transcription factors and RNA-binding proteins, or using ML tools to predict promoters and enhancers for gene expression.
  • Functional genomics – in this area, researchers attempt to describe gene functions and interactions. Machine learning can potentially help in classifying mutations in functional activity, producing promoters and enhancers, and classifying subcellular localization.
  • Structural genomics – this is where researchers explore the characterization of genome structures. Machine learning can help to classify structures of proteins, classify protein tertiary structure, and make connections about protein secondary structures.

In the commercial sector, the genomics industry consists of different products and services. Experts predict that such products will grow to dominate the market thanks to the recurrent use of different instruments for genomic research and the increasing number of research programs carried out by both governments and private organizations. 

What are the most typical genomics services? Think about next-generation genome sequencing, biomarker translations, consumer genomics, and many others.

The global genomics industry is expected to reach a smashing $27.6 billion by 2025. As its part, the genetic testing market will be worth over $22 billion by 2024.

Today, genomics is a powerful field for innovation encompassing technologies such as deep learning, computer vision, and natural language processing. According to AngelList, there are 170 genomics startups all over the world at $5.4 million of average valuation. 

Artificial intelligence in genomics – an overview

Artificial intelligence and its subsets, such as machine learning or deep learning, offer incredible value to the genomics industry. For example, by linking deep learning with computer vision techniques, researchers can analyze the growing amount of genomics imagery data. Machine learning models are capable of solving computer vision tasks such as semantic segmentation, image classification, and image retrieval.

By combining machine learning with natural language processing techniques, it's possible to analyze a great amount of genomics-related text, which can be found in publicly available research papers. That way, researchers can solve problems such as relation extraction, information retrieval, or named entity recognition. Such technologies are perfectly suited to deal with natural language processing tasks because of the incredible amount of research carried out in this field at the moment. 

Read: Machine Learning vs. Data Science: Similarities and Differences You Need to Know

Machine learning applications in genomics today

1. Gene editing

Gene editing refers to a selection of methods for making alterations to the DNA at the cellular or organism level. One of the recent advances in the field is CRISPR, a gene-editing technology that offers a faster and cheaper way of carrying out such projects. However, to use CRISPR researchers need to select the right target sequence first. And this process can be very challenging as it often involves unpredictable outcomes. Machine learning offers a glimmer of hope. The technology might significantly reduce the cost, time, and effort it takes to identify the right target sequence. 

For example, a London-based company called Desktop Genetics works at the intersection of artificial intelligence and CRISPR. The company loads experimental or reference data to Google Cloud and then formats and processes it before it's moved to bioinformatics teams. With the help of this data, researchers can analyze and design CRISPR experiments or train new models. Machine learning will impact CRISPR even more as new techniques are discovered and implemented.


2. Genome sequencing

Another area where machine learning is causing disruptions is genome sequencing, a recent field of interest in medical diagnostics. It includes modern DNA sequencing techniques that allow researchers to sequence the entire human genome in one day. The classic sequencing technology required more than a decade for completion when the human genome was sequenced first. Talk about innovation! 

Companies like Deep Genomics are now operating on the market and using machine learning to help researchers interpret genetic variation. In particular, development teams design algorithms based on patterns identified in large genetic data sets. These patterns are then translated to computer models that help researchers to interpret how genetic variation affects critical cellular processes like metabolism, cell growth, or DNA repair. Disruption to the normal functioning of these processes can potentially cause diseases like cancer. That's why using machine learning in genomics research is so important.


3. Clinical workflows

This area of the medical industry stands to benefit a lot from machine learning as well. Here's an example scenario: take a look at any healthcare system, and you'll find a platform that includes patient data. However, it's common to find gaps in the patient data and their availability to different members of the healthcare team. Machine learning can help to increase the efficiency of the clinical workflow process

Intel has recently released an Analytics Toolkit that puts together machine learning capabilities with the clinical workflow processes. Intel partnered with the Transformation Lab at the Intermountain Healthcare in Salt Lake City, Utah, to efficiently integrate genomics in the institution's breast cancer treatments and patient care. 

This partnership allowed for the development of an algorithm that measures factors such as the patient's level of risk for developing different types of cancer. The company was able to develop a workflow that enables sharing data easily and making the most of the available patient data.


4. Consumer genomics products

Genetic testing and consumer genomics are becoming an increasingly important market for innovation. The anticipated market expansion of these areas is powered by the growing awareness among societies of how genetic tests can be used to determine the likelihood of developing a particular disease. Companies such as 23AndMe or are becoming household names among consumers. 

For example, 23AndMe offers a Genetic Weight based on combining data from 600,000 research participants with the use of machine learning. The report is capable of delivering insights into how unique factors such as age and genotype impact one's weight.

Future applications of machine learning in genomics

1. Pharmacy genomics

Pharmacy genomics is an emerging field within precision medicine that examines the role of genomics in the context of an individual response to particular drugs. This area is a quickly developing one but still relatively new. However, researchers are already experimenting with machine learning techniques. For example, we have seen studies where machine learning models were applied to determine a stable dose of a particular drug in renal transplant patients. 

In the future, researchers will be using machine learning models to better understand the individual response to particular treatments and, as a result, create more personalized treatments.


2. Genetic screening of newborns

Some experts believe that newborn genetic screening might become a standard practice during the next decade. The idea is to collect data birth and then integrate it into the individual EHR profile. Another facet of this trend is making noninvasive screening capabilities available to women during pregnancy. These would be geared at identifying particular diseases such as Down syndrome. 

The Newborn Screening Center at the National Taiwan University Hospital used machine learning to increase the accuracy of their web-based newborn screening system. The focus here was metabolism defects. A study showed that machine learning helps researchers to successfully reduce the number of false positives significantly for various diseases.


3. Agriculture

Let's not forget that genomics is a discipline relevant to our food production industry. Experts imagine that in the future machine learning will be helping farmers to improve soil quality and crop yield. The California-based startup PathoGn combines genomics and machine learning to create diagnostic tools for preventing and predicting diseases and crops. Today the startup is called Trace Genomics and focuses more on soil health. 

However, we can easily imagine using genetic data to predict the health of crops so that farmers can better predict and optimize yields. On a large scale, such innovations could lead to significant global improvements in crops and solve world problems such as hunger. 

Read: What Is Sentiment Analysis in Machine Learning? Definition and Examples

Machine learning in genomics - conclusion

The combination of artificial intelligence technologies such as machine learning and genomics can potentially solve several significant problems we are facing today. With powerful machine learning algorithms, genomics researchers will be able to deliver better results faster, at lower cost - making their outcomes available to more people in the future. 

If you're looking for more insights about machine learning in healthcare, be sure to keep a close eye on our blog, where we publish insights from our team of experts who work with various healthcare companies on building innovative products. And if you're looking for a technology partner, don't hesitate to contact us!

Rated: 5.0 / 1 opinions
Karol Przystalski c529978f2b

Karol Przystalski

CTO at Codete. In 2015, he received his Ph.D. from the Institute of Fundamental Technological Research of the Polish Academy of Sciences. His area of expertise is artificial intelligence.

Our mission is to accelerate your growth through technology

Contact us

Codete Global
Spółka z ograniczoną odpowiedzialnością

Na Zjeździe 11
30-527 Kraków

NIP (VAT-ID): PL6762460401
REGON: 122745429
KRS: 0000983688

Get in Touch
  • icon facebook
  • icon linkedin
  • icon instagram
  • icon youtube
  • Kraków

    Na Zjeździe 11
    30-527 Kraków

  • Lublin

    Wojciechowska 7E
    20-704 Lublin

  • Berlin

    Bouchéstraße 12
    12435 Berlin