Background

Indication-specific knowledge spaces#

A new paradigm in translational, computational biomedicine

Fraunhofer has a long-standing record in the organization and curation (quality assessment and quality improvement of scientific data and knowledge) in entire indication areas. In the course of the AETIONOMY project, a collaborative project funded by the Innovative Medicine Initiative (IMI), we have organized all data and all knowledge about Alzheimer's disease and Parkinson's disease. That took us many years and it was a lot of work. The resulting "knowledge base", however, allowed us to perform computational biology experiments and AI approaches in an unprecedented fashion.

Maintaining such a knowledge base and keeping it up to date is another challenge we face. We are working with major institutions in the area of scientific information provisioning, namely "Informationszentrum Lebenswissenschaften (ZBMED)" in Germany and with the European Bioinformatics Institute (EBI), when it comes to regular updating of our knowledge bases.

Key elements of "knowledge bases" are:

  • Shared, formalized semantics for the indication area. Shared semantics means: we analyse the names that other scientists assign to relevant things, like "genes" or "drugs" that may act against SARS-CoV-2. We organize "name spaces" for things that matter in the context of SARS-CoV-2. We bring order into the world of knowledge about Coronaviruses.
    • On the Fraunhofer side, we have addressed the issue of "shared semantics" for Coronavirus SARS-CoV-2 and the pandemic it causes (COVID-19) by generating the "COVID-19 terminology".
    • This terminology will be published and shared with the scientific community once we have tested its performance in document retrieval and information extraction tasks.
  • Knowledge-based models of the virus and its interaction with the host. Knowledge-based modelling means: we systematically capture and formalized knowledge about the virus and its pathophysiology and represent it in a graph model. This forms a so-called "knowledge graph" that we can use for many things, including applications of "artificial intelligence".
    • On the Fraunhofer side, we have generated the world´s biggest cause-and-effect model representing knowledge about SARS-CoV-2. In a massive campaign, we have extracted and curated the most comprehensive disease map on COVID-19 worldwide.
    • Previous work done by our group has come into focus quite unexpectedly: we have been working on heme biology before (see Humayun, F., Domingo-Fernandez, D., George, A. A. P., Hopp, M. T., Syllwasschy, B. F., Detzel, M. S., Hofmann-Apitius, M. & Imhof, D. (2020). A computational approach for mapping heme biology in the context of hemolytic disorders. Frontiers in Bioengineering and Biotechnology, 8. ) and have generated a highly curated model of heme metabolism under normal and disease conditions. Now, this work becomes highly relevant for COVID-19, as the virus interferes with heme metabolism and this may explain some of the severe clinical phenotypes.
  • The knowledge base comprises Drug and Compound Spaces for COVID-19 and relevant host pathways. By systematic collection of chemical information that has been linked to COVID-19 targets (and host proteins that are recruited by the virus), we extend the knowledge-based model of SARS-CoV-2 by dedicated chemical information. This is of utmost importance for all attempts to identify new drugs active against the virus.
    • We have developed a method to integrate knowledge graphs representing biological and clinical aspects with chemical databases. More precisely: we can link our COVID-19 knowledge graph with relevant entries in ChEBI, one of the best reference databases on bioactive compounds.
    • We have developed text mining approaches to identify publications that contain “drug-target-information” in the SARS-CoV-2 context. We extract the chemical information, if target-information is available, we extract that information, too. The chemistry and its association with biological targets (usually proteins) is then integrated in our knowledge graph.
    • The COVID-19 knowledge graph representing all pathophysiology processes of the SARS-CoV-2 virus integrated with the chemical information on bioactive compounds that bind to virus and host proteins forms the COVID-19 PHARMACOME
  • The COVID-19 PHARMACOME
    • The COVID-19 Pharmacome integrates models of virus biology and models of virus-host interactions. Enriched with chemical information about drugs and drug-like molecules binding to relevant virus- and host-proteins allows us to represent the known pharmacology around SARS-CoV-2.
      • The COVID-19 pharmacome will be complemented with additional information on chemical descriptors (that “encode” features of chemical compounds and drugs) and the “relatedness” of chemical compounds. In brief, we establish “chemical neighbourhood” with this additional annotation of the COVID-19 Pharmacome.
      • Fraunhofer has already developed modern AI-based algorithms that integrate entire Pharmacomes with experimental data. This will allow us in the future to predict re-purposing candidates given the outcome of experiments done e.g. in cell culture.

The scientific information dilemma#

It takes an average scientist about one hour to read an average-sized publication. It may take significantly longer, if you really want to understand the data underlying the paper.  That can take an entire day, dependent on the amount of data. If you start to critically evaluate the experimental workflow or even consider reproducing some of the in-silico methods used in the paper, you can easily invest several days of work on just one publication.

We have estimated that an average scientist who reads two papers per working day can read about 500 papers in a year. 

At present, the biomedical community produces an output of more than 3000 publications per day. Even if a coronavirus researcher focuses only on the most relevant publications in his core area of expertise, it is hard to keep up with the pace of publishing. 

If we want to look into other domains of expertise (e.g. somebody with a background in biological pathways wants to understand, which compounds would be potential candidates for the inhibition of virus replicase), we need to rapidly find and extract relevant information … whatever “relevant” may mean to the scientist. 

Automated systems that “read scientific publications for us” therefore play a vital role in science. Now, as we face an almost complete information overflow in the COVID-19 field, we need these automated systems more than ever.  

This is, where Fraunhofer technology helps. Directly. Immediately.