Chemical Information Space

Through a close collaboration between chemoinformaticians in Fraunhofer SCAI and in Hamburg IME – ScreeningPort, the team developed a series of tools to link the Chemical Information Space to the COVID19 literature –based network. Literature on COVID19 usually identifies and reports compounds through their activities in specific assays. These assays can be biochemical (binding and/or functional) in their nature when a single protein is measured. Alternatively, they can measure cellular properties (like viability, toxicity or simply morphological parameters), giving us a more sophisticated, but more complex picture of what a molecule (a candidate compound or an already approved drug) can do to a living cell or tissue under viral attack.

The realm of chemistry involved in this interaction has its own actors (molecules) with its own labels/names and features (i.e. structural data, structural class annotations, chemo-physical properties). This realm has multiple addresses, where relevant information is waiting to be interrogated. Together, these chemical information sources form the Chemical Space relevant for COVID19 research. Large chemical data resources like EBI-ChEMBL, Pubchem, PDB, BRENDA, or patent-collections; as well as interaction- and pathway- databases can be linked to the highly curated COVID19 network of causal relationships.

Every interaction found in the COVID19 network is annotated with an amazing amount of information coming from different sources of Chemical Space. Purpose of this annotation effort answers at least two of several questions as two are the object classes of the COVID19 network, nodes and edges:

  1. Can we identify a chemical entity (CE) which could be modulating (or has been already found in the past) the interactions we are studying?
  2. If yes, do we have access to it? Is it commercial? Is it patented? Is it stable? Is it soluble? etc.
  3. If not yet, can we develop a NEW one (NCE) through the chemical and biological information of neighbouring nodes hypothesizing an indirect modulation?

As you see, these questions relate to the presence of chemical space annotation within protein/gene nodes of the network. Enriching nodes with related modulating CEs extracted from external DBs allow for more connections to be build and to surface if requested.

Through BiK>Mi (Biomedical Knowledge Miner), we can identify complex pathway patterns within very complex knowledge graphs with causal and correlative relationships. These relationships have been extracted from the primary literature and comprise a lot of interactions between chemical and biological entities; with referential links to many further data resources.

One of the first worldwide reaction to COVID19 pandemic has been the quest for a fast track set of candidate molecules with proven secure profile, if not already approved for other treatments, to be found active against the SARS-CoV-2 virus. This quest is the typical target of repurposing chemical libraries where known drugs, either already generic or brand newly approved, or all compounds which have positively passed at least Phase I clinical trial are collected and used for screening.

Among them, the oldest is the FDA approved collection (ca 2K compounds), which many commercial vendors offer, together with other known like ENZO, LOPAC.

In 2018, IME-ScreeningPort bought the most comprehensive collection available at that time, developed by the BROAD Institute in Boston, which samples 6k+ compounds from approved, phase I-III and withdrawn drug from all over the world and which comprises almost any structural class and any known mechanism of action. Recently, Scripps Institute in S.Diego together with Gates Foundation assembled an even bigger collection (about 12k compounds) called ReFRAME, which will be certainly used in these days as source for candidate actives. While the Broad Institute Collection is public in terms of both structural information and chemical QC analytics, ReFRAME is still not freely accessible and less transparent, but seems to have ca. 50% overlap with the Broad Institute Collection.

It is naturally implicit that these collections are dynamic, with more compounds being added continously and some removed (e.g. those, where the profile can be substituted by more selective and advanced molecules coming up). We will follow-up with all these resources.

Through cheminformatics tools developed within the Fraunhofer SCAI and IME-ScreeningPort collaboration, we are now able to offer annotations from both collection as they offer several index for cross-referencing and primary target information. Usage of structural information (like smiles and/or InChi notation) provided us the opportunity to develop a growing chemistry resource, which extends the annotations provided by the repurposing assembler with labels and links from structural databases (e.g. PDB, UniProt KB), from biochemical and system biology oriented repositories (e.g. BRENDA, KEGG, PathwayCommons, Reactome), from information about genomes and genomic variation (e.g. Ensembl, ClinVar, HGNC), from interaction databases (e.g. BioGrid, IntAct, mirTarBase), from collections of genes and variants associated to human diseases (e.g. DisGeNet) and from drug related databases (e.g. DrugBank, Sider).

Using repurposing collections will allow us to easily validate any hypothesis generated by users on the COVID19 network in terms of possible mechanism of action or viral target inhibiting efficacy.