Web database of hypothetical natural products

Creation of a web database of hypothetical peptidic secondary metabolites

RSF #20-74-00032

Grant PI:
Alexey Gurevich

Olga Kynyavskaya


Peptidic secondary metabolites (PSMs) are a rich source of medically valuable substances such as antibiotics, antimycotics, vitamins, immunosuppressants, and others (Dias et al., Metabolites 2012). Moreover, many PSMs are used in agriculture, cosmetology, industry (Martínez-Núñez and López y López, Sustainable Chemical Processes 2016; Baltz, Industrial Microbiology & Biotechnology 2017). As a result, the study of PSMs and the search for new compounds of this class are promising directions for many researchers. Despite some similarities between PSMs and classical peptides, their identification is often a much more difficult task due to the small size (up to 1-2 kDa), low concentrations, and the peculiarities of the biochemical structure of compounds of this class (in particular, the cyclic structure, the presence of non-standard amino acids and many post-translational modifications).

The most promising direction for the search for new PSMs is metabologenomic methods that carry out a joint analysis of the genome of PSM producing organisms and mass spectrometric measurements of the compounds synthesized by them (Goering et al., ACS Central Science 2016; Cao et al., Cell Systems 2019; and others). The main obstacles to using these approaches are the slow speed of the current computational metabologenomic methods and the lack of appropriate genomic data for PSM researchers, performing only mass spectrometric analysis of samples.

In the proposed project, a database of hypothetical PSMs will be created, obtained based on a high-performance analysis of tens of thousands of microorganism genomes from publicly available data. The developed algorithms and data structures will quickly compare large volumes of mass spectrometric data (hundreds of thousands/millions) with this database, identify and analyze in detail the most probable predicted compounds, in particular, annotate their possible biological effects, potential targets. Thus, a wide range of PSM researchers with only mass spectrometric data will have the opportunity to carry out a fast metabologenomic analysis of their materials by selecting in the developed database a list of expected species/genera/families of organisms in the studied samples and reconciling the experimental mass spectra with the already calculated genomic data. Such a significant expansion of the tools available to the scientific community for the search and analysis of PSMs in the future will lead to an acceleration of the detection of pharmacologically and industrially valuable compounds of natural origin.

NRPminer (in collaboration with UCSD and CMU): GitHub project page & GNPS web service (you need to log in first!)
HypoNPAtlas/seq2ripp (in collaboration with CMU): GitHub project page & web service (alpha version)

Gurevich A.A., Kunyavskaya O.A. NPvis: interactive visualizer for MS/MS fragmentation of natural products. 2021 (proceedings of the 28th International Scientific Conference for Undergraduate and Graduate Students and Young Scientists “Lomonosov”)
Behsaz et al. Integrating Genomics and Metabolomics for Scalable Non-Ribosomal Peptide Discovery. Nature communications, 2021. In press
Lee et al. HypoNPAtlas: an atlas of hypothetical natural product for mass spectrometry database search. In preparation

Conference presentations:
An NPvis poster was presented at the Lomonosov-2021 conference (19-23 April 2021, online)