Web database of hypothetical natural products

Creation of a web database of hypothetical peptidic secondary metabolites

RSF #20-74-00032

Grant PI:
Alexey Gurevich

Olga Kynyavskaya (master student at HSE University, St. Petersburg; participated from 04.2021 to 06.2021)
Alexandra Sadovskaya (master student at SPbU; participated from 10.2021 to 05.2022)

Peptidic secondary metabolites (PSMs) are a rich source of medically valuable substances such as antibiotics, antimycotics, vitamins, immunosuppressants, and others (Dias et al., Metabolites 2012). Moreover, many PSMs are used in agriculture, cosmetology, industry (Martínez-Núñez and López y López, Sustainable Chemical Processes 2016; Baltz, Industrial Microbiology & Biotechnology 2017). As a result, the study of PSMs and the search for new compounds of this class are promising directions for many researchers. Despite some similarities between PSMs and classical peptides, their identification is often a much more difficult task due to the small size (up to 1-2 kDa), low concentrations, and the peculiarities of the biochemical structure of compounds of this class (in particular, the cyclic structure, the presence of non-standard amino acids and many post-translational modifications).

The most promising direction for the search for new PSMs is metabologenomic methods that carry out a joint analysis of the genome of PSM producing organisms and mass spectrometric measurements of the compounds synthesized by them (Goering et al., ACS Central Science 2016; Cao et al., Cell Systems 2019; and others). The main obstacles to using these approaches are the slow speed of the current computational metabologenomic methods and the lack of appropriate genomic data for PSM researchers, performing only mass spectrometric analysis of samples.

In the proposed project, a database of hypothetical PSMs will be created, obtained based on a high-performance analysis of tens of thousands of microorganism genomes from publicly available data. The developed algorithms and data structures will quickly compare large volumes of mass spectrometric data (hundreds of thousands/millions) with this database, identify and analyze in detail the most probable predicted compounds, in particular, annotate their possible biological effects, potential targets. Thus, a wide range of PSM researchers with only mass spectrometric data will have the opportunity to carry out a fast metabologenomic analysis of their materials by selecting in the developed database a list of expected species/genera/families of organisms in the studied samples and reconciling the experimental mass spectra with the already calculated genomic data. Such a significant expansion of the tools available to the scientific community for the search and analysis of PSMs in the future will lead to an acceleration of the detection of pharmacologically and industrially valuable compounds of natural origin.

NRPminer (in collaboration with UCSD and CMU): GitHub project page & GNPS web service (you need to log in first!)
HypoNPAtlas/seq2ripp (in collaboration with CMU): GitHub project page & web service (alpha version)

Gurevich, Kunyavskaya. NPvis: interactive visualizer for MS/MS fragmentation of natural products. 2021 (proceedings of the 28th International Scientific Conference for Undergraduate and Graduate Students and Young Scientists “Lomonosov”)
Behsaz et al. Integrating Genomics and Metabolomics for Scalable Non-Ribosomal Peptide Discovery. Nature communications, 2021
Sadovskaya, Gurevich. Computational prediction of biological activities of peptidic natural products. BMC Bioinformatics, 2021 (Selected abstracts of Bioinformatics: from Algorithms to Applications 2021 conference)
Kunyavskaya, Mikheenko, Gurevich. NPvis: an Interactive Visualizer of Peptidic Natural Product–MS/MS Matches. Metabolites, 2022
Lee et al. HypoNPAtlas: an atlas of hypothetical natural product for mass spectrometry database search. Submitted, 2022
Sadovskaya, Gurevich. SMMole: pipeline for searching biological properties of secondary metabolites based on their molecular structures. In preparation, 2022

Conference presentations:
An NPvis poster was presented at the Lomonosov-2021 conference (19–23 April 2021, Moscow/online)
A poster about the bioactivity prediction pipeline (future SMMole) was presented at the BiATA-2021 conference (12–15 July 2021, St.Petersburg/online)
A talk on SMMole was presented at the XXVIII Symposium on Bioinformatics and Computer-Aided Drug Discovery (24–26 May 2022, Moscow/online)
Another talk related to SMMole will be presented at the 13th International Multiconference on “Bioinformatics of Genome Regulation and Structure/Systems Biology” (04–08 July 2022, Novosibirsk/online).