In silico identification of plant secondary metabolites

Development of computational methods for mass spectrometry-based identification of plant secondary metabolites

RFBR #20-04-01096

Grant PI:
Alexey Gurevich

Azat Tagirdzhanov
Alla Mikheenko
Andrey Prjibelski


Plant specialized metabolism is at the center of attention for a wide range of researchers starting from plant biologists and ecology communities investigators to pharmacologists and biomedical scientists. In particular, plant-derived medicines and biologically active additives are commonly used nowadays (antibiotics, immunosuppressants, vitamins, and many other examples). Modern mass spectrometry instruments allow the rapid scanning of thousands of metabolites which results in huge amounts of high-resolution data. Although this data represents a gold mine for future discoveries, its interpretation remains a bottleneck and requires appropriate computational methods.

In the proposed project, we will develop the first high-throughput software for in silico (computational) identification of plant specialized metabolites via database search of tandem mass spectra. The proposed research team will extend the coverage of recently published computational methods (VarQuest, see Gurevich et al, Nature Microbiology, 2018; Dereplicator+, see Mohimani et al, Nature Communications, 2018) from bacterial metabolites to plant metabolites. The developed software will be tested by being applied to real-world datasets of specialized plant metabolites, comprising millions of mass-spectra obtained from hundreds of species. Reported in silico identifications will be transmitted to our foreign scientific partners to be experimentally validated including isolation and structural identification of the most interesting molecules annotated as previously unknown.

All proposed computational tools created in close cooperation of qualified scientific software developers and their experienced end-users will become publicly available both as command-line utilities for batch searches and convenient web-services. Thus, they will become the utmost demanded by other groups working on plant specialized metabolism and could be a basis for many future studies. Moreover, as part of the proposed project, the created software will be applied to huge amounts of publicly available data, comprising millions of plant-derived mass-spectra. This experiment may potentially lead to the identification of previously unknown biologically active compounds and reveal new facts on the evolution of chemical diversity within the plant kingdom.

Developed software:

Cao et al. MolDiscovery: Learning Mass Spectrometry Fragmentation of Small Molecules. bioRxiv 2020.11.28.401943 (preprint)
Tagirdzhanov et al. VarQuest+: modification-tolerant database search of secondary metabolites mass spectra. BMC Bioinformatics 2020, 21(Suppl 20): O1 (BiATA-2020 conference abstract)