VarDiscovery

Description

VarDiscovery (previously named VarQuest+) is a database search tool capable of identifying novel variants of a wide range of known SMs including polyketides, alkaloids, flavonoids, saponins, and many others. Algorithmic and software innovations in varDiscovery make it much more efficient in running time and memory consumption in comparison to existing analogs. This efficiency allowed the implementation of a modification-tolerant search mode in varDiscovery, which is more challenging than a regular database search. The first prototype (named VarQuest+) was built on top of Dereplicator+, while the current version (varDiscovery) utilizes more accurate and efficient molDiscovery scoring. The tool will be released as part of the NPDtools package.

Preliminary results

We benchmarked VarQuest + on a Korean medical plants dataset (2.5 million mass spectra collected on 337 samples). The standard search of the KNApSAcK database (51,179 plant SMs) resulted in the identification of 349 compounds. VarQuest+ modification-tolerant search identified 4253 SMs, an order of magnitude more than Dereplicator+. Using the same search parameters, VarQuest+ is twenty times more efficient than Dereplicator+ in runtime, and four times more memory efficient.

Additional information

The tool is being developed in collaboration with Carnegie Mellon University (PA, USA).

Preliminary results of the project (VarQuest+) were presented at the BiATA-2020 conference. You may watch the talk and read the abstract published in BMC Bioinformatics volume 21, Article number: 567 (2020).

Note: this is an ongoing project, so stay tuned! If you want to try VarQuest+ pre-release version or you wish to get a notification about the first public release, please write to .

This work is funded by RFBR, project number 20-04-01096.