MolDiscovery is a mass spectral database search method that improves both efficiency and accuracy of small molecule identification by (i) utilizing an efficient algorithm to generate mass spectrometry fragmentations, and (ii) learning a probabilistic model to match small molecules with their mass spectra. A search of over six million spectra from global natural product social molecular networking infrastructure (GNPS) shows that our probabilistic model can identify nearly twice more small molecules than the previous method. MolDiscovery is able to identify a wide range of chemical compounds from various environments including plant, bacterial, fungi datasets as well as human blood and gut.

How to run

You can try molDiscovery online at the GNPS website (registration is needed but it is quick and simple). Direct link to the molDiscovery workflow is here (you should be logged in to access it). See the documentation for details.

Also, we provide the command line version as part of the NPDtools package (version 2.6.0-beta). Installation and Running instructions for v.2.6.0 are specified in the online manual (also available inside the package). The package is released under the Apache 2.0 License. The package includes sample mass spectra for a peptide (Surugamide) and a polyketide (Chalcomycin).

Download NPDtools v.2.6.0-beta binaries (for 64-bit Linux or macOS)

Additional information

The tool is developed in collaboration with Carnegie Mellon University (PA, USA).

The paper on molDiscovery is submitted and is under review now. At the moment, you may read and cite its preprint.


In case of any questions, suggestions, bug reports, please write to .

This work is funded by RFBR, project number 20-04-01096.