hicSPAdes

Description

hicSPAdes is a tool aimed at improving assembly quality using genome-wide chromatin interaction data such as Hi-C. hicSPAdes includes a binning module for extracting metagenome-assembled genomes from a given assembly, a scaffolding module for boosting assembly contiguity, and a misassembly detection module.

Availability

hicSPAdes will be available soon as a part of SPAdes package. For now one can download, build, and try pre-release version of SPAdes package including hicSPAdes-binner, a binning module of hicSPAdes.

Support

If you have a problem running hicSPAdes you can look for a similar issue on our GitHub repository, create a new one or write us via e-mail: .

Binning module manual

hicSPAdes-binner is a tool aimed at extracting metagenome-assembled genomes from a metagenomic assembly using a Hi-C sequencing library.

Installation

tar -xzf hicSPAdes-binner-0.1.tar.gz
cd hicSPAdes-binner-0.1
mkdir build && cd build && cmake ../src
make hicspades-binner

Now to run hicSPAdes-binner move to folder hicSPAdes-binner-0.1/ and execute build/bin/hicspades-binner

Input

The tool has three mandatory options: assembly graph file in GFA format (with scaffolds included as path lines), Hi-C library description file in YAML format, and an output directory name.

Synopsis: hicspades-binner <graph (in GFA)> <dataset description (in YAML)> <output directory> [OPTION...]

The options are:

-t, --threads <int> # of threads to use

-e, --enzymes <string> Comma-separated string of restriction enzyme recognition sites

--tmp-dir <dir name> scratch directory to use

--min-ctg-len <int> Minimum contig length for binning

--path-links-thr <int> Minimum total number of links between contigs

--edge-links-thr <int> Minimum number of links between long edges

-h, --help print help message

Specifying input data with YAML data set file

hicSPAdes-binner currently supports a single Hi-C library described in a YAML file. For example, if your Hi-C library is split into two pairs of files

    lib_hic_left_1.fastq
    lib_hic_right_1.fastq
    lib_hic_left_2.fastq
    lib_hic_right_2.fastq

YAML file should look like this:

    [
      {
        orientation: "fr",
        type: "hic",
        right reads: [
          "/FULL_PATH_TO_DATASET/lib_hic_right_1.fastq",
          "/FULL_PATH_TO_DATASET/lib_hic_right_2.fastq" 
        ],
        left reads: [
          "/FULL_PATH_TO_DATASET/lib_hic_left_1.fastq",
          "/FULL_PATH_TO_DATASET/lib_hic_left_2.fastq" 
        ]
      }
    ]

Output

hicSPAdes-binner stores all output files in <output_dir> , which is set by the user.

  • <output_dir>/clustering.mcl contains resulting scaffold clustering in MCL format
  • <output_dir>/clustering.tsv contains resulting scaffold clustering in TSV format
  • <output_dir>/basic_stats.tsv contains various per-cluster statistics
  • <output_dir>/contact_map.tsv contains hicSPAdes scores between input scaffolds, as well as other scaffold statistics