Description
hicSPAdes is a tool aimed at improving assembly quality using genome-wide chromatin interaction data such as Hi-C. hicSPAdes includes a binning module for extracting metagenome-assembled genomes from a given assembly, a scaffolding module for boosting assembly contiguity, and a misassembly detection module.
Availability
hicSPAdes will be available soon as a part of SPAdes package. For now one can download, build, and try pre-release version of SPAdes package including hicSPAdes-binner, a binning module of hicSPAdes.
Support
If you have a problem running hicSPAdes you can look for a similar issue on our GitHub repository, create a new one or write us via e-mail: .
Binning module manual
hicSPAdes-binner is a tool aimed at extracting metagenome-assembled genomes from a metagenomic assembly using a Hi-C sequencing library.
Installation
tar -xzf hicSPAdes-binner-0.1.tar.gz
cd hicSPAdes-binner-0.1
mkdir build && cd build && cmake ../src
make hicspades-binner
Now to run hicSPAdes-binner move to folder hicSPAdes-binner-0.1/
and execute build/bin/hicspades-binner
Input
The tool has three mandatory options: assembly graph file in GFA format (with scaffolds included as path lines), Hi-C library description file in YAML format, and an output directory name.
Synopsis: hicspades-binner <graph (in GFA)> <dataset description (in YAML)> <output directory> [OPTION...]
The options are:
-t, --threads <int>
# of threads to use
-e, --enzymes <string>
Comma-separated string of restriction enzyme recognition sites
--tmp-dir <dir name>
scratch directory to use
--min-ctg-len <int>
Minimum contig length for binning
--path-links-thr <int>
Minimum total number of links between contigs
--edge-links-thr <int>
Minimum number of links between long edges
-h, --help
print help message
Specifying input data with YAML data set file
hicSPAdes-binner currently supports a single Hi-C library described in a YAML file. For example, if your Hi-C library is split into two pairs of files
lib_hic_left_1.fastq
lib_hic_right_1.fastq
lib_hic_left_2.fastq
lib_hic_right_2.fastq
YAML file should look like this:
[
{
orientation: "fr",
type: "hic",
right reads: [
"/FULL_PATH_TO_DATASET/lib_hic_right_1.fastq",
"/FULL_PATH_TO_DATASET/lib_hic_right_2.fastq"
],
left reads: [
"/FULL_PATH_TO_DATASET/lib_hic_left_1.fastq",
"/FULL_PATH_TO_DATASET/lib_hic_left_2.fastq"
]
}
]
Output
hicSPAdes-binner stores all output files in <output_dir>
, which is set by the user.
<output_dir>/clustering.mcl
contains resulting scaffold clustering in MCL format<output_dir>/clustering.tsv
contains resulting scaffold clustering in TSV format<output_dir>/basic_stats.tsv
contains various per-cluster statistics<output_dir>/contact_map.tsv
contains hicSPAdes scores between input scaffolds, as well as other scaffold statistics