Illumina has recently introduced TSLR technology that produces virtual long reads (up to 10 kb in length) derived from barcoded pools of short reads and promises to reduce the sequencing cost as compared to that of SMRT technology. The TSLR technology is based on fragmentation of genomic DNA into large segments (~10 kb long) and subsequent formation of random pools of the resulting segments (each pool contains ~300 segments). These fragments are clonally amplified, sheared, marked with a barcode that is unique to the pool and sequenced using the standard Illumina short reads. All short reads originating from the same barcoded pool are assembled together, resulting in a set of long contiguous sequences (contigs).
Unique sequencing pipeline of TSLR technology raises many computational challenges. This project is devoted to development of algorithms for efficient analysis of TSLR data including: 1) Barcode assembly, 2) Metagenome assembly from TSLRs, 3) Structural variation detection in human genome using TSLRs.