lucas roguski

projects repository

May 7th, 2013

FQBZ is a simple application designed for compressing DNA sequencing data stored in FASTQ format.

The amounts of genomic data produced recently by institutes has been growing rapidly, where efficient compression of genomic files becomes a challenging and interesting task. FASTQ format is one of the most popular text file formats to hold information about DNA sequence reads, where each FASTQ record consists of tag, sequence and quality fields.

FQBZ is a small and simple proof-of-concept application, which compresses FASTQ files by splitting data into 3 separate streams, which are later compressed separately using Bzip2, exploiting the local similarities between the data. The source files can be downloaded > here <.

Comments are closed.