Flak 1.0 - Ultra-fast Fuzzy Whole-genome Alignment
Flak 1.0 - Ultra-fast Fuzzy Whole-genome Alignment   Download FLAK 1.0

 

Support FLAK!

 

Overview

FLAK (Fuzzy Logic Analysis of k-mers) is a software system designed to perform a fast approximate whole-genome comparison of two DNA sequences and enable fuzzy operations to be performed on a finished alignment in a visual and intuitive way. In contrast with existing genome alignment systems that are based on exact-matching suffix tree data structures or provide approximations using the BLAST-like seed-and-extend model, FLAK has a built-in native mechanism for approximate sequence matching. The kernel of the FLAK system is an optimised fuzzy hash map that enables a genome to be searched in average O(1) running time. FLAK is written in Java and uses a 2-bit encoding mechanism to represent and compress each 32-mer substring of a genome into a single 64-bit (8 byte) primitive type. The representation of DNA sequence information as a bit vector greatly reduces the space complexity of the system and enables FLAK to scale from small prokaryotic genomes (<5Mbps) to large mammalian chromosomes or genomes (>3 Gbps). Moreover, the exploitation of bit vectors using low-level binary shift operations further reduces the time complexity of genome alignment.

 

 

Why Use FLAK?

Because FLAK asks a different kind of question. FLAK alignments are based on fuzzy logic and fuzzy sets and can accommodate vagueness. In the context of biological sequence alignment, the vagueness in question relates to the degree of homology between two sequences. As FLAK is designed around fuzzy sets and fuzzy logic, the alignment system is fundamentally different to all existing genome aligners and deals with degrees of fuzzy set membership and degrees of homology. All existing whole-genome aligners are based on bivalent boolean logic and, at an abstract level, ask the question "Are the sequences homologous?". In contrast, the fuzzy nature of FLAK is designed to answer the question "how homologous are the sequences?". Re-formulating the alignment problem in this manner enables users of FLAK to perform analyses not possible using conventional models.

 

What You See is What You Get

FLAK is designed to be a simple, graphical, wizard-based system for configuring, executing and analysing a whole-genome alignment. From a usability perspective, the software allows for the what-you-see-is-what-you-get (WYSIWYG) viewing and extraction of alignment information using a toolbar of fuzzy options and filters. Users can employ a wizard to customise alignment parameters and may filter or modify the visualisation of an alignment using a range of GUI options.

 

Features

  • Ultra-Fast Whole-Genome Alignment: The running times exhibited by FLAK out-perform all existing whole-genome aligners across a range of genome sizes and types, including large repeat-rich sequences. FLAK can align the genomes of E.coli K12 MG1655 (4.7Mbps) v E.coli 536 (5.0Mbps) in 2 seconds, H.sapiens Chromosome X (158.29Mbps) v M.musculus Chromosome X (169.27Mbps) in 169 seconds and P.abelii Chromosome 1 (264.71Mbps) v H.sapiens Chromosome 1 (231.11Mbps) in 231 seconds.
  • Low Memory Requirements: FLAK is designed to consume as little memory as possible and uses bit encoding and flyweights to reduce the space complexity of a large-scale alignment. As a result, FLAK can compare full primate genomes (>3.2Gbps) on a computer with 16Gb of RAM.
  • Native Approximate Sequence Alignment: FLAK supports native approximate k-mer matching and is capable of identifying approximate matches above a user defined threshold. This allows FLAK to provide a more detailed analysis of a genome alignment and identify putative homologous regions of genomes that existing approaches often miss.
  • Custom Alignment Seeds: FLAK permits the parametrisation of an alignment with any type of seed with a length ≤32. This includes both consecutive seeds and spaced-seeds.
  • Fuzzy Operators and Hedges: FLAK enables the application of fuzzy operators to genome alignments. These include the basic fuzzy operators and a set of hedges to modify the fuzzy set of alignments. Once an alignment has been completed, these fuzzy logic operations can be applied to the alignment data in real time, with the system providing a visual representation of the modified data and also allowing the modified data to be saved in numeric or visual forms.
  • Alignment Visualisation: FLAK provides users with a simple, graphical, wizard-based system for configuring, executing and analysing a whole-genome alignment. From a usability perspective, the software allows for WYSIWYG (What-You-See-Is-What-You-Get) viewing and extraction of alignment information using a toolbar of fuzzy options.
  • Alignment Filtering: FLAK supports the post hoc filtering of alignments based on a minimum alignment length criterion. Filtered alignments are visualised and outputted, enabling users to visually experiment with a revocable filtering operation to search for extended areas of homology between two genomes.
  • Custom Control of Reference Overlaps: By default, FLAK builds an alignment database from the set of contiguous non-overlapping 32-mers in a reference genome. To increase the sensitivity of comparison, users can specify a degree of overlap between the 32-mers extacted from the reference genome.

 

What FLAK Doesn't Do...Yet

FLAK is designed to rapidly align two genomes and has been tested for robustness for this purpose only. The following activities and usage are currently not supported by FLAK, but will be available in future releases:
  • Process protein sequences.
  • Compare sequence reads against a genome.
  • Compare contigs against a draft genome.

 

Papers & Citing

If you are using FLAK in an academic environment, please cite the following publication:

Healy, J., 2016. FLAK: Ultra-Fast Fuzzy Whole Genome Alignment. In 10th International Conference on Practical Applications of Computational Biology and Bioinformatics (pp. 123-131). Springer International Publishing.

@inproceedings{healy2016flak,
     title={FLAK: Ultra-Fast Fuzzy Whole Genome Alignment},
     author={Healy, John},
     booktitle={10th International Conference on Practical Applications of Computational Biology \& Bioinformatics},
     pages={123--131},
     year={2016},
     organization={Springer}
}

 

Availability

FLAK is free and can be downloaded from here.

 

 

© 2016.  FLAK (Fuzzy Logic Analysis of k-mers) Version 1.0     
Flak 1.0 - Ultra-fast Fuzzy Whole-genome Alignment