Download FLAK 1.0  
Support FLAK! 
Native Approximate Sequence AlignmentFLAK provides native support for approximate string matching. This capability arises from the exploitation of a fuzzy hash map to store a reference genome, where approximately matching kmer sequences are grouped together into fuzzy sets, enabling an average O(1) time complexity for insertion, deletion and search operations. In classical set theory, membership of a crisp set A of X is defined as a function ƒ_{A}(x), called the characteristic function of A:
Membership of a classical set is therefore predicated on a boolean characteristic function, requiring an exact match to align two subsequences. However, a fuzzy set A of X can be defined by a function μ_{A}(x), called the membership function of A, which spans the continuum of real numbers in the interval 0 . . . 1:
Using this approach, approximately matching kmer subsequences, 32mers in the FLAK system, can be grouped together and accessed rapidly in a hash map. The Fuzzy Membership Function μ_{A}(x) in FLAKThe membership function μ_{A}(x) is implemented using an optimised variation of the Levenshtein distance algorithm. For a query sequence S and a reference sequence T, both of size k, the membership value returned by the algorithm is μ_{A}(x) = 1 − levenshtein(S, T)/ k . Although, in common with other approximate stringmatching algorithms based on dynamic programming, the Levenshtein distance has a space and time complexity of O(n^{2}), for each 32mer search the algorithm is only applied to the fuzzy sets in the matching bucket of a hash map. A table of the possible Levenshtein distances for a 32mer and their corresponding fuzzy set membership values is shown below.
β Cutoff ThresholdFuzzy hash maps implement the μ_{A}(x) membership function with a minimal membership constraint, called a β cutoff threshold, that excludes weak sequences from a fuzzy set. Consequently, each matching kmer in a fuzzy set has a membership value μ_{A}(x)  β < μ_{A}(x) < 1. The β cutoff threshold controls the specificity of an alignment and is chosen by the user of FLAK before comparing two genomes. The β threshold is also used by FLAK to terminate the execution of the Levenshtein distance algorithm when the threshold cannot possibly be reached. This contributes significantly to the overall speed of the FLAK.
