The amount of memory being used by FLAK is displayed at the bottom right corner of the main window:
As FLAK is written in Java, its execution environment is a Java Virtual Machine (JVM). Unlike fully compiled programmes, that are
capable of accessing physical memory addresses (RAM), Java applications are allocated space on the JVM heap. The amount of heap space
available depends on the amount of overall memory available on your computer, but will be constrained within the following bounds:
A computer with 4GB of memory will provide FLAK with sufficient memory to compare any bacterial genome or chromosomes up to 250 million bases.
For very large genomes, such as the 3.2 billion genome of H. sapiens, you will need to provide FLAK with additional memory using the
- Initial Heap Size: Larger than 1/64th of the physical memory (RAM) available.
- Maximum Heap Size: Smaller than 1/4th of the physical memory (RAM) or 1GB.
- Open a command prompt and change to the flak-1.0 installation directory.
- Launch a JVM with additional heap space. The following example provides FLAK with 14GBs of memory.
java -Xmx14G -XX:+UseConcMarkSweepGC -cp ./flak.jar ie.gmit.flak.Flak
+UseConcMarkSweepGC is an argument that tells the JVM to use Concurrent Mark and Sweep (CMS) garbage collection. Using CMS results in shorter pauses
for memory reclamation and reduces the overall memory overhead of FLAK. Advanced users, who have a full Java SDK installation, can also use the
jconsole tools to monitor memory.
Manual Memory Release
A known issue is the amount of memory consumed when rapidly zooming in on an alignment plot using the magnifying glass button. If FLAK is using an
unreasonable amount of memory after applying the zoom tools, you can free up memory (invoke the garbage collector) directly using the following button on the toolbar:
Reducing Memory Overhead
The amount of memory consumed by FLAK is determined primarily by the size of the query and reference genomes and the degree of homology between them. A comparison of sequences that are highly dissimilar will generate a small number of alignments and will consequently have a low space complexity. For large, highly homologous or repeat-rich sequences, the memory consumed by FLAK can be reduced by tuning the following parameters:
- Do Not Specify Overlapping k-mers in the Reference Genome
By default, FLAK indexes a reference genome from its underlying set of non-overlapping 32-mers and has a space complexity of O(G / k), where G is the size of the reference genome and k is
the size of the k-mer selected (fixed at 32 in FLAK). Using the Alignment Wizard, you can choose to overlap reference 32-mers to increase the sensitivity of an alignment.
This increases the space complexity to
O(G / k - i), where i is the degree of overlap selected in the range [0...31]. For example, a genome of 1Mbps and an overlap of 0 will result in (1000000 / (32 - 0)) - (32 - 1) = 31,219
k-mers being stored in the reference database (a fuzzy hash map). Applying an overlap of 16 bases will double the number of k-mers to 62,469, with an overlap of 31 (a full tiling of the genome) generating
999,969 k-mers. While a high degree of overlap will increase the sensitivity of a genome alignment, it will also increase memory consumption. For large genomes or chromosomes (>50Mbps) use the default overlap setting
- Do Not Use the Default Minimum Match Length
The minimum match length is the smallest aligned sequence size that will be reported by FLAK and is specified in the Alignment Constraints section of the Alignment Wizard.
As FLAK operates on 32-mer chunks of the query and
reference genomes, specifying the default value of 32 will return all 32-mer matches between the two sequences. While this level of sensitivity may well be desirable and will not impose any significant computational burden
for sequences of <50Mbps, it will generate a large number of matches. For large genomes or chromosomes, increasing the minimum match length is the single greatest contributing factor to lowering the memory consumption of FLAK. It is recommended that a minimum match length of at least 64 be used for sequences >50Mbps.
- Apply the Repeat Filter
The fuzzy hash map utilised by FLAK enables the rapid identification of repetitive sequences, including both high-fidelity and inexact repeats. For repeat-rich genomes or sequences with a length >50Mbps, selecting the
Filter Repeats option on the Alignment Constraints section of the Alignment Wizard has the effect of reducing the number of candidate matches, resulting in an commensurate reduction in memory consumption.
In addition, applying a repeat filter will result in a vast reduction in the running time of an alignment against a repeat-rich sequence.
- Cluster Alignments
FLAK uses a clustering algorithm, based on a patience sort, to identify and discard alignments greater than the minimum match length that do not form part of the main diagonal(s) alignment. Selecting this option
on the Alignment Constraints section of the Alignment Wizard will reduce the number of reported alignments and enable a crisper and faster rendering of the synteny plot.
- Filter Alignments in the Plot Window
At the default zoom level, the synteny plot of a comparison can be obfuscated if a large number of alignments are detected. FLAK enables the post facto filtering of alignments to improve the clarity of visualisation.
Using the Filter Alignments tool () at the bottom of the plotter window, the set of alignments can be filtered on alignment length.
After applying the filter, the visualisation of the genome comparison will be updated. If the set of filtered alignments are no longer needed, they will be deleted if the Delete Alignments checkbox is selected.
Depending on the length constraint applied, the deletion of alignments will reduce memory consumption and also enable FLAK to render the alignment visualisation quickly.