Step 1: Launch the Application
If you have not yet unpacked the downloaded FLAK compressed file, do so now by double clicking on the file (flak-1.0.zip / flak-1.0.tar.gz) and extracting the folder flak-1.0 to your home directory. You should see the following files and folders:
Start the application by double-clicking on the file flak.jar.
If the application fails to launch, it is probably for one of the following reasons:
- flak.jar: The FLAK application file.
- conf: A folder containing configuration information.
- docs: A folder containing the documentation for FLAK.
- demo: A folder containing a sample query and reference sequence for this quick-start.
MAC (OSX) users may be presented with the following security dialogue after double-clicking on the file flak.jar:
- You do not have execute permissions on the file. Apply execute permission to the file flak.jar (how this is done will depend on your operating system).
- You do not have a Java Virtual Machine (JVM) installed on your computer. Download and install the latest JVM version from https://www.java.com/en/download/manual.jsp.
As the application was not downloaded from the App Store, you will need to manually enable flak.jar by opening your Security and Privacy settings and selecting the "Open Anyway" button:
Step 2: Select a Query Sequence
When the application starts up, you will be presented with a wizard to guide you through the alignment process. The first screen prompts you to select a query sequence in FASTA format. Select the file mycoplasma-genitalium.fasta from the demo directory.
Make sure to check both the "Use Forward Strand" and "Use Reverse Complement" options. Click on the "Next" button to proceed.
Step 3: Select a Reference Sequence
The next wizard screen prompts you to select a reference sequence in FASTA format. Select the file mycoplasma-pneumoniae.fasta from the demo directory.
By default, the reference genome is decomposed into a set of non-overlapping 32-mer sequences, which are loaded into a fuzzy hash map. The degree of overlap between adjacent 32-mers can be controlled using the slider and permits values from 0-31, with the value 31 producing a tiling of 32-mers analogous to a de Bruijn graph. For this example, leave the overlap value set to zero. Click on the "Next" button to proceed.
Step 4: Select a Fuzzy Seed and β-Cutoff Threshold
The wizard will now prompt you to select a fuzzy seed and a β-cutoff threshold. FLAK operates on 32-mer substrings of sequences and performs an approximate string comparison for 32-mers that match a seed pattern. A hash ('#') character is used to specify the indices in a seed that require an exact match. Indices with a '-' equate to "may care" positions. FLAK is pre-configured with a set of well-known consecutive and spaced seeds. The beta-cutoff threshold is a fuzzy value used to control the specificity of a seed. Higher values should be used for homologous sequences.
Select the seed called "11/18 PatternHunter. Ma et al. (2002)" and a β-cutoff threshold of 0.70. Click on the "Next" button to procede.
Step 5: Add Alignment Constraints
The penultimate step in configuring an alignment requires the specification of the constraints to use. The minimum match length is the smallest aligned sequence size that will be reported by FLAK. The default value of 32 is fine for bacterial genomes. For chromosomes or larger genomes, a minimum match length of at least 64 should be used. Note that you will be able to filter out any "noise" induced by too small a value for the minimum match length after the alignment is complete. FLAK uses a clustering algorithm, based on a patience sort, to identify and discard alignments greater than the minimum match length that do not form part of the main diagonal(s) alignment. If the clustering option is specified, the resultant alignment set and visualisation will be much clearer. The fuzzy hash map utilised by FLAK enables the rapid identification of repetitive sequences, including both high fidelity and inexact repeats. For repeat-rich genomes or sequences with a length >50Mbps, selecting the Filter Repeats option has the effect of reducing the number of candidate matches. This will result in a significant reduction in both memory consumption and in the time required to align repeat-rich sequences.
Specify a minimum match length of 32, with no clustering or repeat filtering. Click on the "Next" button to proceed.
Step 6: Start Alignment
The last wizard screen displays a summary of the selections and constraints that form the parameters to the genome alignment. Click on the "Start" button to begin the alignment.
The alignment of M. genitalium against M. pneumoniae will produce the following visualisation of the query sequence on the Y axis and the reference sequence on the X axis. The visualisation consists of a viewport comprised of the following three components:
- A Toolbar at the top of the viewport, with options for printing, exporting, zooming, fuzzy hedges and fuzzy operations.
- The Plot Window in the centre of the viewport that displays a synteny plot of the query and reference sequences.
- Filtering Options at the bottom of the viewport that enable the visualisation to exclude alignments less than a user defined sequence length.
Step 7: Use the Zoom Tools
Use the magnify button () to zoom in on the alignment. Horizontal and vertical scrollbars will appear if the size of the plot window is too large to display in the viewport. Use the zoom out button () to pan out from the synteny plot. You can add a grid to the plot view by selecting the grid () button.
Step 8: Apply Fuzzy Operations
The really unique feature of FLAK is the ability to apply fuzzy hedges and operations to an alignment. By default the No Hedge () button is selected. This displays the alignment data without any fuzzy modifications. Select the Apply Very Hedge () button. This has the fuzzy meaning of "show very similar alignments" and will eliminate most of the "noise" (single 32-mer matches) from the visulisation. Select the Apply Extremely Hedge () and then the Apply Very Very Hedge () buttons. These will progressively constrain the alignment data to remove all but the strongest matches.
The effect of the Apply Fuzzy AND (), Apply Fuzzy OR ()and Apply Fuzzy NOT () will depend on the type of alignment specified. The AND operation will change the visualisation to reflect the lowest scoring alignment in an alignment chain. The OR operation will alter the visualisation to reflect the highest scoring alignment in an alignment chain. The fuzzy not operation will return the fuzzy inverse of an alignment score and is only applicable where a low β-cutoff threshold is specified.
In addition to the fuzzy operations, an alignment visualisation can be altered by selected a larger β-cutoff threshold from the drop down combo box on the toolbar and clicking on the beta () button. This will constrain the visualisation to alignments at or above the selected threshold.
Step 9: Filter Alignment Data
The filter bar at the bottom of the viewport allows the post hoc modification of an sequence comparison, by hiding alignments with a length below a user-specified threshold.
Change the value of the field "Filter by Minimum Length (L)" from 32 to 33 and then click on the filter () button. You should notice that most of the alignments have disappeared from the synteny plot, i.e. most of the alignments were single 32-mer matches. The filter button will be coloured in red, indicating that the visualisation you are looking at has been filtered. Change the value of the field "Filter by Minimum Length (L)" back to 32 and click on the filter button again to display to the original set of alignments.
Step 10: Save Alignment Data
Click on the Save Alignment Data () button on the toolbar. You will be presented with a dialogue that provides options for sorting the alignment set and specifying a file name. Click on the "Save Alignments" button. This will create a file called out.txt in the current working directory. Open the file in a text editor and examine the content. The file contains the set of query and reference alignment starting and end points, along with the length of each alignment. Reverse complement matches are denoted by three asterisks (***) at the end of a line.
Another unique feature of FLAK is that the alignments reported by the system are based on the What-You-See-Is-What-You-Get (WYSIWYG) principle, i.e. FLAK will output a set of alignments consistent with what is currently displayed in the viewport. Consequently, if you have applied a fuzzy hedge or filter, the set of saved alignments will match what is displayed in the plot window.
Step 11: Save Alignment Image
Click on the "Save Synteny Plot" () button on the toolbar. This action will prompt you with a dialogue for saving the synteny plot in PNG format. Zoom in on the synteny plot using the magnify button () and then select the "Save Synteny Plot" option on the toolbar. The resultant saved image will be scaled up to the same size as the synteny plot in the viewport.