Visualization

All the techniques described above share a common need to visually describe results to the user in ways that can readily convey either a clear verdict or the reason why the answer is complex and needs further resolution with other techniques. For example, sequence and genome alignment needs to be combined with gene annotation and signature location; results of a fingerprinting technique like MLVA or SNPs needs to show the unknown sample in registration with the closest reference standards. It is worth noting that tools suitable for academic analysis and publication may not be robust enough or contain the right features for heavy-duty forensic application. Much work remains to be done to properly combine algorithms and tools into decision tools with appropriate user interfaces and visualization at each step along the way.

Similar to alignment algorithms, the direction in development of new viewing/display tools often follows the goals of the research in question. In addition to an interpretable alignment, visualization and browsing tools need to incorporate extra analyses and features such as database homologies and gene predictions from various sources. The ability to locate repetitive elements, alternate start and splice sites, protein binding sites, and other genomic features can help the biologist in these analyses.

Interactive features are other useful options to consider, such as the viewing resolution (a static graphic vs. the ability to zoom in/out) and real-time analysis capabilities (e.g., ability to search specific regions for homologies). Other problems include: (1) how or whether to represent syntenic breakpoints (e.g., genome rearrangements), (2) how to display alignments from both strands, (3) how to display multiple alignments, (4) determining whether only one sequence should be the reference for the alignment(s), and (5) how to display contigs if an unfinished genome is used as one (or more) of the entries for the alignment.

A further problem lies with the input of data for the visualization programs, since most of these were developed to work on only one specific file format, tied to the details of the research in question. Much of the work to improve the fledgling field of whole-genome comparison involves the design of new alignment algorithms and the modification or implementation of existing algorithms. These programs have often been coupled to visualization tools that try to make a seamless transition from raw data to interpretable comparisons. There remains much room for improvement in terms of long-sequence or whole-genome alignment (or multiple alignment) algorithms, in terms of formatted or processed graphical output that a user may be able to interpret and combine with other analyses.

A visualization display tool called SynPlot46 was developed with the DIALIGN alignment algorithm in mind. SynPlot allows the display of multiple alignments and shows the gaps in each sequence, as well as the nature and positions of conserved regions (based on percent identity of a sliding window) for all sequences. SynPlot has the added functionality of being able to display the features (exons, introns, repeat elements, and CpG islands) for each sequence. As mentioned above, MUMmer has a display tool, and not mentioned previously was the PIPmaker graphical tool for displaying the output of BlastZ and other large-scale alignments from Webb Miller's group.47 MGA has an option to output an alignment in XML format. This can be turned into HTML using the program mga2html. It is available, along with examples and instructions, on the MGA web site (http://bibiserv.techfak. uni-bielefeld.de/mga/). No visualization tool is available for Vmatch output, as further processing for a specific application is generally required.

0 0

Post a comment