2.1. IOGMA Welcome page ................................................................................................ 8 2.2. GenoAnnot Start Center .............................................................................................. 8 2.3. GenoAnnot Data Manager ............................................................................................ 9 3.1. Import a GenBank file using one of the two ways ............................................................. 10 3.2. Select GenBank file to import ..................................................................................... 11 3.3. Accept the"GBK"prefix and"Sequence"type for this GenBank genomic sequence ................... 11 3.4. Select"Do nothing" 11to skip the map creation .............................................................. _ BANK paCollection consists of 876 Objects, shown i jects 3.5. TheIMPORT GENn the Ob nel .................. 12 4.1. Run theCut a sequence fragmenttask using the sequenceSEQ_NC_001318_Bobur_B31as input ...... 13 4.2. Cut a fragment from base pairs 1 to 100,000 ................................................................... 14 4.3. Add the results of this task to a new Collection ............................................................... 14 4.4. Properties editor, showing the Information tab ................................................................. 15 4.5. The Annotations tab shows informations imported from the original GenBank file ....................... 15 4.6. Sequence tab shows nucleotide sequence of the CDS object or amino-acid sequence .................... 16 4.7. Display features associated with the sequenceSEQ_NC_001318 Bobur B31 1 100000on a map ....... 17 _ _ _ _ 4.8. Create a new map for the sequenceSEQ_NC_001318_Bobur_B31_1_100000............................... 17 4.9. Map ofBorrelia burgdorferifragment showing CDSs calculated on the six reading frames .............. 18 4.10. Use this icon to display features associated with a sequence on a map ................................... 18 4.11. Navigation tools allow you to enlarge objects and move around the map ................................ 19 4.12. Draw the GC content and the GC skew in the genomic map. ............................................... 19 4.13. Use the Circular View to visualize the forward and reverse strands ....................................... 20 4.14. Select the type of Object you look for: CDSs ................................................................. 21 4.15. Choose the criterionIs located onto constrain the search to a particular sequence ................... 21 4.16. Select the fragmentSEQ_NC_001318_Bobur_B31_1_100000of theBorrelia burgdorferisequence.... 22 4.17. The query has returned 100 results ............................................................................. 22 4.18. Choose predicted CDSs matching the"hypothetical protein" 23annotation ......................... 4.19. Save the query results as a Collection of individual predicted CDSs ....................................... 23 4.20. Save the Query structure as a Query object ................................................................... 23 4.21. The Query structure saved and displayed in the Queries list ............................................... 24 4.22. The Advanced mode tab shows the query graphically ........................................................ 25 _ 4.23. Select the GenoAnnot CDS classGA FeatCds 26from the data model ........................................ 4.24. Query constrained to find only predicted CDSs annotated with"hypothetical protein"......... 26 4.25. Make your query more specific with the associationGA IsLocatedOn..................................... 27 _ 4.26. Complete the query by adding theGA_Sequenceclass ...................................................... 27 4.27. Constrain the query to a sequence length equal to 100,000 base pairs ................................... 28 4.28. Completed query, in the form Object-Association-Object ................................................... 28 4.29. Create a new Collection of individual CDSs from the query result ......................................... 29 4.30. Query structure saved as a new Query Collection ............................................................ 29 4.31. 51 CDSs putatively annotated as"hypothetical protein"in GenBank .............................. 30 4.32. Compute CDSs for the sequence fragment ofBorrelia burgdorferi........................................ 31 4.33. Accept the"PRK"prefix for the computed CDSs ............................................................. 31 4.34. Define the Prokov CDS calculation parameters ................................................................ 32 4.35. Map the Prokov-calculated CDSs on the existing map ........................................................ 33 4.36. Compute coding curves ............................................................................................ 33 4.37. Define the Prokov curve calculation parameters .............................................................. 34 4.38. Select"Add to existing map"to display the coding curves .......................................... 34 4.39. Map Viewer displays the coding curves in the six reading frames .......................................... 34 4.40. Prokov-calculated CDSs selected in the Map Viewer are outlined in bold ................................. 35 4.41. Use the Change selected objects appearance tool to change the background color ..................... 35 4.42. Use the Transparency slider to make the fill color of CDSs transparent ................................... 36 4.43. Map Viewer showing a large number of superimposedGA_FeatCdsobjects .............................. 36 4.44. Distinguish between CDSs origins ................................................................................ 37 4.45. Compute overlapping features withPRK_CDS 37s as first input target set ................................... 4.46. Input target set of the 'Compute overlapping features' task ................................................ 38 4.47. Setting constraints for theCompute overlapping featurestask ........................................... 38
4.48. Results of theCompute overlapping features 39task .......................................................... 4.49. Verify that thePRK CDS 000022is not annotated ............................................................ 40 _ _ 4.50. Transfer annotations between homologous CDSs .............................................................. 41 4.51. Homologies computation parameters ........................................................................... 41 4.52. Analyze the results of the annotation transfer ................................................................ 42 _ _ _ _ 4.53. Well transferred annotations from the GenBankGBK CDS BB0020to thePRK CDS 000022CDS ...... 43 4.54. Retrieve the CDSs predicted by the Prokov method .......................................................... 44 4.55. Move to thePRK CDS 000023 44selected CDS ................................................................... _ _ 4.56.PRK_CDS_000023 45highlighted on the Map Viewer ............................................................. 4.57.PRK_CD _h ghlighted on the Map Viewer ............................................................. 45 S 000024i 4.58. Popup shows superposition of thePRK_CDS_000024andGBK_CDS_BB0021............................... 46 4.59. Run BlastX on the User sequenceSEQ NC 001318_Bobur_B31_1_100000................................. 46 _ _ 4.60. Accept the"BLX"prefix for BlastX results .................................................................... 47 4.61. Define BlastX protein databank screening parameters ....................................................... 47 4.62. A new Collection is automatically created when BlastX has completed ................................... 48 4.63. Add BlastX similarity hits to the genome map ................................................................ 48 4.64. In the Map Viewer, BlastX hits are displayed as thick purple lines above the CDSs ...................... 49 4.65. BlastX similar y_ _ 12highlighted above thePRK_CDS_000023object ........ 50 it hitBLX NSIMILARITY 0001 _ _ ot p 51 ..................................... _sult shows a QUE 4.66. BlastXBLX NSIMILARITY 000112 r einre A BACSU 4.67.BLX NSIMILARITY 000113hit matching thePRK_CDS_000024............................................... 51 _ _ 4.68. Select the blue CDSGBK_CDS_BB0021 52to display its properties ............................................ 4.69. TheGBK_CDS_BB0021product annotation is the same as the BlastX results ............................. 52 4.70. RunBlastP on selected CDSsonPRK_CDS 000023predicted by Prokov calculation ..................... 53 _ 4.71. Select BlastP parameters .......................................................................................... 53 4.72. BlastP has found a hit for the input CDS ....................................................................... 54 4.73. BlastP results forPRK CDS 000023 54 .............................................................................. _ _
You can use IOGMA® GenoAnnot to read, explore, and manipulate genomic data from private or public databas-es in nucleic FASTA, EMBL, GenBank, and GFF file formats. GenoAnnot reads genomic sequences directly. It then displays the information associated with the sequences in a series of viewers and maps. It allows you to search for and identify regions of biological interest (such as genes) from a raw DNA sequence. GenoAnnot allows you to examine nucleotidic sequences (complete chromosomes and genomic fragments such as contigs) as well as the features associated with these sequences (CDS, rRNA, tRNA, terminators, and others) as recorded in the databases. In addition, GenoAnnot allows you to comprehensively analyze existing, new, or as-yet-unidentified genomic sequences using biocomputing methods, and then re-annotate or newly annotate those sequences directly. 1.1. GenoAnnot help
GenoAnnot has an expanded on-line help function. Look for: • theHelplink on the upper right of the Data Manager, which will open the entire help section; • the buttons located throughout the software, which will open the help section pertinent to their location. 1.2. GenoAnnot tutorial
This tutorial is intended to help you explore how to use GenoAnnot. It illustrates how to visualize, that is, map and explore genomic data, and how to calculate genomic features directly from a DNA sequence. Genomic data generally has two different origins: either it has already been reported in the literature and submitted to a public database like GenBank or EMBL, or it is experimental data in a raw DNA sequence format. In order to illustrate the way to perform comparative analyses in GenoAnnot using reference data, this tutorial aim to explore the genome sequence ofBorrelia burgdorferiand focuses on a fragment of this genome se-quence, which was published in 1997 by a group from the TIGR institute (Fraser et al. (1997),Nature660693(0): 580-586). 1.3. What's new in IOGMA 3.8
Genostar has developed and integrated many exciting new features in Metabolic Pathway Builder 3.8. New methods for data analysis and improvements in reporting, sharing and data transfer features are only part of the picture. With MPB 3.8, you can now easily expand and manage your internal knowledge base. Metabolic Pathway Builder / MicroB With decreases in the cost of sequencing, the quantity of data being generated and analyzed is increasing rapidly. We’ve made it easier to centralize and manage your results in a traceable way. In addition to drawing on the wealth of data in MicroB for your comparative analyses, you can now connect, integrate and save your analyzed genomes in your dedicated MicroB, augmenting and consolidating your knowledge base. Your data is easy to access, and you can visualize, explore, share and export it anytime. Reporting facilities Metabolic Pathway Builder 3.8 has new click and copy features for your reports and publications. We’ve im-proved the copy features so that you can copy any kind of object or graphical image from your workspace and paste it directly into your favorite programs as text or in Fasta format.
Genome annotation Blast databank manager.We have a new graphical Blast databank manager that enables you to easily add, delete and share Blast databanks. Manual CDS creation.Take advantage of additional ways to manually create CDSs: from specified bounds, along a whole sequence, from selected features. Frameshift detection.a new frameshift detection task. Frameshifts can be either a resultGenoAnnot has of mutation or a consequence of sequencing misreads. Some frameshifts generate artificial insertions and deletions of nucleotides. This can lead to errors in predicted protein sequences and compromise analysis of the genome sequence. With the new “Detect Frameshifts” task, you can detect putative frameshift locations in a sequence and visualize them on a map, improving the accuracy of their annotations. The task is based on the comparison of the targeted sequence against a protein database. Comparative analysis We’ve added a new task in GenoAnnot that allows you to detect single-nucleotide polymorphisms and dele-tion-insertion polymorphisms (SNPs and DIPs). When polymorphisms occur in coding regions of DNA sequences, they may change the amino acid sequence of the protein that is produced. Such polymorphisms can be useful in understanding how species develop diseases or respond to drugs, pathogens or other agents. Polymorphisms that occur outside coding regions may have consequences for gene splicing, transcription factor binding or the sequence of non-coding RNA. The Compute SNPs and DIPs tool in GenoAnnot compares a sequence or contigs against a closely related reference sequence and detects single-nucleotide and deletion-insertion polymorphisms. The SNPs and DIPs are easily visualized in a table and can be explored graphically using the genomic viewer. PathwayExplorer also includes a new task that enables you to compute homologies between organisms, taking replicons and plasmids into account. Multiple alignment We’ve integrated the MAFT program into ProteoAnnot for high speed computation of multiple alignments. This complements the ClustalW and Muscle programs already present. Data transfer between modules Data transfer between GenoAnnot and PathwayExplorer has been improved. You can now indicate the type and shape of your replicon. Documentation HOW-TO documentation is now available in the on-line help Installation IOGMA provides a new simple way to specify where your personal files will be stored.
Part 2. Starting IOGMA® 2.1. Welcome page The first window to open when you launch IOGMA is the Welcome page. Here you can select a module to open. You can also launch IOGMA modules and services (viewers) via the top-left corner menu ( ).
Fig. 2.1. IOGMA Welcome page Click on GenoAnnot in the Welcome page to launch the module. After a few seconds, the Start Center appears with an empty Data Manager. For this tutorial, clickCancel(you can always clickCanceland open a Workspace later using the Data Manager).
The Data Manager in GenoAnnot allows you to organize, consult, modify, save and query data. More formal-ly, the Data Manager allows you to manipulate Objects and Collections of Objects in the Workspace. The Workspace also contains the underlying data model and a set of tasks. The Data Manager consists of different Bars and Panels with Lists.
Start the tutorial by using one of the Data Import Tasks to load the entire genome DNA sequence ofBorrelia burgdorferithe sequence and its annotations. To do this you will run the, containing both Import GenBank Task and select a GenBank file previously downloaded from NCBI (bincw.ww//p:tthvog.hin.mln.). The import task will extract the sequence and its features from the GenBank file, create corresponding Objects and gather them in a new Collection in your Workspace. 3.1. Import a Genbank genomic file
To import the GenBank genomic file into GenoAnnot, use theFilemenu in the Menu Bar, or the Tasks Panel: • To load data from theFilemenu, selectFile|Import|Read GenBank file(s)from the Menu Bar. • To run the task from the Task Panel, click on the Tasks Tab from the Tool Panel and open theImport_Export folder. Select theImport GenBanktask and either right-click on the task and selectRun from the( ) menu that appears, or click on the corresponding icon in the Tool Panel.
Fig. 3.1. Import a GenBank file using one of the two ways _ _ • Select the fileBburgdorferi NC 001318.gbk.
Fig. 3.2. Select GenBank file to import • An Import Parameters dialog box suggests a default prefix ("GBK"assign to the Sequence or the Contig) to object of a GenBank file. Keep this prefix and the"Sequence"sequence type selected by default. Click onOKto confirm.
Fig. 3.3. Accept the"GBK"prefix and"Sequence"type for this GenBank genomic sequence • A progress bar provides feedback on the processing of the task. • When the task is done, a new dialog appears, informing you about the Import Result. Occasionally there are items in the data file which cannot be classified by IOGMA. These are ignored in the import process. ClickOKto accept the import result and continue. • In the next dialog you are asked whether or not to create a map. At this point we do not want to display the genomic map yet, so select the option"Do Nothing"and clickOK.
Fig. 3.4. Select"Do nothing"to skip the map creation • A last dialog appears, asking you to identify a new or an existing Collection in which the created objects can be organized. Create a new Collection by clicking onOK.