forked from ghyao/GAA
-
Notifications
You must be signed in to change notification settings - Fork 0
maria123701/GAA
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NAME ==== GAA - Graph Accordance of Assemly VERSION ======= Version 1.0 LICENCE ======= Copyright 2011 - Guohui Yao - Washington University in St. Louis. gyao at wustl dot edu This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA. INTRODUCTION ============ No individual assembly algorithm addresses all the known limitations of assembling short length sequences. Overall reduced sequence contig length is the major problem that challenges the usage of these assemblies. GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies. PREREQUISITES ============= Perl => 5.8.9 Sequence aligners: BLAT GSMapper INSTALLATION ==== The program is tested under LINUX systems. Below command could install the GAA package. export PERL5LIB=$PERL5LIB:path_to_the_gaa_package COMMAND LINE ============ Run gaa.pl without any parameter, it will print below usage on the screen. -.pl --target|-t <target.fa> : target consensus file, the contig names should be in the format of >Contig1.1 --query|-q <query.fa> : query consensus file, the contig names should be in the format of >Contig1.1 The following options are optional. --target-qual|-T <target.qual> : target quality file --query-qual|-Q <query.qual> : query quality file --target_gap|-g <target gap file> : e.g. gap.txt for Newbler --query_gap|-G <query gap file> : e.g. gap.txt for Newbler --match|-m <alignment.psl> : blat target.fa query.fa -fastMap query_vs_target.psl --pair_status_target|r <454PairStatus.txt> : generated from GSMapper --pair_status_query|R <454PairStatus.txt> : generated from GSMapper --ce <ce.psl> : ce status psl file --output-dir|-o <output_dir=pwd> : default is merged_assembly under current directory --tip-size|-p <tip_size=90> : tip tolerance --score-cutoff-lower|-l <score_cutoff_lower=30> : confident unique contig, if score < 30 --score-cutoff-upper|-u <score-cutoff_upper=100> : confident match, if score > 100 --contig-size-cutoff|-s <contig_size_cutoff=100> : keep unique contigs of length > 100 Using -m option would shortcut the alignment step if alignment.psl alread exists. EXAMPLES ======== $ ./gaa.pl -t newbler.fa -q cabog.fa Merge newbler.fa and cabog.fa, and output newbler.fa_n_cabog.fa as the merged assembly to: $ ./gaa.pl -t newbler.fa -q cabog.fa -T newbler.qual -Q cabog.qual Also generate a quality file (newbler.qual_n_cabog.qual) for the merged assembly. $ ./gaa.pl -t newbler.fa -q cabog.fa -t newbler.gap Generate the gap file (g.gap) for merged assembly. $ head target.fa >Contig0.1 AAGgAAGCAAATtAtAAAaCtAaaGCATTAAAGATATACACATAAAAATGAAGTACTAAA TTTATCAAGACAAAAaGCCCTGTCCAAATTGTACTATTGTCCAACTTGTACCACTTTACC >Contig0.2 CCCTAAATTGAGAAAAaTAACTTTAGGGAATATATCAATGTACAGTGCCCGCTTCGTTAT TCGGGAGAGAAATTCGAAAAAGTTAAAATAACATTCCGGAAATGTTAATTTTACCCTGCA >Contig0.3 GTATTGATCCAAAATCGGTGTAAATATTACCCTTTTTAGGTGTATTAGGTGTTAAATGTA TCCTTTTTCATGTTAATTTTACCCTTAATAAGGTGTAAAATTAACATTAAAAAATGTTGA TATATTTTTACACCTAAAAAGTGTTAAATTTCTGAGGAAAAATGTTAATCGCACCCTCTT $ head target.qual >Contig0.1 64 64 64 26 64 64 64 64 64 64 50 64 26 64 16 64 64 64 22 64 39 64 39 39 64 64 55 64 64 64 64 64 64 64 64 64 64 64 64 48 64 64 64 64 64 64 64 49 45 64 64 64 64 64 64 64 64 64 64 32 64 64 64 64 64 64 $ head gap.txt Contig0.1 20 Contig0.2 811 Contig0.3 932 Contig0.4 876 Contig0.5 1927 Contig0.6 1730 Contig0.7 20 Contig1.1 171 Contig1.2 402 Contig1.3 1382 $ head 454PairStatus.txt Template Status Distance Left Accno Left Pos Left Dir Right Accno Right Pos Right Dir Left Distance Right Distance FC3FKPP01BUVA8 TruePair 3967 Contig18.13 24188 - Contig18.13 20221 + FC3FKPP01BTJNG TruePair 3820 Contig23.11 44530 + Contig23.11 48350 - FC3FKPP01AFDGW TruePair 2498 Contig41.19 5103 - Contig41.19 2605 + FC3FKPP01CEJ64 TruePair 2498 Contig41.19 5103 - Contig41.19 2605 + FC3FKPP01E3K7I TruePair 3735 Contig117.3 98721 - Contig117.3 94986 + FC3FKPP01A7AS8 FalsePair - Contig129.5 2438 + Contig129.6 1986 - 276 1986 FC3FKPP01BJX6Q FalsePair - Contig133.3 28972 + Contig140.4 1428 - 5803 1428 FC3FKPP01AWH9A TruePair 2636 Contig40.10 122575 + Contig40.10 125211 - FC3FKPP01DCED4 MultiplyMapped - Contig18.9 1244 - Repeat CONTACT ======= Guohui Yao <gyao at wustl dot edu>
About
GAA is designed to take advantages of different assembly algorithms or sequencing platforms to improve the quality of next-generation sequence (NGS) assemblies.
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published