Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

Realigntargetcreator

h_asifh_asif Posts: 39Member
edited December 2013 in Ask the GATK team

Dear all can anybody help me with this error while running Realigntargetcreator the run failed to pass through many filters any suggestion why

Run summary

INFO 12:07:39,628 HelpFormatter - -------------------------------------------------------------------------------- 
INFO 12:07:39,631 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.7-2-g6bda569, Compiled 2013/08/28 16:30:29 
INFO 12:07:39,631 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO 12:07:39,631 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO 12:07:39,635 HelpFormatter - Program Args: -T RealignerTargetCreator -R /home/sab/ref/human_hg19.fa -I /home/sab/pipeline/A_sorted.bam -o /home/sab/pipeline/A_sorted.IndelRealigner.intervals 
INFO 12:07:39,636 HelpFormatter - Date/Time: 2013/12/03 12:07:39 
INFO 12:07:39,636 HelpFormatter - -------------------------------------------------------------------------------- 
INFO 12:07:39,636 HelpFormatter - -------------------------------------------------------------------------------- 
INFO 12:07:39,697 GenomeAnalysisEngine - Strictness is SILENT 
INFO 12:07:39,789 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000 
INFO 12:07:39,798 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO 12:07:39,820 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02 
INFO 12:07:39,906 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files 
INFO 12:07:40,400 GenomeAnalysisEngine - Done preparing for traversal 
INFO 12:07:40,400 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] 
INFO 12:07:40,401 ProgressMeter - Location processed.sites runtime per.1M.sites completed total.runtime remaining 
INFO 12:08:10,406 ProgressMeter - chr1:85262337 8.53e+07 30.0 s 0.0 s 2.8% 18.2 m 17.7 m 
INFO 12:08:40,407 ProgressMeter - chr1:180404225 1.80e+08 60.0 s 0.0 s 5.8% 17.2 m 16.2 m 
INFO 12:09:10,409 ProgressMeter - chr2:26489345 2.76e+08 90.0 s 0.0 s 8.9% 16.8 m 15.3 m 
INFO 12:09:40,413 ProgressMeter - chr2:126159101 3.75e+08 120.0 s 0.0 s 12.1% 16.5 m 14.5 m 
INFO 12:10:10,415 ProgressMeter - chr2:226503317 4.76e+08 2.5 m 0.0 s 15.4% 16.3 m 13.8 m 
INFO 12:10:40,417 ProgressMeter - chr3:79985305 5.72e+08 3.0 m 0.0 s 18.5% 16.2 m 13.2 m 
INFO 12:11:10,418 ProgressMeter - chr3:178616501 6.71e+08 3.5 m 0.0 s 21.7% 16.1 m 12.6 m 
INFO 12:11:40,424 ProgressMeter - chr4:82438805 7.73e+08 4.0 m 0.0 s 25.0% 16.0 m 12.0 m 
INFO 12:12:10,426 ProgressMeter - chr4:190799281 8.81e+08 4.5 m 0.0 s 28.5% 15.8 m 11.3 m 
INFO 12:12:40,427 ProgressMeter - chr5:95948005 9.78e+08 5.0 m 0.0 s 31.6% 15.8 m 10.8 m 
INFO 12:13:10,429 ProgressMeter - chr6:6479181 1.07e+09 5.5 m 0.0 s 34.5% 15.9 m 10.4 m 
INFO 12:13:40,430 ProgressMeter - chr6:92259205 1.15e+09 6.0 m 0.0 s 37.3% 16.1 m 10.1 m 
INFO 12:14:10,432 ProgressMeter - chr7:12978229 1.25e+09 6.5 m 0.0 s 40.3% 16.1 m 9.6 m 
INFO 12:14:40,433 ProgressMeter - chr7:102060237 1.34e+09 7.0 m 0.0 s 43.1% 16.2 m 9.2 m 
INFO 12:15:10,435 ProgressMeter - chr8:37320269 1.43e+09 7.5 m 0.0 s 46.2% 16.2 m 8.7 m 
INFO 12:15:40,436 ProgressMeter - chr8:134665297 1.53e+09 8.0 m 0.0 s 49.3% 16.2 m 8.2 m 
INFO 12:16:10,437 ProgressMeter - chr9:94340989 1.63e+09 8.5 m 0.0 s 52.8% 16.1 m 7.6 m 
INFO 12:16:40,439 ProgressMeter - chr10:51925797 1.73e+09 9.0 m 0.0 s 56.0% 16.1 m 7.1 m 
INFO 12:17:10,440 ProgressMeter - chr11:10504845 1.83e+09 9.5 m 0.0 s 59.0% 16.1 m 6.6 m 
INFO 12:17:40,442 ProgressMeter - chr11:102575141 1.92e+09 10.0 m 0.0 s 62.0% 16.1 m 6.1 m 
INFO 12:18:10,443 ProgressMeter - chr12:60381241 2.01e+09 10.5 m 0.0 s 65.0% 16.2 m 5.7 m 
INFO 12:18:40,445 ProgressMeter - chr13:27611641 2.11e+09 11.0 m 0.0 s 68.2% 16.1 m 5.1 m 
INFO 12:19:10,446 ProgressMeter - chr14:15346101 2.20e+09 11.5 m 0.0 s 71.6% 16.1 m 4.6 m 
INFO 12:19:40,456 ProgressMeter - chr15:9565601 2.31e+09 12.0 m 0.0 s 74.8% 16.0 m 4.0 m 
INFO 12:20:10,457 ProgressMeter - chr16:5380553 2.42e+09 12.5 m 0.0 s 78.0% 16.0 m 3.5 m 
INFO 12:20:40,459 ProgressMeter - chr17:7484705 2.51e+09 13.0 m 0.0 s 81.0% 16.0 m 3.0 m 
INFO 12:21:10,460 ProgressMeter - chr18:12533677 2.59e+09 13.5 m 0.0 s 83.8% 16.1 m 2.6 m 
INFO 12:21:40,462 ProgressMeter - chr19:34306113 2.69e+09 14.0 m 0.0 s 87.0% 16.1 m 2.1 m 
INFO 12:22:10,463 ProgressMeter - chr20:55319685 2.77e+09 14.5 m 0.0 s 89.6% 16.2 m 100.0 s 
INFO 12:22:40,465 ProgressMeter - chr22:41605677 2.87e+09 15.0 m 0.0 s 92.8% 16.2 m 70.0 s 
INFO 12:23:10,466 ProgressMeter - chrX:84912589 2.97e+09 15.5 m 0.0 s 95.8% 16.2 m 40.0 s 
INFO 12:23:40,468 ProgressMeter - chrY:30850957 3.07e+09 16.0 m 0.0 s 99.1% 16.1 m 8.0 s 
INFO 12:23:47,504 ProgressMeter - done 3.10e+09 16.1 m 0.0 s 100.0% 16.1 m 0.0 s 
INFO 12:23:47,505 ProgressMeter - Total runtime 967.10 secs, 16.12 min, 0.27 hours 
INFO 12:23:47,591 MicroScheduler - 1390162 reads were filtered out during the traversal out of approximately 5682566 total reads (24.46%) 
INFO 12:23:47,591 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter 
INFO 12:23:47,592 MicroScheduler - -> 51548 reads (0.91% of total) failing BadMateFilter 
INFO 12:23:47,592 MicroScheduler - -> 352814 reads (6.21% of total) failing DuplicateReadFilter 
INFO 12:23:47,592 MicroScheduler - -> 0 reads (0.00% of total) failing FailsVendorQualityCheckFilter 
INFO 12:23:47,592 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter 
INFO 12:23:47,592 MicroScheduler - -> 0 reads (0.00% of total) failing MappingQualityUnavailableFilter 
INFO 12:23:47,592 MicroScheduler - -> 985800 reads (17.35% of total) failing MappingQualityZeroFilter 
INFO 12:23:47,593 MicroScheduler - -> 0 reads (0.00% of total) failing NotPrimaryAlignmentFilter 
INFO 12:23:47,593 MicroScheduler - -> 0 reads (0.00% of total) failing Platform454Filter 
INFO 12:23:47,593 MicroScheduler - -> 0 reads (0.00% of total) failing UnmappedReadFilter 
INFO 12:23:49,381 GATKRunReport - Uploaded run statistics report to AWS S3

Regards

Post edited by Geraldine_VdAuwera on

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,903Administrator, GATK Developer admin

    Hi there,

    I take it you mean that you are concerned about the large numbers of reads that are failing to pass quality filters. It is normal to have some reads filtered out, but it is true that the number of reads filtered out due to mapping quality zero is quite large. You may need to troubleshoot your alignments. This is not something we can help you with; you should either ask the people responsible for support for the alignment software you used, or a general forum such as SeqAnswers.com. Good luck!

    Geraldine Van der Auwera, PhD

  • h_asifh_asif Posts: 39Member

    thank you for your reply but my question was why it is showing failing infront of the task where there is a value (985800 reads (17.35% of total) e.g. like the one you highlighted if it is failed then why this number 985800 reads (17.35% of total) failing MappingQualityZeroFilter - secondly where to read about these filters e.g. BadCigarFilter,

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,903Administrator, GATK Developer admin

    Reads fail to pass the MappingQualityZeroFilter when their mapping quality is zero (which indicates they are unmapped). As I said, that is an alignment problem.

    You will find descriptions of what the filters do in our documentation here: http://www.broadinstitute.org/gatk/gatkdocs/ (click the Read Filters line).

    Geraldine Van der Auwera, PhD

  • h_asifh_asif Posts: 39Member

    can you help me how i can download dbSNp from UCSC in vcf format as i tried to run base calibration tool and i get this error

    ERROR /home/sab/database_vcf/dbsnp_137.hg19.vcf contigs = [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY]
    ERROR reference contigs = [chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY, chrM]
    ERROR
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,903Administrator, GATK Developer admin

    We provide several dbsnp files (including in hg19 version) in our resource bundle, but we I don't think we have dbsnp 138 in there. If you want that version specifically, you'll need to ask from UCSC.

    Geraldine Van der Auwera, PhD

  • h_asifh_asif Posts: 39Member

    the problem is i downloaded dbSNP from your resourcebundle but it has this contig order [chrM, chr1, chr2, chr3, chr4, chr5, chr6, chr7, chr8, chr9, chr10, chr11, chr12, chr13, chr14, chr15, chr16, chr17, chr18, chr19, chr20, chr21, chr22, chrX, chrY i need the one with chrM in the end

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,903Administrator, GATK Developer admin

    We don't have that, sorry. I believe there are utilities that make it possible to change the order of the contigs, but they are not part of our software. I recommend you ask on a more general forum like SeqAnswers.com.

    Geraldine Van der Auwera, PhD

  • h_asifh_asif Posts: 39Member

    Thank you very much Kurt is > @Kurt said:

    To be fair, had you downloaded the file from UCSC, I'm pretty sure that it would started with chrM anyways. In any case, since it looks like you are just trying to bump chrM to the bottom; a simple one-liner on the hg19 file you got from the resource bundle would do the trick in a couple of minutes.

    (grep "^#" in.vcf ; grep -v "^#" in.vcf | awk '$1!="chrM"' ; grep -v "^#" in.vcf | awk '$1=="chrM"') > out.vcf

Sign In or Register to comment.