Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Did not get a summary of callable base counts

sarkarsarkar SwitzerlandMember

Dear All,

I am Calling SNPs using Unified Genotyper. I am using the Version GATK 2.5.2.
Command Used:
java -Djava.io.tmpdir=TMP -jar /GenomeAnalysisTK/2.5.2/GenomeAnalysisTK.jar -R Ref.fa -T UnifiedGenotyper -I input1.bam -I input2.bam -I input3.bam -o output.raw.vcf -stand_call_conf 50 -stand_emit_conf 20 -out_mode EMIT_ALL_CONFIDENT_SITES.

Usually I get A summary of callable base counts: Like
INFO 00:23:29,795 UnifiedGenotyper - Visited bases 247249719 INFO 00:23:29,796 UnifiedGenotyper - Callable bases 219998386 INFO 00:23:29,796 UnifiedGenotyper - Confidently called bases 219936125 INFO 00:23:29,796 UnifiedGenotyper - % callable bases of all loci 88.978 INFO 00:23:29,797 UnifiedGenotyper - % confidently called bases of all loci 88.953 INFO 00:23:29,797 UnifiedGenotyper - % confidently called bases of callable loci 88.953 INFO 00:23:29,797 UnifiedGenotyper - Actual calls made 303126

But I did not get it. Can you tell me how can I calculate this from my output VCF file now?

Best Answer

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sarkar‌

    Hi,

    Were you using a different version before? If not, please post your log output here.

    Thanks,
    Sheila

  • sarkarsarkar SwitzerlandMember
    edited June 2014

    Hi Sheila,

    Here is the End part my log Output:
    INFO 12:44:40,399 ProgressMeter - Contig5888:42669 7.20e+08 47.8 h 4.0 m 100.0% 47.9 h 42.0 s INFO 12:45:40,433 ProgressMeter - Contig29505:7501 7.20e+08 47.9 h 4.0 m 100.0% 47.9 h 16.0 s INFO 12:46:40,470 ProgressMeter - Contig29562:4601 7.20e+08 47.9 h 4.0 m 100.0% 47.9 h 0.0 s INFO 12:47:40,502 ProgressMeter - Contig29562:4601 7.20e+08 47.9 h 4.0 m 100.0% 47.9 h 0.0 s INFO 12:48:40,532 ProgressMeter - Contig29562:4601 7.20e+08 47.9 h 4.0 m 100.0% 47.9 h 0.0 s INFO 12:49:40,568 ProgressMeter - Contig29562:4601 7.20e+08 47.9 h 4.0 m 100.0% 47.9 h 0.0 s INFO 12:50:40,601 ProgressMeter - Contig29562:4601 7.20e+08 47.9 h 4.0 m 100.0% 47.9 h 0.0 s INFO 12:51:40,639 ProgressMeter - Contig29562:4601 7.20e+08 48.0 h 4.0 m 100.0% 48.0 h 0.0 s INFO 12:52:03,903 ProgressMeter - done 7.20e+08 48.0 h 4.0 m 100.0% 48.0 h 0.0 s INFO 12:52:03,904 ProgressMeter - Total runtime 172678.09 secs, 2877.97 min, 47.97 hours INFO 12:52:03,904 MicroScheduler - 127539808 reads were filtered out during traversal out of 518418874 total (24.60%) INFO 12:52:03,904 MicroScheduler - -> 34863772 reads (6.73% of total) failing BadMateFilter INFO 12:52:03,905 MicroScheduler - -> 46782373 reads (9.02% of total) failing DuplicateReadFilter INFO 12:52:03,905 MicroScheduler - -> 45893663 reads (8.85% of total) failing UnmappedReadFilter WARN 12:52:18,612 RestStorageService - Error Response: PUT '/GATK_Run_Reports/lShCmiFmrEv1e6OArOHTC5pVTD5XdA3i.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 417, Content-MD5: vGbSyidh0ORxUH/5S2LpEQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: bc66d2ca2761d0e471507ff94b62e911, Date: Fri, 13 Jun 2014 10:52:17 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:cJx0Gjda2YKa/e152dF1q5YYRyE=, User-Agent: JetS3t/0.8.1 (Linux/2.6.18-308.13.1.el5; amd64; en; JVM 1.6.0_20), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 0FB5E6B3BC94A20D, x-amz-id-2: uPdlmLgA/VV1N+l50nB0TFJfIZ9PZZV/iaNKnBkZ9sacrO/FYAcF4neOO/vT5/DP, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 13 Jun 2014 10:52:17 GMT, Connection: close, Server: AmazonS3]
    In another Project, I used GenomeAnalysisTK-1.2-4 to Call SNPs a year ago at that time I remember I received a summary of callable base counts. But Now I am using the Version 2.5.2 and I get the above log file but I have a proper VCF output generated.

  • sarkarsarkar SwitzerlandMember

    Hi Sheila,

    Thanks very much for your answer. But can you tell me which version should I use as I need this summary table.

    Or how can I Calculate the Actual calls made and Confidently called bases?
    I see that the Tools you mentioned above, I cannot FILTER the Actual calls made?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sarkar‌

    Hi,

    I am not sure why you are using version 2.5 and the Unified Genotyper?

    The best way to get the information you want is to upgrade to GATK version 3.1 and use Haplotype Caller, which is a more accurate variant caller.

    When using Haplotype Caller, you can output a gvcf file which contains all the information you want.

    Please read about Haplotype Caller here: http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_haplotypecaller_HaplotypeCaller.html

    -Sheila

Sign In or Register to comment.