Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Uploading RunStats to AWS S3 takes a lot of time

Hi Team,
I am using the GATK-1.6-13 version 's ClipReads feature in my exome analysis project.
The program runs fine, but at the end it takes like more than an hour to upload runStats to AWS S3.
I am thinking of using the -et NO_ET option to eliminate this extra uploading time, but I also wanted to know why is this happening.
This has been noticed several times by me,i.e. for all of my exome samples.

First I thought it might be because of the slow internet connection.
But When I saw the output log of the "DepthOfCoverage analysis" which is followed by the "ClipReads analysis" in my pipe, it shows that the run stats uploading time is only like a minute.

Hence, the delay in uploading runstats to AWS S3 is occuring only during the ClipReads run.

Example of a ClipRead run:

INFO 08:09:14,952 TraversalEngine - chr22:42271432 1.15e+08 75.3 m 39.3 s 92.6% 81.3 m 6.0 m
INFO 08:09:45,081 TraversalEngine - chr22:51043799 1.15e+08 75.8 m 39.4 s 92.9% 81.6 m 5.8 m
INFO 08:10:15,223 TraversalEngine - chrX:37669921 1.16e+08 76.3 m 39.3 s 94.1% 81.1 m 4.8 m
INFO 08:10:45,246 TraversalEngine - chrX:53656535 1.17e+08 76.8 m 39.3 s 94.6% 81.2 m 4.4 m
INFO 08:11:18,402 TraversalEngine - chrX:76888995 1.18e+08 77.3 m 39.4 s 95.4% 81.1 m 3.8 m
INFO 08:11:48,405 TraversalEngine - chrX:107018416 1.19e+08 77.8 m 39.4 s 96.3% 80.8 m 3.0 m
INFO 08:12:18,472 TraversalEngine - chrX:136113700 1.19e+08 78.3 m 39.4 s 97.3% 80.5 m 2.2 m
INFO 08:12:43,516 TraversalEngine - Total runtime 4725.58 secs, 78.76 min, 1.31 hours
**INFO 08:12:43,576 TraversalEngine - 0 reads were filtered out during traversal out of 120260564 total (0.00%)
INFO 09:26:59,125 GATKRunReport - Uploaded run statistics report to AWS S3
**

The DepthOfCoverage run that follow's the ClipReads run:

INFO 12:42:40,665 TraversalEngine - chrX:123202453 2.83e+08 3.3 h 41.5 s 99.0% 3.3 h 2.0 m
INFO 12:43:10,790 TraversalEngine - chrX:135278852 2.84e+08 3.3 h 41.5 s 99.2% 3.3 h 91.7 s
INFO 12:43:40,875 TraversalEngine - chrX:150868432 2.84e+08 3.3 h 41.5 s 99.5% 3.3 h 64.1 s
INFO 12:44:10,947 TraversalEngine - chrX:153706399 2.85e+08 3.3 h 41.5 s 99.7% 3.3 h 33.3 s
INFO 12:44:38,608 DepthOfCoverageWalker - Printing summary info
INFO 12:44:38,612 DepthOfCoverageWalker - Printing locus summary
INFO 12:44:38,661 TraversalEngine - Total runtime 11852.25 secs, 197.54 min, 3.29 hours
INFO 12:44:38,722 TraversalEngine - 124368441 reads were filtered out during traversal out of 376919175 total (33.00%)
INFO 12:44:38,722 TraversalEngine - -> 124368441 reads (33.00% of total) failing DuplicateReadFilter
INFO 12:44:39,745 GATKRunReport - Uploaded run statistics report to AWS S3

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi Shrey,

    We modified the phone home feature to take a lower profile in later versions (ie staring with 2.0 if I recall correctly). Do you still see the same effect with a more recent version than 1.6?

  • rpaulyrpauly Member

    Hi,
    I get a similar message after running the code below:
    java -jar GenomeAnalysisTK.jar -T SplitNCigarReads -R hg19.fa -I x.bam -o split.bam -rf ReassignOneMappingQuality -RMQF 255 -RMQT 60 -U ALLOW_N_CIGAR_READS

    I am pasting the last few lines here:

    INFO 13:11:54,133 MicroScheduler - 0 reads were filtered out during the traversal out of approximately 139042568 total reads (0.00%)
    INFO 13:11:54,134 MicroScheduler - -> 0 reads (0.00% of total) failing BadCigarFilter
    INFO 13:11:54,135 MicroScheduler - -> 0 reads (0.00% of total) failing MalformedReadFilter
    INFO 13:11:54,135 MicroScheduler - -> 0 reads (0.00% of total) failing ReassignOneMappingQualityFilter
    INFO 13:11:55,922 GATKRunReport - Uploaded run statistics report to AWS S3

    How do I obtain the AWS S3 report???
    I am using GATK version 3.4-46

    ~Thanks,
    Rini

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @rpauly Please do not post the same question several times. This wastes our time and clutters the forum. Thank you.

Sign In or Register to comment.