If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!
Test-drive the GATK tools and Best Practices pipelines on Terra
Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Uploading RunStats to AWS S3 takes a lot of time
I am using the GATK-1.6-13 version 's ClipReads feature in my exome analysis project.
The program runs fine, but at the end it takes like more than an hour to upload runStats to AWS S3.
I am thinking of using the -et NO_ET option to eliminate this extra uploading time, but I also wanted to know why is this happening.
This has been noticed several times by me,i.e. for all of my exome samples.
First I thought it might be because of the slow internet connection.
But When I saw the output log of the "DepthOfCoverage analysis" which is followed by the "ClipReads analysis" in my pipe, it shows that the run stats uploading time is only like a minute.
Hence, the delay in uploading runstats to AWS S3 is occuring only during the ClipReads run.
Example of a ClipRead run:
INFO 08:09:14,952 TraversalEngine - chr22:42271432 1.15e+08 75.3 m 39.3 s 92.6% 81.3 m 6.0 m
INFO 08:09:45,081 TraversalEngine - chr22:51043799 1.15e+08 75.8 m 39.4 s 92.9% 81.6 m 5.8 m
INFO 08:10:15,223 TraversalEngine - chrX:37669921 1.16e+08 76.3 m 39.3 s 94.1% 81.1 m 4.8 m
INFO 08:10:45,246 TraversalEngine - chrX:53656535 1.17e+08 76.8 m 39.3 s 94.6% 81.2 m 4.4 m
INFO 08:11:18,402 TraversalEngine - chrX:76888995 1.18e+08 77.3 m 39.4 s 95.4% 81.1 m 3.8 m
INFO 08:11:48,405 TraversalEngine - chrX:107018416 1.19e+08 77.8 m 39.4 s 96.3% 80.8 m 3.0 m
INFO 08:12:18,472 TraversalEngine - chrX:136113700 1.19e+08 78.3 m 39.4 s 97.3% 80.5 m 2.2 m
INFO 08:12:43,516 TraversalEngine - Total runtime 4725.58 secs, 78.76 min, 1.31 hours
**INFO 08:12:43,576 TraversalEngine - 0 reads were filtered out during traversal out of 120260564 total (0.00%)
INFO 09:26:59,125 GATKRunReport - Uploaded run statistics report to AWS S3
The DepthOfCoverage run that follow's the ClipReads run:
INFO 12:42:40,665 TraversalEngine - chrX:123202453 2.83e+08 3.3 h 41.5 s 99.0% 3.3 h 2.0 m
INFO 12:43:10,790 TraversalEngine - chrX:135278852 2.84e+08 3.3 h 41.5 s 99.2% 3.3 h 91.7 s
INFO 12:43:40,875 TraversalEngine - chrX:150868432 2.84e+08 3.3 h 41.5 s 99.5% 3.3 h 64.1 s
INFO 12:44:10,947 TraversalEngine - chrX:153706399 2.85e+08 3.3 h 41.5 s 99.7% 3.3 h 33.3 s
INFO 12:44:38,608 DepthOfCoverageWalker - Printing summary info
INFO 12:44:38,612 DepthOfCoverageWalker - Printing locus summary
INFO 12:44:38,661 TraversalEngine - Total runtime 11852.25 secs, 197.54 min, 3.29 hours
INFO 12:44:38,722 TraversalEngine - 124368441 reads were filtered out during traversal out of 376919175 total (33.00%)
INFO 12:44:38,722 TraversalEngine - -> 124368441 reads (33.00% of total) failing DuplicateReadFilter
INFO 12:44:39,745 GATKRunReport - Uploaded run statistics report to AWS S3