Bug Bulletin: we have identified a bug that affects indexing when producing gzipped VCFs. This will be fixed in the upcoming 3.2 release; in the meantime you need to reindex gzipped VCFs using Tabix.

GATK UnifiedGenotyper stuck, creates bamschedule.*.tmp files, no info in the logs

emixaMemixaM Posts: 19Member

Hi guys,

I have Googled my problem, with no luck, so I am asking you directly.

I am currently testing an established pipeline on BAMs from a new source, so I advance step by step, and the last step, the calling with UG, seems to have trouble.

My BAM endured, in order, Picard AddOrReplaceReadGroup, Picard MarkDup, GATK RealignerTargetCreator, GATK IndelRealigner, Picard FixMateInformation, GATK BaseRecalibrator, GATK PrintReads.

Arrived at UG, this is my (stuck) output (I removed file names because of privacy):

INFO  14:54:29,692 ArgumentTypeDescriptor - Dynamically determined type of /scratch/appli57_local_duplicates/reference/exome_target_intervals.bed to be BED 
INFO  14:54:29,748 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  14:54:29,748 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-11-g13c0244, Compiled 2012/09/29 06:03:05 
INFO  14:54:29,749 HelpFormatter - Copyright (c) 2010 The Broad Institute 
INFO  14:54:29,749 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk 
INFO  14:54:29,750 HelpFormatter - Program Args: -T UnifiedGenotyper -nt 6 -R /scratch/appli57_local_duplicates/reference/Homo_sapiens_assembly19.fasta -I /scratch/user/FILE.marked.realigned.fixed.recal.bam --dbsnp /scratch/appli57_local_duplicates/dbsnp/dbsnp_132.b37.vcf -L /scratch/appli57_local_duplicates/reference/exome_target_intervals.bed --metrics_file /scratch/user/FILE.snps.metrics -o /scratch/user/FILE.vcf 
INFO  14:54:29,750 HelpFormatter - Date/Time: 2013/03/20 14:54:29 
INFO  14:54:29,750 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  14:54:29,751 HelpFormatter - --------------------------------------------------------------------------------- 
INFO  14:54:29,783 ArgumentTypeDescriptor - Dynamically determined type of /scratch/appli57_local_duplicates/dbsnp/dbsnp_132.b37.vcf to be VCF 
INFO  14:54:29,799 GenomeAnalysisEngine - Strictness is SILENT 
INFO  14:54:29,906 SAMDataSource$SAMReaders - Initializing SAMRecords in serial 
INFO  14:54:29,943 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.04 
INFO  14:54:29,959 RMDTrackBuilder - Loading Tribble index from disk for file /scratch/appli57_local_duplicates/dbsnp/dbsnp_132.b37.vcf 
WARN  14:54:30,190 VCFStandardHeaderLines$Standards - Repairing standard header line for field AF because -- count types disagree; header has UNBOUNDED but standard is A -- descriptions disagree; header has 'Allele Frequency' but standard is 'Allele Frequency, for each ALT allele, in the same order as listed' 
INFO  14:54:32,484 MicroScheduler - Running the GATK in parallel mode with 6 concurrent threads

And it does not move from there. In my destination folder, a bamschedule.*.tmp file appears every 5 minutes or so, and in top, the program seems to be running:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29468 valleem 19 0 20.6g 3.9g 10m S 111.5 4.2 28:45.46 java

Can you help me?

Best Answer

Answers

  • emixaMemixaM Posts: 19Member

    I will try, and I am back here ASAP. Thanks!

  • emixaMemixaM Posts: 19Member
    edited March 2013

    OK I have updated and passed the 'MicroScheduler' hold. Unfortunately, a new bug has appeared. ChrY has been re-analyzed several times, lots of org.broadinstitute.sting.gatk.io.stubs.OutputStreamStub*.tmp and org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub*.tmp have been created and an error arised:

    ...
    INFO  16:04:28,445 ProgressMeter -     17:16248001        2.64e+11   21.5 m        0.0 s     78.3%        27.5 m     6.0 m 
    INFO  16:04:58,462 ProgressMeter -     17:19317398        2.73e+11   22.0 m        0.0 s     78.5%        28.0 m     6.0 m 
    INFO  16:05:28,477 ProgressMeter -      19:1008846        2.80e+11   22.5 m        0.0 s     84.3%        26.7 m     4.2 m 
    INFO  16:05:58,493 ProgressMeter -      20:2639465        2.88e+11   23.0 m        0.0 s     90.0%        25.6 m     2.5 m 
    INFO  16:06:28,509 ProgressMeter -     20:23856862        2.95e+11   23.5 m        0.0 s     90.6%        26.0 m     2.4 m 
    INFO  16:06:58,524 ProgressMeter -     21:43782581        3.02e+11   24.0 m        0.0 s     93.1%        25.8 m   106.0 s 
    INFO  16:07:28,541 ProgressMeter -      X:14708922        3.08e+11   24.5 m        0.0 s     96.0%        25.5 m    61.0 s 
    INFO  16:07:58,557 ProgressMeter -       Y:1663127        3.15e+11   25.0 m        0.0 s     99.7%        25.1 m     4.0 s 
    INFO  16:08:28,573 ProgressMeter -      Y:59343477        3.20e+11   25.5 m        0.0 s    100.0%        25.5 m     0.0 s 
    INFO  16:08:58,589 ProgressMeter -      Y:59343477        3.26e+11   26.0 m        0.0 s    100.0%        26.0 m     0.0 s 
    INFO  16:09:28,606 ProgressMeter -      Y:59343477        3.32e+11   26.5 m        0.0 s    100.0%        26.5 m     0.0 s 
    INFO  16:09:58,623 ProgressMeter -      Y:59343477        3.38e+11   27.0 m        0.0 s    100.0%        27.0 m     0.0 s 
    INFO  16:10:28,638 ProgressMeter -      Y:59343477        3.45e+11   27.5 m        0.0 s    100.0%        27.5 m     0.0 s 
    INFO  16:10:58,653 ProgressMeter -      Y:59343477        3.52e+11   28.0 m        0.0 s    100.0%        28.0 m     0.0 s 
    INFO  16:11:28,669 ProgressMeter -      Y:59343477        3.59e+11   28.5 m        0.0 s    100.0%        28.5 m     0.0 s 
    INFO  16:11:58,686 ProgressMeter -      Y:59343477        3.65e+11   29.0 m        0.0 s    100.0%        29.0 m     0.0 s 
    INFO  16:12:28,702 ProgressMeter -      Y:59343477        3.71e+11   29.5 m        0.0 s    100.0%        29.5 m     0.0 s 
    INFO  16:12:58,717 ProgressMeter -      Y:59343477        3.76e+11   30.0 m        0.0 s    100.0%        30.0 m     0.0 s 
    INFO  16:13:28,755 ProgressMeter -      Y:59343477        3.80e+11   30.5 m        0.0 s    100.0%        30.5 m     0.0 s 
    INFO  16:13:58,771 ProgressMeter -      Y:59343477        3.85e+11   31.0 m        0.0 s    100.0%        31.0 m     0.0 s 
    INFO  16:14:28,786 ProgressMeter -      Y:59343477        3.89e+11   31.5 m        0.0 s    100.0%        31.5 m     0.0 s 
    INFO  16:14:58,802 ProgressMeter -      Y:59343477        3.93e+11   32.0 m        0.0 s    100.0%        32.0 m     0.0 s 
    INFO  16:15:28,817 ProgressMeter -      Y:59343477        3.98e+11   32.5 m        0.0 s    100.0%        32.5 m     0.0 s 
    INFO  16:15:58,832 ProgressMeter -      Y:59343477        4.02e+11   33.0 m        0.0 s    100.0%        33.0 m     0.0 s 
    INFO  16:16:28,848 ProgressMeter -      Y:59343477        4.04e+11   33.5 m        0.0 s    100.0%        33.5 m     0.0 s
    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Unable to parse header with error: /scratch/user/folder/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub3747221466899600842.tmp (Too many open files), for input source: /scratch/user/folder/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub3747221466899600842.tmp
    ERROR ------------------------------------------------------------------------------------------
    Post edited by emixaM on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Ah, that is not a bug, it is a limitation of your file system. This can happen when running in multithreading if your tmp directory has a limit on the amount of files that can be open at the same time. To get around this, you need to either use a different tmp directory that has a higher limit, or reduce the amount of threads you are using.

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member

    I understand, but the problem seems to arise only on the last chromosome (and the BAM is an extract of the entire BAM only the 21st chromosome, to quickly test the pipeline), so I don't get why it gets stuck only there. I am an avid user of GATK UG with 6 threads, with whole exome BAMs around 40GB, I never experienced the ulimit restriction... I will try with less threads, do you have a suggestion on a 'golden' thread number? Cheers!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    There is a lot of variability in how multithreading is supported by different platforms and configurations, so it's difficult to give specific numbers. But we do provide some general recommendations here: http://www.broadinstitute.org/gatk/guide/article?id=1975

    One thing you might want to try in addition to that is to adjust downsampling. It's possible that some of your jobs are getting stuck in high-coverage regions.

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member

    I have tried many different -nt and -nct from your general recommendations, with no success. I have tried downsampling with -dcov, no success either. It seems stuck only on chromosome Y, it is strange. Apparently, those new BAMs I am analyzing originate from SOLiD, do you think it could be the source of the problem (even if the whole recalibration went smoothly)?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    We don't have a lot of experience with solid data so it's difficult to say. If it's just chrY causing trouble it may be simpler to process it separately from the rest of the genome.

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member

    It is strange, even with illumina data it stucks at chrY. And it is weird because I only extracted chr21.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Oh -- you are running with -L to restrict processing to the extracted chromosome, right?

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member

    no it is a bed file for exome. you think it could create a trouble?

  • emixaMemixaM Posts: 19Member

    Because I ran my pipeline on the whole exome as usual, and it crashes at the same place.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Well, it's better in general to use an intervals file that covers only areas where you have data, ie if you extract a specific chromosome into a separate BAM, you don't want to run your analysis on the rest of the genome if you know there's no data there. But in your case it doesn't seem like that is a factor at all.

    At this point you should validate your BAM file to make sure there's nothing wrong with the file itself. You should also try to narrow down the issue to a specific interval and take a look at that interval in a genome viewer like IGV. If you post a screenshot we can see what the genomic context looks like. You can use -l DEBUG to get more specific info about the region where the run gets stuck.

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member
    edited March 2013

    Alright I will try this and get back to you ASAP. Thanks!

    Post edited by emixaM on
  • emixaMemixaM Posts: 19Member

    I ran it with the debug option (46G log file, wow!). I grepped as much errors I can, I found no stops during the TraverseLociNano (which holds most of the log), then it goes to a VariantContextWriterStorage step, of around 500 lines (writing in the vcf file). Then here are the last lines (starting with the last VariantContextWriterStorage) :

    `DEBUG 18:07:02,734 VariantContextWriterStorage - Merging VariantContextWriterStorage from /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub6756942301107541591.tmp into /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/out_call_variants/C345.TCGA-A3-3308-11A-01D-0966-08.6.vcf DEBUG 18:07:02,736 VariantContextWriterStorage - Merging VariantContextWriterStorage from /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub1387051437277523911.tmp into /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/out_call_variants/C345.TCGA-A3-3308-11A-01D-0966-08.6.vcf DEBUG 18:07:02,739 VariantContextWriterStorage - Merging VariantContextWriterStorage from /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub1833993627566488873.tmp into /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/out_call_variants/C345.TCGA-A3-3308-11A-01D-0966-08.6.vcf DEBUG 18:07:02,740 GATKRunReport - Aggregating data for run report DEBUG 18:07:03,876 GATKRunReport - Posting report of type STANDARD DEBUG 18:07:03,877 GATKRunReport - Generating GATK report to AWS S3 with key tzhxWsTWG3dU99NffwiVEjnellz7kROS.report.xml.gz DEBUG 18:07:04,165 DefaultHttpParams - Set parameter http.useragent = Jakarta Commons-HttpClient/3.1 DEBUG 18:07:04,166 DefaultHttpParams - Set parameter http.protocol.version = HTTP/1.1 DEBUG 18:07:04,167 DefaultHttpParams - Set parameter http.connection-manager.class = class org.apache.commons.httpclient.SimpleHttpConnectionManager DEBUG 18:07:04,167 DefaultHttpParams - Set parameter http.protocol.cookie-policy = default DEBUG 18:07:04,167 DefaultHttpParams - Set parameter http.protocol.element-charset = US-ASCII DEBUG 18:07:04,168 DefaultHttpParams - Set parameter http.protocol.content-charset = ISO-8859-1 DEBUG 18:07:04,168 DefaultHttpParams - Set parameter http.method.retry-handler = org.apache.commons.httpclient.DefaultHttpMethodRetryHandler@f5a4153 DEBUG 18:07:04,169 DefaultHttpParams - Set parameter http.dateparser.patterns = [EEE, dd MMM yyyy HH:mm:ss zzz, EEEE, dd-MMM-yy HH:mm:ss zzz, EEE MMM d HH:mm:ss yyyy, EEE, dd-MMM-yyyy HH:mm:ss z, EEE, dd-MMM-yyyy HH-mm-ss z, EEE, dd MMM yy HH:mm:ss z, EEE dd-MMM-yyyy HH:mm:ss z, EEE dd MMM yyyy HH:mm:ss z, EEE dd-MMM-yyyy HH-mm-ss z, EEE dd-MMM-yy HH:mm:ss z, EEE dd MMM yy HH:mm:ss z, EEE,dd-MMM-yy HH:mm:ss z, EEE,dd-MMM-yyyy HH:mm:ss z, EEE, dd-MM-yyyy HH:mm:ss z] DEBUG 18:07:04,169 Jets3tProperties - s3service.https-only=true DEBUG 18:07:04,169 Jets3tProperties - storage-service.internal-error-retry-max=5 DEBUG 18:07:04,190 Jets3tProperties - httpclient.connection-timeout-ms=60000 DEBUG 18:07:04,191 DefaultHttpParams - Set parameter http.connection.timeout = 60000 DEBUG 18:07:04,191 Jets3tProperties - httpclient.socket-timeout-ms=60000 DEBUG 18:07:04,191 DefaultHttpParams - Set parameter http.socket.timeout = 60000 DEBUG 18:07:04,191 Jets3tProperties - httpclient.stale-checking-enabled=true DEBUG 18:07:04,191 DefaultHttpParams - Set parameter http.connection.stalecheck = true DEBUG 18:07:04,191 Jets3tProperties - httpclient.max-connections=20 DEBUG 18:07:04,191 Jets3tProperties - httpclient.max-connections-per-host=0 DEBUG 18:07:04,192 DefaultHttpParams - Set parameter http.connection-manager.max-per-host = {HostConfiguration[]=20} DEBUG 18:07:04,192 DefaultHttpParams - Set parameter http.connection-manager.max-total = 20 DEBUG 18:07:04,192 DefaultHttpParams - Set parameter http.tcp.nodelay = true DEBUG 18:07:04,198 Jets3tProperties - httpclient.useragent=null DEBUG 18:07:04,219 RestUtils - Setting user agent string: JetS3t/0.8.1 (Linux/2.6.18-274.17.1.el5; amd64; en; JVM 1.6.0_20) DEBUG 18:07:04,219 DefaultHttpParams - Set parameter http.useragent = JetS3t/0.8.1 (Linux/2.6.18-274.17.1.el5; amd64; en; JVM 1.6.0_20) DEBUG 18:07:04,219 DefaultHttpParams - Set parameter http.protocol.version = HTTP/1.1 DEBUG 18:07:04,220 DefaultHttpParams - Set parameter http.protocol.expect-continue = true DEBUG 18:07:04,220 Jets3tProperties - httpclient.retry-max=5 DEBUG 18:07:04,220 DefaultHttpParams - Set parameter http.method.retry-handler = org.jets3t.service.utils.RestUtils$1@21635164 DEBUG 18:07:04,220 Jets3tProperties - httpclient.connection-manager-timeout=0 DEBUG 18:07:04,220 DefaultHttpParams - Set parameter http.connection-manager.timeout = 0 DEBUG 18:07:04,222 HttpClient - Java version: 1.6.0_20 DEBUG 18:07:04,222 HttpClient - Java vendor: Sun Microsystems Inc. DEBUG 18:07:04,222 HttpClient - Java class path: /appli57/GenomeAnalysisTK/GATK-2.4-9/GenomeAnalysisTK.jar DEBUG 18:07:04,222 HttpClient - Operating system name: Linux DEBUG 18:07:04,222 HttpClient - Operating system architecture: amd64 DEBUG 18:07:04,222 HttpClient - Operating system version: 2.6.18-274.17.1.el5 DEBUG 18:07:04,224 HttpClient - SUN 1.6: SUN (DSA key/parameter generation; DSA signing; SHA-1, MD5 digests; SecureRandom; X.509 certificates; JKS keystore; PKIX CertPathValidator; PKIX CertPathBuilder; LDAP, Collection CertStores, JavaPolicy Policy; JavaLoginConfig Configuration) DEBUG 18:07:04,224 HttpClient - SunRsaSign 1.7: Sun RSA signature provider DEBUG 18:07:04,224 HttpClient - SunJSSE 1.6: Sun JSSE provider(PKCS12, SunX509 key/trust factories, SSLv3, TLSv1) DEBUG 18:07:04,224 HttpClient - SunJCE 1.7: SunJCE Provider (implements RSA, DES, Triple DES, AES, Blowfish, ARCFOUR, RC2, PBE, Diffie-Hellman, HMAC) DEBUG 18:07:04,225 HttpClient - SunJGSS 1.0: Sun (Kerberos v5, SPNEGO) DEBUG 18:07:04,225 HttpClient - SunSASL 1.5: Sun SASL provider(implements client mechanisms for: DIGEST-MD5, GSSAPI, EXTERNAL, PLAIN, CRAM-MD5; server mechanisms for: DIGEST-MD5, GSSAPI, CRAM-MD5) DEBUG 18:07:04,225 HttpClient - XMLDSig 1.0: XMLDSig (DOM XMLSignatureFactory; DOM KeyInfoFactory) DEBUG 18:07:04,225 HttpClient - SunPCSC 1.6: Sun PC/SC provider DEBUG 18:07:04,226 Jets3tProperties - s3service.s3-endpoint=null DEBUG 18:07:04,227 Jets3tProperties - s3service.default-storage-class=null DEBUG 18:07:04,227 Jets3tProperties - httpclient.proxy-autodetect=true DEBUG 18:07:04,227 Jets3tProperties - s3service.s3-endpoint=s3.amazonaws.com DEBUG 18:07:04,231 PluginProxyUtil - About to attempt auto proxy detection under Java version:1.6.0_20-b20 DEBUG 18:07:04,231 PluginProxyUtil - Sun Plugin 1.4+ proxy detection class not found, will try failover detection DEBUG 18:07:04,231 PluginProxyUtil - Using failover proxy detection... DEBUG 18:07:04,231 PluginProxyUtil - Plugin Proxy Config List Property:null DEBUG 18:07:04,232 PluginProxyUtil - No configured plugin proxy list DEBUG 18:07:04,232 Jets3tProperties - devpay.user-token=null DEBUG 18:07:04,232 Jets3tProperties - devpay.product-token=null DEBUG 18:07:04,232 Jets3tProperties - httpclient.requester-pays-buckets-enabled=false DEBUG 18:07:04,238 RestStorageService - Creating Object with key tzhxWsTWG3dU99NffwiVEjnellz7kROS.report.xml.gz in bucket GATK_Run_Reports DEBUG 18:07:04,238 RestStorageService - Uploading object data with Content-Length: 1276 DEBUG 18:07:04,239 Jets3tProperties - httpclient.read-throttle=0 DEBUG 18:07:04,239 Jets3tProperties - s3service.s3-endpoint=null DEBUG 18:07:04,239 Jets3tProperties - s3service.enable-storage-classes=true DEBUG 18:07:04,239 RestStorageService - Creating object bucketName=GATK_Run_Reports, objectKey=tzhxWsTWG3dU99NffwiVEjnellz7kROS.report.xml.gz, storageClass=null. Content-Type=application/octet-stream Including data? true Metadata: {Content-Length=1276, Content-MD5=NoE6c9O5A9QUvGRjuFKMBQ==, md5-hash=36813a73d3b903d414bc6463b8528c05, Content-Type=application/octet-stream} ACL: null DEBUG 18:07:04,240 Jets3tProperties - s3service.disable-dns-buckets=false DEBUG 18:07:04,240 Jets3tProperties - s3service.s3-endpoint=s3.amazonaws.com DEBUG 18:07:04,240 Jets3tProperties - s3service.s3-endpoint-virtual-path= DEBUG 18:07:04,241 Jets3tProperties - s3service.s3-endpoint-https-port=443 DEBUG 18:07:04,241 RestStorageService - S3 URL: https://s3.amazonaws.com:443/GATK_Run_Reports/tzhxWsTWG3dU99NffwiVEjnellz7kROS.report.xml.gz DEBUG 18:07:04,264 RestStorageService - Performing PUT request for 'https://s3.amazonaws.com/GATK_Run_Reports/tzhxWsTWG3dU99NffwiVEjnellz7kROS.report.xml.gz', expecting response codes: [200,204] DEBUG 18:07:04,264 RestStorageService - Adding authorization for Access Key 'AKIAIMHBU7X642TCHQ2A'. DEBUG 18:07:04,264 Jets3tProperties - s3service.s3-endpoint=s3.amazonaws.com DEBUG 18:07:04,265 RestStorageService - Canonical string ('|' is a newline): PUT|NoE6c9O5A9QUvGRjuFKMBQ==|application/octet-stream|Tue, 26 Mar 2013 17:07:04 GMT|x-amz-meta-md5-hash:36813a73d3b903d414bc6463b8528c05|/GATK_Run_Reports/tzhxWsTWG3dU99NffwiVEjnellz7kROS.report.xml.gz DEBUG 18:07:04,272 MultiThreadedHttpConnectionManager - HttpConnectionManager.getConnection: config = HostConfiguration[host=https://s3.amazonaws.com], timeout = 0 DEBUG 18:07:04,273 MultiThreadedHttpConnectionManager$ConnectionPool - Allocating new connection, hostConfig=HostConfiguration[host=https://s3.amazonaws.com] DEBUG 18:07:04,276 HttpConnection - Open connection to s3.amazonaws.com:443 INFO 18:07:14,128 ProgressMeter - Y:59343477 4.04e+11 35.5 m 0.0 s 100.0% 35.5 m 0.0 s DEBUG 18:07:34,049 GATKRunReport - A problem occurred during GATK run reporting [*** everything is fine, but no report could be generated; please do not post this to the support forum ***]. Message is Run statistics report upload to AWS S3 timed-out

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Unable to parse header with error: /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub1833993627566488873.tmp (Too many open files), for input source: /scratch/valleem/C345.TCGA-A3-3308-11A-01D-0966-08.6/org.broadinstitute.sting.gatk.io.stubs.VariantContextWriterStub1833993627566488873.tmp
    ERROR ------------------------------------------------------------------------------------------

    `

    I hope this helps!

  • emixaMemixaM Posts: 19Member

    I just tried HaplotypeCaller, it has run like a charm...

  • emixaMemixaM Posts: 19Member
    edited March 2013

    I tried UG on an older BAM that I already have called using UG from GATK 2.1, it blocked with the same error:

    `INFO 14:16:29,699 ProgressMeter - 6:29974767 7.45e+07 6.5 m 5.0 s 32.9% 19.8 m 13.3 m

    INFO 14:16:59,713 ProgressMeter - 6:108496030 8.24e+07 7.0 m 5.0 s 35.4% 19.8 m 12.8 m

    INFO 14:17:29,725 ProgressMeter - 6:109081595 8.24e+07 7.5 m 5.0 s 35.4% 21.2 m 13.7 m

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 2.4-9-g532efad):
    ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ERROR Please do not post this error to the GATK forum
    ERROR
    ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Couldn't read file /scratch/valleem/org.broadinstitute.sting.gatk.io.stubs.OutputStreamStub7788271059597680424.tmp because Unable to open input stream for file with exception /scratch/valleem/org.broadinstitute.sting.gatk.io.stubs.OutputStreamStub7788271059597680424.tmp (Too many open files)
    ERROR ------------------------------------------------------------------------------------------

    `

    Post edited by emixaM on
  • emixaMemixaM Posts: 19Member
    edited March 2013

    Same old BAM, old GATK (2.1), worked flawlessly :

    INFO 14:32:13,597 TraversalEngine - 19:53454340 2.19e+08 13.0 m 3.6 s 89.1% 14.6 m 95.9 s

    INFO 14:32:43,602 TraversalEngine - 21:34166609 2.27e+08 13.5 m 3.6 s 92.7% 14.6 m 64.1 s

    INFO 14:33:13,638 TraversalEngine - X:49223876 2.37e+08 14.0 m 3.5 s 96.9% 14.5 m 27.0 s

    INFO 14:33:32,299 TraversalEngine - Total runtime 859.04 secs, 14.32 min, 0.24 hours

    INFO 14:33:32,299 TraversalEngine - 135161467 reads were filtered out during traversal out of 413260266 total (32.71%)

    INFO 14:33:32,299 TraversalEngine - -> 532329 reads (0.13% of total) failing BadMateFilter

    INFO 14:33:32,300 TraversalEngine - -> 113425586 reads (27.45% of total) failing DuplicateReadFilter

    INFO 14:33:32,300 TraversalEngine - -> 19454578 reads (4.71% of total) failing FailsVendorQualityCheckFilter

    INFO 14:33:32,300 TraversalEngine - -> 1748974 reads (0.42% of total) failing UnmappedReadFilter

    [~]$

    Post edited by emixaM on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Are you using the same command lines with the different versions? And is it still always chrY failing?

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member

    exactly the same command line, just the path for gatk executable changes.

    The old BAM has failed sooner with the latest GATK because it is not a one chromosome extract, it is a whole exome. So it failed at chr6.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    Can you please post the command-line you're using? Also, what different settings of -nt/-nct have you tried. Is there a lower number at which they don't fail, or does it always fail when multithreading?

    Geraldine Van der Auwera, PhD

  • emixaMemixaM Posts: 19Member
    edited April 2013

    The command is:

    java -Djava.io.tmpdir=/scratch/user/ -Xmx20g -jar /appli57/GenomeAnalysisTK/GATK_2.1-11/GenomeAnalysisTK.jar -T UnifiedGenotyper -nt 6 -R /scratch/appli57_local_duplicates/reference/Homo_sapiens_assembly19.fasta -I /data/file.marked.realigned.fixed.recal.bam --dbsnp /scratch/appli57_local_duplicates/dbsnp/dbsnp_132.b37.vcf -L /scratch/appli57_local_duplicates/reference/exome_target_intervals.bed --metrics_file /data/file.snps.metrics -o /data/test.vcf 2>&1

    And I am just changing the path to GATK from /appli57/GenomeAnalysisTK/GATK_2.1-11/GenomeAnalysisTK.jar to /appli57/GenomeAnalysisTK/GATK-2.4-9/GenomeAnalysisTK.jar

    And I tried a couple of multi threading options, from none to all of one node following the post you advised me, and it never ran.

    Post edited by emixaM on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,235Administrator, GSA Member admin

    I'm sorry but there's not much I can do for you unless you can isolate a specific slice of data that the UG is choking on. Otherwise I have to chalk it up to transient issues with your filesystem.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.