The current GATK version is 3.2-2

Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

# warnings and errors during score calibration

Posts: 19Member
edited October 2012

Dear GATK team,

The command I used:

 java -Xmx4g -jar /usr/local/src/gatk/GenomeAnalysisTK-2.0-0-g4c0ffd4/GenomeAnalysisTK.jar   -T BaseRecalibrator   -I my_merged_lane1.bam -R reference_genome.fasta  -knownSites my_snps.bed  -o recal_data_lane1.grp


It seemed it worked, but I got the following :

WARN  23:15:29,538 RestStorageService - Error Response: PUT '/GATK_Run_Reports/7wP8pzXrHgtLBjkDcJtGqpqLljupw7aN.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 343, Content-MD5: gmCP98zrgZqoNpLiyKC2+w==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 82608ff7cceb819aa83692e2c8a0b6fb, Date: Thu, 25 Oct 2012 21:15:28 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:rX//6IdcVn7cmu+vh3BM1OdRuG0=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-71.29.1.el6.x86_64; amd64; en; JVM 1.6.0_17), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 44927B6871494046, x-amz-id-2: 6PtxgxrhMcCLXlVmKviqkFWT+jHmrg/hOvEqtJ1Z160m9O7aoxTYVnNq/OSGMkg9, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 25 Oct 2012 20:43:55 GMT, Connection: close, Server: AmazonS3]
WARN  23:15:30,558 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately -1894 seconds. Retrying connection.
INFO  23:15:31,373 GATKRunReport - Uploaded run statistics report to AWS S3


And for the -T PrintReads I got:

WARN  01:32:59,987 RestStorageService - Error Response: PUT '/GATK_Run_Reports/rDrSv8aayCQwBzzwAC3R4P1NNhU8eNtF.report.xml.gz' --
ResponseCode: 403, ResponseStatus: Forbidden, Request Headers:
[Content-Length: 347, Content-MD5: /HRSKSCV6FIXe/03JWlMwQ==, Content-Type:
application/octet-stream, x-amz-meta-md5-hash:
fc7452292095e852177bfd3725694cc1, Date: Thu, 25 Oct 2012 23:32:58 GMT,
Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:ye/pDLSwNPHgH2kNUHKWCAt3YQ4=,
User-Agent: JetS3t/0.8.1 (Linux/2.6.32-71.29.1.el6.x86_64; amd64; en; JVM
1.6.0_17), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers:
[x-amz-request-id: 11DE8B74B2E028F3, x-amz-id-2:
/mV+Znu5Rq1j8tub42Y4CNz2lD5npQrtgFDM2OL5Tap3Whtt4rL4KOJLFtqhgNbA,
Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 25 Oct
2012 23:01:26 GMT, Connection: close, Server: AmazonS3]
WARN  01:33:00,975 RestStorageService - Adjusted time offset in response to
RequestTimeTooSkewed error. Local machine and S3 server disagree on the time
by approximately -1893 seconds. Retrying connection.


I will be grateful if you could let me know if I'm doing something wrong, and if should be worried about this.

Thanks a lot.

Hi there,

It looks like the phone home feature is causing problems (likely to happen if you were running without an internet connection). See this article for an explanation and solution:

Geraldine Van der Auwera, PhD

Posts: 19Member

Thanks a lot for the quick reply. I do have internet connection, so will try understand why it happens.

I can use the output that were generated - right?

Thanks.

Yes, your output should be fine -- it's just the report upload that didn't work. It may just have been a transient issue on the amazon cloud side. If it doesn't happen again, don't worry about it.

Geraldine Van der Auwera, PhD

Posts: 19Member

Thanks!

Posts: 4Member

Hi Geraldine,

Can you tell me why gatk try to upload a report to the amazon cloud. Is it for debugging purpose?

Kind regards

Yes, and to collect general usage statistics. It helps us understand what tools people use the most and what are the most common error modes. It can be turned off if it causes trouble, but we do prefer to have it on because the data is useful and helps us improve the software.

Geraldine Van der Auwera, PhD

Posts: 16Member

To be honest, I don't mind the phone home feature as I do believe it will help to improve the tools, however, I am experiencing problem with it (I guess) where more than half of my runs fail with the following error:

WARN  12:24:45,542 RestStorageService - Error Response: PUT '/GATK_Run_Reports/hGXmuykExcJeliySbqP0nxFdoFtbIC9d.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 402, Content-MD5: 40vJhBYx0ynIeLx3O/8BqQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: e34bc9841631d329c878bc773bff01a9, Date: Mon, 20 May 2013 04:24:43 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:xrZWTOIxf8j5dHmJtT61cUFMYvA=, User-Agent: JetS3t/0.8.1 (Linux/3.2.0-26-generic; amd64; en; JVM 1.6.0_24), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: A5080602D6BE61D6, x-amz-id-2: 0a31I06+3RWGOjEjMhojsioYHbyjK8u5jxykQqSrtjg+SmggliXl50HkZqKTr5Zh, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 20 May 2013 04:03:34 GMT, Connection: close, Server: AmazonS3]
WARN  12:24:46,516 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately -1271 seconds. Retrying connection.


As we are using a server cluster and I am not the administrator, I really have no idea how to solve this problem locally, I guess that only leave me with the option to switch off the phone home feature?

Hi @shingwan,

Normally the failure of the phone home connection shouldn't be making the run itself fail. What you see there is just a warning about the connection failure, but if your run failed there should be more information in the console output about what is the actual problem that caused the failure.

Geraldine Van der Auwera, PhD

Posts: 16Member

@Geraldine_VdAuwera, thank you for your information. However, whenever my job fail, that is the only warning that I saw. Most of the time, a re-run will solve the problem (as in, just use the same parameter and execute the queue) . That makes me confuse. I am trying to reproduce the error and maybe able to send the error log out.

Posts: 2Member
@Geraldine_VdAuwera I meet the same problem as discussed above. My output is fine, but when I try to find the conordance between the result of gatk and samtools, I got an error:

##### ERROR MESSAGE: Your input file has a malformed header: Count < 0 for fixed size VCF header field PL

And the error is in the attachment. The first pircture is what I got when I did variant calling. The second is part of my snp vcf file. I hope both can do some help. Thank you for your help.

Hello,

What version of GATK are you using? This looks like an old bug.

If the issue is just with your header, you can simply re-header it.

-Sheila

Posts: 2Member
edited August 15

##fileformat=VCFv4.1
##samtoolsVersion=0.1.13 (r926:134)
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability that sample chromosomes are not all the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the site allele frequency of the first ALT allele">
##INFO=<ID=CI95,Number=2,Type=Float,Description="Equal-tail Bayesian credible interval of the site allele frequency at the 95% level">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR1003776
NR_002765       113     .       A       G       5.46    .       DP=7;AF1=0.4999;CI95=0.5,0.5;DP4=4,1,1,1;MQ=60;FQ=7.8;PV4=1,0.0056,1,0.44


snp.gatk.vcf:

##fileformat=VCFv4.1
##FILTER=<ID=FS,Description="FS > 30.0">
##FILTER=<ID=LowQual,Description="Low quality">
##FILTER=<ID=QD,Description="QD < 2.0">
##FILTER=<ID=SnpCluster,Description="SNPs found in clusters">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
##GATKCommandLine=<ID=HaplotypeCaller,Version=3.2-2-gec30cee,Date="Thu Aug 14 22:13:55 CST 2014",Epoch=1408025635745,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[split.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/mnt/ext2/chenjh/reference_gatk/refMrna.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_w


I am confused why ID:PL has so many lines of details. And what confuse me as well is it has many lines of ##contig:

##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##contig=<ID=NM_001013643,length=386>
##contig=<ID=NM_024058,length=1214>
##contig=<ID=NM_001037802,length=891>
##contig=<ID=NM_001080438,length=1023>
##contig=<ID=NR_029462,length=608>
##contig=<ID=NM_032102,length=4349>
##contig=<ID=NR_002765,length=1179>
##contig=<ID=NR_002179,length=166>
##contig=<ID=NR_024620,length=1262>
##contig=<ID=NR_026770,length=780>
##contig=<ID=NR_026865,length=2299>
##contig=<ID=NR_033971,length=1656>
##contig=<ID=NR_103559,length=727>
##contig=<ID=NR_024207,length=7119>
##contig=<ID=NR_104143,length=1440>
##contig=<ID=NR_036487,length=1144>
##contig=<ID=NR_037885,length=1084>
##contig=<ID=NR_046283,length=575>
##contig=<ID=NR_028393,length=7682>
##contig=<ID=NR_037665,length=3881>


Thank you for your help.

Hello,

The issue is that in snp.samtools.vcf, the PL number =-1. In the snp.gatk.vcf, the PL number =G, so there is a mismatch.

You can just edit the snp.samtools.vcf PL number to G, and I think that should fix this issue.

You can also refer here for vcf format specifications: http://www.1000genomes.org/wiki/Analysis/Variant Call Format/vcf-variant-call-format-version-41 Specifically, look under 1. Meta-information lines for an idea of what the Number entry is.

I hope this helps!

-Sheila