Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

warnings and errors during score calibration

gilgigilgi Posts: 19Member
edited October 2012 in Ask the GATK team

Dear GATK team,

The command I used:

 java -Xmx4g -jar /usr/local/src/gatk/GenomeAnalysisTK-2.0-0-g4c0ffd4/GenomeAnalysisTK.jar   -T BaseRecalibrator   -I my_merged_lane1.bam -R reference_genome.fasta  -knownSites my_snps.bed  -o recal_data_lane1.grp

It seemed it worked, but I got the following :

WARN  23:15:29,538 RestStorageService - Error Response: PUT '/GATK_Run_Reports/7wP8pzXrHgtLBjkDcJtGqpqLljupw7aN.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 343, Content-MD5: gmCP98zrgZqoNpLiyKC2+w==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: 82608ff7cceb819aa83692e2c8a0b6fb, Date: Thu, 25 Oct 2012 21:15:28 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:rX//6IdcVn7cmu+vh3BM1OdRuG0=, User-Agent: JetS3t/0.8.1 (Linux/2.6.32-71.29.1.el6.x86_64; amd64; en; JVM 1.6.0_17), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 44927B6871494046, x-amz-id-2: 6PtxgxrhMcCLXlVmKviqkFWT+jHmrg/hOvEqtJ1Z160m9O7aoxTYVnNq/OSGMkg9, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 25 Oct 2012 20:43:55 GMT, Connection: close, Server: AmazonS3] 
WARN  23:15:30,558 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately -1894 seconds. Retrying connection. 
INFO  23:15:31,373 GATKRunReport - Uploaded run statistics report to AWS S3 

And for the -T PrintReads I got:

WARN  01:32:59,987 RestStorageService - Error Response: PUT '/GATK_Run_Reports/rDrSv8aayCQwBzzwAC3R4P1NNhU8eNtF.report.xml.gz' -- 
ResponseCode: 403, ResponseStatus: Forbidden, Request Headers:
[Content-Length: 347, Content-MD5: /HRSKSCV6FIXe/03JWlMwQ==, Content-Type:
application/octet-stream, x-amz-meta-md5-hash:
fc7452292095e852177bfd3725694cc1, Date: Thu, 25 Oct 2012 23:32:58 GMT,
Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:ye/pDLSwNPHgH2kNUHKWCAt3YQ4=,
User-Agent: JetS3t/0.8.1 (Linux/2.6.32-71.29.1.el6.x86_64; amd64; en; JVM
1.6.0_17), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers:
[x-amz-request-id: 11DE8B74B2E028F3, x-amz-id-2:
/mV+Znu5Rq1j8tub42Y4CNz2lD5npQrtgFDM2OL5Tap3Whtt4rL4KOJLFtqhgNbA,
Content-Type: application/xml, Transfer-Encoding: chunked, Date: Thu, 25 Oct
2012 23:01:26 GMT, Connection: close, Server: AmazonS3] 
WARN  01:33:00,975 RestStorageService - Adjusted time offset in response to
RequestTimeTooSkewed error. Local machine and S3 server disagree on the time
by approximately -1893 seconds. Retrying connection. 

I will be grateful if you could let me know if I'm doing something wrong, and if should be worried about this.

Thanks a lot.

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,180Administrator, GATK Developer admin

    Hi there,

    It looks like the phone home feature is causing problems (likely to happen if you were running without an internet connection). See this article for an explanation and solution:

    http://www.broadinstitute.org/gatk/guide/article?id=1250

    Geraldine Van der Auwera, PhD

  • gilgigilgi Posts: 19Member

    Thanks a lot for the quick reply. I do have internet connection, so will try understand why it happens.

    I can use the output that were generated - right?

    Thanks.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,180Administrator, GATK Developer admin

    Yes, your output should be fine -- it's just the report upload that didn't work. It may just have been a transient issue on the amazon cloud side. If it doesn't happen again, don't worry about it.

    Geraldine Van der Auwera, PhD

  • didiercroesdidiercroes Posts: 4Member

    Hi Geraldine,

    Can you tell me why gatk try to upload a report to the amazon cloud. Is it for debugging purpose?

    Kind regards

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,180Administrator, GATK Developer admin

    Yes, and to collect general usage statistics. It helps us understand what tools people use the most and what are the most common error modes. It can be turned off if it causes trouble, but we do prefer to have it on because the data is useful and helps us improve the software.

    Geraldine Van der Auwera, PhD

  • shingwanshingwan Posts: 16Member

    To be honest, I don't mind the phone home feature as I do believe it will help to improve the tools, however, I am experiencing problem with it (I guess) where more than half of my runs fail with the following error:

    WARN  12:24:45,542 RestStorageService - Error Response: PUT '/GATK_Run_Reports/hGXmuykExcJeliySbqP0nxFdoFtbIC9d.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 402, Content-MD5: 40vJhBYx0ynIeLx3O/8BqQ==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: e34bc9841631d329c878bc773bff01a9, Date: Mon, 20 May 2013 04:24:43 GMT, Authorization: AWS AKIAIMHBU7X642TCHQ2A:xrZWTOIxf8j5dHmJtT61cUFMYvA=, User-Agent: JetS3t/0.8.1 (Linux/3.2.0-26-generic; amd64; en; JVM 1.6.0_24), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: A5080602D6BE61D6, x-amz-id-2: 0a31I06+3RWGOjEjMhojsioYHbyjK8u5jxykQqSrtjg+SmggliXl50HkZqKTr5Zh, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Mon, 20 May 2013 04:03:34 GMT, Connection: close, Server: AmazonS3]
    WARN  12:24:46,516 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately -1271 seconds. Retrying connection.
    

    As we are using a server cluster and I am not the administrator, I really have no idea how to solve this problem locally, I guess that only leave me with the option to switch off the phone home feature?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,180Administrator, GATK Developer admin

    Hi @shingwan,

    Normally the failure of the phone home connection shouldn't be making the run itself fail. What you see there is just a warning about the connection failure, but if your run failed there should be more information in the console output about what is the actual problem that caused the failure.

    Geraldine Van der Auwera, PhD

  • shingwanshingwan Posts: 16Member

    @Geraldine_VdAuwera, thank you for your information. However, whenever my job fail, that is the only warning that I saw. Most of the time, a re-run will solve the problem (as in, just use the same parameter and execute the queue) . That makes me confuse. I am trying to reproduce the error and maybe able to send the error log out.

  • theomarkertheomarker qingdaoPosts: 2Member
    edited August 14

    @Geraldine_VdAuwera I meet the same problem as discussed above. My output is fine, but when I try to find the conordance between the result of gatk and samtools, I got an error:

    ERROR MESSAGE: Your input file has a malformed header: Count < 0 for fixed size VCF header field PL

    And the error is in the attachment. The first pircture is what I got when I did variant calling. The second is part of my snp vcf file. I hope both can do some help. Thank you for your help.

    error1.png
    1306 x 191 - 82K
    error2.png
    1305 x 669 - 242K
    Post edited by theomarker on
  • SheilaSheila Broad InstitutePosts: 428Member, GATK Developer, Broadie, Moderator admin

    @theomarker

    Hello,

    What version of GATK are you using? This looks like an old bug.

    If the issue is just with your header, you can simply re-header it.

    -Sheila

  • theomarkertheomarker qingdaoPosts: 2Member
    edited August 15

    @Sheila many thanks for your quick reply. And I still need your help. My gatk version is the latest. And is re-header means using picard tools(AddOrReplaceReadGroups) to re-header? If so, I have made a try but it still doesn't work. And I may explain my problem more detailed: I follow the workflow: Calling variants in RNAseq in the gatk website. After "split”N”trim and reasign mapping qualities", I got the bam file. Then I use this bam file to call variant using gatk and samtools and I get 2 raw vcf files. I try to find the conordance between these two vcf files and I got the error: ERROR MESSAGE: Your input file has a malformed header: Count < 0 for fixed size VCF header field PL the following is part of my vcf files: snp.samtools.vcf:

    ##fileformat=VCFv4.1
    ##samtoolsVersion=0.1.13 (r926:134)
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
    ##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
    ##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
    ##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability that sample chromosomes are not all the same">
    ##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the site allele frequency of the first ALT allele">
    ##INFO=<ID=CI95,Number=2,Type=Float,Description="Equal-tail Bayesian credible interval of the site allele frequency at the 95% level">
    ##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
    ##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
    ##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
    ##FORMAT=<ID=PL,Number=-1,Type=Integer,Description="List of Phred-scaled genotype likelihoods, number of values is (#ALT+1)*(#ALT+2)/2">
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR1003776
    NR_002765       113     .       A       G       5.46    .       DP=7;AF1=0.4999;CI95=0.5,0.5;DP4=4,1,1,1;MQ=60;FQ=7.8;PV4=1,0.0056,1,0.44    
    

    snp.gatk.vcf:

    ##fileformat=VCFv4.1
    ##FILTER=<ID=FS,Description="FS > 30.0">
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FILTER=<ID=QD,Description="QD < 2.0">
    ##FILTER=<ID=SnpCluster,Description="SNPs found in clusters">
    ##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##GATKCommandLine=<ID=HaplotypeCaller,Version=3.2-2-gec30cee,Date="Thu Aug 14 22:13:55 CST 2014",Epoch=1408025635745,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[split.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/mnt/ext2/chenjh/reference_gatk/refMrna.fa nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_w
    

    I am confused why ID:PL has so many lines of details. And what confuse me as well is it has many lines of ##contig:

    ##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
    ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
    ##contig=<ID=NM_001013643,length=386>
    ##contig=<ID=NM_024058,length=1214>
    ##contig=<ID=NM_001037802,length=891>
    ##contig=<ID=NM_001080438,length=1023>
    ##contig=<ID=NR_029462,length=608>
    ##contig=<ID=NM_032102,length=4349>
    ##contig=<ID=NR_002765,length=1179>
    ##contig=<ID=NR_002179,length=166>
    ##contig=<ID=NR_024620,length=1262>
    ##contig=<ID=NR_026770,length=780>
    ##contig=<ID=NR_026865,length=2299>
    ##contig=<ID=NR_033971,length=1656>
    ##contig=<ID=NR_103559,length=727>
    ##contig=<ID=NR_024207,length=7119>
    ##contig=<ID=NR_104143,length=1440>
    ##contig=<ID=NR_036487,length=1144>
    ##contig=<ID=NR_037885,length=1084>
    ##contig=<ID=NR_046283,length=575>
    ##contig=<ID=NR_028393,length=7682>
    ##contig=<ID=NR_037665,length=3881>
    

    Thank you for your help.

    Post edited by theomarker on
  • SheilaSheila Broad InstitutePosts: 428Member, GATK Developer, Broadie, Moderator admin

    @theomarker

    Hello,

    The issue is that in snp.samtools.vcf, the PL number =-1. In the snp.gatk.vcf, the PL number =G, so there is a mismatch.

    You can just edit the snp.samtools.vcf PL number to G, and I think that should fix this issue.

    You can also refer here for vcf format specifications: http://www.1000genomes.org/wiki/Analysis/Variant Call Format/vcf-variant-call-format-version-41 Specifically, look under 1. Meta-information lines for an idea of what the Number entry is.

    I hope this helps!

    -Sheila

Sign In or Register to comment.