ContEst

Hello,

I would like to use the ContEst and I was wondering from where I can derive the populationSites.interval_list
I have the intervalListFromRTC.intervals for the samples from RealignerTargetCreator but these are not the populationSites interval. Right?

Thank you.

Issue · Github
by Sheila

Issue Number
746
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
chandrans

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alejandra
    Hi,

    Sorry for the delay. I am consulting with the team. We will get back to you soon.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @alejandra
    Hi,

    You can just use the popfile VCF as list of sites to look at. :smile:

    -Sheila

  • alejandraalejandra spainMember

    ok thanks. I'll have a go

  • sespiritusespiritu Toronto, ON, CanadaMember

    Hello,

    In GATK v3.4.0, I've been running ContEst externally (since it wasn't incorporated in GATK yet) and has been running smoothly. I've noticed that in GATK v3.5.0, the output VCF formats of HaplotypeCaller and GenotypeVCFs are both v4.2. When I try to run my pipeline, running ContEst externally, I get an error:

    ERROR MESSAGE: Your input file has a malformed header: VCFv4.2 is not a supported version

    I also tried to run ContEst through GATK to see if the issue is fixed using "java -jar GenomeAnalysisTK.jar -T ContEst" command, but I get a different error:

    ERROR MESSAGE: Could not create module String because an exception of type NullPointerException occurred caused by exception null

    Thank you
    Shadrielle

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sespiritu
    Hi Shadrielle,

    Can you please post the exact command you ran in GATK?

    Thanks,
    Sheila

  • sespiritusespiritu Toronto, ON, CanadaMember

    @Sheila

    The GATK command:

    java -Xmx10g -jar GenomeAnalysisTK.jar -T ContEst -R hs37d5.fa -I --genotypes <SNPs.vcf> --popfile b37_population_stratified_af_hapmap_3.3.vcf.gz --lane_level_contamination LANE -isr INTERSECTION -o output.tsv

    Couple notes: this run is currently on exome data, the SNPs.vcf is filtered from the GenotypeGVCF step, and the population file came with the ContEst package.

    Thank you

    Issue · Github
    by Sheila

    Issue Number
    810
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sespiritu
    Hi,

    Can you submit a bug report? Please be sure to include a copy of the popfile. Instructions are here.

    Thanks,
    Sheila

  • sespiritusespiritu Toronto, ON, CanadaMember
    edited April 2016

    @Sheila

    I have uploaded the tar file: sespiritu_ContEst_bugs.tar.gz

    Thank you

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sespiritu
    Hi,

    Thank you. I will have a look soon and let you know when I submit a bug report.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @sespiritu
    Hi,

    The error message you are getting seems to be because --lane_level_contamination LANE is not an option. Have a look at the options here. When you fix that (P.S. Set it to READGROUP instead of LANE), you will get another error message: ##### ERROR MESSAGE: The provided VCF file is malformed at approximately line number 4: The VCF specification does not allow for whitespace in the INFO field. It looks like a tab got converted to a space.

    Once you fix that, I think ContEst will run fine for you :smile:

    -Sheila

  • sespiritusespiritu Toronto, ON, CanadaMember

    Ah yes thanks for pointing that out. Everything's working well! Thank you for the help Sheila.

  • achalneupaneachalneupane Member
    edited June 2016

    Hi Sheila,
    I have finished running ContEst tool, but keep getting this error below, could you please explain what's causing this? The output I got has only the header line.

    INFO 11:24:50,318 TraversalEngine - -> 1071322 reads (2.93% of total) failing DuplicateReadFilter INFO 11:24:50,319 TraversalEngine - -> 9567 reads (0.03% of total) failing UnmappedReadFilter WARN 11:24:52,049 RestStorageService - Error Response: PUT '/GATK_Run_Reports/AOq57TYVv9XdotgrCknseGPfni4zIsNL.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 1319, Content-MD5: ti83pGkhq2oc8Jtp//UZsw==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: b62f37a46921ab6a1cf09b69fff519b3, Date: Fri, 03 Jun 2016 01:24:50 GMT, Authorization: AWS AKIAJXU7VIHBPDW4TDSQ:jZCevAW/CXHt5E6D7IGGGFrgC9g=, User-Agent: JetS3t/0.8.0 (Linux/2.6.32-573.26.1.el6.x86_64; amd64; en; JVM 1.6.0_30), Host: s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: CC2BE27E2DFEA122, x-amz-id-2: o/3oqAh4k8T/k1OPnRsNiWUL0DUc+opVIjIXfS+GLqmIGblITVM9YwF5yEcVXP1nwOE8fh5fBi4=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Fri, 03 Jun 2016 01:24:51 GMT, Connection: close, Server: AmazonS3]

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @achalneupane
    Hi,

    I have responded here :smiley:

    -Sheila

Sign In or Register to comment.