interval output in DiagnoseTargets

moranmoran The BroadMember

Hello Geraldine,

I've ran DiagnoseTargets with the following command:
java -jar GenomeAnalysisTK.jar -T DiagnoseTargets -R Final.HMP.fasta -o diag.output.vcf -I bam.list -L interval.list

But for some reason, my output is only one line per chromosome. there are two output files: diag.output.vcf, diag.output.vcf.idx.

How come there is no file for interval outputs?

My interval names look like this: lcl|NC_011004.1-5739, is there a problem with this naming? (I hope not..)

Thanks again for all your help!
Moran.

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    What does the content of the vcf file look like? Can you post a few lines?

    Keep in mind also that your intervals have to be separated by one or more bases. If they are contiguous, the GATK engine will merge them ad threat them as a single interval.

  • moranmoran The BroadMember
    edited October 2013

    Not sure I understand the second issue. they do not overlap, but they are "touching" one another:

    lcl|NC_011004.1 5740001 5741000 + lcl|NC_011004.1-5741

    lcl|NC_011004.1 5741001 5742000 + lcl|NC_011004.1-5742

    Will these be combined?

    The vcf file looks like this:

    lcl|NC_007925.1 1 . G

    . NO_READS END=5513000;IDP=0.015 FT:IDP:LL:ZL NO_READS:9.467e-04:45324:5467462NO_READS:2.877e-04:20821:5492162 NO_READS:1.616e-04:18517:5494483 NO_READS:3.162e-04:27253:5485725 NO_READS:2.137e-04:22558:5490441

    lcl|NC_007958.1 1 . A

    . NO_READS END=4892000;IDP=0.010 FT:IDP:LL:ZL NO_READS:6.504e-04:37026:4854693NO_READS:2.933e-04:17978:4873976 NO_READS:9.608e-05:17124:4874842 NO_READS:2.418e-04:23277:4868683 NO_READS:1.372e-04:19921:4872061

    lcl|NC_007778.1 1 . T

    . NO_READS END=5331000;IDP=0.012 FT:IDP:LL:ZL NO_READS:5.959e-04:42570:5288300NO_READS:3.669e-04:19353:5311556 NO_READS:1.043e-04:17931:5313069 NO_READS:9.829e-05:24064:5306936 NO_READS:2.127e-04:21243:5309757

    (it gets messed up in the preview, but there are 3 lines of text here)

    Thanks!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Yep, that's the issue -- because they are "touching", the GATK fuses them into a single interval. You have to skip a position between them. The -L interval list functionality was designed for passing exons/ exome targets, which don't normally touch. It's not really meant for cutting up the genome into contiguous chunks.

Sign In or Register to comment.