If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra

Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

a deletion with two names

Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

Hi @bhandsaker

Genomestrip/2.0 deletion disco & geno pipeline, generated a record with two variant IDs which is a bit confusing.

chr2L 4240777 chr2L.DEL716,chr2L.DEL717 G <DEL> . PAIRSPERSAMPLE AC=24;AF=0.054;AN=442;GSDEPTHRANKSUMPVALUE=NA;GSDMIN=1;GSOUTLEFT=0;GSOUTLIERS=0;GSOUTRIGHT=0;IMPRECISE;SVTYPE=DEL;set=FilteredInAll GT:CNF:FT:GP:GQ:PL 0/0:2.0916:PASS:-0.00,-5.59,-76.43:56:0,

I was wondering what the cause of this was, if I'd lose much information if I just deleted the second name.

Note I called chromosomes separately, then prefixed varIDs with chr name.




Best Answer


  • Will_GilksWill_Gilks University of Sussex, UKMember ✭✭

    Thanks, good points. Using GATK, I concatenated vcfs from each chromosome, then filtered by AN and AC. The END attributes are retained for normal variant records, just not this example. In the virgin vcf, the two records which later become merged have the same start positions but different end positions.

    chr2L 4240777 chr2L.DEL716 G <DEL> . PAIRSPERSAMPLE CIEND=-19,20;CIPOS=-19,20;END=4241591;GSCOHERENCE=-3.0589275945276326;GSCOHFN=-1.0196425315092108;GSCOHPVALUE=0.4037;GSCOORDS=4240719,4240847,4241561,4241682;GSDEPTHCALLTHRESHOLD=0.7454414759596009;GSDEPTHNOBSSAMPLES chr2L 4240777 chr2L.DEL717 G <DEL> . PAIRSPERSAMPLE CIEND=-22,22;CIPOS=-22,22;END=4241688;GSCOHERENCE=-1.668324719820941;GSCOHFN=-0.8341623599104705;GSCOHPVALUE=0.5008;GSCOORDS=4240719,4240849,4241660,4241781;GSDEPTHCALLTHRESHOLD=0.7144326762895573;GSDEPTHNOBSSAMPLES=

    I suppose that this is biologically possible though they seem to be fairly low frequency and not in LD. Also I suppose that GATK isn't really designed for CNV data so might be doing strange things. I shall investigate further with IGV.

    By the way, I've checked the general format of genomestrip vcf with NCBI dbVar and they're happy with it. Also for deletions with high call-rate, I'm finding it very difficult to find any errors/anomalies/artefacts though singletons seem to be harder to judge.

Sign In or Register to comment.