Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Germline short variant discovery with GATK4, exome sequencing, single sample

freekfreek Member
edited June 2018 in Ask the GATK team

Dear GATK team,

I'm implementing "Germline short variant discovery" (we often do not have matched normal samples, otherwise I'd go for the somatic pipeline) with GATK4. We do exome sequencing and we generally look at only 1 or a couple of samples at a time.

Thus the following sentence in joint-discovery-gatk4-local.wdl has me a bit concerned:

## - Bare minimum 1 WGS sample or 30 Exome samples. Gene panels are not supported.

How hard is this requirement for 30 samples minimum and what step exactly requires it? What would you recommend if I just have a single BAM file from a single exome sequenced patient/sample, made using processing-for-variant-discovery-gatk4.wdl?

Highest regards,

Freek.

Best Answer

  • SkyWarriorSkyWarrior Turkey ✭✭✭
    Accepted Answer

    This part is only for the variant score recalibration. If you have single sample variant filtering step can be done via hard filtering. Once you accumulate more and more samples then you can start combining your gvcfs and joint genotype them for proper variant filtering.

    The only part you may need to modify is to disable gvcf for haplotypecaller since you have a single sample exome.

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    Accepted Answer

    This part is only for the variant score recalibration. If you have single sample variant filtering step can be done via hard filtering. Once you accumulate more and more samples then you can start combining your gvcfs and joint genotype them for proper variant filtering.

    The only part you may need to modify is to disable gvcf for haplotypecaller since you have a single sample exome.

  • student-tstudent-t SydneyMember
    edited June 2018

    DELETED

    Post edited by student-t on
  • freekfreek Member

    @SkyWarrior

    Thank you for your swift reply.

    If I understand correctly (see what-is-a-gvcf-and-how-is-it-different-from-a-regular-vcf), a GVCF contains info on all bases, not just the non-ref ones (but they are smartly grouped into genomic regions to reduce file size), a regular VCF only contains non-ref bases. A GVCF is needed to build a comprehensive variant quality score recalibration model, moreover, the GVCF needs to contain a lot of data to do this properly. If I would create a GVCF from a single sample it would not contain enough data to do proper variant quality score recalibration, consequently using GVCF is a waste of space and doing variant quality score recalibration is a waste of compute power. And thus you recommend me to make a regular VCF, meaning I can drop the following option:

    --emit-ref-confidence GVCF
    

    often used as:

    -ERC GVCF
    

    from HaplotypeCaller.

    Subsequently I can thus also skip IndelsVariantRecalibrator, SNPsVariantRecalibratorCreateModel, SNPsVariantRecalibrator and ApplyRecalibration (see joint-discovery-gatk4-local.wdl)

    Meaning I can stop after HardFilterAndMakeSitesOnlyVcf. The down side would be a non optimal estimation of the confidence of the called variants, but I have no other option. At least, until I do have a large amount of exome samples to group together.

    Highest regards,

    Freek.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Exactly my thoughts.

  • freekfreek Member
    edited June 2018

    Ok, thanx for the reply, I'm a bit further now and I have encountered some more things that are unclear to me and I hope you can help me.

    So far I have done GATK4 preprocessing on my fastq files (mapping, generating bqsr models per lane, do the correction, then merge samples into 1 BAM file per sample with correct read group information). I have used this wdl as my guide: PairedEndSingleSampleWf.gatk4.0.wdl. Said wdl also describes initial variant calling which I also implemented. I did implemented it as discussed above, I generated a VCF file, not a GVCF file: 1 per BAM file (and thus per sample).

    Now I'm switching to JointGenotypingWf.wdl to do variant calling on my WES data. This wdl file starts with ImportGVCFs, in which an "output_genomicsdb" is generated, as far as I can tell this is a tar-ed up directory with a lot GVCF files put together into some structure. On this file GenotypeGVCFs is run during which the GVCF (or a single sample GVCF) are genotyped, the description reads:

    Perform joint genotyping on a single-sample GVCF from HaplotypeCaller or a multi-sample GVCF from CombineGVCFs or GenomicsDBImport
    

    Now I'm guessing Genotyping here means adding information whether 0,1 or 2 alleles contain a variant to the (G)VCF? My questions is: Should I run the ImportGVCFs on my VCF or can I also input my VCF (not GVCF) into GenotypeGVCFs?

    After this I will do the HardFilterAndMakeSitesOnlyVcf steps which are straightforward steps and if I understand correctly it is just a quality cutoff followed by throwing out all sites that are not deviating from the reference genome... (although my VCF should already not have such sites, right?).

    And following this would normally be the VariantRecalibrator steps which I will skip for now.

    Highest regards,

    Freek.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    With a single sample you don't need to import anything. Just genotype the only gvcf file you have. Once you start gathering multiples then you need to import or combine then genotype.

  • freekfreek Member

    Thanx for the rapid reply but the GenotypeGVCFs script just crashed:

    A USER ERROR has occurred: The list of input alleles must contain as an allele but that is not the case at position 856883; please use the Haplotype Caller with gVCF output to generate appropriate records.

    So I guess I do need to make a GVCF (using -ERC GVCF for HaplotypeCaller) as GenotypeGVCFs expects it...

    Anyway, for now, have a very nice weekend!

Sign In or Register to comment.