To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Genotype refinement with incomplete "trios"

I am doing a WGS project on a family with seven siblings. We have data on the mother but the father passed many years ago. I tried splitting variant recalibrated vcf file and ped file into "trios" with just the mother and a sibling (seven times) then running PhaseByTransmission on the combined vcf. The job was successfully completed but nothing appears phased (all "/," and no "|") in the output vcf.
I also tried the variant recalibrated vcf file separately with ReadBackedPhasing. The job was successfully completed as well but again nothing appears phased (all "/" and no "|" or assigned "PQ" scores).
The ProduceBeagleInput walker (to use Beagle for genotype refinement) appears to only support unrelated individuals and my set involves related individuals. Do you have any other suggestions for phasing incomplete "trios?"
Thanks in advance!


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Unfortunately that's all the tools we have for phasing. You may want to examine more closely why the phasing didn't seem to work at all, that's a little worrying.

  • pradeepnpradeepn MGHMember

    By the log files, the job was completed successfully without errors. Raw VCFs were generated by UnifiedGenotyper and VQSR (for SNPs and indels) was conducted without issues. For our study, phasing is less critical but I'm trying to understand why the log suggested that the job was "successful" but there's no phasing. I presumed because the trios I created weren't full trios (ie, just mother and offspring). I tried this in parallel (splitting the PED file and subsetting the VCF for each "trio," running PhaseByTransmission, then combining) as well as in series (splitting the PED file for each "trio", then running PhaseByTransmission sequentially on the VCF for each "trio" pedigree). Both methods yielded no errors but neither resulted in a phased VCF. Additionally, no errors with ReadBackedPhasing on the recalibrated VCF but I was surprised by the lack of phasing - is this because of lack of the father and small sample size?
    Any other thoughts on why a job might be "successful" yet have no phasing...?

    Here's what I ran for PhaseByTransmission in parallel:

    1. Split each "trio":
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T SelectVariants \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    --variant vcf/PCAD_01.UG.hg19.recal.SNPs.indels.vcf \
    -sn <$mother IID> \
    -sn <$offspring IID> \
    -o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf
    1. Ran PhaseByTransmission on each split VCF:
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T PhaseByTransmission \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    -V vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf \
    -ped phenotypes/PCAD_01<trio_name>.ped \
    -o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.pbt.<$trio_name>.vcf
    1. Combined "trios":
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T CombineVariants \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio1>.vcf \
    <$remaining_trios> \
    --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio7>.vcf \
    -o vcf/PCAD_01.UG.hg19.recal.SNPs.indels.pbt-trio_splits.vcf

    And here is what I run for ReadBackedPhasing (--respectPhaseInInput didn't make a difference since I ran this independently of PhaseByTransmission):

    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T ReadBackedPhasing \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    -V vcf/PCAD_01.UG.hg19.recal.SNPs.indels.vcf \
    -I ../reduced_bams/reduced_bam.list \
    -o vcf/PCAD_01.UG.hg19.recal.SNPs.indels.rbp.vcf
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Not sure about PhaseByTransmission but I'm pretty sure Read-backed phasing doesn't work on reduced bams, since all your reads get collapsed into consensus reads.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    As for "job successful" vs. "didn't get the desired result", imagine you are reading a list of words looking for the word "yes". You can successfully read through the list of words, but if the word "yes" is not in it, you won't get the result you wanted. Here, not finding any information that can be phased is not considered a failure for the program. The operation did not fail, it's just that there was nothing for the program to operate on. Make sense?

Sign In or Register to comment.