Genotype refinement with incomplete "trios"

pradeepnpradeepn MGHPosts: 6Member

I am doing a WGS project on a family with seven siblings. We have data on the mother but the father passed many years ago. I tried splitting variant recalibrated vcf file and ped file into "trios" with just the mother and a sibling (seven times) then running PhaseByTransmission on the combined vcf. The job was successfully completed but nothing appears phased (all "/," and no "|") in the output vcf. I also tried the variant recalibrated vcf file separately with ReadBackedPhasing. The job was successfully completed as well but again nothing appears phased (all "/" and no "|" or assigned "PQ" scores). The ProduceBeagleInput walker (to use Beagle for genotype refinement) appears to only support unrelated individuals and my set involves related individuals. Do you have any other suggestions for phasing incomplete "trios?" Thanks in advance!

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Unfortunately that's all the tools we have for phasing. You may want to examine more closely why the phasing didn't seem to work at all, that's a little worrying.

    Geraldine Van der Auwera, PhD

  • pradeepnpradeepn MGHPosts: 6Member

    By the log files, the job was completed successfully without errors. Raw VCFs were generated by UnifiedGenotyper and VQSR (for SNPs and indels) was conducted without issues. For our study, phasing is less critical but I'm trying to understand why the log suggested that the job was "successful" but there's no phasing. I presumed because the trios I created weren't full trios (ie, just mother and offspring). I tried this in parallel (splitting the PED file and subsetting the VCF for each "trio," running PhaseByTransmission, then combining) as well as in series (splitting the PED file for each "trio", then running PhaseByTransmission sequentially on the VCF for each "trio" pedigree). Both methods yielded no errors but neither resulted in a phased VCF. Additionally, no errors with ReadBackedPhasing on the recalibrated VCF but I was surprised by the lack of phasing - is this because of lack of the father and small sample size? Any other thoughts on why a job might be "successful" yet have no phasing...?


    Here's what I ran for PhaseByTransmission in parallel:

    1. Split each "trio":
      
    java
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T SelectVariants \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    --variant vcf/PCAD_01.UG.hg19.recal.SNPs.indels.vcf \
    -sn <$mother IID> \
    -sn <$offspring IID> \
    -o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf
      
    
    1. Ran PhaseByTransmission on each split VCF:
      
    java
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T PhaseByTransmission \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    -V vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf \
    -ped phenotypes/PCAD_01.ped \
    -o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.pbt.<$trio_name>.vcf
      
    
    1. Combined "trios":
      
    java
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T CombineVariants \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio1>.vcf \
    <$remaining_trios> \
    --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio7>.vcf \
    -o vcf/PCAD_01.UG.hg19.recal.SNPs.indels.pbt-trio_splits.vcf
      
    

    And here is what I run for ReadBackedPhasing (--respectPhaseInInput didn't make a difference since I ran this independently of PhaseByTransmission):

      
    java
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T ReadBackedPhasing \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    -V vcf/PCAD_01.UG.hg19.recal.SNPs.indels.vcf \
    -I ../reduced_bams/reduced_bam.list \
    -o vcf/PCAD_01.UG.hg19.recal.SNPs.indels.rbp.vcf
      
    
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    Not sure about PhaseByTransmission but I'm pretty sure Read-backed phasing doesn't work on reduced bams, since all your reads get collapsed into consensus reads.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,643Administrator, GATK Developer admin

    As for "job successful" vs. "didn't get the desired result", imagine you are reading a list of words looking for the word "yes". You can successfully read through the list of words, but if the word "yes" is not in it, you won't get the result you wanted. Here, not finding any information that can be phased is not considered a failure for the program. The operation did not fail, it's just that there was nothing for the program to operate on. Make sense?

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.