The current GATK version is 3.6-0
Examples: Monday, today, last week, Mar 26, 3/26/04

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# Genotype refinement with incomplete "trios"

MGHMember Posts: 11

I am doing a WGS project on a family with seven siblings. We have data on the mother but the father passed many years ago. I tried splitting variant recalibrated vcf file and ped file into "trios" with just the mother and a sibling (seven times) then running PhaseByTransmission on the combined vcf. The job was successfully completed but nothing appears phased (all "/," and no "|") in the output vcf.
I also tried the variant recalibrated vcf file separately with ReadBackedPhasing. The job was successfully completed as well but again nothing appears phased (all "/" and no "|" or assigned "PQ" scores).
The ProduceBeagleInput walker (to use Beagle for genotype refinement) appears to only support unrelated individuals and my set involves related individuals. Do you have any other suggestions for phasing incomplete "trios?"

Tagged:

Unfortunately that's all the tools we have for phasing. You may want to examine more closely why the phasing didn't seem to work at all, that's a little worrying.

Geraldine Van der Auwera, PhD

• MGHMember Posts: 11

By the log files, the job was completed successfully without errors. Raw VCFs were generated by UnifiedGenotyper and VQSR (for SNPs and indels) was conducted without issues. For our study, phasing is less critical but I'm trying to understand why the log suggested that the job was "successful" but there's no phasing. I presumed because the trios I created weren't full trios (ie, just mother and offspring). I tried this in parallel (splitting the PED file and subsetting the VCF for each "trio," running PhaseByTransmission, then combining) as well as in series (splitting the PED file for each "trio", then running PhaseByTransmission sequentially on the VCF for each "trio" pedigree). Both methods yielded no errors but neither resulted in a phased VCF. Additionally, no errors with ReadBackedPhasing on the recalibrated VCF but I was surprised by the lack of phasing - is this because of lack of the father and small sample size?
Any other thoughts on why a job might be "successful" yet have no phasing...?

Here's what I ran for PhaseByTransmission in parallel:

1. Split each "trio":
  
java
-T SelectVariants \
-R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
-sn <$mother IID> \ -sn <$offspring IID> \
-o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf   1. Ran PhaseByTransmission on each split VCF:   java -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \ -T PhaseByTransmission \ -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \ -V vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf \
-o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.pbt.<$trio_name>.vcf   1. Combined "trios":   java -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \ -T CombineVariants \ -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \ --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio1>.vcf \
<$remaining_trios> \ --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio7>.vcf \



And here is what I run for ReadBackedPhasing (--respectPhaseInInput didn't make a difference since I ran this independently of PhaseByTransmission):

  
java
-R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
-I ../reduced_bams/reduced_bam.list \