The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Got a problem?

1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?

Then follow instructions in Article#1894.

Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.10.2 is now available. As of 2.10.0, Picard supports NovaSeq CBCL data. Download and read release notes at
**GATK4-BETA.2** is here. That's TWO, as in the second beta release. Be sure to read about the known issues before test driving. See Article#9881 to start and for details.

Genotype refinement with incomplete "trios"

I am doing a WGS project on a family with seven siblings. We have data on the mother but the father passed many years ago. I tried splitting variant recalibrated vcf file and ped file into "trios" with just the mother and a sibling (seven times) then running PhaseByTransmission on the combined vcf. The job was successfully completed but nothing appears phased (all "/," and no "|") in the output vcf.
I also tried the variant recalibrated vcf file separately with ReadBackedPhasing. The job was successfully completed as well but again nothing appears phased (all "/" and no "|" or assigned "PQ" scores).
The ProduceBeagleInput walker (to use Beagle for genotype refinement) appears to only support unrelated individuals and my set involves related individuals. Do you have any other suggestions for phasing incomplete "trios?"
Thanks in advance!


  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Unfortunately that's all the tools we have for phasing. You may want to examine more closely why the phasing didn't seem to work at all, that's a little worrying.

  • pradeepnpradeepn MGHMember

    By the log files, the job was completed successfully without errors. Raw VCFs were generated by UnifiedGenotyper and VQSR (for SNPs and indels) was conducted without issues. For our study, phasing is less critical but I'm trying to understand why the log suggested that the job was "successful" but there's no phasing. I presumed because the trios I created weren't full trios (ie, just mother and offspring). I tried this in parallel (splitting the PED file and subsetting the VCF for each "trio," running PhaseByTransmission, then combining) as well as in series (splitting the PED file for each "trio", then running PhaseByTransmission sequentially on the VCF for each "trio" pedigree). Both methods yielded no errors but neither resulted in a phased VCF. Additionally, no errors with ReadBackedPhasing on the recalibrated VCF but I was surprised by the lack of phasing - is this because of lack of the father and small sample size?
    Any other thoughts on why a job might be "successful" yet have no phasing...?

    Here's what I ran for PhaseByTransmission in parallel:

    1. Split each "trio":
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T SelectVariants \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    --variant vcf/PCAD_01.UG.hg19.recal.SNPs.indels.vcf \
    -sn <$mother IID> \
    -sn <$offspring IID> \
    -o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf
    1. Ran PhaseByTransmission on each split VCF:
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T PhaseByTransmission \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    -V vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio_name>.vcf \
    -ped phenotypes/PCAD_01<trio_name>.ped \
    -o vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.pbt.<$trio_name>.vcf
    1. Combined "trios":
    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T CombineVariants \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio1>.vcf \
    <$remaining_trios> \
    --variant vcf/trio_splits/PCAD_01.UG.hg19.recal.SNPs.indels.<$trio7>.vcf \
    -o vcf/PCAD_01.UG.hg19.recal.SNPs.indels.pbt-trio_splits.vcf

    And here is what I run for ReadBackedPhasing (--respectPhaseInInput didn't make a difference since I ran this independently of PhaseByTransmission):

    -jar ../tools/GATK/GenomeAnalysisTK-nightly-2013-07-08-g05eadc3/GenomeAnalysisTK.jar \
    -T ReadBackedPhasing \
    -R /seq/references/Homo_sapiens_assembly19/v1/Homo_sapiens_assembly19.fasta \
    -V vcf/PCAD_01.UG.hg19.recal.SNPs.indels.vcf \
    -I ../reduced_bams/reduced_bam.list \
    -o vcf/PCAD_01.UG.hg19.recal.SNPs.indels.rbp.vcf
  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Not sure about PhaseByTransmission but I'm pretty sure Read-backed phasing doesn't work on reduced bams, since all your reads get collapsed into consensus reads.

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    As for "job successful" vs. "didn't get the desired result", imagine you are reading a list of words looking for the word "yes". You can successfully read through the list of words, but if the word "yes" is not in it, you won't get the result you wanted. Here, not finding any information that can be phased is not considered a failure for the program. The operation did not fail, it's just that there was nothing for the program to operate on. Make sense?

Sign In or Register to comment.