It looks like you're new here. If you want to get involved, click one of these buttons!
To call variants with the GATK using pedigree information, you should base your workflow on the Best Practices recommendations -- the principles detailed there all apply to pedigree analysis.
But there is one crucial addition: you should make sure to pass a pedigree file (PED file) to all GATK walkers that you use in your workflow. Some will deliver better results if they see the pedigree data.
At the moment there are two of the standard annotations affected by pedigree:
In the specific case of trios, an additional GATK walker, PhaseByTransmission, should be used to obtain trio-aware genotypes as well as phase by descent.
The annotations mentioned above have been adapted for PED files starting with GATK v.1.6. If you already have VCF files generated by an older version of the GATK or have not passed a PED file while running the UnifiedGenotyper or VariantAnnotator, you should do the following:
-G StandardAnnotation to VariantAnnotator. Make sure you pass your PED file to the VariantAnnotator as well!The PED files used as input for these tools are based on PLINK pedigree files. The general description can be found here.
For these tools, the PED files must contain only the first 6 columns from the PLINK format PED file, and no alleles, like a FAM file in PLINK.
Comments
I've been looking all over for how to add a PED file to my VariantAnnotator run. I don't see an explanation on the VariantAnnotator page or here. I've tried to use the -list function to look at possible annotations, but I don't see ped file as an option. How should I pass a ped file to VariantAnnotator to re-annotate variants from an old version GATK run that wasn't originally run with a ped file?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •PED files are passed through an engine argument, they're not tool-specific: see http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_CommandLineGATK.html
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thank you for the very prompt response. The program is now running away happily (hopefully :).
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Geraldine, can you tell me whats wrong with my PED file? Here it is attached! Ped file (A family with, father(NA00001), mother(NA00002), sonNA00003), daugther(NA00004)) the son has ALS disease for example.
(number sign)Phenotype (-9=missing; 0=missing; 1=unaffected; 2=affected)
FAM001 NA00001 0 0 1 1
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Can I put the cousin also as: - FAM002 NA00005 0 0 2 1 Can I have only sisters and brothers in a ped file without having mother and father? and last question: is better to put the ped file at the time of variant calling with unifiedgenotyper?
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi @alirezakj,
Actually I think PED files can only contain trios, so if you want to phase siblings you have to put them in as different families (though obviously with the same parents). So FAM1 would have Mom, Dad and Kid1, FAM2 would have Mom, Dad and Kid2, and so on. Not sure what to do about cousins though.
Some GATK tools use the PED files and some don't. The simplest is to pass your PED file to every tool; those that can use it will do so, and those that can't will just ignore it.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thank you so much Geraldine, very helpful. Three more questions:
Thank you so much you are being so helpful.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •From your explanation I understand, for a family of four (father "F", mother "M", son "S", daughter "D" affected) the Ped file should look like the following:
Am I right? Thanks
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •To your previous three questions:
Set 1 if you know they are unaffected, 0 if you don't know.
By default validation is set to STRICT. Some people choose to use SILENT for various reasons, for example if they are using a BAM file containing a large cohort of individuals, but they are only analyzing one family trio. If they used STRICT, the program would complain that all the other samples are lacking pedigree information. If you don't care about those other samples then this is a bother. These are the validation options:
This is typically a programmer's decision to allow different values to mean the same thing; in this case I don't know why they chose this. It doesn't really matter, just pick one and be sure to always use the same one in all your work to avoid confusion.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, I just created a ped file in order to use it for the variant calling. However I'm receiving the following error:
INFO 12:45:21,838 PedReader - Reading PED file /Data/samples.ped with missing fields: []
INFO 12:45:21,949 PedReader - Phenotype is other? false
I already saw some comments about this error which might occur due to a missformed ped file. However I could not find the problem with my ped file. It looks like this, separated by tabs.
FAM001 S00002 0 0 1 1
FAM001 S00003 0 0 2 1
FAM001 S00001 S00002 S00003 1 2
FAM002 ....
Is there any option to get a more detailed error message or can someone tell me whats wrong about my file ? I have to say that I created it by hand, since there are only 8 trios in it.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hmm, any chance that you have spaces instead of tabs somewhere in there?
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •I could not find any spaces, only tabs. Just checked it again.
I also created a minimal PED file, containing only the first trio. However, this resulted in the same warning message. I also tried to experiment with the values, changed the separator to spaces, without success.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hey MAX that doesn't look like an error, it's just an info line, what is the output of the program?
Mauricio Carneiro, PhD http://www.broadinstitute.org/~carneiro/
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Carneiro,
well the output looks normal, but I wasnt sure if everything worked fine since I saw this warning message. Therefore I thougt there might be a problem with my ped file.
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •problem solved then! great!
Mauricio Carneiro, PhD http://www.broadinstitute.org/~carneiro/
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •