It looks like you're new here. If you want to get involved, click one of these buttons!
Hi to all
I have just started using GATK and I have few question about some tools and about the general workflow.
I have 3 exome-seq data from a trio and I have to detect rare or private variants that segregate with the disease.
From the 3 aligned bam file I procedeed with the GATK pipeline (ADDgroupInfo, MarkDup, Realign, BQSR, Unified Genotyper and variant filtration) and I generated 3 VCF file.
As now I have to use the PhaseByTrasmission tool, should I merge the 3 VCF file?
Or it was better to merge the BAM file after adding the group info and proceed with the other analysis?
And should I create my .ped file,(I visited http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml#ped, but I couln't understand how ped file is generated) based on the read group that I have assigned?
Thanks!!!
Geraldine_VdAuwera
Posts: 2,239 admin
Hi there,
It would be better to call your three samples together. You don't have to merge the bams, you just pass the three files on the command line to Unified Genotyper, and it will produce a single VCF with the variants for all three samples.
Note that the PED files must contain only the first 6 columns from the PLINK format PED file, and no alleles, like a FAM file in PLINK. Use the sample names to identify individuals in the PED file.
Geraldine Van der Auwera, PhD
Answers
See this discussion also for details on the PED file
http://gatkforums.broadinstitute.org/discussion/2031/wrong-number-of-fields-in-ped-files-in-phasebytransmission
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Ok thank you Geraldine!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •