The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

☞ Did you remember to?

1. Search using the upper-right search box, e.g. using the error message.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

☞ Formatting tip!

Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks (  ) each to make a code block as demonstrated here.

GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Pedigree Analysis

Member
edited July 2013

Hi,

I am trying to use PED files when I run UnifiedGeotyper, variantannotator and VariantRecalibrator (The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14)

if I don't add the PED file , I have some InbreedingCoeff

annotation=[SnpEff, AlleleBalance, BaseCounts, GCContent, HardyWeinberg, IndelType, AlleleBalanceBySample, MappingQualityZeroBySample]

1       14653   .       C       T       22.54   LowQual ABHet=0.842;ABHom=0.803;AC=4;AF=0.182;AN=22;BaseCounts=1,1362,0,261;BaseQRankSum=-4.292;DP=1075;Dels=0.00;FS=4.692;GC=58.42;HW=2.5;HaplotypeScore=0.0000;InbreedingCoeff=0.0530;MLEAC=3;MLEAF=0.136;MQ=6.41;MQ0=967;MQRankSum=-2.246;OND=0.164;QD=0.08;ReadPosRankSum=-0.008;SNPEFF_EFFECT=DOWNSTREAM;SNPEFF_FUNCTIONAL_CLASS=NONE;SNPEFF_GENE_BIOTYPE=processed_transcript;SNPEFF_GENE_NAME=DDX11L1;SNPEFF_IMPACT=MODIFIER;SNPEFF_TRANSCRIPT_ID=ENST00000456328;set=FilteredInAll      GT:AB:AD:DP:GQ:MQ0:PL


but when I add the PED file, I have nothing

annotation=[AlleleBalance, BaseCounts, GCContent, HardyWeinberg, IndelType, AlleleBalanceBySample, MappingQualityZeroBySample, InbreedingCoeff]
pedigree=[/scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped] pedigreeString=[] pedigreeValidationType=SILENT

1       14653   .       C       T       10.43   LowQual ABHom=0.792;AC=2;AF=0.091;AN=22;BaseCounts=1,1362,0,261;BaseQRankSum=-3.828;DP=1075;Dels=0.00;FS=2.373;GC=58.42;HW=10.2;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.045;MQ=6.20;MQ0=976;MQRankSum=-2.768;OND=0.190;QD=0.10;


and my PED file seems is bad format because I have this message

INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: []
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: []
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: []
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: []
INFO  17:02:17,770 PedReader - Phenotype is other? false
INFO  17:02:17,770 PedReader - Phenotype is other? false
INFO  17:02:17,770 PedReader - Phenotype is other? false
INFO  17:02:17,775 PedReader - Phenotype is other? false


But I have 6 columns and the name and tabulation between each column

2 MN932002 0 0 2 1
2 PNB32015 0 MN932002 1 1
4 CS934001 0 0 2 2
4 RMAO8004 0 CS934001 1 2
6 KH948045 0 0 2 2
6 AHAO8001 0 0 2 2
6 MHAO8003 0 0 2 2
A LSB32012 0 0 2 2
A SBB32010 0 LSB32012 2 2
B CMSB32011 0 0 2 2
C MKB32014 0 0 2 2
Z M88BBTPL 0 0 other -9

for the phenotype :

-9 missing
1 unaffected
2 affected

Consequently, I cannot run " VariantRecalibrator with -an InbreedingCoeff". Could you help me ?

Thanks,

Tiphaine

Post edited by Geraldine_VdAuwera on
Tagged:

• Member
edited July 2013

I don't understand if I have an error in PED file or not, I don't think so but I don't have InbreedingCoef when I had PED file

<br /> INFO 16:50:55,539 HelpFormatter - -------------------------------------------------------------------------------- <br /> INFO 16:50:55,541 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.6-4-g3e5ff60, Compiled 2013/06/24 14:48:56 <br /> INFO 16:50:55,541 HelpFormatter - Copyright (c) 2010 The Broad Institute <br /> INFO 16:50:55,541 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk <br /> INFO 16:50:55,544 HelpFormatter - Program Args: -nct 14 -R /scratch/cbrc/ref/broad_bundle_b37_2.2/human_g1k_v37.fasta -T HaplotypeCaller -stand_emit_conf 10 -L /scratch/cbrc/analysis/BHF-4/tmp/target-intervals.1026697[0].master01.xcat.cluster.intervals -I /scratch/cbrc/analysis/BHF-4/out/bam/PNB32_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/CMSB32_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/CS001_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/M88BL_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/MN932_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/LSB32_A.bam -I /scratch/cbrc/analysis/BostonHF-4/out/bam/SBB3_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/KH9_B.bam -I /scratch/cbrc/analysis/BostonHF-4/out/bam/MHAO_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/KH945_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/RMA04_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/AH1_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/CS001_C.bam -I /scratch/cbrc/analysis/HF-4/out/bam/MN02_B.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/C4001_B.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/M32014_A.bam --dbsnp /scratch/cbrc/ref/broad_bundle_b37_2.2/dbsnp_137.b37.vcf -dcov 200 -A QualByDepth -A Coverage -A VariantType -A ClippingRankSumTest -A DepthPerSampleHC -o /scratch/cbrc/analysis/BostonHF-4/out/vcf/initialvariants-0.vcf -A InbreedingCoeff -ped /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BHF.ped -pedValidationType SILENT -et NO_ET -K /scratch/cbrc/groupapps/GATK/2.6_4/key/tm483_medschl.cam.ac.uk.key <br /> INFO 16:50:55,545 HelpFormatter - Date/Time: 2013/07/12 16:50:55 <br /> INFO 16:50:55,545 HelpFormatter - -------------------------------------------------------------------------------- <br /> INFO 16:50:55,545 HelpFormatter - -------------------------------------------------------------------------------- <br /> INFO 16:50:55,571 ArgumentTypeDescriptor - Dynamically determined type of /scratch/cbrc/ref/broad_bundle_b37_2.2/dbsnp_137.b37.vcf to be VCF <br /> INFO 16:50:56,252 GenomeAnalysisEngine - Strictness is SILENT <br /> INFO 16:50:56,320 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 200 <br /> INFO 16:50:56,327 SAMDataSource$SAMReaders - Initializing SAMRecords in serial <br /> INFO 16:50:56,400 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 <br /> INFO 16:50:56,415 HCMappingQualityFilter - Filtering out reads with MAPQ &lt; 20 <br /> INFO 16:50:56,424 RMDTrackBuilder - Loading Tribble index from disk for file /scratch/cbrc/ref/broad_bundle_b37_2.2/dbsnp_137.b37.vcf <br /> INFO 16:50:56,608 IntervalUtils - Processing 34876904 bp from intervals <br /> INFO 16:50:56,618 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BHF/PED/BHF.ped with missing fields: [] <br /> INFO 16:50:56,684 PedReader - Phenotype is other? false <br /> INFO 16:50:56,689 MicroScheduler - Running the GATK in parallel mode with 14 total threads, 14 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine <br /> INFO 16:50:56,729 GenomeAnalysisEngine - Preparing for traversal over 16 BAM files <br /> INFO 16:50:57,380 GenomeAnalysisEngine - Done preparing for traversal <br /> INFO 16:50:57,380 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] <br /> INFO 16:50:57,380 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining <br /> INFO 16:50:57,456 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units <br /> 

this is PED file
<br /> 2 MN93 0 0 2 2<br /> 2 PNB32 0 MN93 1 2<br /> 4 CS001 0 0 2 2<br /> 4 RMA04 0 CS001 1 2<br /> 6 KH945 0 0 2 2<br /> 6 AH1 0 0 2 2<br /> 6 MHAO8003 0 0 2 2<br /> A LSB32 0 0 2 2<br /> A SBB3 0 LSB32 2 2<br /> B CMSB32 0 0 2 2<br /> C MKB32014 0 0 2 2<br /> Z MBPL 0 0 other 0<br /> 

could you help me ? I have only 12 samples and only 11 affected samples ... is it a limit to run InbreedingCoeff ?

If I remember correctly, the PED file is used to calculate the InbreedingCoeff, so if your PED file is not loading correctly then it will prevent the annotation from being made properly. So you need to figure out what's wrong with your PED file. Have you checked that there are no extra trailing spaces or blank lines? We've seen that cause issues in the past.

• Member
edited July 2013

I have one extra space and I deleted that, but when I rerun with VariantAnnotator with option -A InbreedingCoeff -pedValidationType SILENT and PED file, I have the same error message. (I rerun VariantAnnotator and no HaplotypeCaller because VariantAnnotator run quicker than HaplotypeCaller and I suppose that VariantAnnotator updates this values).
I am not sure I understand your sentence "if your PED file is not loading correctly then it will prevent the annotation from being made properly ". Do you want to say that if my PED file is not loading correctly, your algorithm to define InbreedingCoeff does not take this PED file and define InbreedingCoeff as if we have unrelated samples?

• Member, Dev

I'm a little confused here - what's the error? I don't see any problems in that log

• Member

I don't know if this message is an error and If it is an error message, how do I check that my PED file is ok because this file has 6 columns and I follow the recommandations ...
I don't understand these lines

<br /> INFO 16:50:56,618 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BHF/PED/BHF.ped with missing fields: [] INFO 16:50:56,684 PedReader - Phenotype is other? false <br /> `

Tiphaine

• Member, Dev

The first line tells you that no fields are missing (that is, the fields listed inside the square brackets are missing)

The second line says that Phenotype is not encoded as "other" in the file (Is that 9 in PEDs? Can't remember)

These are both just status lines - which you can tell because they start with "INFO". I can't remember ever seeing GATK encounter an error without throwing an exception (ie, stopping execution and complaining loudly)

• Member

ok thanks

@pdexheimer said:
I can't remember ever seeing GATK encounter an error without throwing an exception (ie, stopping execution and complaining loudly)

Hah that's very true in general. However we do occasionally see some bugs where something fails silently. In this case I'm concerned that the PED file issue may be causing a silent failure because of this:

I don't have InbreedingCoef when I had PED file

If the otherwise same command line fails to yield the InbreedingCoeff annotation when the PED file is provided, but does yield the annotation when the PED file is not provided, then it sounds like we might have a silent failure. If so we may need some test files to reproduce the error locally. @Tiphaine, can you confirm that that's what you're seeing?

• Member, Dev

I thought the effect of the PED file was to remove non-founders from the annotation. In this case, that may be enough to drop it below 10 subjects and not emit the annotation

That's a fair point, I hadn't considered that. I'd be happier if we emitted a message to that effect in the log though.

• Member

in this case, I have 11 affected samples and one control data. And it doesn't explain why when I give a PED file, I don't have values of InbreedingCoeff but I have when I don't give a PED file . I dropped the -pedValidationType SILENT, I don't have more informations in log.

• Member, Dev

Yes, but affected vs control is not the same as founder vs non-founder. In your 12 samples, I see 3 that are not founders (i.e., that have parents present in the PED file) - PNB32, RMA04, and SBB3. That leaves 9 samples considered for InbreedingCoefficient, which is only calculated when you give it at least 10.

@Tiphaine, I think @pdexheimer is correct; your PED file is actually OK, and the annotation isn't being added because once the non-founders are removed, there are not enough individuals to calculate it.

• Member
edited July 2013

Ok, I can understand that but why with HaplotypeCaller (GATK 2.6.4) I obtain some values of InbreedingCoeff. isn't it the same algorithm ?
I think it will be good if we have also the number of non-founders and founders used for InbreedingCoeff in a log file.

• Member
edited July 2013

In doc, maybe it is better to write at least 10 founders and not 10 samples to define inbreedingCoeff.