Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

Pedigree Analysis

TiphaineTiphaine Posts: 53Member
edited July 2013 in Ask the GATK team

Hi,

I am trying to use PED files when I run UnifiedGeotyper, variantannotator and VariantRecalibrator (The Genome Analysis Toolkit (GATK) v2.3-9-ge5ebf34, Compiled 2013/01/11 22:43:14)

if I don't add the PED file , I have some InbreedingCoeff

annotation=[SnpEff, AlleleBalance, BaseCounts, GCContent, HardyWeinberg, IndelType, AlleleBalanceBySample, MappingQualityZeroBySample]

1       14653   .       C       T       22.54   LowQual ABHet=0.842;ABHom=0.803;AC=4;AF=0.182;AN=22;BaseCounts=1,1362,0,261;BaseQRankSum=-4.292;DP=1075;Dels=0.00;FS=4.692;GC=58.42;HW=2.5;HaplotypeScore=0.0000;InbreedingCoeff=0.0530;MLEAC=3;MLEAF=0.136;MQ=6.41;MQ0=967;MQRankSum=-2.246;OND=0.164;QD=0.08;ReadPosRankSum=-0.008;SNPEFF_EFFECT=DOWNSTREAM;SNPEFF_FUNCTIONAL_CLASS=NONE;SNPEFF_GENE_BIOTYPE=processed_transcript;SNPEFF_GENE_NAME=DDX11L1;SNPEFF_IMPACT=MODIFIER;SNPEFF_TRANSCRIPT_ID=ENST00000456328;set=FilteredInAll      GT:AB:AD:DP:GQ:MQ0:PL

but when I add the PED file, I have nothing

annotation=[AlleleBalance, BaseCounts, GCContent, HardyWeinberg, IndelType, AlleleBalanceBySample, MappingQualityZeroBySample, InbreedingCoeff] pedigree=[/scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped] pedigreeString=[] pedigreeValidationType=SILENT

1       14653   .       C       T       10.43   LowQual ABHom=0.792;AC=2;AF=0.091;AN=22;BaseCounts=1,1362,0,261;BaseQRankSum=-3.828;DP=1075;Dels=0.00;FS=2.373;GC=58.42;HW=10.2;HaplotypeScore=0.0000;MLEAC=1;MLEAF=0.045;MQ=6.20;MQ0=976;MQRankSum=-2.768;OND=0.190;QD=0.10;

and my PED file seems is bad format because I have this message

INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,687 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BostonHF.ped with missing fields: [] 
INFO  17:02:17,770 PedReader - Phenotype is other? false 
INFO  17:02:17,770 PedReader - Phenotype is other? false 
INFO  17:02:17,770 PedReader - Phenotype is other? false 
INFO  17:02:17,775 PedReader - Phenotype is other? false 

But I have 6 columns and the name and tabulation between each column

2 MN932002 0 0 2 1
2 PNB32015 0 MN932002 1 1
4 CS934001 0 0 2 2
4 RMAO8004 0 CS934001 1 2
6 KH948045 0 0 2 2
6 AHAO8001 0 0 2 2
6 MHAO8003 0 0 2 2
A LSB32012 0 0 2 2
A SBB32010 0 LSB32012 2 2
B CMSB32011 0 0 2 2
C MKB32014 0 0 2 2
Z M88BBTPL 0 0 other -9

for the phenotype :

-9 missing 1 unaffected 2 affected

Consequently, I cannot run " VariantRecalibrator with -an InbreedingCoeff". Could you help me ?

Thanks,

Tiphaine

Post edited by Geraldine_VdAuwera on
Tagged:

Answers

  • TiphaineTiphaine Posts: 53Member
    edited July 2013

    I don't understand if I have an error in PED file or not, I don't think so but I don't have InbreedingCoef when I had PED file

    INFO 16:50:55,539 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:50:55,541 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.6-4-g3e5ff60, Compiled 2013/06/24 14:48:56 INFO 16:50:55,541 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 16:50:55,541 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 16:50:55,544 HelpFormatter - Program Args: -nct 14 -R /scratch/cbrc/ref/broad_bundle_b37_2.2/human_g1k_v37.fasta -T HaplotypeCaller -stand_emit_conf 10 -L /scratch/cbrc/analysis/BHF-4/tmp/target-intervals.1026697[0].master01.xcat.cluster.intervals -I /scratch/cbrc/analysis/BHF-4/out/bam/PNB32_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/CMSB32_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/CS001_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/M88BL_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/MN932_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/LSB32_A.bam -I /scratch/cbrc/analysis/BostonHF-4/out/bam/SBB3_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/KH9_B.bam -I /scratch/cbrc/analysis/BostonHF-4/out/bam/MHAO_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/KH945_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/RMA04_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/AH1_A.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/CS001_C.bam -I /scratch/cbrc/analysis/HF-4/out/bam/MN02_B.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/C4001_B.bam -I /scratch/cbrc/analysis/BHF-4/out/bam/M32014_A.bam --dbsnp /scratch/cbrc/ref/broad_bundle_b37_2.2/dbsnp_137.b37.vcf -dcov 200 -A QualByDepth -A Coverage -A VariantType -A ClippingRankSumTest -A DepthPerSampleHC -o /scratch/cbrc/analysis/BostonHF-4/out/vcf/initialvariants-0.vcf -A InbreedingCoeff -ped /scratch/cbrc/data/release_2012_Nov/BostonHF/PED/BHF.ped -pedValidationType SILENT -et NO_ET -K /scratch/cbrc/groupapps/GATK/2.6_4/key/tm483_medschl.cam.ac.uk.key INFO 16:50:55,545 HelpFormatter - Date/Time: 2013/07/12 16:50:55 INFO 16:50:55,545 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:50:55,545 HelpFormatter - -------------------------------------------------------------------------------- INFO 16:50:55,571 ArgumentTypeDescriptor - Dynamically determined type of /scratch/cbrc/ref/broad_bundle_b37_2.2/dbsnp_137.b37.vcf to be VCF INFO 16:50:56,252 GenomeAnalysisEngine - Strictness is SILENT INFO 16:50:56,320 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 200 INFO 16:50:56,327 SAMDataSource$SAMReaders - Initializing SAMRecords in serial INFO 16:50:56,400 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.07 INFO 16:50:56,415 HCMappingQualityFilter - Filtering out reads with MAPQ < 20 INFO 16:50:56,424 RMDTrackBuilder - Loading Tribble index from disk for file /scratch/cbrc/ref/broad_bundle_b37_2.2/dbsnp_137.b37.vcf INFO 16:50:56,608 IntervalUtils - Processing 34876904 bp from intervals INFO 16:50:56,618 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BHF/PED/BHF.ped with missing fields: [] INFO 16:50:56,684 PedReader - Phenotype is other? false INFO 16:50:56,689 MicroScheduler - Running the GATK in parallel mode with 14 total threads, 14 CPU thread(s) for each of 1 data thread(s), of 16 processors available on this machine INFO 16:50:56,729 GenomeAnalysisEngine - Preparing for traversal over 16 BAM files INFO 16:50:57,380 GenomeAnalysisEngine - Done preparing for traversal INFO 16:50:57,380 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING] INFO 16:50:57,380 ProgressMeter - Location processed.active regions runtime per.1M.active regions completed total.runtime remaining INFO 16:50:57,456 HaplotypeCaller - Using global mismapping rate of 45 => -4.5 in log10 likelihood units

    this is PED file 2 MN93 0 0 2 2 2 PNB32 0 MN93 1 2 4 CS001 0 0 2 2 4 RMA04 0 CS001 1 2 6 KH945 0 0 2 2 6 AH1 0 0 2 2 6 MHAO8003 0 0 2 2 A LSB32 0 0 2 2 A SBB3 0 LSB32 2 2 B CMSB32 0 0 2 2 C MKB32014 0 0 2 2 Z MBPL 0 0 other 0

    could you help me ? I have only 12 samples and only 11 affected samples ... is it a limit to run InbreedingCoeff ?

    Post edited by Tiphaine on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    If I remember correctly, the PED file is used to calculate the InbreedingCoeff, so if your PED file is not loading correctly then it will prevent the annotation from being made properly. So you need to figure out what's wrong with your PED file. Have you checked that there are no extra trailing spaces or blank lines? We've seen that cause issues in the past.

    Geraldine Van der Auwera, PhD

  • TiphaineTiphaine Posts: 53Member
    edited July 2013

    I have one extra space and I deleted that, but when I rerun with VariantAnnotator with option -A InbreedingCoeff -pedValidationType SILENT and PED file, I have the same error message. (I rerun VariantAnnotator and no HaplotypeCaller because VariantAnnotator run quicker than HaplotypeCaller and I suppose that VariantAnnotator updates this values). I am not sure I understand your sentence "if your PED file is not loading correctly then it will prevent the annotation from being made properly ". Do you want to say that if my PED file is not loading correctly, your algorithm to define InbreedingCoeff does not take this PED file and define InbreedingCoeff as if we have unrelated samples?

    Post edited by Tiphaine on
  • pdexheimerpdexheimer Posts: 345Member, GSA Collaborator ✭✭✭

    I'm a little confused here - what's the error? I don't see any problems in that log

  • TiphaineTiphaine Posts: 53Member

    I don't know if this message is an error and If it is an error message, how do I check that my PED file is ok because this file has 6 columns and I follow the recommandations ... I don't understand these lines

    INFO 16:50:56,618 PedReader - Reading PED file /scratch/cbrc/data/release_2012_Nov/BHF/PED/BHF.ped with missing fields: [] INFO 16:50:56,684 PedReader - Phenotype is other? false

    Tiphaine

  • pdexheimerpdexheimer Posts: 345Member, GSA Collaborator ✭✭✭

    The first line tells you that no fields are missing (that is, the fields listed inside the square brackets are missing)

    The second line says that Phenotype is not encoded as "other" in the file (Is that 9 in PEDs? Can't remember)

    These are both just status lines - which you can tell because they start with "INFO". I can't remember ever seeing GATK encounter an error without throwing an exception (ie, stopping execution and complaining loudly)

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    @pdexheimer said: I can't remember ever seeing GATK encounter an error without throwing an exception (ie, stopping execution and complaining loudly)

    Hah that's very true in general. However we do occasionally see some bugs where something fails silently. In this case I'm concerned that the PED file issue may be causing a silent failure because of this:

    I don't have InbreedingCoef when I had PED file

    If the otherwise same command line fails to yield the InbreedingCoeff annotation when the PED file is provided, but does yield the annotation when the PED file is not provided, then it sounds like we might have a silent failure. If so we may need some test files to reproduce the error locally. @Tiphaine, can you confirm that that's what you're seeing?

    Geraldine Van der Auwera, PhD

  • pdexheimerpdexheimer Posts: 345Member, GSA Collaborator ✭✭✭

    I thought the effect of the PED file was to remove non-founders from the annotation. In this case, that may be enough to drop it below 10 subjects and not emit the annotation

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    That's a fair point, I hadn't considered that. I'd be happier if we emitted a message to that effect in the log though.

    Geraldine Van der Auwera, PhD

  • TiphaineTiphaine Posts: 53Member

    in this case, I have 11 affected samples and one control data. And it doesn't explain why when I give a PED file, I don't have values of InbreedingCoeff but I have when I don't give a PED file . I dropped the -pedValidationType SILENT, I don't have more informations in log.

  • pdexheimerpdexheimer Posts: 345Member, GSA Collaborator ✭✭✭

    Yes, but affected vs control is not the same as founder vs non-founder. In your 12 samples, I see 3 that are not founders (i.e., that have parents present in the PED file) - PNB32, RMA04, and SBB3. That leaves 9 samples considered for InbreedingCoefficient, which is only calculated when you give it at least 10.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    @Tiphaine, I think @pdexheimer is correct; your PED file is actually OK, and the annotation isn't being added because once the non-founders are removed, there are not enough individuals to calculate it.

    Geraldine Van der Auwera, PhD

  • TiphaineTiphaine Posts: 53Member
    edited July 2013

    Ok, I can understand that but why with HaplotypeCaller (GATK 2.6.4) I obtain some values of InbreedingCoeff. isn't it the same algorithm ? I think it will be good if we have also the number of non-founders and founders used for InbreedingCoeff in a log file.

    Post edited by Tiphaine on
  • TiphaineTiphaine Posts: 53Member
    edited July 2013

    In doc, maybe it is better to write at least 10 founders and not 10 samples to define inbreedingCoeff.

    Post edited by Tiphaine on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Fair point, I'll make a note to update the doc.

    Geraldine Van der Auwera, PhD

  • TiphaineTiphaine Posts: 53Member

    Ok thanks but Do you have an explanation why HaplotypeCaller of GATK 2.6.4 can do that and not UnifiedGenotyper of GATK 2.3.9 from the same PED file?

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,192Administrator, GATK Developer admin

    Hmm. What behavior do you get when you use the UG from the latest version? I need to know if this issue is specific to HaplotypeCaller or also affects UG in 2.6.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.