Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Picard FindMendelianViolations: "Malformed header" error when specifying output directory

StefanCStefanC AustriaMember
edited June 22 in Ask the GATK team

Hi all

I am relatively new at NGS analysis and especially at using GATK. I am curently analyzing a small set of exome seq data from a small family (3 generations, 2 individuals per generation) and wanted check for mendelian errors using picard FindMendelianViolations (+filtering the variants for a minimum coverage of 30x to avoid false calls at sparsely covered intronic SNPs). The data was generated at the BGI on a HiSeq Ten X and processed using GATK (as far as i can extract from the VCF header)

The FindMendelianViolations program works fine when using the command

java -jar /opt/picard/picard.jar FindMendelianViolations I=../../variant_files/vcf/combine.snp.vcf.gz PED=../../../0_pedigree/trio.ped OUTPUT=mendelian_trio.DP30b.txt MIN_DP=30

However, when I add an output folder the tool first runs through the vcf, but then stops reporting the with the error:
"Your input file has a malformed header: BUG: VCF header has duplicate sample names". The error appears only when I specify an output folder (which appears quite weird to me), but I could reproduce the error several times. I could not figure out what exactly happens. The output folder remains empty, although it seems that the tool attempts to write a file named 1.vcf.

$ java -jar /opt/picard/picard.jar FindMendelianViolations I=../../variant_files/vcf_reheader/combine.snp.reheader-out.vcf.gz PED=../../../0_pedigree/trio_nospaces.ped OUTPUT=mendelian_trio.DP30-2.txt MIN_DP=30 VCF_DIR=vcf_violations30/
INFO    2019-06-21 20:30:14 FindMendelianViolations 

********** NOTE: Picard's command line syntax is changing.
**********
********** For more information, please see:
********** https://github.com/broadinstitute/picard/wiki/Command-Line-Syntax-Transition-For-Users-(Pre-Transition)
**********
********** The command line looks like this in the new syntax:
**********
**********    FindMendelianViolations -I ../../variant_files/vcf_reheader/combine.snp.reheader-out.vcf.gz -PED ../../../0_pedigree/trio_nospaces.ped -OUTPUT mendelian_trio.DP30-2.txt -MIN_DP 30 -VCF_DIR vcf_violations30/
**********


20:30:15.252 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/picard/picard.jar!/com/intel/gkl/native/libgkl_compression.so
[Fri Jun 21 20:30:15 CEST 2019] FindMendelianViolations INPUT=../../variant_files/vcf_reheader/combine.snp.reheader-out.vcf.gz TRIOS=../../../0_pedigree/trio_nospaces.ped OUTPUT=mendelian_trio.DP30-2.txt MIN_DP=30 VCF_DIR=vcf_violations30    MIN_GQ=30 MIN_HET_FRACTION=0.3 SKIP_CHROMS=[MT, chrM] MALE_CHROMS=[chrY, Y] FEMALE_CHROMS=[chrX, X] PSEUDO_AUTOSOMAL_REGIONS=[chrX:10000-2781479, X:10001-2649520, chrX:155701382-156030895, X:59034050-59373566] THREAD_COUNT=1 TAB_MODE=false VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 MAX_RECORDS_IN_RAM=500000 CREATE_INDEX=false CREATE_MD5_FILE=false GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INFLATER=false
[Fri Jun 21 20:30:15 CEST 2019] Executing as [email protected] on Linux 4.15.0-51-generic amd64; OpenJDK 64-Bit Server VM 11.0.3+7-Ubuntu-1ubuntu218.04.1; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.20.2-SNAPSHOT
INFO    2019-06-21 20:30:15 FindMendelianViolations Loading and filtering trios.
WARNING 2019-06-21 20:30:15 FindMendelianViolations Removing trio due to the following missing samples in VCF: [0]
WARNING 2019-06-21 20:30:15 FindMendelianViolations Removing trio due to the following missing samples in VCF: [0]
WARNING 2019-06-21 20:30:15 FindMendelianViolations Removing trio due to the following missing samples in VCF: [0]
INFO    2019-06-21 20:30:16 FindMendelianViolations variants analyzed        10,000 records.  Elapsed time: 00:00:01s.  Time for last 10,000:    0s.  Last read position: chr1:62,594,480

[ ... omitted ... ]

INFO    2019-06-21 20:30:20 FindMendelianViolations variants analyzed       240,000 records.  Elapsed time: 00:00:05s.  Time for last 10,000:    0s.  Last read position: chr22:44,368,204
INFO    2019-06-21 20:30:20 FindMendelianViolations Writing family violation VCFs to /media/q005sc/WINDOWS/ngs_analysis/exome/2_analysis/recomb_TL/picard/vcf_violations30/
INFO    2019-06-21 20:30:20 FindMendelianViolations Writing 1 violation VCF to /media/q005sc/WINDOWS/ngs_analysis/exome/2_analysis/recomb_TL/picard/vcf_violations30/1.vcf
[Fri Jun 21 20:30:20 CEST 2019] picard.vcf.MendelianViolations.FindMendelianViolations done. Elapsed time: 0.09 minutes.
Runtime.totalMemory()=206569472
To get help, see http://broadinstitute.github.io/picard/index.html#GettingHelp
Exception in thread "main" htsjdk.tribble.TribbleException$InvalidHeader: Your input file has a malformed header: BUG: VCF header has duplicate sample names
    at htsjdk.variant.vcf.VCFHeader.<init>(VCFHeader.java:142)
    at picard.vcf.MendelianViolations.FindMendelianViolations.writeAllViolations(FindMendelianViolations.java:288)
    at picard.vcf.MendelianViolations.FindMendelianViolations.doWork(FindMendelianViolations.java:262)
    at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:295)
    at picard.cmdline.PicardCommandLine.instanceMain(PicardCommandLine.java:103)
    at picard.cmdline.PicardCommandLine.main(PicardCommandLine.java:113)

However, the header seems fine to me (AXX to TXX are the six samples):

#CHROM  POS ID  REF ALT QUAL    FILTER  INFO    FORMAT  AXX01   EXX01   GXX01   NXX01   OXX01   TXX01

The input ped file looks like. It does not contain all samples because we are interested only in generation 2 and 3, but the error appears also when including all samples into the ped file:

1   OXX01   0  0  1  1
1   NXX01   0  0  2  0
1   TXX01   OXX01  NXX01  1  1
1   EXX01   0  0  2  0

The output of ValidateVariants is as follows (run from the docker image)

[email protected]:/gatk# gatk --version
The Genome Analysis Toolkit (GATK) v4.1.2.0
HTSJDK Version: 2.19.0
Picard Version: 2.19.0
[email protected]:/gatk# gatk ValidateVariants --variant combine.snp.reheader-out.vcf.gz 
Using GATK jar /gatk/gatk-package-4.1.2.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /gatk/gatk-package-4.1.2.0-local.jar ValidateVariants --variant combine.snp.reheader-out.vcf.gz
15:16:38.147 INFO  NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/gatk-package-4.1.2.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
Jun 22, 2019 3:16:39 PM shaded.cloud_nio.com.google.auth.oauth2.ComputeEngineCredentials runningOnComputeEngine
INFO: Failed to detect whether we are running on Google Compute Engine.
15:16:39.895 INFO  ValidateVariants - ------------------------------------------------------------
15:16:39.896 INFO  ValidateVariants - The Genome Analysis Toolkit (GATK) v4.1.2.0
15:16:39.896 INFO  ValidateVariants - For support and documentation go to https://software.broadinstitute.org/gatk/
15:16:39.896 INFO  ValidateVariants - Executing as [email protected] on Linux v4.15.0-51-generic amd64
15:16:39.897 INFO  ValidateVariants - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_191-8u191-b12-0ubuntu0.16.04.1-b12
15:16:39.897 INFO  ValidateVariants - Start Date/Time: June 22, 2019 3:16:38 PM UTC
15:16:39.897 INFO  ValidateVariants - ------------------------------------------------------------
15:16:39.897 INFO  ValidateVariants - ------------------------------------------------------------
15:16:39.897 INFO  ValidateVariants - HTSJDK Version: 2.19.0
15:16:39.897 INFO  ValidateVariants - Picard Version: 2.19.0
15:16:39.897 INFO  ValidateVariants - HTSJDK Defaults.COMPRESSION_LEVEL : 2
15:16:39.898 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
15:16:39.898 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
15:16:39.898 INFO  ValidateVariants - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
15:16:39.898 INFO  ValidateVariants - Deflater: IntelDeflater
15:16:39.898 INFO  ValidateVariants - Inflater: IntelInflater
15:16:39.898 INFO  ValidateVariants - GCS max retries/reopens: 20
15:16:39.898 INFO  ValidateVariants - Requester pays: disabled
15:16:39.898 INFO  ValidateVariants - Initializing engine
15:16:40.150 INFO  FeatureManager - Using codec VCFCodec to read file file:///gatk/combine.snp.reheader-out.vcf.gz
15:16:40.268 INFO  ValidateVariants - Done initializing engine
15:16:40.269 INFO  ProgressMeter - Starting traversal
15:16:40.269 INFO  ProgressMeter -        Current Locus  Elapsed Minutes    Variants Processed  Variants/Minute
15:16:41.899 INFO  ProgressMeter -       chrX:142605437              0.0                245817        9048478.5
15:16:41.900 INFO  ProgressMeter - Traversal complete. Processed 245817 total variants in 0.0 minutes.
15:16:41.900 INFO  ValidateVariants - Shutting down engine

I was not able to distill from the output above whether my vcf is ok or not. No report file was written to the directory (exectuted in /gatk)

I would be very grateful for any help to figure out what is happening! Thank you very much!

Stefan

Post edited by StefanC on

Issue · Github
by bhanuGandham

Issue Number
1354
State
open
Last Updated

Answers

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin
    edited June 24

    Hi @StefanC

    Take a look at this thread: https://gatkforums.broadinstitute.org/gatk/discussion/4277/error-message-your-input-file-has-a-malformed-header

    PS: Checkout Terra for end-to-end GATK pipelining solutions and let us know what more pipelines we can add that will make using GATK easier for you! For more details on whether this is the right fit for you checkout our blog page.

    Post edited by bhanuGandham on
  • StefanCStefanC AustriaMember

    Hi @bhanuGandham

    Thank you very much for your reply. I checked the link. It proposes that the header is separated by spaces instead of tabs. Unfortunately this is not the case for my file. Both the columns and the header uses tabs. See here below the output of cat -T. Also Notepadqq shows only tabs.

    $ cat -T combine.snp.reheader-out.vcf | grep "#CHROM" -A2
    #CHROM^IPOS^IID^IREF^IALT^IQUAL^IFILTER^IINFO^IFORMAT^IAXX01^IEXX01^IGXX01^INXX01^IOXX01^ITXX01
    chr1^I14653^I.^IC^IT^I925.34^IPASS^IAC=6;AF=0.5;AN=12;BaseQRankSum=0.234;ClippingRankSum=0;DP=287;ExcessHet=14.6052;FS=13.082;MLEAC=6;MLEAF=0.5;MQ=40.44;MQRankSum=-0.756;QD=3.24;ReadPosRankSum=-0.395;SOR=1.633^IGT:AD:DP:GQ:PL^I0/1:34,14:48:99:264,0,871^I0/1:46,6:52:48:48,0,1317^I0/1:39,8:47:99:114,0,1085^I0/1:36,11:47:99:192,0,991^I0/1:42,7:49:95:95,0,1194^I0/1:31,12:43:99:249,0,828
    chr1^I14677^I.^IG^IA^I291.15^IPASS^IAC=1;AF=0.083;AN=12;BaseQRankSum=-0.639;ClippingRankSum=0;DP=343;ExcessHet=3.0103;FS=4.993;MLEAC=1;MLEAF=0.083;MQ=71.77;MQRankSum=-1.597;QD=5.29;ReadPosRankSum=-0.544;SOR=1.376^IGT:AD:DP:GQ:PL^I0/1:38,17:55:99:324,0,1152^I0/0:54,3:57:91:0,91,1715^I0/0:58,0:58:99:0,120,1800^I0/0:64,0:64:99:0,120,1800^I0/0:60,2:62:99:0,119,1802^I0/0:45,0:45:99:0,120,1800
    

    Do you have any idea what else might be the problem?

    best
    Stefan

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @StefanC

    Can you please post the header for this file: combine.snp.reheader-out.vcf.gz

  • StefanCStefanC AustriaMember

    Hi @bhanuGandham

    sure. It is:


    ##fileformat=VCFv4.2 ##FILTER=<ID=PASS,Description="All filters passed"> ##ALT=<ID=NON_REF,Description="Represents any possible alternative allele at this location"> ##FILTER=<ID=LowQual,Description="Low quality"> ##FILTER=<ID=filter,Description="QD < 2.0 || FS > 60.0 || MQ <40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0"> ##FORMAT=<ID=AD,Number=R,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed"> ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)"> ##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality"> ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype"> ##FORMAT=<ID=MIN_DP,Number=1,Type=Integer,Description="Minimum DP observed within the GVCF block"> ##FORMAT=<ID=PGT,Number=1,Type=String,Description="Physical phasing haplotype information, describing how the alternate alleles are phased in relation to one another"> ##FORMAT=<ID=PID,Number=1,Type=String,Description="Physical phasing ID information, where each unique ID within a given sample (but not across samples) connects records within a phasing group"> ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification"> ##FORMAT=<ID=RGQ,Number=1,Type=Integer,Description="Unconditional reference genotype confidence, encoded as a phred quality -10*log10 p(genotype call is wrong)"> ##FORMAT=<ID=SB,Number=4,Type=Integer,Description="Per-sample component statistics which comprise the Fisher's Exact Test to detect strand bias."> ##GATKCommandLine.GenotypeGVCFs=<ID=GenotypeGVCFs,Version=3.7-0-gcfedb67,Date="Thu Dec 06 15:12:07 CST 2018",Epoch=1544080327970,CommandLineOptions="analysis_type=GenotypeGVCFs input_file=[] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/ifswh1/BC_PUB/biosoft/pipeline/DNA/DNA_Human_WES/DNA_Human_WES_2016b/Database/hg19/fa/hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false variant=[(RodBindingCollection [(RodBinding name=variant source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/GXX01/callGVCF_GATK/GXX01.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant2 source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/NXX01/callGVCF_GATK/NXX01.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant3 source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/EXX01/callGVCF_GATK/EXX01.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant4 source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/OXX01/callGVCF_GATK/OXX01.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant5 source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/AXX01/callGVCF_GATK/AXX01.g.vcf.gz)]), (RodBindingCollection [(RodBinding name=variant6 source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/TXX01/callGVCF_GATK/TXX01.g.vcf.gz)])] out=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/callGVCF_GATK/combine.vcf.gz includeNonVariantSites=false uniquifySamples=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=10.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 annotation=[] group=[StandardAnnotation] dbsnp=(RodBinding name= source=UNBOUND) filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> ##GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.7-0-gcfedb67,Date="Thu Dec 06 08:41:27 CST 2018",Epoch=1544056887852,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/TXX01/TXX01.realign.recal.bam] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=[/ifswh1/BC_PUB/biosoft/pipeline/DNA/DNA_Human_WES/DNA_Human_WES_2016b/Database/hg19/bed/ExonCaptureRegion_hg19_Agilent_V6/CallVariantRegion/ex_region.sort.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/ifswh1/BC_PUB/biosoft/pipeline/DNA/DNA_Human_WES/DNA_Human_WES_2016b/Database/hg19/fa/hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=LINEAR variant_index_parameter=128000 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[StrandBiasBySample] excludeAnnotation=[ChromosomeCounts, FisherStrand, StrandOddsRatio, QualByDepth] group=[StandardAnnotation, StandardHCAnnotation] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=GVCF bamOutput=null bamWriterType=CALLED_HAPLOTYPES emitDroppedReads=false disableOptimizations=false annotateNDA=false useNewAFCalculator=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 heterozygosity_stdev=0.01 standard_min_confidence_threshold_for_calling=-0.0 standard_min_confidence_threshold_for_emitting=30.0 max_alternate_alleles=6 max_genotype_count=1024 max_num_PL_values=100 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=true gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=false keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxReadsInMemoryPerSample=30000 maxTotalReadsInMemory=10000000 maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> ##GATKCommandLine.SelectVariants=<ID=SelectVariants,Version=3.7-0-gcfedb67,Date="Thu Dec 06 15:55:08 CST 2018",Epoch=1544082908207,CommandLineOptions="analysis_type=SelectVariants input_file=[] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/ifswh1/BC_PUB/biosoft/pipeline/DNA/DNA_Human_WES/DNA_Human_WES_2016b/Database/hg19/fa/hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/callGVCF_GATK/combine.vcf.gz) discordance=(RodBinding name= source=UNBOUND) concordance=(RodBinding name= source=UNBOUND) out=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/snp_GATK/combine.raw.snp.vcf.gz sample_name=[] sample_expressions=null sample_file=null exclude_sample_name=[] exclude_sample_file=[] exclude_sample_expressions=[] selectexpressions=[] invertselect=false excludeNonVariants=true excludeFiltered=false preserveAlleles=false removeUnusedAlternates=false restrictAllelesTo=ALL keepOriginalAC=false keepOriginalDP=false mendelianViolation=false invertMendelianViolation=false mendelianViolationQualThreshold=0.0 select_random_fraction=0.0 remove_fraction_genotypes=0.0 selectTypeToInclude=[SNP] selectTypeToExclude=[] keepIDs=null excludeIDs=null fullyDecode=false justRead=false maxIndelSize=2147483647 minIndelSize=0 maxFilteredGenotypes=2147483647 minFilteredGenotypes=0 maxFractionFilteredGenotypes=1.0 minFractionFilteredGenotypes=0.0 maxNOCALLnumber=2147483647 maxNOCALLfraction=1.0 setFilteredGtToNocall=false ALLOW_NONOVERLAPPING_COMMAND_LINE_SAMPLES=false forceValidOutput=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> ##GATKCommandLine.VariantFiltration=<ID=VariantFiltration,Version=3.7-0-gcfedb67,Date="Thu Dec 06 15:57:05 CST 2018",Epoch=1544083025455,CommandLineOptions="analysis_type=VariantFiltration input_file=[] showFullBamList=false read_buffer_size=null read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/ifswh1/BC_PUB/biosoft/pipeline/DNA/DNA_Human_WES/DNA_Human_WES_2016b/Database/hg19/fa/hg19.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1000 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 static_quantized_quals=null round_down_quantized=false disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 secondsBetweenProgressUpdates=10 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 reference_window_stop=0 phone_home= gatk_key=null tag=NA logging_level=INFO log_to_file=null help=false version=false variant=(RodBinding name=variant source=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/snp_GATK/combine.raw.snp.vcf.gz) mask=(RodBinding name= source=UNBOUND) out=/ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/snp_GATK/combine.filtered_snp.vcf.gz filterExpression=[QD < 2.0 || FS > 60.0 || MQ <40.0 || MQRankSum < -12.5 || ReadPosRankSum < -8.0] filterName=[filter] genotypeFilterExpression=[] genotypeFilterName=[] clusterSize=3 clusterWindowSize=0 maskExtension=0 maskName=Mask filterNotInMask=false missingValuesInExpressionsShouldEvaluateAsFailing=false invalidatePreviousFilters=false invertFilterExpression=false invertGenotypeFilterExpression=false setFilteredGtToNocall=false filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false"> ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed"> ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed"> ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes"> ##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities"> ##INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases"> ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered"> ##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?"> ##INFO=<ID=END,Number=1,Type=Integer,Description="Stop position of the interval"> ##INFO=<ID=ExcessHet,Number=1,Type=Float,Description="Phred-scaled p-value for exact test of excess heterozygosity"> ##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias"> ##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes"> ##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation"> ##INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed"> ##INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed"> ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality"> ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities"> ##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth"> ##INFO=<ID=RAW_MQ,Number=1,Type=Float,Description="Raw data for RMS Mapping Quality"> ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias"> ##INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias"> ##contig=<ID=chrM,length=16571,assembly=hg19> ##contig=<ID=chr1,length=249250621,assembly=hg19> ##contig=<ID=chr2,length=243199373,assembly=hg19> ##contig=<ID=chr3,length=198022430,assembly=hg19> ##contig=<ID=chr4,length=191154276,assembly=hg19> ##contig=<ID=chr5,length=180915260,assembly=hg19> ##contig=<ID=chr6,length=171115067,assembly=hg19> ##contig=<ID=chr7,length=159138663,assembly=hg19> ##contig=<ID=chr8,length=146364022,assembly=hg19> ##contig=<ID=chr9,length=141213431,assembly=hg19> ##contig=<ID=chr10,length=135534747,assembly=hg19> ##contig=<ID=chr11,length=135006516,assembly=hg19> ##contig=<ID=chr12,length=133851895,assembly=hg19> ##contig=<ID=chr13,length=115169878,assembly=hg19> ##contig=<ID=chr14,length=107349540,assembly=hg19> ##contig=<ID=chr15,length=102531392,assembly=hg19> ##contig=<ID=chr16,length=90354753,assembly=hg19> ##contig=<ID=chr17,length=81195210,assembly=hg19> ##contig=<ID=chr18,length=78077248,assembly=hg19> ##contig=<ID=chr19,length=59128983,assembly=hg19> ##contig=<ID=chr20,length=63025520,assembly=hg19> ##contig=<ID=chr21,length=48129895,assembly=hg19> ##contig=<ID=chr22,length=51304566,assembly=hg19> ##contig=<ID=chrX,length=155270560,assembly=hg19> ##contig=<ID=chrY,length=59373566,assembly=hg19> ##contig=<ID=chr1_gl000191_random,length=106433,assembly=hg19> ##contig=<ID=chr1_gl000192_random,length=547496,assembly=hg19> ##contig=<ID=chr4_ctg9_hap1,length=590426,assembly=hg19> ##contig=<ID=chr4_gl000193_random,length=189789,assembly=hg19> ##contig=<ID=chr4_gl000194_random,length=191469,assembly=hg19> ##contig=<ID=chr6_apd_hap1,length=4622290,assembly=hg19> ##contig=<ID=chr6_cox_hap2,length=4795371,assembly=hg19> ##contig=<ID=chr6_dbb_hap3,length=4610396,assembly=hg19> ##contig=<ID=chr6_mann_hap4,length=4683263,assembly=hg19> ##contig=<ID=chr6_mcf_hap5,length=4833398,assembly=hg19> ##contig=<ID=chr6_qbl_hap6,length=4611984,assembly=hg19> ##contig=<ID=chr6_ssto_hap7,length=4928567,assembly=hg19> ##contig=<ID=chr7_gl000195_random,length=182896,assembly=hg19> ##contig=<ID=chr8_gl000196_random,length=38914,assembly=hg19> ##contig=<ID=chr8_gl000197_random,length=37175,assembly=hg19> ##contig=<ID=chr9_gl000198_random,length=90085,assembly=hg19> ##contig=<ID=chr9_gl000199_random,length=169874,assembly=hg19> ##contig=<ID=chr9_gl000200_random,length=187035,assembly=hg19> ##contig=<ID=chr9_gl000201_random,length=36148,assembly=hg19> ##contig=<ID=chr11_gl000202_random,length=40103,assembly=hg19> ##contig=<ID=chr17_ctg5_hap1,length=1680828,assembly=hg19> ##contig=<ID=chr17_gl000203_random,length=37498,assembly=hg19> ##contig=<ID=chr17_gl000204_random,length=81310,assembly=hg19> ##contig=<ID=chr17_gl000205_random,length=174588,assembly=hg19> ##contig=<ID=chr17_gl000206_random,length=41001,assembly=hg19> ##contig=<ID=chr18_gl000207_random,length=4262,assembly=hg19> ##contig=<ID=chr19_gl000208_random,length=92689,assembly=hg19> ##contig=<ID=chr19_gl000209_random,length=159169,assembly=hg19> ##contig=<ID=chr21_gl000210_random,length=27682,assembly=hg19> ##contig=<ID=chrUn_gl000211,length=166566,assembly=hg19> ##contig=<ID=chrUn_gl000212,length=186858,assembly=hg19> ##contig=<ID=chrUn_gl000213,length=164239,assembly=hg19> ##contig=<ID=chrUn_gl000214,length=137718,assembly=hg19> ##contig=<ID=chrUn_gl000215,length=172545,assembly=hg19> ##contig=<ID=chrUn_gl000216,length=172294,assembly=hg19> ##contig=<ID=chrUn_gl000217,length=172149,assembly=hg19> ##contig=<ID=chrUn_gl000218,length=161147,assembly=hg19> ##contig=<ID=chrUn_gl000219,length=179198,assembly=hg19> ##contig=<ID=chrUn_gl000220,length=161802,assembly=hg19> ##contig=<ID=chrUn_gl000221,length=155397,assembly=hg19> ##contig=<ID=chrUn_gl000222,length=186861,assembly=hg19> ##contig=<ID=chrUn_gl000223,length=180455,assembly=hg19> ##contig=<ID=chrUn_gl000224,length=179693,assembly=hg19> ##contig=<ID=chrUn_gl000225,length=211173,assembly=hg19> ##contig=<ID=chrUn_gl000226,length=15008,assembly=hg19> ##contig=<ID=chrUn_gl000227,length=128374,assembly=hg19> ##contig=<ID=chrUn_gl000228,length=129120,assembly=hg19> ##contig=<ID=chrUn_gl000229,length=19913,assembly=hg19> ##contig=<ID=chrUn_gl000230,length=43691,assembly=hg19> ##contig=<ID=chrUn_gl000231,length=27386,assembly=hg19> ##contig=<ID=chrUn_gl000232,length=40652,assembly=hg19> ##contig=<ID=chrUn_gl000233,length=45941,assembly=hg19> ##contig=<ID=chrUn_gl000234,length=40531,assembly=hg19> ##contig=<ID=chrUn_gl000235,length=34474,assembly=hg19> ##contig=<ID=chrUn_gl000236,length=41934,assembly=hg19> ##contig=<ID=chrUn_gl000237,length=45867,assembly=hg19> ##contig=<ID=chrUn_gl000238,length=39939,assembly=hg19> ##contig=<ID=chrUn_gl000239,length=33824,assembly=hg19> ##contig=<ID=chrUn_gl000240,length=41933,assembly=hg19> ##contig=<ID=chrUn_gl000241,length=42152,assembly=hg19> ##contig=<ID=chrUn_gl000242,length=43523,assembly=hg19> ##contig=<ID=chrUn_gl000243,length=43341,assembly=hg19> ##contig=<ID=chrUn_gl000244,length=39929,assembly=hg19> ##contig=<ID=chrUn_gl000245,length=36651,assembly=hg19> ##contig=<ID=chrUn_gl000246,length=38154,assembly=hg19> ##contig=<ID=chrUn_gl000247,length=36422,assembly=hg19> ##contig=<ID=chrUn_gl000248,length=39786,assembly=hg19> ##contig=<ID=chrUn_gl000249,length=38502,assembly=hg19> ##reference=file:///ifswh1/BC_PUB/biosoft/pipeline/DNA/DNA_Human_WES/DNA_Human_WES_2016b/Database/hg19/fa/hg19.fasta ##source=SelectVariants ##bcftools_viewVersion=1.2+htslib-1.2.1 ##bcftools_viewCommand=view -e ALT=="*" -f PASS -o /ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/snp_GATK/anno/combine.filtered_snp.vcf.gz -O z /ifswh1/BC_COM_P1/F18FTSEUHT1383/HUMopcX/analysis/process/combine/snp_GATK/combine.filtered_snp.vcf.gz #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT AXX01 EXX01 GXX01 NXX01 OXX01 TXX01

    Sequencing and file generation was done at the BGI.

    best regards
    Stefan

  • bhanuGandhambhanuGandham Cambridge MAMember, Administrator, Broadie, Moderator admin

    Hi @StefanC

    Looks like this is a bug in FindMendelianViolations. I have created an issue ticket for the dev team and we are looking into it. You can follow the progress issue on this here: https://github.com/broadinstitute/picard/issues/1354

Sign In or Register to comment.