Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.
Attention:
We will be out of the office on November 11th and 13th 2019, due to the U.S. holiday(Veteran's day) and due to a team event(Nov 13th). We will return to monitoring the GATK forum on November 12th and 14th respectively. Thank you for your patience.

Same setting, different SNPs being called

ssandmannssandmann Münster, GermanyMember

Hey,

currently, I am working on 454 sequencing data. I am using GATK to call SNPs and indels, which works relatively well for the SNPs with the recommended filtering options. However, I was interested to raise sensitivity, which is why I started experimenting with the QD value. I restarted the whole GATK pipeline (including the genotyping) with the filtering option "QD<0.0" instead of "QD<2.0".
As I did not change any other parameters, I was just expecting to see more SNPs being called. However, I came across a list of SNPs, which did contain some new SNPs, but which was also lacking one of the previously called SNPs (it was not even present in the raw HaplotypeCaller output). How could this have happened?

This is an extract from the vcf file containing the SNP that was previously called:

4 106156187 rs17253672 C T 490.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.341;ClippingRankSum=1.022;DB;DP=161;FS=4.193;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.231;QD=3.05;ReadPosRankSum=1.347 GT:AD:DP:GQ:PL 0/1:10,12:22:99:519,0,262

I am very curious to hear whether anyone might have an explanation for my findings.
Many thanks in advance!

Best Answer

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hi there,

    Please clarify what you mean by

    restarted the whole GATK pipeline (including the genotyping) with the filtering option "QD<0.0" instead of "QD<2.0"

    How is your pipeline set up?

  • ssandmannssandmann Münster, GermanyMember

    We are using GATK 3.2.2. In our analysis of the 454 data we are sticking to the recommendet pipeline, so it is:

    1) Recalibration (using the BaseRecalibrator, considering dbSNP as regards the known sites; using PrintReads to create the .bam-files; using Picard tools to build an index for the bam-files)

    2) Genotyping (using the HaplotypeCaller with stand_call_conf 30.0 and stand_emit_conf 10.9; using SelectVariants to split vcf-file for SNPs and indels)

    3) Hard filtration (using VariantFiltration for the SNPs and Indels seperately with the recommendet filtering options, except for QD; using CombineVariants to merge the vcf-files)

    4) Annotation

    I restarted this whole pipeline for my samples, only changing the QD-threshold. However, the results from the second run lack one SNP that has previously been called. It was already lacking in the raw output after step two. I still have no idea how this could have happened.

    If you need any more information on our pipeline please let me know.

    Thanks for your help!
    Sarah

  • SheilaSheila Broad InstituteMember, Broadie admin

    @ssandmann‌

    Hi Sarah,

    So, you are saying that Haplotype Caller originally called a SNP, but when you reran Haplotype Caller a second time, the same SNP was not called? That should never be the case, unless maybe you switched versions. Did you use any intervals in one run but not in the other? Can you please post your commands for each of the steps you listed?

    Thanks,
    Sheila

  • ssandmannssandmann Münster, GermanyMember

    Hi Sheila,

    Yes, that is exactly what happened. I used intervals, but they were identical for both runs.
    I checked the bam-files I used as input for the second step and they are identical, so I do not suppose that there is any difference in the first step?!

    This is the command I used for the second step in the first run (with the subsequent QD<2.0 filtering option):

    GATKCommandLine=<ID=HaplotypeCaller,Version=3.2-2-gec30cee,Date="Wed Dec 03 13:36:21 CET 2014",Epoch=1417610181874,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[../validation_study/454/validation_study/gatk/bam/MDS006_T664.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=[../validation_study/454/validation_study/targetRegions/targetRegions.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/home/share/Genomes/Homo_sapiens.GRCh37.67/gatk-3.2.2/Homo_sapiens.GRCh37.67.dna.chromosome.all.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=8 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false out=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN graphOutput=null bamOutput=null bam_compression=null disable_bam_indexing=null generate_md5=null simplifyBAM=null bamWriterType=CALLED_HAPLOTYPES dbsnp=(RodBinding name=dbsnp source=/home/share/Genomes/gatk-bundle-2.8/dbsnp_138.b37.excluding_sites_after_129.vcf) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[ClippingRankSumTest, DepthPerSampleHC] excludeAnnotation=[SpanningDeletions, TandemRepeatAnnotator] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=10.0 max_alternate_alleles=9 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingTails=false consensus=false GVCFGQBands=[5, 20, 60] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 minPruning=2 gcpHMM=10 includeUmappedReads=false useAllelesTrigger=false phredScaledGlobalReadMismappingRate=45 maxNumHaplotypesInPopulation=128 mergeVariantsViaLD=false pair_hmm_implementation=VECTOR_LOGLESS_CACHING keepRG=null justDetermineActiveRegions=false dontGenotype=false errorCorrectKmers=false debugGraphTransformations=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false allowCyclesInKmerGraphToGeneratePaths=false noFpga=false errorCorrectReads=false kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 pcr_indel_model=CONSERVATIVE activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

    And this is the command for the second step in the second run (with the subsequent QD<0.0 filtering option):

    GATKCommandLine=<ID=HaplotypeCaller,Version=3.2-2-gec30cee,Date="Thu Dec 11 15:50:05 CET 2014",Epoch=1418309405034,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[../validation_study/454/validation_study/QD/gatk/bam/MDS006_T664.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] intervals=[../validation_study/454/validation_study/targetRegions/targetRegions.bed] excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=/home/share/Genomes/Homo_sapiens.GRCh37.67/gatk-3.2.2/Homo_sapiens.GRCh37.67.dna.chromosome.all.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=1500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false num_threads=1 num_cpu_threads_per_data_thread=8 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false out=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub no_cmdline_in_header=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub sites_only=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub bcf=org.broadinstitute.gatk.engine.io.stubs.VariantContextWriterStub likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN graphOutput=null bamOutput=null bam_compression=null disable_bam_indexing=null generate_md5=null simplifyBAM=null bamWriterType=CALLED_HAPLOTYPES dbsnp=(RodBinding name=dbsnp source=/home/share/Genomes/gatk-bundle-2.8/dbsnp_138.b37.excluding_sites_after_129.vcf) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[ClippingRankSumTest, DepthPerSampleHC] excludeAnnotation=[SpanningDeletions, TandemRepeatAnnotator] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=30.0 standard_min_confidence_threshold_for_emitting=10.0 max_alternate_alleles=9 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=EXACT_INDEPENDENT exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingTails=false consensus=false GVCFGQBands=[5, 20, 60] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 minPruning=2 gcpHMM=10 includeUmappedReads=false useAllelesTrigger=false phredScaledGlobalReadMismappingRate=45 maxNumHaplotypesInPopulation=128 mergeVariantsViaLD=false pair_hmm_implementation=VECTOR_LOGLESS_CACHING keepRG=null justDetermineActiveRegions=false dontGenotype=false errorCorrectKmers=false debugGraphTransformations=false dontUseSoftClippedBases=false captureAssemblyFailureBAM=false allowCyclesInKmerGraphToGeneratePaths=false noFpga=false errorCorrectReads=false kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 pcr_indel_model=CONSERVATIVE activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

    I looked at the differences, and I could only find the date, the path to the file (the files definately are identical), and the EPOCH (but I do not know what this means).

    I also attached a screanshot from the IGV at the corresponding region. The files with the "_QD0" extension are those from the second run.

    If you like, I can also post the commands from the other steps.

    Sarah

    image

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Hey Sarah, in future, can you please post the command lines as you wrote them rather than the recap command output by the program? The latter contains all values for every applicable parameter after initial evaluation, which makes it very difficult to compare your command lines. Sometimes we ask to see the full thing but in a case like this we just need to see what parameters you specified yourself.

    (Epoch is just a unix timestamp, you can ignore that.)

    Are you sure your bam files are identical? Because they don't look quite the same in the screenshot. Though that could be an artifact of the visualization.

    One possible culprit could be the use of the -nct parameter, which introduces some non-deterministic behavior in downsampling. That can make borderline calls (that typically would get filtered out) appear or disappear.

    What does the call actually look like? Can you please post the VCF record?

  • ssandmannssandmann Münster, GermanyMember

    Hey,

    I created an example for you to reproduce the error:

    The region, saved as a bed-file, is:

    4 106155055 106158634

    In my script I am working with a loop, so there should not be any parameters that change. In the script I am working with a bam-file containing reads only covering the above mentioned interval (if you like I could send you the bam-file per email?!).

    The main script is the following:

    RESULTS_QD2=$RUN/minimal/summary_QD2_8.txt
    rm $RESULTS_QD2
    touch $RESULTS_QD2

    for i in {1..250}; do

    Variant calling using GATK:

    no realignment (not recommended), just copy bam-files

    $GATK_SCRIPTS/01_realign_454_IonTorrent.sh $RUN/minimal $RUN/minimal/gatk $SAMPLENAMES

    Base quality score recalibration

    $GATK_SCRIPTS/02_recalibrate.sh $RUN/minimal/gatk $SAMPLENAMES $THREADS

    Actual variant calling

    $GATK_SCRIPTS/03_genotyping.sh $RUN/minimal/gatk $SAMPLENAMES $TARGETS $THREADS

    grep -v "#" $RUN/minimal/gatk/intermRes/$SAMPLE_QD2.rawSNPs.vcf | grep "106156187" | wc -l >> $RESULTS_QD2
    done

    I am working with 8 threads (but I also started the script for only 1 thread; however, it is not ready yet). In the example $SAMPLENAMES just contains one sample. The number "106156187" I grep from the output-vcf corresponds to the position of the called SNP. It is the only SNP that is ever called in the region.

    The GATK-scripts are the following:

    01_realign_454_Iontorrent.sh:

    BAMDIRRAW=$1
    DIR=$2
    BAMDIR=$DIR/bam
    INTERMRES=$DIR/intermRes
    LOGS=$DIR/log

    mkdir -p $BAMDIR $INTERMRES $LOGS

    SAMPLENAMES=$3
    if [ -f $SAMPLENAMES ];
    then
    SAMPLES=$(cat $SAMPLENAMES)
    else
    SAMPLES=$SAMPLENAMES
    fi

    for SAMPLE in ${SAMPLES[*]} ; do

    echo "Processing sample ${SAMPLE}"

    cp ${BAMDIRRAW}/${SAMPLE}.bam ${BAMDIR}/${SAMPLE}_ra.bam
    cp ${BAMDIRRAW}/${SAMPLE}.bai ${BAMDIR}/${SAMPLE}_ra.bai
    cp ${BAMDIRRAW}/${SAMPLE}.bam.bai ${BAMDIR}/${SAMPLE}_ra.bai

    done

    02_recalibrate.sh:

    DIR=$1
    BAMDIR=$DIR/bam
    INTERMRES=$DIR/intermRes
    LOGS=$DIR/log

    mkdir -p $BAMDIR $INTERMRES $LOGS

    SAMPLENAMES=$2
    if [ -f $SAMPLENAMES ];
    then
    SAMPLES=$(cat $SAMPLENAMES)
    else
    SAMPLES=$SAMPLENAMES
    fi

    THREADS=$3

    GATK=/opt/GenomeAnalysisTK-3.2.2/GenomeAnalysisTK.jar
    PIC=/opt/picard-tools-1.118
    GENOME=/home/share/Genomes/Homo_sapiens.GRCh37.67/gatk-3.2.2/Homo_sapiens.GRCh37.67.dna.chromosome.all.fasta
    DBSNP=/home/share/Genomes/gatk-bundle-2.8/dbsnp_138.b37.vcf
    KNOWNINDELS1=/home/share/Genomes/gatk-bundle-2.8/Mills_and_1000G_gold_standard.indels.b37.vcf
    KNOWNINDELS2=/home/share/Genomes/gatk-bundle-2.8/1000G_phase1.indels.b37.vcf

    for SAMPLE in ${SAMPLES[*]} ; do

    echo "Processing sample ${SAMPLE}"

    java -Xmx32g -Djava.io.tmpdir=/mnt/fc6tb/tmp -jar $GATK \
    -T BaseRecalibrator \
    -I ${BAMDIR}/${SAMPLE}_ra.bam \
    -R $GENOME \
    --covariate ContextCovariate \
    --covariate CycleCovariate \
    --covariate QualityScoreCovariate \
    --covariate ReadGroupCovariate \
    -knownSites $DBSNP \
    -knownSites $KNOWNINDELS1 \
    -knownSites $KNOWNINDELS2 \
    -nct $THREADS \
    -o ${INTERMRES}/${SAMPLE}_RecalData.csv \
    2>&1 | tee $LOGS/02_BaseRecalibrator.txt

    java -Xmx32g -Djava.io.tmpdir=/mnt/fc6tb/tmp -jar $GATK \
    -T PrintReads \
    -I ${BAMDIR}/${SAMPLE}_ra.bam \
    -R $GENOME \
    -BQSR ${INTERMRES}/${SAMPLE}_RecalData.csv \
    -o ${BAMDIR}/${SAMPLE}.bam \
    2>&1 | tee $LOGS/02_PrintReads.txt

    java -jar $PIC/BuildBamIndex.jar VALIDATION_STRINGENCY="LENIENT"\
    INPUT=${BAMDIR}/${SAMPLE}.bam \
    OUTPUT=${BAMDIR}/${SAMPLE}.bai \
    TMP_DIR=/mnt/fc6tb/tmp \
    2>&1 | tee $LOGS/02_BuildBamIndex.txt

    done

    03_genotyping.sh:

    DIR=$1
    BAMDIR=$DIR/bam
    INTERMRES=$DIR/intermRes
    LOGS=$DIR/log

    mkdir -p $BAMDIR $INTERMRES

    SAMPLENAMES=$2
    if [ -f $SAMPLENAMES ];
    then
    SAMPLES=$(cat $SAMPLENAMES)
    else
    SAMPLES=$SAMPLENAMES
    fi

    TARGETREGION=$3
    THREADS=$4

    GATK=/opt/GenomeAnalysisTK-3.2.2/GenomeAnalysisTK.jar
    GENOME=/home/share/Genomes/Homo_sapiens.GRCh37.67/gatk-3.2.2/Homo_sapiens.GRCh37.67.dna.chromosome.all.fasta

    DBSNP=/home/share/Genomes/gatk-bundle-2.8/dbsnp_138.b37.vcf

    dbSNP-file contains no mutatinos, just known polymorphisms

    DBSNP=/home/share/Genomes/gatk-bundle-2.8/dbsnp_138.b37.excluding_sites_after_129.vcf
    DOWNSAMPLING=1500

    for SAMPLE in ${SAMPLES[*]} ; do
    echo "Processing sample ${SAMPLE}"

    java -Xmx32g -Djava.io.tmpdir=/mnt/fc6tb/tmp -jar $GATK \
    -T HaplotypeCaller \
    -R $GENOME \
    -stand_call_conf 30.0 \
    -stand_emit_conf 10.0 \
    --dbsnp $DBSNP \
    -L $TARGETREGION \
    --max_alternate_alleles 9 \
    --downsample_to_coverage $DOWNSAMPLING \
    -nct $THREADS \
    -I ${BAMDIR}/${SAMPLE}.bam \
    -o ${INTERMRES}/${SAMPLE}.rawMutations.vcf \
    2>&1 | tee $LOGS/03_HaplotypeCaller.txt

    done

    split vcf file for indels and SNPs to filter them seperately in the next step

    for SAMPLE in ${SAMPLES[*]} ; do
    echo "Processing sample ${SAMPLE}"

    java -Xmx2g -jar $GATK \
    -T SelectVariants \
    -R $GENOME \
    -selectType SNP \
    --variant ${INTERMRES}/${SAMPLE}.rawMutations.vcf \
    -o ${INTERMRES}/${SAMPLE}.rawSNPs.vcf \
    2>&1 | tee $LOGS/03_SelectVariantsSNPs.txt

    java -Xmx2g -jar $GATK \
    -T SelectVariants \
    -R $GENOME \
    -selectType INDEL \
    --variant ${INTERMRES}/${SAMPLE}.rawMutations.vcf \
    -o ${INTERMRES}/${SAMPLE}.rawIndels.vcf \
    2>&1 | tee $LOGS/03_SelectVariantsIndels.txt

    done

    The output-file *.rawSNPs.vcf contains a series of zeros, but also a few ones from time to time. So, obviously the SNP is sometimes called and sometimes not.

    By the way, I tried the same with both bam-files that were displayed im my IGV-screenshot and I get the same result in both cases.

    As we restrict our downsampling to 1500 and we've got a noticeable smaller number of reads covering the interval this should not be any problem, should it?!

    Sarah

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Sarah, you need to isolate the point(s) in your pipeline where you see variability in outputs when you start with the exact same inputs, otherwise we cannot help you. Try running each step independently and checking for differences in outputs, then let us know at which step(s) you find reproducibility issues.

    In addition to that, we'll need you to post actual VCF records. Telling us that "The output-file *.rawSNPs.vcf contains a series of zeros, but also a few ones from time to time." is not helpful. We need to see the call metrics in order to understand what's going on.

  • ssandmannssandmann Münster, GermanyMember

    Reproduceable variability may be seen when I run the Haplotype Caller. I use exactly the same input, but I observe different output.
    I repeated the step 500 times using 8 threads. In 21 out of these 500 cases a SNP is called. In the remaining cases it is not.

    The vcf records for the called SNP do sometimes vary. The predominant record is the following:

    4 106156187 rs17253672 C T 37.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.464;ClippingRankSum=0.506;DB;DP=198;FS=2.057;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.350;QD=0.19;ReadPosRankSum=0.796 GT:AD:DP:GQ:PL 0/1:15,6:21:66:66,0,654

    Yet, I do also observe other records:

    4 106156187 rs17253672 C T 10.20 LowQual AC=1;AF=0.500;AN=2;BaseQRankSum=3.783;ClippingRankSum=0.212;DB;DP=204;FS=8.776;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.151;QD=0.05;ReadPosRankSum=2.174 GT:AD:DP:GQ:PL 0/1:18,7:25:38:38,0,977

    4 106156187 rs17253672 C T 12.05 LowQual AC=1;AF=0.500;AN=2;BaseQRankSum=3.708;ClippingRankSum=0.501;DB;DP=200;FS=1.824;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.301;QD=0.06;ReadPosRankSum=2.445 GT:AD:DP:GQ:PL 0/1:16,7:23:40:40,0,1040

    4 106156187 rs17253672 C T 42.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.708;ClippingRankSum=0.367;DB;DP=188;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.969;QD=0.23;ReadPosRankSum=1.630 GT:AD:DP:GQ:PL 0/1:16,7:23:71:71,0,788

    4 106156187 rs17253672 C T 117.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.592;ClippingRankSum=0.619;DB;DP=191;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=0.289;QD=0.62;ReadPosRankSum=1.197 GT:AD:DP:GQ:PL 0/1:11,8:19:99:146,0,385

    4 106156187 rs17253672 C T 17.84 LowQual AC=1;AF=0.500;AN=2;BaseQRankSum=3.747;ClippingRankSum=1.080;DB;DP=202;FS=2.016;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-1.016;QD=0.09;ReadPosRankSum=3.079 GT:AD:DP:GQ:PL 0/1:17,7:24:46:46,0,940

    4 106156187 rs17253672 C T 62.77 . AC=1;AF=0.500;AN=2;BaseQRankSum=3.566;ClippingRankSum=0.000;DB;DP=203;FS=4.506;MLEAC=1;MLEAF=0.500;MQ=60.00;MQ0=0;MQRankSum=-0.475;QD=0.31;ReadPosRankSum=1.805 GT:AD:DP:GQ:PL 0/1:13,7:20:91:91,0,599

    In the IGV 444 reads are counted at chr4:106156187. As downsampling is set to 1500, I do not think that this should be a problem.

    Please let me know if you need any more information.

Sign In or Register to comment.