variant calling with biological replicates

I am new to use GATK pipeline for SNP calling. I am currently working on four different populations (RNASEq data) with 6 clones. Each clone has 3 biological replicates. How do I combine the variant calling step for the replicates?
I went through the GATK documentation on cohorts and stuff, but I am not sure if I should continue this.
Does combining the resultant vcf file also a possibility?
please help!

Best Answers

Answers

  • tommycarstensentommycarstensen United KingdomMember ✭✭✭

    I am by no means an expert on this type of problem. But it makes more sense to me to merge your reads rather than merging after variant calling. You better wait for the answer from someone else other than a fellow user.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi,

    Can you give us some more background information? What organism are you working with? By clones, do you mean separate colonies (assuming you are working with bacteria)?
    Can you explain how the samples were prepared?

    Thanks,
    Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila:
    Hello Ms.Sheila,
    I work on Daphnia.
    By clones, I meant the genotypes.
    Daphnia from 4 different location were sampled and 6 genotypes from each location were selected using microsatellite markers. For each of the genotype, 3 biological replicates were considered. The cDNA was extracted and sequenced using Illumina. Then, the reads were assembled and mapped.
    Hope this helps!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi,

    Thanks. What exactly is your end goal? Do you want to compare the Daphnia from each location?

    -Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila:
    Hello,
    Yes, I want to compare the daphnids from each location, I am trying to understand the local adaptation

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    We need to understand your experimental design a bit better to answer your question.

    1. Regarding the clones / genotypes that you are using as biological replicates, do you expect them to be genetically identical? Or just very closely related?

    2. In your downstream analysis,what exactly do you want to compare? How do you plan to set up the analysis?

  • suda_ravindransuda_ravindran HamburgMember

    @Geraldine_VdAuwera

    1) Daphnia have the capabitiy to produce clones of themselves. Hence, our samples are genetically identical biological replicates.

    So for example:
    Lets say we have for populations: A,B,C and D.
    Each of the populations have 6 genotypes labelled from 1-6. (eg: A1,A2,A3,A4,A5,A6 and so on..)
    Each genotype has 3 biological replicates labelled as a,b,c. (eg: A1a, A1b, A1c, A2a, A2b, A2c and so on..)

    So A1a, A1b, A1c are gentically identical. Similarly for all other genotypes.

    2) With the read count data, we have already done a differential gene expression analysis and have a list of genes that are differentially expressed for each of the population (example: We have 100,150,125,145 genes that are differentially expressed in Population A,B,C and D respectively). Now we want to study the sequence level variation among the four populations. We would like to annotate them understand their function. Based on this functional interpretation, we would carry forward our study.

    Hope this helps!

    Suda

    Issue · Github
    by Sheila

    Issue Number
    94
    State
    closed
    Last Updated
    Assignee
    Array
    Milestone
    Array
    Closed By
    chandrans
  • suda_ravindransuda_ravindran HamburgMember

    Hi @Sheila :smile:

    Thank you very much!
    Suda

  • suda_ravindransuda_ravindran HamburgMember
    edited August 2015

    @Sheila
    Hello,
    After using the same SM tag in the AddorReplaceReadGroups step, I tried running the HaplotypeCaller using both methods individually viz.,
    1) producing a combined per-sample bam file where I combined them at the indel realignment step.
    2) I kept the bam files separate until variant calling.

    After variant calling, when I try to do the Variant filtration, I started getting errors like:
    Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    Interpreter - ![0,2]: 'FS > 30.0;' undefined variable FS

    I checked the vcf file for multiple-allelic sites as mentioned http://gatkforums.broadinstitute.org/discussion/2334/undefined-variable-variantfiltration

    I am not sure what is happening here in the REF field (find attached jpeg file):

    Please help!!

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    Can you post some actual records from the VCF? I cannot see many annotations in the image you posted. Also, which version of GATK are you using and what was the exact command you ran for VariantFiltration?

    Thanks,
    Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Sheila,

    I am using the latest version of GATK : -3.4-46

    The command I used for running the Variant Filtration is:

    java -jar GenomeAnalysisTK.jar -T VariantFiltration -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111_new.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o output_files/indel_realignment/G/G111_filt_again_2.vcf

    Please find attached the same regions posted in the screenshot as a text file.

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Sheila,
    Please find attached the same regions posted in the screenshot as a text file.
    Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    I think you did not attach the image.

    -Sheila

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Sorry I spoke to soon! I see the .txt file :smile: Thanks.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    This is odd. The VCF looks fine. Can you check and make sure QD and FS are defined in the VCF header? Also, please try running ValidateVariants? https://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_gatk_tools_walkers_variantutils_ValidateVariants.php
    I tried running the same command you ran on some data I have, and it ran perfectly.

    -Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila:

    Hi Sheila,

    This is my header from the VCF file:

    fileformat=VCFv4.1

    FILTER=<ID=LowQual,Description="Low quality">

    FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">

    FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">

    FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">

    FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">

    FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">

    GATKCommandLine.HaplotypeCaller=<ID=HaplotypeCaller,Version=3.4-0-g7e26428,Date="Wed Aug 26 11:04:51 CEST 2015",Epoch=1440579891180,CommandLineOptions="analysis_type=HaplotypeCaller input_file=[output_files/indel_realignment/G/G111_reads.bam] showFullBamList=false read_buffer_size=null phone_home=AWS gatk_key=null tag=NA read_filter=[] disable_read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL interval_padding=0 reference_sequence=input_files/reference/allaugust.okay.tr.fasta nonDeterministicRandomSeed=false disableDithering=false maxRuntime=-1 maxRuntimeUnits=MINUTES downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=500 baq=OFF baqGapOpenPenalty=40.0 refactor_NDN_cigar_string=false fix_misencoded_quality_scores=false allow_potentially_misencoded_quality_scores=false useOriginalQualities=false defaultBaseQualities=-1 performanceLog=null BQSR=null quantize_quals=0 disable_indel_quals=false emit_original_quals=false preserve_qscores_less_than=6 globalQScorePrior=-1.0 validation_strictness=SILENT remove_program_records=false keep_program_records=false sample_rename_mapping_file=null unsafe=null disable_auto_index_creation_and_locking_when_reading_rods=false no_cmdline_in_header=false sites_only=false never_trim_vcf_format_field=false bcf=false bam_compression=null simplifyBAM=false disable_bam_indexing=false generate_md5=false num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 monitorThreadEfficiency=false num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false generateShadowBCF=false variant_index_type=DYNAMIC_SEEK variant_index_parameter=-1 logging_level=INFO log_to_file=null help=false version=false out=/media/mathilde/daten2/Suda/SNP_Calling/GATK/output_files/indel_realignment/G/G111.vcf likelihoodCalculationEngine=PairHMM heterogeneousKmerSizeResolution=COMBO_MIN dbsnp=(RodBinding name= source=UNBOUND) dontTrimActiveRegions=false maxDiscARExtension=25 maxGGAARExtension=300 paddingAroundIndels=150 paddingAroundSNPs=20 comp=[] annotation=[ClippingRankSumTest, DepthPerSampleHC] excludeAnnotation=[] debug=false useFilteredReadsForAnnotations=false emitRefConfidence=NONE bamOutput=null bamWriterType=CALLED_HAPLOTYPES disableOptimizations=false annotateNDA=false heterozygosity=0.001 indel_heterozygosity=1.25E-4 standard_min_confidence_threshold_for_calling=20.0 standard_min_confidence_threshold_for_emitting=20.0 max_alternate_alleles=6 input_prior=[] sample_ploidy=2 genotyping_mode=DISCOVERY alleles=(RodBinding name= source=UNBOUND) contamination_fraction_to_filter=0.0 contamination_fraction_per_sample_file=null p_nonref_model=null exactcallslog=null output_mode=EMIT_VARIANTS_ONLY allSitePLs=false gcpHMM=10 pair_hmm_implementation=VECTOR_LOGLESS_CACHING pair_hmm_sub_implementation=ENABLE_ALL always_load_vector_logless_PairHMM_lib=false phredScaledGlobalReadMismappingRate=45 noFpga=false sample_name=null kmerSize=[10, 25] dontIncreaseKmerSizesForCycles=false allowNonUniqueKmersInRef=false numPruningSamples=1 recoverDanglingHeads=false doNotRecoverDanglingBranches=false minDanglingBranchLength=4 consensus=false maxNumHaplotypesInPopulation=128 errorCorrectKmers=false minPruning=2 debugGraphTransformations=false allowCyclesInKmerGraphToGeneratePaths=false graphOutput=null kmerLengthForReadErrorCorrection=25 minObservationsForKmerToBeSolid=20 GVCFGQBands=[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 70, 80, 90, 99] indelSizeToEliminateInRefModel=10 min_base_quality_score=10 includeUmappedReads=false useAllelesTrigger=false doNotRunPhysicalPhasing=true keepRG=null justDetermineActiveRegions=false dontGenotype=false dontUseSoftClippedBases=true captureAssemblyFailureBAM=false errorCorrectReads=false pcr_indel_model=CONSERVATIVE maxReadsInRegionPerSample=10000 minReadsPerAlignmentStart=10 mergeVariantsViaLD=false activityProfileOut=null activeRegionOut=null activeRegionIn=null activeRegionExtension=null forceActive=false activeRegionMaxSize=null bandPassSigma=null maxProbPropagationDistance=50 activeProbabilityThreshold=0.002 min_mapping_quality_score=20 filter_reads_with_N_cigar=false filter_mismatching_base_and_quals=false filter_bases_not_stored=false">

    INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">

    INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">

    INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">

    INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">

    INFO=<ID=ClippingRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref number of hard clipped bases">

    INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">

    INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">

    INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">

    INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">

    INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">

    INFO=<ID=MLEAC,Number=A,Type=Integer,Description="Maximum likelihood expectation (MLE) for the allele counts (not necessarily the same as the AC), for each ALT allele, in the same order as listed">

    INFO=<ID=MLEAF,Number=A,Type=Float,Description="Maximum likelihood expectation (MLE) for the allele frequency (not necessarily the same as the AF), for each ALT allele, in the same order as listed">

    INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">

    INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">

    INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">

    INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">

    INFO=<ID=SOR,Number=1,Type=Float,Description="Symmetric Odds Ratio of 2x2 contingency table to detect strand bias">

    This is what I obtained after running the ValidateVariants step:

    java -jar GenomeAnalysisTK.jar -T ValidateVariants -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111.vcf

    INFO 16:41:52,908 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 16:41:52,910 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 16:41:52,911 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 16:41:52,911 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 16:41:52,915 HelpFormatter - Program Args: -T ValidateVariants -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111.vcf
    INFO 16:41:52,921 HelpFormatter - Executing as [email protected] on Linux 3.13.0-61-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14.
    INFO 16:41:52,921 HelpFormatter - Date/Time: 2015/08/28 16:41:52
    INFO 16:41:52,921 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 16:41:52,921 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 16:41:52,984 GenomeAnalysisEngine - Strictness is SILENT
    INFO 16:41:53,852 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 16:42:01,746 GenomeAnalysisEngine - Preparing for traversal
    INFO 16:42:01,798 GenomeAnalysisEngine - Done preparing for traversal
    INFO 16:42:01,799 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 16:42:01,799 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 16:42:01,800 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    INFO 16:42:03,198 ValidateVariants - Reference allele is too long (146) at position abyss10353:109; skipping that record.
    INFO 16:42:03,683 ValidateVariants - Reference allele is too long (118) at position oasesvelvLoc2997d15643t2:151; skipping that record.
    INFO 16:42:04,896 ValidateVariants - Reference allele is too long (129) at position trinitytrinloc16123c0t2:23; skipping that record.
    INFO 16:42:04,982 ValidateVariants - Reference allele is too long (103) at position trinitytrinloc20144c2t5:2427; skipping that record.
    Successfully validated the input file. Checked 150907 records with no failures.
    INFO 16:42:05,561 ProgressMeter - done 168838.0 3.0 s 22.0 s 100.0% 3.0 s 0.0 s
    INFO 16:42:05,562 ProgressMeter - Total runtime 3.76 secs, 0.06 min, 0.00 hours
    INFO 16:42:07,083 GATKRunReport - Uploaded run statistics report to AWS S3

    Suda

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila:

    Just to make sure, I have done things right:

    1) In the AddorReplaceReadGroups step, I had the same sample name and different library name for each of my replicates (as you have mentioned previously)
    2) I then Sorted, indexed the bam files, then checked for duplicates and also ran the SplitNCigarReads step using the default paramters.
    3) Then, I ran the Indel Relaignment step.
    4) I went to VariantCalling step after this as I do not have snp db for the BQSR step using the command:

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R input_files/reference/allaugust.okay.tr.fasta -I output_files/indel_realignment/G/G111_reads.bam -dontUseSoftClippedBases -stand_call_conf 20.0 -stand_emit_conf 20.0 -o output_files/indel_realignment/G/G111_new.vcf -log output_files/indel_realignment/G/G111_report.log

    After this I did the VariantFiltration step as mentioned previously.

    Is this right?

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Thanks Suda. Everything looks fine. Can you try removing those sites where the reference alleles are too long and try running VariantFiltration again? I may need you to submit a bug report.

    Thanks,
    Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila
    Hi Sheila,

    I removed those variants that were troubling and rand the VariantFiltration step:

    This is what I got:

    java -jar GenomeAnalysisTK.jar -T VariantFiltration -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111_removed.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o output_files/indel_realignment/G/G111_new_filt_removed.vcf

    INFO 17:12:34,468 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 17:12:34,471 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 17:12:34,471 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 17:12:34,471 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 17:12:34,475 HelpFormatter - Program Args: -T VariantFiltration -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111_removed.vcf -window 35 -cluster 3 -filterName FS -filter FS > 30.0 -filterName QD -filter QD < 2.0 -o output_files/indel_realignment/G/G111_new_filt_removed.vcf
    INFO 17:12:34,481 HelpFormatter - Executing as [email protected] on Linux 3.13.0-61-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14.
    INFO 17:12:34,481 HelpFormatter - Date/Time: 2015/08/28 17:12:34
    INFO 17:12:34,482 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 17:12:34,482 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 17:12:34,562 GenomeAnalysisEngine - Strictness is SILENT
    INFO 17:12:35,392 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 17:12:43,986 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NumberFormatException: For input string: "280,77"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
    at java.lang.Double.valueOf(Double.java:504)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseQual(AbstractVCFCodec.java:511)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:317)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:60)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:41)
    at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:40)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:502)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.(IndexFactory.java:403)
    at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:312)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:401)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:287)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:224)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:147)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1047)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:828)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "280,77"
    ERROR ------------------------------------------------------------------------------------------
  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Sheila,
    Any updates on how I could proceed on this?
    Thanks,
    Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    It looks like there is a 280,77 that is not defined somewhere in your VCF. The format looks like an AD field. Can you check if for a 280,77 in your VCF? Please post the record here. How did you remove the offending records?

    Thanks,
    Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Shiela,

    I am not sure what 280,77 means.

    I removed the offending records manually (just deleted the variants from the file) and created the new file.

    Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    Can you please try running ValidateVariants on your VCF without the offending records? If that runs with no errors, can you submit a bug report. Instructions are here: http://gatkforums.broadinstitute.org/discussion/1894/how-do-i-submit-a-detailed-bug-report

    Thanks,
    Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Sheila,

    Thanks, I just ran the ValidateVaraints and obtained the same error message: " For input string: 280,77.

    The command that I used:
    java -jar GenomeAnalysisTK.jar -T ValidateVariants -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111_removed.vcf

    INFO 15:46:37,573 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:46:37,575 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 15:46:37,575 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 15:46:37,575 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 15:46:37,578 HelpFormatter - Program Args: -T ValidateVariants -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G111_removed.vcf
    INFO 15:46:37,583 HelpFormatter - Executing as [email protected] on Linux 3.13.0-61-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14.
    INFO 15:46:37,584 HelpFormatter - Date/Time: 2015/08/31 15:46:37
    INFO 15:46:37,584 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:46:37,584 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:46:37,635 GenomeAnalysisEngine - Strictness is SILENT
    INFO 15:46:38,336 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 15:46:46,936 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.NumberFormatException: For input string: "280,77"
    at sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
    at java.lang.Double.valueOf(Double.java:504)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseQual(AbstractVCFCodec.java:511)
    at htsjdk.variant.vcf.AbstractVCFCodec.parseVCFLine(AbstractVCFCodec.java:317)
    at htsjdk.variant.vcf.AbstractVCFCodec.decodeLine(AbstractVCFCodec.java:279)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:257)
    at htsjdk.variant.vcf.AbstractVCFCodec.decode(AbstractVCFCodec.java:60)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:79)
    at htsjdk.tribble.AsciiFeatureCodec.decode(AsciiFeatureCodec.java:41)
    at htsjdk.tribble.AbstractFeatureCodec.decodeLoc(AbstractFeatureCodec.java:40)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.readNextFeature(IndexFactory.java:502)
    at htsjdk.tribble.index.IndexFactory$FeatureIterator.(IndexFactory.java:403)
    at htsjdk.tribble.index.IndexFactory.createDynamicIndex(IndexFactory.java:312)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createIndexInMemory(RMDTrackBuilder.java:401)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.loadIndex(RMDTrackBuilder.java:287)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.getFeatureSource(RMDTrackBuilder.java:224)
    at org.broadinstitute.gatk.utils.refdata.tracks.RMDTrackBuilder.createInstanceOfTrack(RMDTrackBuilder.java:147)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedQueryDataPool.(ReferenceOrderedDataSource.java:208)
    at org.broadinstitute.gatk.engine.datasources.rmd.ReferenceOrderedDataSource.(ReferenceOrderedDataSource.java:88)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.getReferenceOrderedDataSources(GenomeAnalysisEngine.java:1047)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.initializeDataSources(GenomeAnalysisEngine.java:828)
    at org.broadinstitute.gatk.engine.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:286)
    at org.broadinstitute.gatk.engine.CommandLineExecutable.execute(CommandLineExecutable.java:121)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:248)
    at org.broadinstitute.gatk.utils.commandline.CommandLineProgram.start(CommandLineProgram.java:155)
    at org.broadinstitute.gatk.engine.CommandLineGATK.main(CommandLineGATK.java:106)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 3.4-46-gbc02625):
    ERROR
    ERROR This might be a bug. Please check the documentation guide to see if this is a known problem.
    ERROR If not, please post the error message, with stack trace, to the GATK forum.
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: For input string: "280,77"
    ERROR ------------------------------------------------------------------------------------------

    Do I still submit a bug report?

    Thanks, Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    Okay somehow the VCF must have gotten corrupted when you removed the sites. Can you try using -XL on the sites rather than removing them? https://www.broadinstitute.org/gatk/guide/tooldocs/org_broadinstitute_gatk_engine_CommandLineGATK.php#--excludeIntervals

    -Sheila

  • suda_ravindransuda_ravindran HamburgMember

    ****@Sheila
    Hi Sheila,
    I used the -XL option to remove those sites which had long REF position. I still keep getting this warning message:

    My command was: ****
    **java -jar GenomeAnalysisTK.jar -T VariantFiltration -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G112.vcf -window 35 -cluster 3 -filterName FS -filter "FS > 30.0" -filterName QD -filter "QD < 2.0" -o output_files/indel_realignment/G/G112_filt.vcf -XL abyssk40_j_668347:698 -XL abyssk64_j_517734:263 -XL abyssk76_j_468075:250 -XL abyss1756:1344 -XL abyss2055:1638 -XL abyss12232:82 -XL oasesvelvLoc3048t2:1925 -XL oasesvelvLoc3048t2:2170 -XL oasesvelvLoc4278t1:324 -XL oasesvelvLoc4t8884:3534 -XL oasesvelvLoc6t676:53 -XL oasesvelvLoc4723d10616t1:637 -XL oasesvelvLoc5598d10760t1:60 -XL oasesvelvLoc359t2:370 -XL oasesvelvLoc1228t2:927 -XL oasesvelvLoc3068t3:448 -XL oasesvelvLoc11581t1:146 -XL oasesvelvLoc12128t3:1376 -XL oasesvelvLoc1026d26235t1:3695 -XL oasesvelvLoc2282d27162t3:334 -XL oasesvelvLoc5370d28990t1:817 -XL oasesvelvLoc16759t1:2457 -XL oasesvelvLoc21253t2:237 -XL oasesvelvLoc23883t1:467 -XL oasesvelvLoc886d36038t3:4137 -XL oasesvelvLoc894d36043t1:74 -XL oasesvelvLoc1408d36364t2:2167 -XL oasesvelvLoc12188d39561t1:189 -XL oasesvelvLoc13807d47288t1:444 -XL soapsoap196131:106 -XL soapsoapd3723321453:85 -XL soapsoap362703:372 -XL soapsoap385976:970 -XL soapsoapd20450231149:98 -XL soapsoapd25964281441:72 -XL soapsoap385691:67 -XL soapsoap386007:2010 -XL soapsoap425053:1616 -XL trinitytrinloc10376c0t4:1143 -XL trinitytrinloc12960c0t2:1952 -XL trinitytrinloc18203c0t14:2436 -XL trinitytrinloc18717c0t1:1105 -XL trinitytrinloc19230c0t1:261 -XL trinitytrinloc19230c0t1:319 -XL trinitytrinloc19862c0t4:762 -XL trinitytrinloc20144c2t5:2427 -XL trinitytrinloc20146c0t4:2046 -XL trinitytrinloc21018c0t1:1235 -XL trinitytrinloc21314c0t2:698 -XL trinitytrinloc21409c1t3:1096 -XL trinitytrinloc22065c0t1:2308 -XL trinitytrinloc22067c0t1:900 -XL trinitytrinloc22331c0t1:444 -XL trinitytrinloc22510c0t13:1465 -XL trinitytrinloc22867c0t1:2162 -XL trinitytrinloc23124c0t14:1009 -XL trinitytrinloc23314c0t5:774 -XL trinitytrinloc23357c0t4:101 -XL trinitytrinloc23597c0t1:1431 -XL trinitytrinloc23632c0t11:171 -XL trinitytrinloc23694c0t1:394 -XL trinitytrinloc23969c1t1:1909 -XL trinitytrinloc23972c1t41:2308 -XL trinitytrinloc24346c0t3:2962 -XL trinitytrinloc24511c0t1:367 -XL trinitytrinloc24791c0t2:2249 -XL trinitytrinloc25114c0t38:3566 -XL trinitytrinloc25199c0t1:1077 -XL trinitytrinloc25306c1t3:2396 -XL trinitytrinloc25550c0t2:2283 -XL trinitytrinloc25716c1t13:1527 -XL trinitytrinloc25721c1t2:1454
    **

    INFO 15:39:57,354 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:39:57,356 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-46-gbc02625, Compiled 2015/07/09 17:38:12
    INFO 15:39:57,356 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 15:39:57,357 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 15:39:57,360 HelpFormatter - Program Args: -T VariantFiltration -R input_files/reference/allaugust.okay.tr.fasta -V output_files/indel_realignment/G/G112.vcf -window 35 -cluster 3 -filterName FS -filter FS > 30.0 -filterName QD -filter QD < 2.0 -o output_files/indel_realignment/G/G112_filt.vcf -XL abyssk40_j_668347:698 -XL abyssk64_j_517734:263 -XL abyssk76_j_468075:250 -XL abyss1756:1344 -XL abyss2055:1638 -XL abyss12232:82 -XL oasesvelvLoc3048t2:1925 -XL oasesvelvLoc3048t2:2170 -XL oasesvelvLoc4278t1:324 -XL oasesvelvLoc4t8884:3534 -XL oasesvelvLoc6t676:53 -XL oasesvelvLoc4723d10616t1:637 -XL oasesvelvLoc5598d10760t1:60 -XL oasesvelvLoc359t2:370 -XL oasesvelvLoc1228t2:927 -XL oasesvelvLoc3068t3:448 -XL oasesvelvLoc11581t1:146 -XL oasesvelvLoc12128t3:1376 -XL oasesvelvLoc1026d26235t1:3695 -XL oasesvelvLoc2282d27162t3:334 -XL oasesvelvLoc5370d28990t1:817 -XL oasesvelvLoc16759t1:2457 -XL oasesvelvLoc21253t2:237 -XL oasesvelvLoc23883t1:467 -XL oasesvelvLoc886d36038t3:4137 -XL oasesvelvLoc894d36043t1:74 -XL oasesvelvLoc1408d36364t2:2167 -XL oasesvelvLoc12188d39561t1:189 -XL oasesvelvLoc13807d47288t1:444 -XL soapsoap196131:106 -XL soapsoapd3723321453:85 -XL soapsoap362703:372 -XL soapsoap385976:970 -XL soapsoapd20450231149:98 -XL soapsoapd25964281441:72 -XL soapsoap385691:67 -XL soapsoap386007:2010 -XL soapsoap425053:1616 -XL trinitytrinloc10376c0t4:1143 -XL trinitytrinloc12960c0t2:1952 -XL trinitytrinloc18203c0t14:2436 -XL trinitytrinloc18717c0t1:1105 -XL trinitytrinloc19230c0t1:261 -XL trinitytrinloc19230c0t1:319 -XL trinitytrinloc19862c0t4:762 -XL trinitytrinloc20144c2t5:2427 -XL trinitytrinloc20146c0t4:2046 -XL trinitytrinloc21018c0t1:1235 -XL trinitytrinloc21314c0t2:698 -XL trinitytrinloc21409c1t3:1096 -XL trinitytrinloc22065c0t1:2308 -XL trinitytrinloc22067c0t1:900 -XL trinitytrinloc22331c0t1:444 -XL trinitytrinloc22510c0t13:1465 -XL trinitytrinloc22867c0t1:2162 -XL trinitytrinloc23124c0t14:1009 -XL trinitytrinloc23314c0t5:774 -XL trinitytrinloc23357c0t4:101 -XL trinitytrinloc23597c0t1:1431 -XL trinitytrinloc23632c0t11:171 -XL trinitytrinloc23694c0t1:394 -XL trinitytrinloc23969c1t1:1909 -XL trinitytrinloc23972c1t41:2308 -XL trinitytrinloc24346c0t3:2962 -XL trinitytrinloc24511c0t1:367 -XL trinitytrinloc24791c0t2:2249 -XL trinitytrinloc25114c0t38:3566 -XL trinitytrinloc25199c0t1:1077 -XL trinitytrinloc25306c1t3:2396 -XL trinitytrinloc25550c0t2:2283 -XL trinitytrinloc25716c1t13:1527 -XL trinitytrinloc25721c1t2:1454
    INFO 15:39:57,366 HelpFormatter - Executing as [email protected] on Linux 3.13.0-61-generic amd64; OpenJDK 64-Bit Server VM 1.7.0_79-b14.
    INFO 15:39:57,367 HelpFormatter - Date/Time: 2015/09/01 15:39:57
    INFO 15:39:57,367 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:39:57,367 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 15:39:57,445 GenomeAnalysisEngine - Strictness is SILENT
    INFO 15:39:58,286 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO 15:40:08,122 IntervalUtils - Initial include intervals span 39118833 loci; exclude intervals span 72 loci
    INFO 15:40:08,125 IntervalUtils - Excluding 72 loci from original intervals (0.00% reduction)
    INFO 15:40:08,126 IntervalUtils - Processing 39118761 bp from intervals
    INFO 15:40:08,467 GenomeAnalysisEngine - Preparing for traversal
    INFO 15:40:08,490 GenomeAnalysisEngine - Done preparing for traversal
    INFO 15:40:08,491 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 15:40:08,491 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 15:40:08,492 ProgressMeter - Location | sites | elapsed | sites | completed | runtime | runtime
    WARN 15:40:20,665 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    WARN 15:40:21,055 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    WARN 15:40:21,056 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    WARN 15:40:21,127 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    WARN 15:40:21,353 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    WARN 15:40:21,356 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    WARN 15:40:21,398 Interpreter - ![0,2]: 'QD < 2.0;' undefined variable QD
    INFO 15:40:22,542 ProgressMeter - done 156790.0 14.0 s 89.0 s 100.0% 14.0 s 0.0 s
    INFO 15:40:22,543 ProgressMeter - Total runtime 14.05 secs, 0.23 min, 0.00 hours
    INFO 15:40:23,874 GATKRunReport - Uploaded run statistics report to AWS S3

    Thanks,
    Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    Oh. So you are getting an output VCF with no error message? You just get a WARN message? That is no reason to be concerned. The WARN messages may just be telling you that some of the sites in your file do not have the QD annotation or some of the sites may be homozygous reference. Have a look at this thread for more information: http://gatkforums.broadinstitute.org/discussion/2334/undefined-variable-variantfiltration

    -Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Sheila,

    Now I get only a warning message. I did have a look at the link you have mentioned. My question is:

    Why does GATK provide such long stretch of nucleotides in the Reference field (Before using the -XL option) ? Sequences of upto say (10bp) could be considered as indels. But can indels also be 100bp long as shown below? I don't understand.

    abyssk76_j_468075 250 . AAGGTTGTATATTTATTATACATTCAACATGAGTTTAGATTTGTATGAAGCTACTGTAACACCAATTTCCCTTTTATT A 112,73 . AC=1;AF=0.500;AN=2;BaseQRankSum=1.203;ClippingRankSum=-0.248;DP=27;FS=3.332;MLEAC=1;MLEAF=0.500;MQ=37.27;MQRankSum=-2.054;QD=4.18;ReadPosRankSum=-0.727;SOR=2.833 GT:AD:DP:GQ:PL 0/1:6,3:9:99:150,0,1995
    abyss1756 1344 . ACTGGCGAACACAAAACAAAGTTAAATTGAATTTGACAATATTTTTAAAAGTTATTAC A 631,73 . AC=1;AF=0.500;AN=2;BaseQRankSum=-1.745;ClippingRankSum=1.317;DP=40;FS=4.416;MLEAC=1;MLEAF=0.500;MQ=34.21;MQRankSum=-3.950;QD=15.79;ReadPosRankSum=-0.047;SOR=1.863 GT:AD:DP:GQ:PL 0/1:8,16:24:99:669,0,342
    abyss10353 109 . ATGTGGATTAGTCATGATGCCGGCAAGTTATTCAATAACACATCGAGGCAGTTTGTTTGTTTTCCAACTTTATTTAGCTTCTCCAGGTGGTATTTTTTTGGTAGGTAAATATTAACCTCGGCAGCATCGCCCGCCATTGCAAGTCT A 21,76 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.049;ClippingRankSum=0.832;DP=46;FS=1.768;MLEAC=1;MLEAF=0.500;MQ=26.68;MQRankSum=3.441;QD=0.47;ReadPosRankSum=-3.593;SOR=1.292 GT:AD:DP:GQ:PL 0/1:6,40:46:59:59,0,127
    oasesvelvLoc359t2 370 . AACCCTTTGAACGTTAGGTCTGATCATGAAGGTGTGAAATGTAATCAAATAATAATAATTATTACACAAAATGTCTTGCGTTTT A 2034,84 . AC=1;AF=0.500;AN=2;BaseQRankSum=-0.413;ClippingRankSum=1.310;DP=83;FS=1.788;MLEAC=1;MLEAF=0.500;MQ=35.98;MQRankSum=1.451;QD=24.52;ReadPosRankSum=-0.089;SOR=1.463 GT:AD:DP:GQ:PL 0/1:6,77:83:16:2072,0,16
    oasesvelvLoc11581t1 146 . AGTCATGATGGATCCAAGAGAAGAAGAAGGAGGTGTTGAT A 265,73 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.067;ClippingRankSum=0.762;DP=46;FS=6.532;MLEAC=1;MLEAF=0.500;MQ=38.01;MQRankSum=-0.762;QD=5.78;ReadPosRankSum=-0.747;SOR=4.174 GT:AD:DP:GQ:PL 0/1:2,7:9:49:303,0,49
    oasesvelvLoc23883t1 467 . ACCTAAGGTATTTTTCAAAATTTGTAATTCTTTTAACTGCGACTTGGTAAAATTGCATGTCAGAAAAATATGTTTGACACTTG A 1157,73 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.635;ClippingRankSum=-0.746;DP=52;FS=1.386;MLEAC=1;MLEAF=0.500;MQ=35.80;MQRankSum=-3.961;QD=22.26;ReadPosRankSum=-0.125;SOR=0.997 GT:AD:DP:GQ:PL 0/1:10,28:38:99:1195,0,2877
    oasesvelvLoc3975d37489t1 14 . AGTAAGGTTGAATTCAGGGGTATTATTTGTTCCGTTTCCGCCTTTTTTCAGCTGCCTGTCTTTAACTTTAATAAG A 48,73 . AC=1;AF=0.500;AN=2;BaseQRankSum=2.510;ClippingRankSum=-0.271;DP=45;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=37.00;MQRankSum=-2.329;QD=1.08;ReadPosRankSum=-2.899;SOR=0.223 GT:AD:DP:GQ:PL 0/1:40,5:45:86:86,0,1656
    soapsoap196131 106 . ATCTTCTTCATCCTCGTCTTCTTCATCCGCATCTACTTCG A 65,74 . AC=1;AF=0.500;AN=2;BaseQRankSum=0.727;ClippingRankSum=-0.727;DP=10;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=28.47;MQRankSum=0.727;QD=6.57;ReadPosRankSum=0.727;SOR=1.609 GT:AD:DP:GQ:PL 0/1:1,3:4:29:103,0,29

    Thanks,
    Suda

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @suda_ravindran
    Hi Suda,

    Yes, it is possible for Haplotype Caller to call such a long deletion. You will have to check your bam and bamout files for more information about why exactly they were called.

    -Sheila

  • suda_ravindransuda_ravindran HamburgMember

    @Sheila

    Hi Sheila,

    Thanks. I will look into it

    Suda

  • JulsJuls Member
    edited April 2017

    @Sheila

    Dear Sheila,

    I have a followup question here.
    Wouldn't the simplest way forward to treat it as one sample from the beginning if possible (as long as the coverage does not slow everything down) - with the beginning I mean read preprocessing and bwa mapping - followed by GATK preprocessing and finally calling. So could I just treat all runs for a sample even if they come from different biological replicates as one sample from the start or is there something I did not consider?

    Thank you for your help!
    Julia

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    @Juls For performance reasons (like speed and memory requirements) it is better to do the early processing steps separately.

  • JulsJuls Member

    @Geraldine_VdAuwera
    Thank you very much! I have very very low coverage data so I don't think it would change much performance-wise.
    So it would be fine to throw everything together from the start if I don't mind the slight drop in performance?
    Thanks again!

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    If the coverage is very low then that's reasonable.

Sign In or Register to comment.