Bug Bulletin: The recent 3.2 release fixes many issues. If you run into a problem, please try the latest version before posting a bug report, as your problem may already have been solved.

ReduceReads across multiple samples

annatannat Posts: 7Member

I have read on your recent slides for "Data Compression with Reduce Reads" that "Tumor and Normal samples (or any set of samples) get co-­‐reduced, meaning that every variable region triggered by one sample will be forced in every sample."

I have data from 4 variant strains of an organism, my samples in RG info, and 4 individuals for each strain, my libraries in RG info. Currently I have a bam file for each of the 16 different libraries.

If I want to run ReduceReads as I have quite high coverage, but preserve information across all of my samples where a site is not consensus in just one as there is no snp information available for this organism and I don't want to lose any important data. Should I merge all bam files for all samples before proceeding with ReduceReads with downsampling turned off? Or just leave out ReduceReads?

Thanks
Anna

Tagged:

Best Answers

Answers

  • annatannat Posts: 7Member

    Hi Mauricio,
    thanks for your quick reply. I would ideally like to keep the bam files for the 16 libraries separate, but I'm afraid I'm not having much luck witht eh -nw command, can you let me know where I am going wrong.

    1a) input file contains tab separated pairs of input and output bam file full paths, one pair per line. extension .map
    java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads
    -I bamForReduceReads.map -R data/abH97.fa -nw

    Error:

    Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM files with the .bam
    extension and lists of BAM files with the .list extension, but the file bamForReduceReads.map has neither
    extension. Please ensure that your BAM file or list of BAM files is in the correct format, update the
    extension, and try again.

    1b) As above, but named input file with .list extension
    java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads
    -I bamForReduceReads.list -R data/abH97.fa -nw

    Error:

    Couldn't read file 1.realigned.bam 1.realigned.reduced.bam because java.io.FileNotFoundException:
    1.realigned.bam 1.realigned.reduced.bam (No such file or directory)

    2a) input file contains full path to bam files, one on each line, output is one bam file
    java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads
    -I bamForReduceReads.list -R data/abH97.fa --nwayout -o reduced.bam

    Error:

    The same sample appears in multiple files, this input cannot be multiplexed using the BySampleSAMFileWriter,
    try NWaySAMFileWriter instead.

    2b) as above but try specifying nwayout differently, as it is in IndelRealigner documentation
    java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads
    -I bamForReduceReads.list -R data/abH97.fa --nWayOut -o reduced.bam

    Error:

    Argument with name 'nWayOut' isn't defined.

    3) input file contains full path to bam files, one on each line. output file contains full path to output bam files, one on each line
    java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads
    -I bamForReduceReads.list -R data/abH97.fa -nw -o outBamForReduceReads.list

    Error:
    The same sample appears in multiple files, this input cannot be multiplexed using the BySampleSAMFileWriter,
    try NWaySAMFileWriter instead.

    please let me know where I am going wrong? Has this functionality been added in v2.3-6 ?
    Thanks
    Anna

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,821Administrator, GATK Developer admin

    Hi Anna,

    The key info here is the error message telling you

    The same sample appears in multiple files

    If you want to use -nwayout, the sample names HAVE TO BE different. Co-reducing files with the same sample name will result in one big reduced file, the gatk doesn't differentiate them.

    Geraldine Van der Auwera, PhD

  • annatannat Posts: 7Member

    Hi, I reheadered so that I now have 16 unique sample names, but am still having problems.

    java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads
    -I bamForReduceReads.list -R data/abH97.fa --nwayout -o reduced.bam

    INFO  11:23:10,296 HelpFormatter - -------------------------------------------------------------------------------- 
    INFO 11:23:10,300 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.3-5-g49ed93c, Compiled 2013/01/06 20:58:13
    INFO 11:23:10,300 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 11:23:10,300 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 11:23:10,305 HelpFormatter - Program Args: -T ReduceReads -I bamForReduceReads.list -R data/abH97.fa --nwayout -o reduced.bam
    INFO 11:23:10,306 HelpFormatter - Date/Time: 2013/01/11 11:23:10
    INFO 11:23:10,306 HelpFormatter - --------------------------------------------------------------------------------
    INFO 11:23:10,306 HelpFormatter - --------------------------------------------------------------------------------
    INFO 11:23:10,448 GenomeAnalysisEngine - Strictness is SILENT
    INFO 11:23:10,705 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO 11:23:10,717 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 11:23:10,864 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.14
    INFO 11:23:10,932 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 11:23:10,933 ProgressMeter - Location processed.reads runtime per.1M.reads completed total.runtime remaining
    INFO 11:23:11,131 ReadShardBalancer$1 - Loading BAM index data for next contig
    INFO 11:23:11,139 ReadShardBalancer$1 - Done loading BAM index data for next contig
    INFO 11:23:17,182 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
    java.lang.NullPointerException
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.SingleSampleCompressor.closeVariantRegions(SingleSampleCompressor.java:83)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.closeVariantRegionsInAllSamples(MultiSampleCompressor.java:94)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.MultiSampleCompressor.addAlignment(MultiSampleCompressor.java:76)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReadsStash.compress(ReduceReadsStash.java:67)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:387)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:87)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:226)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:215)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:254)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:219)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:91)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:55)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:83)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:281)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:237)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:147)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 2.3-5-g49ed93c):
    ##### ERROR
    ##### ERROR Please visit the wiki to see if this is a known problem
    ##### ERROR If not, please post the error, with stack trace, to the GATK forum
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Code exception (see stack trace for error itself)
    ##### ERROR ------------------------------------------------------------------------------------------
  • tvatanentvatanen Posts: 3Member

    I am trying to run ReduceReads in cancer mode with -nw parameter. I get the following problem, related to sort order

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: Alignments added out of order in SAMFileWriterImpl.addAlignment for /media/ephemeral0/testdata/work/tumor_bqsr_sorted2.reduced.bam. Sort order is coordinate. Offending records are at [17:39191] and [17:39174]
    at net.sf.samtools.SAMFileWriterImpl.assertPresorted(SAMFileWriterImpl.java:177)
    at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:167)
    at org.broadinstitute.sting.utils.sam.NWaySAMFileWriter.addAlignment(NWaySAMFileWriter.java:167)
    at org.broadinstitute.sting.utils.sam.BySampleSAMFileWriter.addAlignment(BySampleSAMFileWriter.java:68)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.outputRead(ReduceReads.java:708)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:478)
    at org.broadinstitute.sting.gatk.walkers.compression.reducereads.ReduceReads.reduce(ReduceReads.java:113)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:251)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano$TraverseReadsReduce.apply(TraverseReadsNano.java:240)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.executeSingleThreaded(NanoScheduler.java:279)
    at org.broadinstitute.sting.utils.nanoScheduler.NanoScheduler.execute(NanoScheduler.java:245)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:102)
    at org.broadinstitute.sting.gatk.traversals.TraverseReadsNano.traverse(TraverseReadsNano.java:56)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:108)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:311)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:245)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:152)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:91)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 2.6-4-g3e5ff60):
    ERROR
    ERROR Please check the documentation guide to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR MESSAGE: Alignments added out of order in SAMFileWriterImpl.addAlignment for /media/ephemeral0/testdata/work/tumor_bqsr_sorted2.reduced.bam. Sort order is coordinate. Offending records are at [17:39191] and [17:39174]
    ERROR ------------------------------------------------------------------------------------------

    I have tried sorting the tumor file with both, samtools sort and picard SortSam.jar but the problem persists.

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,821Administrator, GATK Developer admin

    Hi @tvatanen, could you please upload a snippet file to our FTP so that we can try to reproduce this error locally?

    Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Geraldine Van der Auwera, PhD

  • tvatanentvatanen Posts: 3Member

    @Geraldine_VdAuwera said:
    Hi tvatanen, could you please upload a snippet file to our FTP so that we can try to reproduce this error locally?

    Instructions are here: http://www.broadinstitute.org/gatk/guide/article?id=1894

    Thanks for the quick reply! I have uploaded the bug report to your ftp site. The file name is bug_report_tvatanen.tar.gz and all the required information is contained. Thank you!

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,821Administrator, GATK Developer admin

    Thanks for the test files, we'll have a look at them and let you know what we find in this thread.

    Geraldine Van der Auwera, PhD

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 5,821Administrator, GATK Developer admin
    edited July 2013

    Update: we have a fix for this issue. It is available in the latest nightly build and will be in the next official release.

    Post edited by Geraldine_VdAuwera on

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.