It looks like you're new here. If you want to get involved, click one of these buttons!
I have read on your recent slides for "Data Compression with Reduce Reads" that "Tumor and Normal samples (or any set of samples) get co-‐reduced, meaning that every variable region triggered by one sample will be forced in every sample."
I have data from 4 variant strains of an organism, my samples in RG info, and 4 individuals for each strain, my libraries in RG info. Currently I have a bam file for each of the 16 different libraries.
If I want to run ReduceReads as I have quite high coverage, but preserve information across all of my samples where a site is not consensus in just one as there is no snp information available for this organism and I don't want to lose any important data. Should I merge all bam files for all samples before proceeding with ReduceReads with downsampling turned off? Or just leave out ReduceReads?
Thanks Anna
Carneiro
Posts: 187 admin
It depends on how you want your output to look like. If you want a reduce bam for each sample, you need to provide different sample names in the read group tag of each sample (if I understood correctly you have 4 samples), and use the hidden --nwayout (or -nw) parameter to reduce reads. It's hidden because it's not yet documented.
If all you want is one bam file with all your data, then you can just run them without the -nw all together in one ReduceReads run. You can do that by either providing 16 -I arguments, or create a text file with a line for the path to each file and provide that to a -I argument. This will give you one co-reduced bam file.
Answers
Hi Mauricio, thanks for your quick reply. I would ideally like to keep the bam files for the 16 libraries separate, but I'm afraid I'm not having much luck witht eh -nw command, can you let me know where I am going wrong.
1a) input file contains tab separated pairs of input and output bam file full paths, one pair per line. extension .map
java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads -I bamForReduceReads.map -R data/abH97.fa -nwError:
Invalid command line: The GATK reads argument (-I, --input_file) supports only BAM files with the .bam extension and lists of BAM files with the .list extension, but the file bamForReduceReads.map has neither extension. Please ensure that your BAM file or list of BAM files is in the correct format, update the extension, and try again.1b) As above, but named input file with .list extension
java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads -I bamForReduceReads.list -R data/abH97.fa -nwError:
Couldn't read file 1.realigned.bam 1.realigned.reduced.bam because java.io.FileNotFoundException: 1.realigned.bam 1.realigned.reduced.bam (No such file or directory)2a) input file contains full path to bam files, one on each line, output is one bam file
java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads -I bamForReduceReads.list -R data/abH97.fa --nwayout -o reduced.bamError:
The same sample appears in multiple files, this input cannot be multiplexed using the BySampleSAMFileWriter, try NWaySAMFileWriter instead.2b) as above but try specifying nwayout differently, as it is in IndelRealigner documentation
java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads -I bamForReduceReads.list -R data/abH97.fa --nWayOut -o reduced.bamError:
Argument with name 'nWayOut' isn't defined.3) input file contains full path to bam files, one on each line. output file contains full path to output bam files, one on each line
java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads -I bamForReduceReads.list -R data/abH97.fa -nw -o outBamForReduceReads.listError:
The same sample appears in multiple files, this input cannot be multiplexed using the BySampleSAMFileWriter, try NWaySAMFileWriter instead.please let me know where I am going wrong? Has this functionality been added in v2.3-6 ? Thanks Anna
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi Anna,
The key info here is the error message telling you
If you want to use -nwayout, the sample names HAVE TO BE different. Co-reducing files with the same sample name will result in one big reduced file, the gatk doesn't differentiate them.
Geraldine Van der Auwera, PhD
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Hi, I reheadered so that I now have 16 unique sample names, but am still having problems.
java -jar /software/additional/GenomeAnalysisTK-2.3-5-g49ed93c/GenomeAnalysisTK.jar -T ReduceReads -I bamForReduceReads.list -R data/abH97.fa --nwayout -o reduced.bam- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •Thanks Geraldine!
- Spam
- Abuse
- Troll
0 • Off Topic Disagree Agree Like WTF •