To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at

Reduced Representation

rdubinrdubin Albert Einstein College of MedicineMember

Is it possible to use Genome Strip (preprocess, discovery, genotyper) on reduced representation data? In this case, genomic DNA was restricted with PacI, a rare cutter, and for each of 16 samples, the restricted DNA was run on a gel and a specific size range was excised for each sample and purified; it was this gel-excised DNA that was used for library construction and sequencing. If it's possible to use Genome Strip on such data, could you please tell me how to set -P input.genomeSize, -P input.genomeSizeMale, and -P input.genomeSizeFemale and how to set the regions that we wish to examine. We have over 3000 specific regions that were selected and that we wish to examine, and I know the total size of these regions; should I be using the sum of these regions in the input.genomeSize parameters? The regions are on all of the major chromosomes. However, when I provide these 3000 regions to discovery (using the -L parameter pointing to a file containing the 3000 regions), the module fails during MergeDiscoveryOutput, apparently due to running out of memory (I imagine discovery cannot open 3000 vcf files at once). (Note that I also tried to use -L with multiple regions during Preprocessing, and this fails too.) Any assistance would be greatly appreciated.

Sign In or Register to comment.