Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We encourage our fourm members to be more involved, jump in and help out your fellow researchers with their questions. GATK forum is a community forum and helping each other with using GATK tools and research is the cornerstone of our success as a genomics research community.We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

About population indel calling

Hi,
Recently I have 2444 samples to perform population INDEL calling. The samples are all whole-exome sequecing with average depth ~45X. I called these 2444 samples simultaneously using 2444 reduced BAMs, and used 4 CPU threads. Here is my script:
**java -Xms2g -Xmx8g -jar /path/GenomeAnalysisTK-2.2-8-gec077cd/GenomeAnalysisTK.jar \
-R /path/hg19/ucsc.hg19.fasta \
-T UnifiedGenotyper -nct 4 \
-I /path/reduced.bam.list --dbsnp dbsnp_135.hg19.vcf \
-o chr7.raw.vcf -stand_call_conf 30.0 -dcov 600 \
-glm INDEL -A HomopolymerRun -A InbreedingCoeff -A IndelType -l INFO \
-L chr7_ex200.bed **

It is too slow. For example, there is 6096403 bp region in chr7_ex200.bed. It costs about 7days.
INFO 13:46:13,480 ProgressMeter - chr7:48559385 4.34e+06 48.3 h 11.1 h 28.6% 7.0 d 5.0 d
INFO 13:47:19,172 ProgressMeter - chr7:48619576 4.34e+06 48.4 h 11.1 h 28.6% 7.0 d 5.0 d
INFO 13:48:26,270 ProgressMeter - chr7:48655039 4.35e+06 48.4 h 11.1 h 28.7% 7.0 d 5.0 d
INFO 13:49:41,206 ProgressMeter - chr7:48684363 4.35e+06 48.4 h 11.1 h 28.7% 7.0 d 5.0 d
.....

How could I speed up my population indel calling? Could someone give me some suggestion?
Thanks in advance.

Best Answers

Answers

  • ebanksebanks Broad InstituteMember, Broadie, Dev ✭✭✭✭

    We are currently actively working on improving the run time of the indel calling capabilities of the Unified Genotyper. These improvements should be available in the next release of the GATK (version 2.4). In the meantime, you can run your jobs in parallel (either through multi-threading or "scatter-gather").

  • WangxbWangxb Member

    Thanks. Yes, I used multi-threading -nct 4. Maybe I should set nct to more.
    My colleague suggest that I merge 2444 bams to less than 1000 bams or less, using picard-tools-1.77/MergeSamFiles.jar (because of limitation of machine: at most open 1024 files simultaneously). That is, per 3 bams or more were merged to 1 bam. Then I set -L to smaller intervals, and run all jobs in parallel in different computer nodes.
    Does this method work well for indel calling?

    When using picard-tools-1.77/MergeSamFiles.jar to merge reduced BAMs, there are errors.

    Exception in thread "main" net.sf.samtools.SAMException: Exception when processing alignment for BAM index 816 2/2 4b aligned read.

    at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:120)
    at net.sf.samtools.SAMFileWriterImpl.addAlignment(SAMFileWriterImpl.java:168)
    at net.sf.picard.sam.MergeSamFiles.doWork(MergeSamFiles.java:149)
    at net.sf.picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:177)
    at net.sf.picard.sam.MergeSamFiles.main(MergeSamFiles.java:79)
    Caused by: net.sf.samtools.SAMException: Exception creating BAM index for record 816 2/2 4b aligned read.
    at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:94)
    at net.sf.samtools.BAMFileWriter.writeAlignment(BAMFileWriter.java:117)
    ... 4 more
    Caused by: net.sf.samtools.SAMFormatException: SAM validation error: ERROR: Read name 816, No real operator (M|I|D|N) in CIGAR
    at net.sf.samtools.SAMUtils.processValidationErrors(SAMUtils.java:448)
    at net.sf.samtools.BAMRecord.getCigar(BAMRecord.java:247)
    at net.sf.samtools.SAMRecord.getAlignmentEnd(SAMRecord.java:456)
    at net.sf.samtools.BAMIndexer$BAMIndexBuilder.processAlignment(BAMIndexer.java:258)
    at net.sf.samtools.BAMIndexer.processAlignment(BAMIndexer.java:92)
    ... 5 more

    Maybe picardtools cannot be applied to Reduced Bams that were generated by GATK ReduceReads. How can I deal with that ? Is there some tool to merge Reduced bams?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
    Accepted Answer

    Hi there,
    You can use PrintReads to merge the reduced BAMs.

  • WangxbWangxb Member

    @Wangxb said:
    Thanks. Yes, I used multi-threading -nct 4. Maybe I should set nct to more.
    My colleague suggest that I merge 2444 bams to less than 1000 bams or less. (because of limitation of machine: at most open 1024 files simultaneously). That is, per 3 bams or more were merged to 1 bam. Then I set -L to smaller intervals, and run all jobs in parallel in different computer nodes.
    Does this method work well for indel calling?

    @Geraldine_VdAuwera said:
    Hi there,
    You can use PrintReads to merge the reduced BAMs.

    I used PrintReads to merge per 10 reduced BAMs to 1 BAM. There are correct RG id and SM id in BAMs. Then I set -L to smaller intervals, and run all jobs in parallel in different computer nodes.
    Does this method work well for indel calling? Does this method affect the INDEL result for each sample?

Sign In or Register to comment.