Notice:
If you happen to see a question you know the answer to, please do chime in and help your fellow community members. We appreciate your help!

Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Ask your questions about GenomeSTRiP here

Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin
edited January 2013 in GenomeSTRiP

The authors of GenomeSTRiP will respond to questions posed in this section. You can also comment on the articles in the "GenomeSTRiP Documentation" subcategory.

Please don't post your question as a new comment on this announcement; you can create a new question by using the big "Ask a Question" button on the left.

Post edited by Geraldine_VdAuwera on
Tagged:

Comments

  • Hi,

    Have been able to install and run GenomeSTRip on the test dataset without any trouble.

    Now with my data it is able to run and generate the .vcf file. However, when i run the genotyping step after the discovery step without alternate allele alignment step "it" gives many errors (Left read of read pair fails left read test) like the one below. Could you please let me know why i get this error and how i can "fix" it?

    #DBG: HWI-ST1335:42:D1D8BACXX:1:1102:11203:59394 161 scaffold_145 112988 0 100M scaffold_145 159961 47073 GAAGCTCCAGGGAGAGCTGAGAGCCCCTTGCAGGGCCTAAAGGGGCTCCAGGAGAGCTGCAGAGGGACTGGGGCCAAGGCATGGAGGGACAGGAGCCGGG CC[email protected]ABDDDD?:;?AABD<[email protected]@[email protected]@B X0:i:2 X1:i:6 XA:Z:scaffold_145,-132868,100M,3;scaffold_77,+1515879,100M,4;scaffold_77,-1228410,100M,4;scaffold_77,+1650080,100M,4;scaffold_77,+1574087,100M,4;scaffold_77,+1569108,100M,4;scaffold_77,+1525171,100M,4; MD:Z:94C0A0A3 RG:Z:322 XG:i:0 AM:i:0 NM:i:3 SM:i:0 XM:i:3 XO:i:0 XT:A:R#DBG: HWI-ST1335:42:D1D8BACXX:1:1102:11203:59394 81 scaffold_145 159961 37 100M scaffold_145 112988 -47073 AGCCTGGCACAGTGGGAGGTGTCCCTGCCATGGCAGGGCTGGCACTGGAGGGGCTCTGAGGTCCCTTCCCACCCAGCCATCCCGGGATTCTCAGTGCCAG BBBCDCCC:A<[email protected]@EEECFFFFHHHJIHGIJIIIHCHGDGGIED:[email protected] X0:i:1 X1:i:0 MD:Z:100 RG:Z:322 XG:i:0 AM:i:0 NM:i:0 SM:i:37 XM:i:0 XO:i:0 XT:A:U
    INFO 15:32:44,064 GATKRunReport - Aggregating data for run report
    [GC 349113K->15984K(1042048K), 0.0877790 secs]
    [Full GC 15984K->15697K(1042048K), 0.1454470 secs]

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    java.lang.IllegalArgumentException: Left read of read pair fails left read test: HWI-ST1335:42:D1D8BACXX:1:1102:11203:59394 161 scaffold_145 112988 0 100M scaffold_145 159961 47073GAAGCTCCAGGGAGAGCTGAGAGCCCCTTGCAGGGCCTAAAGGGGCTCCAGGAGAGCTGCAGAGGGACTGGGGCCAAGGCATGGAGGGACAGGAGCCGGG CC[email protected]ABDDDD?:;?AABD<[email protected]@[email protected]@BX0:i:2 X1:i:6 XA:Z:scaffold_145,-132868,100M,3;scaffold_77,+1515879,100M,4;scaffold_77,-1228410,100M,4;scaffold_77,+1650080,100M,4;scaffold_77,+1574087,100M,4;scaffold_77,+1569108,100M,4;scaffold_77,+1525171,100M,4; MD:Z:94C0A0A3 RG:Z:322 XG:i:0 AM:i:0 NM:i:3 SM:i:0 XM:i:3 XO:i:0 XT:A:R
    at org.broadinstitute.sv.util.ReadPair.create(ReadPair.java:135)
    at org.broadinstitute.sv.discovery.ReadPairRecordFilter.createReadPair(ReadPairRecordFilter.java:228)
    at org.broadinstitute.sv.discovery.ReadPairRecordFilter.generateReadPairs(ReadPairRecordFilter.java:149)
    at org.broadinstitute.sv.discovery.ReadPairRecordFilter.filterReadPairs(ReadPairRecordFilter.java:80)
    at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.finishReadPairSelection(DeletionDiscoveryAlgorithm.java:206)
    at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:156)
    at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:150)
    at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:43)
    at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:75)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:217)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:111)
    at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:110)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:239)
    at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:72)
    at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 1.0.5039M):
    ERROR
    ERROR Please visit to wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ERROR
    ERROR MESSAGE: Left read of read pair fails left read test: HWI-ST1335:42:D1D8BACXX:1:1102:11203:59394 161 scaffold_145 112988 0 100M scaffold_145 159961 47073 GAAGCTCCAGGGAGAGCTGAGAGCCCCTTGCAGGGCCTAAAGGGGCTCCAGGAGAGCTGCAGAGGGACTGGGGCCAAGGCATGGx

    AGGGACAGGAGCCGGG
    CC[email protected]ABDDDD?:;?AABD<[email protected]@[email protected]@B X0:i:2 X1:i:6 XA:Z:scaffold_145,-132868,100M,3;scaffold_77,+1515879,100M,4;scaffold_77,-1228410,100M,4;scaffold_77,+1650080,100M,4;scaffold_77,+1574087,100M,4;scaffold_77,+1569108,100M,4;scaffold_77,+1525171,100M,4; MD:Z:94C0A0A3 RG:Z:322 XG:i:0 AM:i:0 NM:i:3 SM:i:0 XM:i:3 XO:i:0 XT:A:R

    ERROR ------------------------------------------------------------------------------------------

    `

  • Hi, I'm trying to use Genome STRiP to run some Daphnia data. I've tested the code successfully. I have two groups of alignment result, one is from BWA and another is from MOSAIK. There is no error to run GenomeSTRiP with BWA reuslts. However, when I use the alignment result from MOSAIK, there are some errors:

    ERROR ------------------------------------------------------------------------------------------
    ERROR stack trace

    @DDDDDH8AFF<;AC<F?FGGGEGGD?<FEHDGGGGDA6)[email protected](5'[email protected]:?A.7?>@C;;;6;.;=83;?5:5>>@(,;5::;@((443+::((+444:443>>>(+44+83&)+++4(4(+4(:(+3+(28(0 ZA:Z:<@;35;;;1;;><&;37;;;1;1S115M3I5M1D7M2D4M1I11M1S;77T11A20A1C0A9^G1G2G2^CC2T2C0C2G4G1> MD:Z:57T19T11A16C26G4G6G1 RG:Z:A28_ID NM:i:7}, {M00483:4:000000000-A0T2U:1:12:12111:14005 129 scaffold_15 846121 37 1S115M3I5M1D7M2D4M1I11M1S scaffold_15 846121 146 CTACATATGACAGTTCACTAAAGGGAAATAATGCACCGCCTATTGTTTTAGATCACGGTAACATTTCCATAATTAAGTCGGATTGTAATTCATGAATTCATCAATTTCTTAGTTTAGCCATCAAGTTCCTAAAGGTTAGTCTTGGATT +:?DDFFDFHHHHIGHCIIJJGJIDCCGHEIIIIGFHDGIIFDEFDFHIGGGEGEG<[email protected]>C(=A?BAC>>B3;@>;@CCC3>;:@>;[email protected]:[email protected]@>::@CD:(+>@CC>;[email protected]@:4++:(:((+42>((444:(: ZA:Z:<&;35;;;1;1S147M;57T19T11A16C26G4G6G1><@;37;;;1;;> MD:Z:77T11A20A1C0A9^G1G2G2^CC2T2C0C2G4G1 RG:Z:A28_ID NM:i:19} ]
    at org.broadinstitute.sv.discovery.ReadPairRecordFilter.generateReadPairs(ReadPairRecordFilter.java:217)
    at org.broadinstitute.sv.discovery.ReadPairRecordFilter.filterReadPairs(ReadPairRecordFilter.java:97)
    at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.finishReadPairSelection(DeletionDiscoveryAlgorithm.java:236)
    at org.broadinstitute.sv.discovery.DeletionDiscoveryAlgorithm.runDiscovery(DeletionDiscoveryAlgorithm.java:185)
    at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:174)
    at org.broadinstitute.sv.discovery.SVDiscoveryWalker.onTraversalDone(SVDiscoveryWalker.java:45)
    at org.broadinstitute.sting.gatk.executive.Accumulator$StandardAccumulator.finishTraversal(Accumulator.java:129)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:76)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:234)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:113)
    at org.broadinstitute.sv.main.SVCommandLine.execute(SVCommandLine.java:105)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:221)
    at org.broadinstitute.sv.main.SVCommandLine.main(SVCommandLine.java:67)
    at org.broadinstitute.sv.main.SVDiscovery.main(SVDiscovery.java:21)

    ERROR ------------------------------------------------------------------------------------------
    ERROR A GATK RUNTIME ERROR has occurred (version 1.0-6121-g40e3165):
    ERROR
    ERROR Please visit the wiki to see if this is a known problem
    ERROR If not, please post the error, with stack trace, to the GATK forum
    ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ERROR
    @DDDDDH8AFF<;AC<F?FGGGEGGD?<FEHDGGGGDA6)[email protected](5'[email protected]:?A.7?>@C;;;6;.;=83;?5:5>>@(,;5::;@((443+::((+444:443>>>(+44+83&)+++4(4(+4(:(+3+(28(0 ZA:Z:<@;35;;;1;;><&;37;;;1;1S115M3I5M1D7M2D4M1I11M1S;77T11A20A1C0A9^G1G2G2^CC2T2C0C2G4G1> MD:Z:57T19T11A16C26G4G6G1 RG:Z:A28_ID NM:i:7}, {M00483:4:000000000-A0T2U:1:12:12111:14005 129 scaffold_15 846121 37 1S115M3I5M1D7M2D4M1I11M1S scaffold_15 846121 146 CTACATATGACAGTTCACTAAAGGGAAATAATGCACCGCCTATTGTTTTAGATCACGGTAACATTTCCATAATTAAGTCGGATTGTAATTCATGAATTCATCAATTTCTTAGTTTAGCCATCAAGTTCCTAAAGGTTAGTCTTGGATT +:?DDFFDFHHHHIGHCIIJJGJIDCCGHEIIIIGFHDGIIFDEFDFHIGGGEGEG<[email protected]>C(=A?BAC>>B3;@>;@CCC3>;:@>;[email protected]:[email protected]@>::@CD:(+>@CC>;[email protected]@:4++:(:((+42>((444:(: ZA:Z:<&;35;;;1;1S147M;57T19T11A16C26G4G6G1><@;37;;;1;;> MD:Z:77T11A20A1C0A9^G1G2G2^CC2T2C0C2G4G1 RG:Z:A28_ID NM:i:19} ]
    ERROR ------------------------------------------------------------------------------------------
    at org.broadinstitute.sting.queue.util.ShellJob.run(ShellJob.scala:24)
    at org.broadinstitute.sting.queue.engine.shell.ShellJobRunner.start(ShellJobRunner.scala:54)
    at org.broadinstitute.sting.queue.engine.FunctionEdge.start(FunctionEdge.scala:56)
    at org.broadinstitute.sting.queue.engine.QGraph.runJobs(QGraph.scala:383)
    at org.broadinstitute.sting.queue.engine.QGraph.run(QGraph.scala:123)
    at org.broadinstitute.sting.queue.QCommandLine.execute(QCommandLine.scala:111)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:221)
    at org.broadinstitute.sting.queue.QCommandLine$.main(QCommandLine.scala:57)
    at org.broadinstitute.sting.queue.QCommandLine.main(QCommandLine.scala)
    

    I've used picard's tool to remove the replicates. So could you help me figure out the problems. Thanks!

  • bob_twbob_tw
    edited February 2013

    I have installed GenomeSTRiP successfully - discovery.sh and genotyping.sh ran without errors.
    But how can I start analyzing my bam files?
    I can't find any README or use manual.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    One of the best sources right now is the presentation from the GATK Workshop in December, 2012, which can be found here:
    https://www.dropbox.com/sh/0puyz82ecswm4ig/V2fW--1ZFS

    You can also use the installtest as a guide, but be aware that the installtest uses certain options that are not recommended for full-scale production use. I believe these are noted in the workshop presentation.

  • jfarrelljfarrell Member ✭✭

    How large a batch of bam files would be recommended for 40x coverage with the latest version of GenomeSTRiP? Presently we would like to run this on about 65 bam files initially. Is the recommended best practice to run on all of them or split them into two smaller groups for discovery to keep the coverage below 2000x for a run?

  • jl14jl14 Wellcome Trust Sanger InstituteMember

    Hi

    We are using GenomeStrip to call structural variants across a low-coverage case-control cohort of total size about 8000. Each of our BAM files contains data for ~100 samples for a single chromosome. On the basis of some other discussions in this forum, we are running the SV discovery and genotyping processes separately on each chromosome, but estimating insert size distributions from the whole genome.

    However, some of our sample-library pairs are problematic, and it will be difficult (with regards to memory resources) to edit the BAM files. Is it possible to run GenomeStrip - particularly the discovery and genotyping phases - on only some of these samples? Essentially, we are looking for a way to submit our BAM files in their current format along with a list of samples specifying whether they are to be included or discarded.

    Thanks in advance for any help.

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    There is an argument
    -excludeReadGroup file.list
    that you can use to exclude one or more read groups from different analysis steps. This can be applied during preprocessing, or during discovery or genotyping. Assuming each sample is in a separate read group (or set of read groups), you should be able to use this.

    The file must be named with a .list extension. There should be one read group ID (value of the @RG ID field) per line.

Sign In or Register to comment.