GenomeSTRiP not giving genotypes for input bam files

I am trying to run GenomeSTRiP on whole genome sequencing data. I am able to run the preprocessing and discovery queue scripts successfully: the metadata folder is filled and I get a discovery vcf file with detected deletion regions. However, when I run the genotyping queue script, I get no errors, but I do not get any genotype information. Instead, I get a vcf file that is missing the format column as well as the columns that should have genotype information for the submitted samples (it looks identical to the input vcf file that I got from the discovery step). Any idea why this is happening? The relevant part of my perl code is below.

my $svDir = "$curDir/svtoolkit";
my $classpath="$svDir/lib/SVToolkit.jar:$svDir/lib/gatk/GenomeAnalysisTK.jar:$svDir/lib/gatk/Queue.jar";
my $setenv = "export LD_LIBRARY_PATH=$svDir/bwa/:\${LD_LIBRARY_PATH}; export PATH=$svDir/bwa/:\${PATH}; export SV_DIR=$svDir";
my $cmd = "$setenv; java -cp $classpath -Xmx4g -jar $svDir/lib/SVToolkit.jar";
print "$cmd\n"; print $cmd;

my $tmpDir = "$curDir/tmp";
my $bamslist = "$curDir/IDSsmall.txt";
my $gtDir = "$curDir/GT";
my $cmd3 = "java -Xmx4g -cp $classpath org.broadinstitute.sting.queue.QCommandLine -cp $classpath -S $svDir/qscript/SVQScript.q -S $svDir/qscript/SVGenotyper.q -md $curDir/metadata -configFile $svDir/conf/genstrip_parameters.txt -tempDir $tmpDir -gatk $svDir/lib/gatk/GenomeAnalysisTK.jar -R $curDir/hs37d5.bwa58.fa -genomeMaskFile $svDir/mask/svmask.wholeGenome.bwa58.fa -runDirectory $gtDir -vcf $curDir/discovery/chr1.discovery.vcf -I $bamslist -O $gtDir/chr1.deletions.genotypes.vcf -jobQueue NGTjob -jobProject NGT -jobLogDir $gtDir/joblogs -parallelJobs 100 -run";
print "$cmd3\n";$cmd3;

Answers

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Can you tell me which version you are running?

    It would help to see what output was produced.
    You can also do a run without -run to do a dry run to see what commands will be executed.
    You could also try a dry run without -parallelJobs in case there's a bug in the parallelization.
    How many records are in your input vcf?

  • srashkinsrashkin Member

    I'm running version 1.04 (build 1068). The input vcf file has ~1000 regions, and I am only looking at 30 individuals (I was trying to get this working on a subsample of individuals on one chromosome before attempting all individuals for the whole genome).

    The only commandline output I get is the expanded command I submitted (for the original command, one without -run, and one without the parallelization). I can post that if you think it will be helpful, though.

    It does look like when I removed the parallelization part that it seems to be working. So I guess something was going wrong there, though I'm not entirely sure what since the same code was working for a coworker.

  • skashinskashin Member ✭✭

    I tried to specify "-parallelJobs", and it works correctly for me. So it would be good to see the actual command generated by your perl script.
    Also, what batch system do you intend to use? I don't see either "-bsub" or "-qsub" in your command

Sign In or Register to comment.