Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

Error Message running Discovery-Input file is not sorted by start position

jfarrelljfarrell Member ✭✭

During the very last job of 3000+ jobs submitted with the Discovery qscript, the following error occurred running the VariantFiltration tool.

ERROR MESSAGE: Input file is not sorted by start position.
ERROR We saw a record with a start of 1:56000020 after a record with a start of 1:56000025, for input source: /auto/nfs-archive/ifs/noreplica/project/genpro/archive/adsp/GenomeSTRiP/svtoolkit/adsp/run.Pilot37/adsp.del.sites.unfilte

red.vcf which looks like it was generated during the creation of the tribble index.

When the 1000s of discovery vcf files were concatenated, these 2 calls ended up out of the correct order in adsp.del.sites.unfiltered.vcf.

1       56000025        DEL_P0056_1278 
1       56000020        DEL_P0057_1

The 56000025 deletion was the last one found in the P0056 vcf file and the 56000020 the first in P0056 vcf file.

Does a sort step need to be added to the discovery qscript or should these type of out of order calls not happen?

Best Answer

Answers

  • jfarrelljfarrell Member ✭✭

    Thanks. I dropped one of the deletions, restarted the discovery and the last job to filter the vcf completed fine.

    Below are the parameters from the logs for those two vcf files.

    grep P0056 *.out
    SVDiscovery-16.out:INFO  16:03:29,736 HelpFormatter - Program Args: -T SVDiscoveryWalker -R /archive/not-replicated/project/genpro/archive/adsp/ref/Homo_sapiens_assembly19.fasta -O /auto/nfs-archive/ifs/noreplica/project/genpro/archive/adsp/GenomeSTRiP/svtoolkit/adsp/run.Pilot37/P0056.discovery.vcf -disableGATKTraversal true -md run.Pilot37/metadata -configFile conf/genstrip_parameters.txt -P pairs.uniquifyReadNames:true -runDirectory run.Pilot37 -genderMapFile adsp_gender.map -genomeMaskFile /usr3/bustaff/farrell/adsp/GenomeSTRiP/svtoolkit/genomeMaskFile/Homo_sapiens_assembly19.mask.101.fasta -partitionName P0056 -runFilePrefix P0056 -L 1:54990001-56110001 -searchLocus 1:55000001-56000000 -searchWindow 1:54990001-56110001 -searchMinimumSize 100 -searchMaximumSize 100000
    
    grep P0057 *.out
    SVDiscovery-19.out:INFO  16:03:05,107 HelpFormatter - Program Args: -T SVDiscoveryWalker -R /archive/not-replicated/project/genpro/archive/adsp/ref/Homo_sapiens_assembly19.fasta -O /auto/nfs-archive/ifs/noreplica/project/genpro/archive/adsp/GenomeSTRiP/svtoolkit/adsp/run.Pilot37/P0057.discovery.vcf -disableGATKTraversal true -md run.Pilot37/metadata -configFile conf/genstrip_parameters.txt -P pairs.uniquifyReadNames:true -runDirectory run.Pilot37 -genderMapFile adsp_gender.map -genomeMaskFile /usr3/bustaff/farrell/adsp/GenomeSTRiP/svtoolkit/genomeMaskFile/Homo_sapiens_assembly19.mask.101.fasta -partitionName P0057 -runFilePrefix P0057 -L 1:55990001-57110001 -searchLocus 1:56000001-57000000 -searchWindow 1:55990001-57110001 -searchMinimumSize 100 -searchMaximumSize 100000
    
  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Thanks. I will look into this. It is a bug. I think P0056 should not have emitted that call since the start coordinate is outside of the search locus for that partition.

  • jfarrelljfarrell Member ✭✭

    Hi Bob,

    Any luck with this bug? I will be running this on a larger set of bam files and just checking if there was a fix for this yet.

    John

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Hi, John,

    I know it's been a while, but do you happen to have the full VCF record for DEL_P0056_1278 ?

    Bob

  • jfarrelljfarrell Member ✭✭

    Bob,

    Here is the full vcf record....

    Thanks for looking into it,

    John

    P0064.genotypes.vcf:1 56000025 DEL_P0056_1278 T <DEL> . DEPTH;DEPTHPVAL;PAIRSPERSAMPLE CIEND=-31,32;CIPOS=-31,32;END=56000329;GSCOHERENCE=-2.3996147593919215;GSCOHFN=-1.1998073796959607;GSCOHPVALUE=0.3055;GSCOORDS=55999891,55999991,56000427,56000561;GSDEPTHCALLTHRESHOLD=0.9379948496312903;GSDEPTHNOBSSAMPLES=2;GSDEPTHNTOTALSAMPLES=2;GSDEPTHOBSSAMPLES=A-CUHS-CU001885-BL-COL-25032BL1,A-CUHS-CU001886-BL-COL-41219BL1;GSDEPTHPVALUE=0.057632;GSDEPTHPVALUECOUNTS=131,869335835,2992,16702246700;GSDEPTHRANKSUMPVALUE=NA;GSDEPTHRATIO=0.8411956505801403;GSDMAX=535;GSDMIN=1;GSDOPT=303;GSDSPAN=435;GSELENGTH=241;GSMEMBNPAIRS=2;GSMEMBNSAMPLES=2;GSMEMBOBSSAMPLES=A-CUHS-CU001886-BL-COL-41219BL1,A-CUHS-CU001885-BL-COL-25032BL1;GSMEMBPVALUE=0.296;GSMEMBSTATISTIC=37.780641222368345;GSNDEPTHCALLS=6;GSNPAIRS=2;GSNSAMPLES=2;GSOUTLEFT=0;GSOUTLIERS=0;GSOUTRIGHT=0;GSREADGROUPS=H0TGC.2,D27B5.6;[email protected],[email protected];GSSAMPLES=A-CUHS-CU001886-BL-COL-41219BL1,A-CUHS-CU001885-BL-COL-25032BL1;IMPRECISE;SVLEN=-303;SVTYPE=DEL GT:FT:GL:GL0:GQ 0/0:PASS:-0.00,-6.04,-70.27:-0.00,-3.47,-67.71:60 0/0:PASS:-0.00,-6.68,-73.14:-0.00,-4.11,-70.57:67 0/0:PASS:-0.00,-5.39,-65.92:-0.00,-2.82,-63.36:54 0/0:PASS:-0.00,-8.18,-90.60:-0.00,-5.62,-88.03:82 0/0:PASS:-0.00,-6.15,-69.43:-0.00,-3.58,-66.86:61 0/0:PASS:-0.00,-6.72,-74.18:-0.00,-4.15,-71.61:67 0/0:PASS:-0.00,-3.49,-37.02:-0.05,-0.97,-34.50:35 0/0:PASS:-0.00,-5.02,-91.83:-0.00,-2.45,-89.26:50 0/0:PASS:-0.00,-11.32,-116.86:-0.00,-8.75,-114.29:99 0/0:PASS:-0.00,-7.85,-85.21:-0.00,-5.28,-82.64:78 0/0:PASS:-0.00,-7.66,-77.88:-0.00,-5.09,-75.31:77 0/0:PASS:-0.01,-1.77,-20.76:-0.87,-0.06,-19.06:18 0/0:PASS:-0.00,-5.24,-53.96:-0.00,-2.68,-51.40:52 0/0:PASS:-0.00,-4.10,-46.42:-0.01,-1.54,-43.87:41 0/0:PASS:-0.00,-3.96,-44.74:-0.02,-1.41,-42.19:40 0/0:LowQual:-0.20,-0.43,-18.43:-2.35,-0.00,-18.01:2 0/0:PASS:-0.00,-2.58,-49.06:-0.30,-0.31,-46.79:26 0/0:PASS:-0.00,-3.38,-51.12:-0.06,-0.87,-48.62:34 0/0:PASS:-0.00,-8.94,-93.92:-0.00,-6.37,-91.35:89 0/0:PASS:-0.00,-5.55,-64.64:-0.00,-2.98,-62.07:56 0/0:PASS:-0.00,-5.56,-78.55:-0.00,-3.00,-75.99:56 0/0:PASS:-0.00,-4.34,-48.75:-0.01,-1.78,-46.19:43 0/0:PASS:-0.00,-4.74,-53.78:-0.00,-2.18,-51.21:47 0/0:PASS:-0.00,-3.66,-39.65:-0.03,-1.12,-37.12:37 0/0:PASS:-0.00,-8.45,-88.81:-0.00,-5.88,-86.24:84 0/0:PASS:-0.00,-4.90,-53.98:-0.00,-2.33,-51.41:49 0/0:PASS:-0.00,-2.39,-27.33:-0.40,-0.22,-25.16:24 0/0:PASS:-0.00,-8.68,-93.43:-0.00,-6.11,-90.87:87 0/0:PASS:-0.00,-3.47,-39.31:-0.05,-0.95,-36.80:35 0/0:PASS:-0.00,-4.73,-54.49:-0.00,-2.17,-51.93:47 0/0:PASS:-0.02,-1.32,-18.56:-1.29,-0.02,-17.26:13 0/0:PASS:-0.00,-4.03,-47.65:-0.01,-1.48,-45.10:40 0/0:PASS:-0.00,-8.27,-92.68:-0.00,-5.70,-90.11:83 0/0:PASS:-0.00,-3.57,-48.05:-0.04,-1.05,-45.53:36 0/0:PASS:-0.00,-5.52,-63.63:-0.00,-2.95,-61.06:55 0/0:PASS:-0.00,-4.74,-56.17:-0.00,-2.18,-53.61:47 0/0:PASS:-0.00,-4.89,-65.92:-0.00,-2.33,-63.36:49

  • bhandsakerbhandsaker Member, Broadie, Moderator admin

    Thanks, John,

    We've got a bead on it and will try to get a fix out shortly. But just to make sure there isn't something more complicated going on, can you send DEL_P0057_1 too?

    Bob

  • jfarrelljfarrell Member ✭✭

    Bob,

    Here are both from the discovery vcf output....

    P0057.discovery.vcf:1 56000020 DEL_P0057_1 T <DEL> . . CIEND=-8,8;CIPOS=-8,8;END=56000179;GSCOHERENCE=-2.0998168894192903;GSCOHFN=-1.0499084447096452;GSCOHPVALUE=0.3753;GSCOORDS=55999781,56000014,56000201,56000398;GSDEPTHCALLTHRESHOLD=1.2705903003608259;GSDEPTHNOBSSAMPLES=2;GSDEPTHNTOTALSAMPLES=2;GSDEPTHOBSSAMPLES=A-CUHS-CU001709-BL-COL-55227BL1,A-CUHS-CU001895-BL-COL-46258BL1;GSDEPTHPVALUE=0.301725;GSDEPTHPVALUECOUNTS=91,1030469182,1298,16541115087;GSDEPTHRANKSUMPVALUE=NA;GSDEPTHRATIO=1.1253729582259475;GSDMAX=286;GSDMIN=1;GSDOPT=157;GSDSPAN=186;GSMEMBNPAIRS=2;GSMEMBNSAMPLES=2;GSMEMBOBSSAMPLES=A-CUHS-CU001895-BL-COL-46258BL1,A-CUHS-CU001709-BL-COL-55227BL1;GSMEMBPVALUE=0.752;GSMEMBSTATISTIC=31.643142522149944;GSNDEPTHCALLS=29;GSNPAIRS=2;GSNSAMPLES=2;GSOUTLEFT=0;GSOUTLIERS=0;GSOUTRIGHT=0;GSREADGROUPS=H0PC9.2,C2EDJ.3;[email protected],[email protected];GSSAMPLES=A-CUHS-CU001895-BL-COL-46258BL1,A-CUHS-CU001709-BL-COL-55227BL1;IMPRECISE;SVLEN=-157;SVTYPE=DEL

    P0056.discovery.vcf:1 56000025 DEL_P0056_1278 T <DEL> . . CIEND=-31,32;CIPOS=-31,32;END=56000329;GSCOHERENCE=-2.3996147593919215;GSCOHFN=-1.1998073796959607;GSCOHPVALUE=0.3055;GSCOORDS=55999891,55999991,56000427,56000561;GSDEPTHCALLTHRESHOLD=0.9379948496312903;GSDEPTHNOBSSAMPLES=2;GSDEPTHNTOTALSAMPLES=2;GSDEPTHOBSSAMPLES=A-CUHS-CU001885-BL-COL-25032BL1,A-CUHS-CU001886-BL-COL-41219BL1;GSDEPTHPVALUE=0.057632;GSDEPTHPVALUECOUNTS=131,869335835,2992,16702246700;GSDEPTHRANKSUMPVALUE=NA;GSDEPTHRATIO=0.8411956505801403;GSDMAX=535;GSDMIN=1;GSDOPT=303;GSDSPAN=435;GSMEMBNPAIRS=2;GSMEMBNSAMPLES=2;GSMEMBOBSSAMPLES=A-CUHS-CU001886-BL-COL-41219BL1,A-CUHS-CU001885-BL-COL-25032BL1;GSMEMBPVALUE=0.296;GSMEMBSTATISTIC=37.780641222368345;GSNDEPTHCALLS=6;GSNPAIRS=2;GSNSAMPLES=2;GSOUTLEFT=0;GSOUTLIERS=0;GSOUTRIGHT=0;GSREADGROUPS=H0TGC.2,D27B5.6;[email protected],[email protected];GSSAMPLES=A-CUHS-CU001886-BL-COL-41219BL1,A-CUHS-CU001885-BL-COL-25032BL1;IMPRECISE;SVLEN=-303;SVTYPE=DEL

  • skashinskashin Member ✭✭

    John,

    We fixed the problem and pushed out a release containing the fix, svtoolkit_1.04.1425.tar.gz
    Please let us know, if you have any more issues

    Seva

  • jfarrelljfarrell Member ✭✭

    Seva,

    This is just to confirm the bug fix worked and the new version is working fine. Thanks,

    John

Sign In or Register to comment.