Bug Bulletin: The GenomeLocPArser error in SplitNCigarReads has been fixed; if you encounter it, use the latest nightly build.

"ERROR MESSAGE: No data found." from VariantRecalibrator

claybreshearsclaybreshears Hillsboro, ORPosts: 1Member

This crops up when running with "-mode INDEL". Not sure why there is no data. (See attached log file with stack trace.)

All input files are non-empty (except the .R file). A similar execution using "-mode SNP" completes with no problems. Since I'm simply looking to get the scripting and flags correct, I've used a public data set. Could it be that I'm unlucky and chose something that has no indels from the reference, which is causing the error? Could there be a more graceful method of termination?

log
log
NIST7035_TAAGGCGA_L001_R1_001.recalibrate.indel.log
8K

Answers

  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    It looks like you're using a pretty small dataset, so there might be no variants in your data that overlap with the model training resources. This happens often for indels if you're running on a small dataset. The solution is to use a bigger dataset -- unfortunately it's not possible to test VQSR on small datasets.

    We're looking at ways to improve how the program handles the issues stemming from having too few variants to work with, so hopefully future versions will be more graceful.

    Geraldine Van der Auwera, PhD

  • IrantzuIrantzu Posts: 2Member
    edited July 18

    Hi @Geraldine, one little question. I'm running VariantRecalibrator, and it seems that is running OK but at the end I have the "##### ERROR MESSAGE: No data found." error. I think the command is OK, but the thing is, is possible to run variantrecalibrator with 4000 variants and only ONE sample? I'm asking this because I've read several comments about this issue and I'm not sure if it is possible to run the analysis only with one sample...

    Thanks in advance

    Post edited by Irantzu on
  • Geraldine_VdAuweraGeraldine_VdAuwera Posts: 6,176Administrator, GATK Developer admin

    Hi @Irantzu,

    VQSR does not perform well (if at all) on a single sample. It can work with whole genome sequence, but if you're working with exome, there's just too few variants. Our recommendation for dealing with this is to get additional sample bams from the 1000Genomes project and add them to your callset (see this presentation for details.

    Geraldine Van der Auwera, PhD

Sign In or Register to comment.