We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

ERROR LocationAwareSeekableRODIterator during CombineGVCFs

Dear,

I am managing 952 gvcf files and I want to use GenotypeGVCFs. Due to the high number of individuals, I am first using CombineGVCFs by splitting the individuals by 200, as well explained in GATK.

I get the results for all blocks except one, here the error I received:
"##### ERROR MESSAGE: LocationAwareSeekableRODIterator: track variant169 is out of coordinate order on contig 8:96167119-96167120 compared to 8:96167252"

The few topics talking about ROD error date from 3 years ago, so I do not know how to handle the problem..

If someone could give me some advices it will be nice.
Best,
Vimel.

Best Answer

Answers

  • vimelvimel ParisMember

    Here the command and the version I have used, if it could help:

    "java -Xmx25g -jar /home/files/software/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar \
    -T CombineGVCFs \
    -R /home/files/reference/human_g1k_v37.fasta \
    -L /home/files/reference/SureSelect-v4plusUTR_baits.interval_list \
    --interval_padding 200 \
    --variant /home/storage2/vcf/JL0447.vcf \
    [...]
    --variant /home/storage2/vcf/JL0731.vcf \
    -o TB201to400.g.vcf \
    &>> TB201to400.log".

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @vimel
    Hi Vimel,

    Can you try deleting the VCF indices? GATK will create a new index (perhaps the .idx files got corrupted somehow). If that does not work, can you try running on 50 GVCFs at a time? This will help find the exact GVCF that is causing the error.

    -Sheila

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie admin

    Also, did you parallelize the HaplotypeCaller runs? How were the GVCFs generated?

  • vimelvimel ParisMember

    Hi Sheila and Geraldine,

    I split by 50 the block of interest to use CombineGVCFs. The last sub-block get the same error, so I used GenotypeGVCFs on each individual by specifying the area of interest (-L 8:96167119-96167252 with --interval_padding 200) to find the exact GVCF that causing the error. But after "removing" these one from the project, I get a problem on the chr12 for another individual..

    So, as you said, I deleted the VCF indices for these two individuals. And I tried a CombineGVCFs on them. The error obtained is "##### ERROR We saw a record with a start of 12:131439284 after a record with a start of 12:131450797", I guess the initial bam files were not well sorted. I tried to use SortVcf from picard, but I received error messages(for individuals with or without problem quoted above), perhaps these tool does not handle GVCF.

    I think the best solution will be to directly treat the bam files (sort/index) and run HaplotypeCaller. But for the moment I have launched GenotypeGVCFs for the 50 individuals with full exome interval this time, and I will remove the unsorted individuals from the study.

    I am not the one who generated the VCF file, but by reading the header of VCF files I can see:
    "##GATKCommandLine=<ID=HaplotypeCaller [...] num_threads=1 num_cpu_threads_per_data_thread=1 num_io_threads=0 [...]", so I think HC runs were not parallelized.

    Thanks for your quick responses.

    Best,
    Vimel.

  • vimelvimel ParisMember

    Hi Sheila,

    Yes, the error was due to 2 individual GVCFs. CombineGVCFs worked well (without these individuals) and I am now running GenotypeGVCFs. I will run HaplotypeCaller on these 2 corrupted GVCFs when I will have some time.

    Thank you again.
    Best,
    Vimel.

Sign In or Register to comment.