Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

How to use the -L value in HaplotypeCaller?

nayshoolnayshool israelMember

I have exome seq made by using Agilent SureSelect v4 kit. I have downloaded the regions in BED files and transformed into this this format:
chr1:762097-762270 chr1:861281-861490 chr1:865591-865791 chr1:866325-866498 chr1:871059-871244 chr1:874364-874774
...
I am tring to provide the BEM file to the HaplotypeCaller:
java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fasta -I result_sort_MarDup_AddRG_Indels_recal.bam -I exomes_from_1000G.list -o VCF_18_06_14 -stand_call_conf 30 -stand_emit_conf 10 -minPruning 3 -L regions.bed -nct 32

but I am getting empty VCF file
INFO 13:53:34,089 HaplotypeCaller - Ran local assembly on 0 active regions INFO 13:53:34,102 ProgressMeter - done 0.00e+00 0.0 s 38.1 h 0.0% 0.0 s 0.0 s INFO 13:53:34,102 ProgressMeter - Total runtime 0.14 secs, 0.00 min, 0.00 hours

If I am not using the -L function I am getting very large VCF file.

Should I provide diffrent format? How can I solve it?

many thanks

Omri

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @nayshool‌

    Hi Omri,

    Instead of having a .bed file for your intervals list, please rename it to a .intervals file so the parser interprets it correctly. So, regions.intervals is fine.

    -Sheila

  • nayshoolnayshool israelMember

    Hi,

    thanks for the answer.

    Unfortunally i changed the argument and the file into ".intervals":

    java -jar GenomeAnalysisTK.jar -T HaplotypeCaller -R human_g1k_v37.fasta -I result_sort_MarDup_AddRG_Indels_recal.bam -I exomes_from_1000G.list -o VCF_18_06_14 -stand_call_conf 30 -stand_emit_conf 10 -minPruning 3 -L regions.intervals -nct 32

    but I'm getting the following error:
    Badly formed genome loc: Contig 'chr1' does not match any contig in the GATK sequence dictionary derived from the reference; are you sure you are using the correct reference fasta file?

    I am using the same fasta file all the best practice pipeline,
    why I am getting this error?

    many thanks

    Omri

  • nayshoolnayshool israelMember
Sign In or Register to comment.