The current GATK version is 3.3-0

#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

# RealignerTargetCreator gets a empty file, why?

Posts: 23Member
edited November 2012
java -jar GenomeAnalysisTK.jar -R exampleFASTA.fasta -I exampleBAM.bam -T RealignerTargetCreator -o exampleTarget.intervals

I used the example files, and I got no error report, but in the output, there is no data, that is to say. The exampleTarget.intervals is empty.

Why is this happening?

And when I add --known dbsnp_135.hg19.vcf( that is in the bundle), it will get an error

##### ERROR contig reference = chr1 / 100000.
Post edited by Geraldine_VdAuwera on
Tagged:

OK, I have good news -- you're not doing anything wrong. I checked the example file; the reason you're not getting any SNPs is that there aren't any in there. This example file was intended only to test the very simple walkers in the "first time GATK" tutorials; it's not appropriate for use with more advanced tools like the UnifiedGenotyper. I'm sorry that wasn't clear in the documentation, I'll try to correct that. To test the UG, you can use the file called NA12878.HiSeq.WGS.bwa.cleaned.recal.hg19.20.bam in the bundle. That contains a dataset for chromosome 20. You will get plenty of SNPs there.

Geraldine Van der Auwera, PhD

Regarding your first problem, it looks like exampleBAM.bam does not have any problem regions that need to be realigned...

For your second problem, that's because the example files are aligned to the b37 reference, but you're using the hg19 version of the dbsnp file. See the FAQs for why the b37/hg19 stuff is important.

Geraldine Van der Auwera, PhD

• Posts: 23Member
edited November 2012

Thanks very much.

Post edited by livefallfly on
• Posts: 23Member
edited November 2012

I have got another problem for calling SNPs. I can't get SNPs in the example.vcf.

java -jar ~/chenlong/GenomeAnalysisTK-2.1-5/GenomeAnalysisTK.jar -T UnifiedGenotyper -I exampleFASTA.bam -R exampleFASTA.fasta --dbsnp exampleSNP.vcf -o example.vcf

And it is all the same either with -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 or not.

And I can't get the result either with my own data.

Thanks.

Post edited by Geraldine_VdAuwera on

Can you tell me a little more about your problem? Specifically, I need to know what happens when you run the command. Is there no output file created, or is the file empty? Do you get any error messages?

By the way, we recently released version 2.2, so I recommend you upgrade -- the tools are significantly faster in the new version.

Geraldine Van der Auwera, PhD

• Posts: 23Member

Thanks for your answer. As for your questions, it creates the example.vcf file. And in the file it contains the annotation line. The final line in the example.vcf file is: #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT exampleBAM But there is no error messages report to me. And I will upgrade the GATK, and have a try again.

• Posts: 23Member
edited November 2012

java -jar ~/GenomeAnalysisTK-2.1-5/GenomeAnalysisTK.jar -T UnifiedGenotyper -I exampleBAM.bam -R exampleFASTA.fasta --dbsnp exampleSNP.vcf -o example.vcf

INFO 21:58:41,035 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:58:41,036 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-5-gf3daab0, Compiled 2012/08/27 17:19:33 INFO 21:58:41,036 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 21:58:41,036 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 21:58:41,036 HelpFormatter - Program Args: -T UnifiedGenotyper -I exampleBAM.bam -R exampleFASTA.fasta --dbsnp exampleSNP.vcf -o example.vcf INFO 21:58:41,037 HelpFormatter - Date/Time: 2012/11/04 21:58:41 INFO 21:58:41,037 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:58:41,037 HelpFormatter - -------------------------------------------------------------------------------- INFO 21:58:41,054 ArgumentTypeDescriptor - Dynamically determined type of exampleSNP.vcf to be VCF INFO 21:58:41,061 GenomeAnalysisEngine - Strictness is SILENT INFO 21:58:41,130 SAMDataSource$SAMReaders - Initializing SAMRecords in serial WARNING: BAM index file /home/livefallfly/chenlong/GenomeAnalysisTK-2.1-5/resources/exampleBAM.bam.bai is older than BAM /home/livefallfly/chenlong/GenomeAnalysisTK-2.1-5/resources/exampleBAM.bam INFO 21:58:41,140 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 INFO 21:58:41,148 RMDTrackBuilder - Loading Tribble index from disk for file exampleSNP.vcf WARN 21:58:41,234 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer INFO 21:58:41,581 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING] INFO 21:58:41,581 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining INFO 21:58:42,666 TraversalEngine - Total runtime 1.17 secs, 0.02 min, 0.00 hours INFO 21:58:42,667 TraversalEngine - 0 reads were filtered out during traversal out of 33 total (0.00%) INFO 21:58:44,838 GATKRunReport - Uploaded run statistics report to AWS S3 This is the screen output with no error. Post edited by livefallfly on • Posts: 23Member edited November 2012 I have tried the new version. But it is the same question. It created the example.vcf file with the annotation infomation. Below is the screen output INFO 14:25:36,452 HelpFormatter - -------------------------------------------------------------------------------- INFO 14:25:36,465 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.2-3-gde33222, Compiled 2012/11/03 01:04:17 INFO 14:25:36,466 HelpFormatter - Copyright (c) 2010 The Broad Institute INFO 14:25:36,466 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk INFO 14:25:36,468 HelpFormatter - Program Args: -T UnifiedGenotyper -I exampleBAM.bam --dbsnp ../../GenomeAnalysisTK-2.1-5/resources/exampleSNP.vcf -R exampleFASTA.fasta -o example.vcf -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 INFO 14:25:36,468 HelpFormatter - Date/Time: 2012/11/05 14:25:36 INFO 14:25:36,468 HelpFormatter - -------------------------------------------------------------------------------- INFO 14:25:36,468 HelpFormatter - -------------------------------------------------------------------------------- INFO 14:25:36,515 ArgumentTypeDescriptor - Dynamically determined type of ../../GenomeAnalysisTK-2.1-5/resources/exampleSNP.vcf to be VCF INFO 14:25:36,520 GenomeAnalysisEngine - Strictness is SILENT INFO 14:25:36,599 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE Target Coverage: 1000 INFO 14:25:36,603 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
WARNING: BAM index file /home/livefallfly/chenlong/GenomeAnalysisTK-2.2-3/resources/exampleBAM.bam.bai is older than BAM /home/livefallfly/chenlong/GenomeAnalysisTK-2.2-3/resources/exampleBAM.bam
INFO  14:25:36,616 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01 INFO 14:25:36,624 RMDTrackBuilder - Loading Tribble index from disk for file ../../GenomeAnalysisTK-2.1-5/resources/exampleSNP.vcf WARN 14:25:36,692 VCFStandardHeaderLines$Standards - Repairing standard header line for field GQ because -- type disagree; header has Float but standard is Integer
INFO  14:25:36,705 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  14:25:36,706 ProgressMeter -        Location processed.sites  runtime per.1M.sites completed total.runtime remaining
INFO  14:25:37,992 ProgressMeter -            done        1.00e+05    1.3 s       12.9 s    100.0%         1.3 s     0.0 s
INFO  14:25:37,992 ProgressMeter - Total runtime 1.29 secs, 0.02 min, 0.00 hours
INFO  14:25:37,993 MicroScheduler - 0 reads were filtered out during traversal out of 33 total (0.00%)
INFO  14:25:37,993 NSRuntimeProfile - Input   time:    0.3 s (20.96%)
INFO  14:25:37,993 NSRuntimeProfile - Map     time:    0.7 s (57.91%)
INFO  14:25:37,993 NSRuntimeProfile - Reduce  time:    0.0 s ( 1.99%)
INFO  14:25:37,993 NSRuntimeProfile - Outside time:    0.2 s (19.13%)
INFO  14:25:41,040 GATKRunReport - Uploaded run statistics report to AWS S3

And next is the example.vcf

##..
...
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=RPA,Number=.,Type=Integer,Description="Number of times tandem repeat unit is repeated, for each allele (including reference)">
##INFO=<ID=RU,Number=1,Type=String,Description="Tandem repeat unit (bases)">
##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
##INFO=<ID=STR,Number=0,Type=Flag,Description="Variant is a short tandem repeat">
##contig=<ID=chr1,length=100000>
##reference=file:///home/livefallfly/chenlong/GenomeAnalysisTK-2.2-3/resources/exampleFASTA.fasta
#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  exampleBAM.bam

And no SNP, I don't why.

Post edited by Geraldine_VdAuwera on
• Posts: 23Member

Thanks for your help. It works well now. And I have another question, IndelGenotyperV2 is not used again, so which replace the function? And makeIndelMask.py is in the GATK? where can I get it. if not ,which software can I get it?

It sounds like you're following a really old pipeline example. Most of the tools have changed -- for example, now the variant calling is done by either the UnifiedGenotyper or the HaplotypeCaller. Please read the Best Practices document, it will tell you what tools to use and in what order:

Geraldine Van der Auwera, PhD

• Posts: 23Member

It's very helpful. Thanks very much!

• Posts: 9Member

Hi, I'm trying to use the RealignerTargetCreator and I got empty Target.intervals files. I read that maybe it means that no regions need to be realigned. But then when I used the IndelRealigner tool with my empty Target.intervals files I got empty BAMs. I thought I should have gotten the same BAM (the one used as INPUT file) as there is nothing to realign? Why is it not the case?

Sounds like something is wrong with your data files. Have you validated your bam file? And if you try using PrintReads on a small interval, does it generate a small bam properly?

Geraldine Van der Auwera, PhD

• Posts: 9Member

Actually, if I do not use the option -L my_selected_scaff when I'm using PrintReads I got this error message : Badly formed genome loc: Parameters to GenomeLocParser are incorrect:The contig index 23924 is greater than the stored sequence count (1)

Actually I created a new bam only with the reads mapping on my_selected_scaff with this command: samtools view -h my_bam "gi|15616630|ref|NC_002528.1|" | samtools view -bS -> my_new_bam But I still have in the header a line for each scaff I had before. I runned then RealignerTargetCreator and PrintReads reads on this BAM.

If I use the option -L "gi|15616630|ref|NC_002528.1|" I do not have any error but I got at the end an empty bam.

Something is definitely wrong with your new bam file. You should re-create it from the original file using PrintReads with the -L option. Keep in mind that the -L option does not use quotes for contig names.

Geraldine Van der Auwera, PhD

• Posts: 9Member
edited February 2013

Then I tried without quotes but I got an error because the tool is looking for a contig only named gi. So I tried -L gi\|15616630\|ref\|NC_002528.1\| and I got the same as before : no error message but no result in my final BAM. I really don't understand what's happening here. Anyway, as I got an empty interval file for realignment, can I suppose that I do not need to realign and then I can keep working on my initial BAM?

Post edited by Geraldine_VdAuwera on

You would be better off simplifying the contig names, eg here change it to just NC_002528.1.

Is that a very small contig? If so it's possible that the RTC is not finding anything that needs to be realigned. I would recommend doing one more try with PrintReads on that contig and make sure that the data in the new file is the same as the old one. Then you can move on to the next processing step.

Geraldine Van der Auwera, PhD

• Posts: 9Member

I tried to change the scaffold name but when I use PrintReads to select the scaff I want to work on, it doesn't print anything in my BAM file but the header. But with this command : samtool view -h mybam.bam buch_scaff | samtools view -bS -> mybam_buchnera.bam with buch_scaff the name of the scaff for which I want to retrieve the reads mapping on it. The new name doesn't change anything about the realignment. My interval file is still empty. The length of the scaffold is 600 kb. Maybe it's not short enough to think that there is no need of a realignment step...?