We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

FastaAlternateReferenceMaker error (AWS S3 issue?)

robinvanvelzenrobinvanvelzen NetherlandsMember
edited April 2015 in Ask the GATK team

I am following a protocol described here https://github.com/justin-lack/Drosophila-Genome-Nexus/blob/master/INSTRUCTIONS-ASSEMBLY to do a reference-based genome assembly. However, I am stuck at the point where I want to generate a new reference fasta based on the called SNPs and INDELS (also using GATK).

Basically, I get a nondescriptive ERROR:

$ ~/progs_nobackup/GenomeAnalysisTK-3.3-0/GenomeAnalysisTK.jar -T FastaAlternateReferenceMaker -R stampy_PriWU20x00-PanWU01x14_asm01_SNPs_reference.fasta -V stampy_PriWU20x00-PanWU01x14_asm01_sorted_clean_dups_header_realign_round1_INDELS.vcf -o stampy_PriWU20x00-PanWU01x14_asm01_reference.fasta -log FastaAlternateReferenceMaker.log -l DEBUG 2>FastaAlternateReferenceMaker.out

$ tail FastaAlternateReferenceMaker.out
...... 2693, 2694, 2695, 2696, 2697, 2698, 2699, 2700, 2701, 2702, 2703, 2704, 2705, 2706, 2707, 2708, 2709, 2710, 2711, 2712, 2713, 2714, 2715, 2716, 2717, 2718, 2719, 2720, 2721, 2722, 2723, 2724, 2725, 2726, 2727, 2728, 2729, 2730, 2731, 2732]

ERROR ------------------------------------------------------------------------------------------

$ tail FastaAlternateReferenceMaker.log
DEBUG 13:50:23,168 GenomeLocParser - 2730 (511 bp)
DEBUG 13:50:23,170 GenomeLocParser - 2731 (503 bp)
DEBUG 13:50:23,171 GenomeLocParser - 2732 (501 bp)
INFO 13:50:23,331 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
DEBUG 13:50:23,396 RMDTrackBuilder - Loading Tribble index from disk for file stampy_PriWU20x00-PanWU01x14_asm01_sorted_clean_dups_header_realign_round1_INDELS.vcf
DEBUG 13:50:23,928 GATKRunReport - Aggregating data for run report
DEBUG 13:50:24,112 GATKRunReport - Posting report of type AWS
DEBUG 13:50:24,115 GATKRunReport - Generating GATK report to AWS S3 with key 5Wycn2kxjqUtmduOyM8DuuvO5jIqgeoY.report.xml.gz
INFO 13:50:26,091 GATKRunReport - Uploaded run statistics report to AWS S3
DEBUG 13:50:26,093 GATKRunReport - Uploaded to AWS: S3Object [key=5Wycn2kxjqUtmduOyM8DuuvO5jIqgeoY.report.xml.gz, bucket=, lastModified=Mon Apr 20 13:50:26 CEST 2015, dataInputStream=null, Metadata={ETag="ff4cc693fcef30a0a065a347e2be897a", Date=Mon Apr 20 13:50:26 CEST 2015, Content-Length=15055, id-2=vqeCWzphukT0okmJ6GbLI6jLCinLyRfrjhliDAEYrKJhDVE+5lFp9k9O0agWvxos, request-id=4BFE08DA3ADE4E15, Content-MD5=/0zGk/zvMKCgZaNH4r6Jeg==, Content-Type=application/octet-stream}]

The debug log suggests that the last performed step was uploading a report to AWS which is an Amazon cloud computing service. And I have no idea why (I don't have an AWS account and want to run the analysis on the local server). Perhaps this is causing the error?

Any help or advice would be very much appreciated

Thanks

Best Answer

Answers

  • robinvanvelzenrobinvanvelzen NetherlandsMember

    Dear Geraldine,

    Many thanks for clearing this up. I did not realise I truncated the error.. it reads: ##### ERROR MESSAGE: Input files variant and reference have incompatible contigs: No overlapping contigs found.

    I had missed the following statement in the protocol: "The original reference should have each chromosome renamed numerically in the order they are in in the reference fasta file (i.e., >1, >2, >3, etc.). This is required by the AlternateReferenceMaker module of GATK (I don't know why, but it automatically renames every chromosome in numerical order in the new reference that it creates)."

    I was able to fix the issue by running AlternateReferenceMaker only once on merged .vcf files.

    Best, Robin

Sign In or Register to comment.