To celebrate the release of GATK 4.0, we are giving away free credits for running the GATK4 Best Practices pipelines in FireCloud, our secure online analysis portal. It’s first come first serve, so sign up now to claim your free credits worth $250. Sponsored by Google Cloud. Learn more at https://software.broadinstitute.org/firecloud/documentation/freecredits

SelectVariants doesn't appear to work on Haploid VCFs

SteveLSteveL BarcelonaMember

Hi all,

I am still working on how to process sex-chromosomes correctly, and as a result I have a VCF of haploid calls for a few hundred male X-chromosomes. I am trying to subset out by sample using SelectVariants, but it throws the following error:

##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
...
##### ERROR MESSAGE: All samples must be diploid

The relevant part of the command is the following:

java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar /apps/GATK/3.5/GenomeAnalysisTK.jar \ -T SelectVariants \ --sample_file Final385Samples.219males.list \ --excludeNonVariants \ --removeUnusedAlternates \ -R /project/production/Indexes/samtools/hsapiens.hs37d5.fasta \ -V 312Samples.ChrX.haploid.vcf.gz \ -o 219samples.ChrX.haploid.vcf.gz

I tried running with version 3.4, which doesn't seem to complain about this issue, however it is not happy with the "*" alleles.

Is there any simple workaround for this that will provide a correctly formatted output file, or am I doing something wrong?

Thanks, Steve

Tagged:

Issue · Github
by Sheila

Issue Number
709
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • SteveLSteveL BarcelonaMember

    Hi @Geraldine_VdAuwera,

    Thanks for your quick response, and sorry for my delay in getting back to you but I had a couple of days in bed with man-flu. Your suspicions were correct - the offending argument appears to be "--removeUnusedAlternates". It appears to work fine if I just omit it, and no problem with using the ""--excludeNonVariants". So this is not so inconvenient.

    In case it is still relevant, I had 11,772 "*" positions out of ~6M total.

Sign In or Register to comment.