We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

SelectVariants doesn't appear to work on Haploid VCFs

SteveLSteveL BarcelonaMember ✭✭

Hi all,

I am still working on how to process sex-chromosomes correctly, and as a result I have a VCF of haploid calls for a few hundred male X-chromosomes. I am trying to subset out by sample using SelectVariants, but it throws the following error:

##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
##### ERROR MESSAGE: All samples must be diploid

The relevant part of the command is the following:

java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar /apps/GATK/3.5/GenomeAnalysisTK.jar \ -T SelectVariants \ --sample_file Final385Samples.219males.list \ --excludeNonVariants \ --removeUnusedAlternates \ -R /project/production/Indexes/samtools/hsapiens.hs37d5.fasta \ -V 312Samples.ChrX.haploid.vcf.gz \ -o 219samples.ChrX.haploid.vcf.gz

I tried running with version 3.4, which doesn't seem to complain about this issue, however it is not happy with the "*" alleles.

Is there any simple workaround for this that will provide a correctly formatted output file, or am I doing something wrong?

Thanks, Steve


Issue · Github
by Sheila

Issue Number
Last Updated
Closed By

Best Answer


  • SteveLSteveL BarcelonaMember ✭✭

    Hi @Geraldine_VdAuwera,

    Thanks for your quick response, and sorry for my delay in getting back to you but I had a couple of days in bed with man-flu. Your suspicions were correct - the offending argument appears to be "--removeUnusedAlternates". It appears to work fine if I just omit it, and no problem with using the ""--excludeNonVariants". So this is not so inconvenient.

    In case it is still relevant, I had 11,772 "*" positions out of ~6M total.

  • HovhannesSahakyanHovhannesSahakyan Tartu, EstoniaMember

    Hello guys,

    Did you find a solution for this issue?

Sign In or Register to comment.