SelectVariants doesn't appear to work on Haploid VCFs

SteveLSteveL BarcelonaMember

Hi all,

I am still working on how to process sex-chromosomes correctly, and as a result I have a VCF of haploid calls for a few hundred male X-chromosomes. I am trying to subset out by sample using SelectVariants, but it throws the following error:

##### ERROR A GATK RUNTIME ERROR has occurred (version 3.5-0-g36282e4):
...
##### ERROR MESSAGE: All samples must be diploid

The relevant part of the command is the following:

java -Xmx8g -Djava.io.tmpdir=$TMPDIR -jar /apps/GATK/3.5/GenomeAnalysisTK.jar \ -T SelectVariants \ --sample_file Final385Samples.219males.list \ --excludeNonVariants \ --removeUnusedAlternates \ -R /project/production/Indexes/samtools/hsapiens.hs37d5.fasta \ -V 312Samples.ChrX.haploid.vcf.gz \ -o 219samples.ChrX.haploid.vcf.gz

I tried running with version 3.4, which doesn't seem to complain about this issue, however it is not happy with the "*" alleles.

Is there any simple workaround for this that will provide a correctly formatted output file, or am I doing something wrong?

Thanks, Steve

Tagged:

Issue · Github
by Sheila

Issue Number
709
State
closed
Last Updated
Assignee
Array
Milestone
Array
Closed By
vdauwera

Best Answer

Answers

  • SteveLSteveL BarcelonaMember

    Hi @Geraldine_VdAuwera,

    Thanks for your quick response, and sorry for my delay in getting back to you but I had a couple of days in bed with man-flu. Your suspicions were correct - the offending argument appears to be "--removeUnusedAlternates". It appears to work fine if I just omit it, and no problem with using the ""--excludeNonVariants". So this is not so inconvenient.

    In case it is still relevant, I had 11,772 "*" positions out of ~6M total.

Sign In or Register to comment.