CombineGVCFs input multiple files

Hello,

How can I specify the multiple input files for CombineGVCF without having to type the name of each file separately (I have 432 input files...)?

Using

java -jar GenomeAnalysisTK.jar \
-T CombineGVCFs \
-R /home/mruhsam/bluebells/Ref_files/refseq_gene236.fa \
--variant *.scc30_sec20.g.vcf \
-o cohort.g.vcf

gives me the error message

ERROR MESSAGE: Invalid argument value '12663.gatkHC.raw.snps.indels_scc30_sec20.g.vcf ' at position 6.
ERROR Invalid argument value '187-03.gatkHC.raw.snps.indels_scc30_sec20.g.vcf ' at position 7.
ERROR Invalid argument value '188-02.gatkHC.raw.snps.indels_scc30_sec20.g.vcf ' at position 8.
ERROR Invalid argument value '232-01.gatkHC.raw.snps.indels_scc30_sec20.g.vcf ' at position 9.

AND SO ON going through all the 432 sample names. However, specifying a few files separately using '–variant' works. Surely there must be a way to 'batch' specify the name of the input files?

Thank you

Markus

Best Answer

Answers

  • mruhsammruhsam Member

    Thanks very much!

  • RosmaninhoRosmaninho Member

    @lindenb said:

    How can I specify the multiple input files for CombineGVCF without having to type the name of each file separately (I have 432 input files...)?

    find DIR -name "*.g.vcf" > input.list
    

    and then

    java -jar GenomeAnalysisTK.jar -T CombineGVCFs (...) --variant  input.list (...)
    
    

    Does this also work for GenomicsDBImport?

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Yes but you need to use file name extension .txt instead of .list. Somehow the walker gets confused with the file name extension. The parameter to use is -sn and the list must be a tab seperated samplename filename list.

Sign In or Register to comment.