Holiday Notice:
The Frontline Support team will be slow to respond December 17-18 due to an institute-wide retreat and offline December 22- January 1, while the institute is closed. Thank you for your patience during these next few weeks. Happy Holidays!

HaplotypeCaller warnings DepthPerSampleHC

Hi I'm trying to do a multisample variant call using several bam files in the following cmd

/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/raw_variants.vcf

Using GATK jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar
Running:
java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 -jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar HaplotypeCaller -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24150_1#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24144_2#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24712_6#1.bam -I /mnt/fastdata/md1jale/WGS_MShef7_iPS/24811_2#1.bam -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/mshef7_wt_vs_ips_raw_variants.vcf
10:26:29.719 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_compression.so
10:26:29.935 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.935 INFO HaplotypeCaller - The Genome Analysis Toolkit (GATK) v4.0.1.0
10:26:29.935 INFO HaplotypeCaller - For support and documentation go to https://software.broadinstitute.org/gatk/
10:26:29.935 INFO HaplotypeCaller - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
10:26:29.936 INFO HaplotypeCaller - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
10:26:29.936 INFO HaplotypeCaller - Start Date/Time: 14 February 2018 10:26:29 GMT
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - ------------------------------------------------------------
10:26:29.936 INFO HaplotypeCaller - HTSJDK Version: 2.14.1
10:26:29.936 INFO HaplotypeCaller - Picard Version: 2.17.2
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.COMPRESSION_LEVEL : 1
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
10:26:29.937 INFO HaplotypeCaller - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
10:26:29.937 INFO HaplotypeCaller - Deflater: IntelDeflater
10:26:29.937 INFO HaplotypeCaller - Inflater: IntelInflater
10:26:29.937 INFO HaplotypeCaller - GCS max retries/reopens: 20
10:26:29.937 INFO HaplotypeCaller - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
10:26:29.937 INFO HaplotypeCaller - Initializing engine
10:26:30.520 INFO HaplotypeCaller - Done initializing engine
10:26:30.528 INFO HaplotypeCallerEngine - Disabling physical phasing, which is supported only for reference-model confidence output
10:26:31.119 INFO NativeLibraryLoader - Loading libgkl_utils.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_utils.so
10:26:31.154 INFO NativeLibraryLoader - Loading libgkl_pairhmm_omp.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com/intel/gkl/native/libgkl_pairhmm_omp.so
10:26:31.259 WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
10:26:31.259 INFO IntelPairHmm - Available threads: 16
10:26:31.259 INFO IntelPairHmm - Requested threads: 4
10:26:31.259 INFO PairHMM - Using the OpenMP multi-threaded AVX-accelerated native PairHMM implementation
10:26:31.298 INFO ProgressMeter - Starting traversal
10:26:31.298 INFO ProgressMeter - Current Locus Elapsed Minutes Regions Processed Regions/Minute
10:26:33.832 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.865 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.880 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:33.911 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:34.733 WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null
10:26:41.497 INFO ProgressMeter - 1:15485 0.2 80 470.6

Despite having slight memory issues with running the above, the now command runs on providing large amount of memory, although i do get lots of WARN DepthPerSampleHC. Is this normal?

Answers

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Pretty much normal. However joint genotyping this way is not the suggested approach. It is better and more feasible (both computationally and time wise) to genotype in GVCF mode and do joint genotyping using GenotypeGVCFs. Therefore it makes it easier to add or remove samples to/from your call list later on.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @md1jale
    Hi,

    Also, have a look at this thread.

    -Sheila

  • md1jalemd1jale Member

    Thank you both very much. I switched to using GenotypeGVCFs and I've run into some new errors I'm afraid.

    #

    Step1: Create GVCF for each bam file

    Command:
    /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk HaplotypeCaller --native-pair-hmm-threads 16 -R /mnt/fastdata/md1jale/reference/hs37d5.fa -I /mnt/fastdata/md1jale/
    WGS_MShef7_iPS/24712_6#1.bam -ERC GVCF -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output//24712_6#1.bam.g.vcf

    #

    Step 2: Use GenomicsDBImport to create DB

    #
    #!/bin/bash
    #$ -V
    #$ -P rse
    #$ -q rse.q
    #$ -l h_vmem=32G
    #$ -l h_rt=96:00:00
    #$ -N GenomicsDBIimportwt_ips

    module add apps/java/jdk1.8.0_102/binary
    cd /fastdata/md1jale/WGS_MShef7_iPS/output
    /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk GenomicsDBImport -R /mnt/fastdata/md1jale/reference/hs37d5.fa --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output
    /'24811_1_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24150_1_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24144_2_1.
    bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24712_6_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24811_2_1.bam.g.vcf' -
    -genomicsdb-workspace-path /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database --intervals 20 --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    Log file:

    /var/spool/sge/sharc-node121/job_scripts/985719: line 12: module: command not found // not sure what this is
    13:54:35.663 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com
    /intel/gkl/native/libgkl_compression.so
    13:54:36.312 INFO GenotypeGVCFs - ------------------------------------------------------------
    13:54:36.312 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.1.0
    13:54:36.312 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    13:54:36.313 INFO GenotypeGVCFs - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    13:54:36.313 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
    13:54:36.313 INFO GenotypeGVCFs - Start Date/Time: 01 March 2018 13:54:35 GMT
    13:54:36.313 INFO GenotypeGVCFs - ------------------------------------------------------------
    13:54:36.313 INFO GenotypeGVCFs - ------------------------------------------------------------
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Version: 2.14.1
    13:54:36.314 INFO GenotypeGVCFs - Picard Version: 2.17.2
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    13:54:36.314 INFO GenotypeGVCFs - Deflater: IntelDeflater
    13:54:36.314 INFO GenotypeGVCFs - Inflater: IntelInflater
    13:54:36.314 INFO GenotypeGVCFs - GCS max retries/reopens: 20
    13:54:36.314 INFO GenotypeGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_
    all_nio_fixes
    13:54:36.315 INFO GenotypeGVCFs - Initializing engine
    terminate called after throwing an instance of 'VariantQueryProcessorException'
    what(): VariantQueryProcessorException : Could not open array genomicsdb_array at workspace: /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database
    Using GATK jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1
    -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar GenotypeGVCFs -R /mnt/fastdata/md1jale/refe
    rence/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcal
    ls.vcf

    #

    GenomicsDBIimportwt_ips.o985520
    Tool returned:
    true

    #

    Step 3: Run GenotypeGVCFs

    GenotypeGVCF
    #!/bin/bash
    #$ -V
    #$ -P rse
    #$ -q rse.q
    #$ -l h_vmem=32G
    #$ -l h_rt=96:00:00
    #$ -N GenotypeGVCFs

    module add apps/java/jdk1.8.0_102/binary
    cd /fastdata/md1jale/WGS_MShef7_iPS/output
    /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk GenotypeGVCFs -R /mnt/fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_ms
    hef7_database -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    Log file:
    more GenotypeGVCFs.e985719
    /var/spool/sge/sharc-node121/job_scripts/985719: line 12: module: command not found

    13:54:35.663 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar!/com
    /intel/gkl/native/libgkl_compression.so
    13:54:36.312 INFO GenotypeGVCFs - ------------------------------------------------------------
    13:54:36.312 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.1.0
    13:54:36.312 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    13:54:36.313 INFO GenotypeGVCFs - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    13:54:36.313 INFO GenotypeGVCFs - Java runtime: Java HotSpot(TM) 64-Bit Server VM v1.8.0_102-b14
    13:54:36.313 INFO GenotypeGVCFs - Start Date/Time: 01 March 2018 13:54:35 GMT
    13:54:36.313 INFO GenotypeGVCFs - ------------------------------------------------------------
    13:54:36.313 INFO GenotypeGVCFs - ------------------------------------------------------------
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Version: 2.14.1
    13:54:36.314 INFO GenotypeGVCFs - Picard Version: 2.17.2
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    13:54:36.314 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    13:54:36.314 INFO GenotypeGVCFs - Deflater: IntelDeflater
    13:54:36.314 INFO GenotypeGVCFs - Inflater: IntelInflater
    13:54:36.314 INFO GenotypeGVCFs - GCS max retries/reopens: 20
    13:54:36.314 INFO GenotypeGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_
    all_nio_fixes
    13:54:36.315 INFO GenotypeGVCFs - Initializing engine
    terminate called after throwing an instance of 'VariantQueryProcessorException'
    what(): VariantQueryProcessorException : Could not open array genomicsdb_array at workspace: /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database

    Using GATK jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar
    Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1
    -DGATK_STACKTRACE_ON_USER_EXCEPTION=true -jar /mnt/fastdata/md1jale/software/gatk-4.0.1.0/gatk-package-4.0.1.0-local.jar GenotypeGVCFs -R /mnt/fastdata/md1jale/refe
    rence/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database -O /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcal
    ls.vcf

    #
  • md1jalemd1jale Member

    Btw directory wt_mshef7_database/ contains

    total 40K
    -rwx------ 1 md1jale md 0 Mar 1 13:02 __tiledb_workspace.tdb
    -rw-r--r-- 1 md1jale md 8.7K Mar 1 13:46 vidmap.json
    -rw-r--r-- 1 md1jale md 17K Mar 1 13:46 vcfheader.vcf
    drwx------ 3 md1jale md 4.0K Mar 1 13:46 genomicsdb_array
    -rw-r--r-- 1 md1jale md 541 Mar 1 13:46 callset.json

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @md1jale
    Hi,

    I think your issue is the same one in this post. I am checking with the team, but since a few others have posted on this, I will try to get this moved up in priority. Please keep an eye on that thread, as I will update there with more information.

    -Sheila

  • md1jalemd1jale Member

    Hi Sheila,

    Thanks for getting back - I had a look at the post earlier and it dint seem to work. Is there a work around/quick fix yet?

    Thank you very much.
    cheers,
    J

    Issue · Github
    by shlee

    Issue Number
    3006
    State
    open
    Last Updated
    Assignee
    Array
  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @md1jale,

    I've posted to https://gatkforums.broadinstitute.org/gatk/discussion/11184/ the locations of WDL scripts that may enable you to move forward. You can try running the problematic steps via Docker container. It's fairly straightforward to set up. See https://software.broadinstitute.org/gatk/documentation/article?id=11090.

  • md1jalemd1jale Member

    Hi Shlee,

    Thank you for getting back. I've installed docker using singularity as per our IT team instructions

    $ export SINGULARITY_CACHEDIR="/fastdata/$USER/singularity-cachedir"
    $ mkdir -p $SINGULARITY_CACHEDIR
    $ singularity pull docker://broadinstitute/gatk:4.0.2.0 # create a Singularity container from a Docker container

    singularity run $SINGULARITY_CACHEDIR/gatk-4.0.2.0.simg
    gatk GenomicsDBImport -R /mnt/fastdata/md1jale/reference/hs37d5.fa --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24811_1_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24150_1_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24144_2_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24712_6_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24811_2_1.bam.g.vcf' --genomicsdb-workspace-path /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database --intervals 20

    (gatk) [email protected]:/fastdata/md1jale/WGS_MShef7_iPS/output$ gatk GenomicsDBImport -R /mnt/fastdata/md1jale/reference/hs37d5.fa --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24811_1_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24150_1_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24144_2_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24712_6_1.bam.g.vcf' --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/'24811_2_1.bam.g.vcf' --genomicsdb-workspace-path /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database --intervals 20
    Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
    Running:
    /gatk/build/install/gatk/bin/gatk GenomicsDBImport -R /mnt/fastdata/md1jale/reference/hs37d5.fa --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24811_1_1.bam.g.vcf --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24150_1_1.bam.g.vcf --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24144_2_1.bam.g.vcf --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24712_6_1.bam.g.vcf --variant /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24811_2_1.bam.g.vcf --genomicsdb-workspace-path /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database --intervals 20
    01:58:54.785 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/install/gatk/lib/gkl-0.8.5.jar!/com/intel/gkl/native/libgkl_compression.so
    01:58:55.186 INFO GenomicsDBImport - ------------------------------------------------------------
    01:58:55.186 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.2.0
    01:58:55.186 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    01:58:55.187 INFO GenomicsDBImport - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    01:58:55.187 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
    01:58:55.187 INFO GenomicsDBImport - Start Date/Time: March 21, 2018 1:58:54 AM UTC
    01:58:55.187 INFO GenomicsDBImport - ------------------------------------------------------------
    01:58:55.187 INFO GenomicsDBImport - ------------------------------------------------------------
    01:58:55.188 INFO GenomicsDBImport - HTSJDK Version: 2.14.3
    01:58:55.188 INFO GenomicsDBImport - Picard Version: 2.17.2
    01:58:55.188 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    01:58:55.188 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    01:58:55.188 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    01:58:55.188 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    01:58:55.188 INFO GenomicsDBImport - Deflater: IntelDeflater
    01:58:55.188 INFO GenomicsDBImport - Inflater: IntelInflater
    01:58:55.188 INFO GenomicsDBImport - GCS max retries/reopens: 20
    01:58:55.188 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    01:58:55.189 INFO GenomicsDBImport - Initializing engine
    01:58:55.215 INFO GenomicsDBImport - Shutting down engine
    [March 21, 2018 1:58:55 AM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 0.01 minutes.
    Runtime.totalMemory()=2076049408
    htsjdk.tribble.TribbleException$MalformedFeatureFile: Unable to parse header with error: /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24811_1_1.bam.g.vcf, for input source: file:///mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24811_1_1.bam.g.vcf
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:262)
    at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:101)
    at htsjdk.tribble.TribbleIndexedFeatureReader.(TribbleIndexedFeatureReader.java:126)
    at htsjdk.tribble.AbstractFeatureReader.getFeatureReader(AbstractFeatureReader.java:110)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getReaderFromPath(GenomicsDBImport.java:615)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.getHeaderFromPath(GenomicsDBImport.java:356)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.initializeHeaderAndSampleMappings(GenomicsDBImport.java:319)
    at org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport.onStartup(GenomicsDBImport.java:297)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:133)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:180)
    at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:199)
    at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:159)
    at org.broadinstitute.hellbender.Main.mainEntry(Main.java:201)
    at org.broadinstitute.hellbender.Main.main(Main.java:287)
    Caused by: java.nio.file.NoSuchFileException: /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24811_1_1.bam.g.vcf
    at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
    at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
    at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214)
    at java.nio.file.Files.newByteChannel(Files.java:361)
    at java.nio.file.Files.newByteChannel(Files.java:407)
    at htsjdk.samtools.seekablestream.SeekablePathStream.(SeekablePathStream.java:41)
    at htsjdk.tribble.util.ParsingUtils.openInputStream(ParsingUtils.java:108)
    at htsjdk.tribble.TribbleIndexedFeatureReader.readHeader(TribbleIndexedFeatureReader.java:252)
    ... 13 more


    Not sure I understand the output:
    These *.vcf are in my output/
    -rw-r--r-- 1 md1jale md 48G Feb 22 07:11 24811_1_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 47G Feb 23 09:30 24144_2_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 52G Feb 23 20:53 24150_1_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 51G Feb 24 12:59 24150_2_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 48G Feb 24 18:28 24152_2_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 41G Feb 25 07:44 24712_5_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 46G Feb 25 10:42 24152_1_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 40G Feb 25 11:19 24712_6_1.bam.g.vcf
    -rw-r--r-- 1 md1jale md 47G Feb 25 11:42 24811_2_1.bam.g.vcf

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Your Docker/Singularity instance cannot find the files.

    Caused by: java.nio.file.NoSuchFileException: /mnt/fastdata/md1jale/WGS_MShef7_iPS/output/24811_1_1.bam.g.vcf

  • md1jalemd1jale Member

    Thank you very much.

    I tried to run GenotypeGVCFs (Step3) and it broke again with the same error - I'm just not sure if the previous step(GenomicImporter Step2) has run properly -although as you can see Step2 produced a result

    (gatk) [email protected]:/fastdata/md1jale/WGS_MShef7_iPS/output$ gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database -O /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
    Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
    Running:
    /gatk/build/install/gatk/bin/gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database -O /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls.vcf
    11:01:16.320 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/install/gatk/lib/gkl-0.8.5.jar!/com/intel/gkl/native/libgkl_compression.so
    11:01:16.909 INFO GenotypeGVCFs - ------------------------------------------------------------
    11:01:16.910 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.2.0
    11:01:16.910 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    11:01:16.910 INFO GenotypeGVCFs - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    11:01:16.910 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
    11:01:16.910 INFO GenotypeGVCFs - Start Date/Time: March 21, 2018 11:01:16 AM UTC
    11:01:16.911 INFO GenotypeGVCFs - ------------------------------------------------------------
    11:01:16.911 INFO GenotypeGVCFs - ------------------------------------------------------------
    11:01:16.911 INFO GenotypeGVCFs - HTSJDK Version: 2.14.3
    11:01:16.911 INFO GenotypeGVCFs - Picard Version: 2.17.2
    11:01:16.911 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    11:01:16.911 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:01:16.911 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:01:16.911 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:01:16.912 INFO GenotypeGVCFs - Deflater: IntelDeflater
    11:01:16.912 INFO GenotypeGVCFs - Inflater: IntelInflater
    11:01:16.912 INFO GenotypeGVCFs - GCS max retries/reopens: 20
    11:01:16.912 INFO GenotypeGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    11:01:16.912 INFO GenotypeGVCFs - Initializing engine
    terminate called after throwing an instance of 'VariantQueryProcessorException'
    what(): VariantQueryProcessorException : Could not open array genomicsdb_array at workspace: /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database

    (gatk) [email protected]:/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database$ ls -rlht
    total 40K
    -rwx------ 1 md1jale md 0 Mar 1 13:02 __tiledb_workspace.tdb
    -rw-r--r-- 1 md1jale md 8.7K Mar 1 13:46 vidmap.json
    -rw-r--r-- 1 md1jale md 17K Mar 1 13:46 vcfheader.vcf
    drwx------ 3 md1jale md 4.0K Mar 1 13:46 genomicsdb_array
    -rw-r--r-- 1 md1jale md 541 Mar 1 13:46 callset.json

    (gatk) [email protected]:/fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database$ du -h
    1.4G ./genomicsdb_array/__bbe3ebc7-cb00-457c-9a8c-c9d45564945347083416028928_1519909396420
    1.4G ./genomicsdb_array

    I'm running Step2 again (in case it has been corrupted) and then Step3 using the docker - to see if this makes a difference but i quite doubt it -any advice is welcome!

  • md1jalemd1jale Member

    I can confirm Docker GATK4 doesn't work and produces the same error on repeating Step2,Step3

    [[email protected] output]$ singularity run $SINGULARITY_CACHEDIR/gatk-4.0.2.0.simg
    groups: cannot find name for group ID 20006
    (gatk) [email protected]:/fastdata/md1jale/WGS_MShef7_iPS/output$ gatk GenomicsDBImport -R /fastdata/md1jale/reference/hs37d5.fa --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24811_1_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24150_1_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24144_2_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24712_6_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24811_2_1.bam.g.vcf' --genomicsdb-workspace-path /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1 --intervals 20 --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
    Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
    Running:
    /gatk/build/install/gatk/bin/gatk GenomicsDBImport -R /fastdata/md1jale/reference/hs37d5.fa --variant /fastdata/md1jale/WGS_MShef7_iPS/output/24811_1_1.bam.g.vcf --variant /fastdata/md1jale/WGS_MShef7_iPS/output/24150_1_1.bam.g.vcf --variant /fastdata/md1jale/WGS_MShef7_iPS/output/24144_2_1.bam.g.vcf --variant /fastdata/md1jale/WGS_MShef7_iPS/output/24712_6_1.bam.g.vcf --variant /fastdata/md1jale/WGS_MShef7_iPS/output/24811_2_1.bam.g.vcf --genomicsdb-workspace-path /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1 --intervals 20
    11:14:01.537 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/install/gatk/lib/gkl-0.8.5.jar!/com/intel/gkl/native/libgkl_compression.so
    11:14:02.223 INFO GenomicsDBImport - ------------------------------------------------------------
    11:14:02.224 INFO GenomicsDBImport - The Genome Analysis Toolkit (GATK) v4.0.2.0
    11:14:02.224 INFO GenomicsDBImport - For support and documentation go to https://software.broadinstitute.org/gatk/
    11:14:02.225 INFO GenomicsDBImport - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    11:14:02.225 INFO GenomicsDBImport - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
    11:14:02.225 INFO GenomicsDBImport - Start Date/Time: March 21, 2018 11:14:01 AM UTC
    11:14:02.225 INFO GenomicsDBImport - ------------------------------------------------------------
    11:14:02.225 INFO GenomicsDBImport - ------------------------------------------------------------
    11:14:02.226 INFO GenomicsDBImport - HTSJDK Version: 2.14.3
    11:14:02.226 INFO GenomicsDBImport - Picard Version: 2.17.2
    11:14:02.226 INFO GenomicsDBImport - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    11:14:02.226 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    11:14:02.226 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    11:14:02.226 INFO GenomicsDBImport - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    11:14:02.226 INFO GenomicsDBImport - Deflater: IntelDeflater
    11:14:02.226 INFO GenomicsDBImport - Inflater: IntelInflater
    11:14:02.226 INFO GenomicsDBImport - GCS max retries/reopens: 20
    11:14:02.227 INFO GenomicsDBImport - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    11:14:02.227 INFO GenomicsDBImport - Initializing engine
    11:14:32.496 INFO IntervalArgumentCollection - Processing 63025520 bp from intervals
    11:14:32.619 INFO GenomicsDBImport - Done initializing engine
    Created workspace /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1
    11:14:32.934 INFO GenomicsDBImport - Vid Map JSON file will be written to /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1/vidmap.json
    11:14:32.934 INFO GenomicsDBImport - Callset Map JSON file will be written to /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1/callset.json
    11:14:32.934 INFO GenomicsDBImport - Complete VCF Header will be written to /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1/vcfheader.vcf
    11:14:32.934 INFO GenomicsDBImport - Importing to array - /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1/genomicsdb_array
    11:14:32.971 INFO ProgressMeter - Starting traversal
    11:14:32.972 INFO ProgressMeter - Current Locus Elapsed Minutes Batches Processed Batches/Minute
    11:15:05.437 INFO GenomicsDBImport - Importing batch 1 with 5 samples
    11:48:08.400 INFO ProgressMeter - 20:1 33.6 1 0.0
    11:48:08.401 INFO GenomicsDBImport - Done importing batch 1/1
    11:48:08.401 INFO ProgressMeter - 20:1 33.6 1 0.0
    11:48:08.401 INFO ProgressMeter - Traversal complete. Processed 1 total batches in 33.6 minutes.
    11:48:08.401 INFO GenomicsDBImport - Import completed!
    11:48:08.468 INFO GenomicsDBImport - Shutting down engine
    [March 21, 2018 11:48:08 AM UTC] org.broadinstitute.hellbender.tools.genomicsdb.GenomicsDBImport done. Elapsed time: 34.12 minutes.
    Runtime.totalMemory()=8581025792
    Tool returned:
    true

    (gatk) [email protected]:/fastdata/md1jale/WGS_MShef7_iPS/output$ gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1 -O /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'
    Using GATK wrapper script /gatk/build/install/gatk/bin/gatk
    Running:
    /gatk/build/install/gatk/bin/gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1 -O /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls.vcf
    12:04:12.621 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/gatk/build/install/gatk/lib/gkl-0.8.5.jar!/com/intel/gkl/native/libgkl_compression.so
    12:04:12.833 INFO GenotypeGVCFs - ------------------------------------------------------------
    12:04:12.834 INFO GenotypeGVCFs - The Genome Analysis Toolkit (GATK) v4.0.2.0
    12:04:12.834 INFO GenotypeGVCFs - For support and documentation go to https://software.broadinstitute.org/gatk/
    12:04:12.834 INFO GenotypeGVCFs - Executing as [email protected] on Linux v3.10.0-693.11.6.el7.x86_64 amd64
    12:04:12.834 INFO GenotypeGVCFs - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_131-8u131-b11-2ubuntu1.16.04.3-b11
    12:04:12.835 INFO GenotypeGVCFs - Start Date/Time: March 21, 2018 12:04:12 PM UTC
    12:04:12.835 INFO GenotypeGVCFs - ------------------------------------------------------------
    12:04:12.835 INFO GenotypeGVCFs - ------------------------------------------------------------
    12:04:12.835 INFO GenotypeGVCFs - HTSJDK Version: 2.14.3
    12:04:12.835 INFO GenotypeGVCFs - Picard Version: 2.17.2
    12:04:12.835 INFO GenotypeGVCFs - HTSJDK Defaults.COMPRESSION_LEVEL : 1
    12:04:12.835 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false
    12:04:12.836 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true
    12:04:12.836 INFO GenotypeGVCFs - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false
    12:04:12.836 INFO GenotypeGVCFs - Deflater: IntelDeflater
    12:04:12.836 INFO GenotypeGVCFs - Inflater: IntelInflater
    12:04:12.836 INFO GenotypeGVCFs - GCS max retries/reopens: 20
    12:04:12.836 INFO GenotypeGVCFs - Using google-cloud-java patch 6d11bef1c81f885c26b2b56c8616b7a705171e4f from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes
    12:04:12.836 INFO GenotypeGVCFs - Initializing engine
    terminate called after throwing an instance of 'VariantQueryProcessorException'
    what(): VariantQueryProcessorException : Could not open array genomicsdb_array at workspace: /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1

  • md1jalemd1jale Member

    I changed the output of the database /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database/ which uses a lustre file system to another directory /data (which has another architecture im not sure off)

    Apparently the Lustre architecture can cause problems for certain types of database e.g. SQLite
    databases.

    It is running and generating joint calls now.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Great to hear you solved your issue @md1jale.

  • md1jalemd1jale Member
    edited March 21

    Hi Shlee,

    I'm back again - im a bit confused looking at the VCF output - I seem to have only an output on chromosome 20. Does this have anything to do with the interval i s specified?

    here's the head of my vcf file

    more mshef7_raw_variants_jointcalls.vcf

    fileformat=VCFv4.2
    
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  3894STDY7081741 3894STDY7081742 3894STDY7081743 3894STDY7081744 3894STDY7081745 3894STDY7191358
    20      61138   .       CT      C       71.97   .       AC=4;AF=0.333;AN=12;BaseQRankSum=-2.020e-01;ClippingRankSum=0.00;DP=185;ExcessHet=7.7512;FS=0.000;MLEAC=4;MLEAF=0.333;MQ=60.00;MQRankSum=0.00;QD=0.77;ReadPosRankSum=0.059;SOR=0.644    GT:AD:DP:GQ:PL  0/1:32,6:41:28:28,0,772 0/0:28,0:28:0:0,0,890   0/1:18,3:21:21:21,0,406 0/1:13,3:16:36:36,0,303 0/1:15,3:18:30:30,0,351 0/0:46,0:46:14:0,14,1687
    20      61795   .       G       T       3861.03 .       AC=8;AF=0.667;AN=12;BaseQRankSum=0.728;ClippingRankSum=0.00;DP=184;ExcessHet=6.1542;FS=6.598;MLEAC=8;MLEAF=0.667;MQ=60.00;MQRankSum=0.00;QD=21.33;ReadPosRankSum=0.408;SOR=1.182        GT:AD:DP:GQ:PL  0/1:17,13:30:99:453,0,637       0/1:12,15:27:99:577,0,447       0/1:24,18:42:99:584,0,894       1/1:0,19:19:57:813,57,0 1/1:0,24:24:72:977,72,0 0/1:25,14:39:99:497,0,956
    20      62731   .       C       A       3649.03 .       AC=8;AF=0.667;AN=12;BaseQRankSum=-2.070e-01;ClippingRankSum=0.00;DP=183;ExcessHet=6.1542;FS=0.000;MLEAC=8;MLEAF=0.667;MQ=60.00;MQRankSum=0.00;QD=19.94;ReadPosRankSum=-1.034e+00;SOR=0.725      GT:AD:DP:GQ:PL  0/1:17,23:40:99:877,0,600       0/1:21,13:34:99:406,0,791       0/1:26,19:45:99:629,0,989       1/1:0,18:18:54:767,54,0 1/1:0,10:10:30:408,30,0 0/1:19,17:36:99:602,0,722
    20      63799   .       C       T       4468.03 .       AC=8;AF=0.667;AN=12;BaseQRankSum=1.40;ClippingRankSum=0.00;DP=192;ExcessHet=6.1542;FS=0.543;MLEAC=8;MLEAF=0.667;MQ=60.00;MQRankSum=0.00;QD=23.27;ReadPosRankSum=0.139;SOR=0.621 GT:AD:DP:GQ:PL  0/1:21,12:33:99:437,0,813       0/1:18,13:31:99:471,0,682       0/1:13,27:40:99:1048,0,454      1/1:0,25:25:75:1049,75,0        1/1:0,22:22:66:923,66,0 0/1:25,16:41:99:580,0,998
    20      64223   .       A       AT      1358.02 .       AC=7;AF=0.583;AN=12;BaseQRankSum=0.814;ClippingRankSum=0.00;DP=146;ExcessHet=9.1645;FS=12.653;MLEAC=6;MLEAF=0.500;MQ=59.96;MQRankSum=0.00;QD=12.35;ReadPosRankSum=0.063;SOR=1.236       GT:AD:DP:GQ:PGT:PID:PL  0/1:8,13:21:99:.:.:228,0,135    0/1:9,9:23:99:.:.:192,0,180     0/1:12,11:23:99:.:.:201,0,212   1/1:1,11:14:17:.:.:282,17,0     0/1:2,8:10:20:0|1:64223_A_AT:152,0,20   0/1:9,17:31:99:.:.:345,0,161
    20      65288   .       G       T       3943.03 .       AC=8;AF=0.667;AN=12;BaseQRankSum=0.111;ClippingRankSum=0.00;DP=177;ExcessHet=6.1542;FS=4.473;MLEAC=8;MLEAF=0.667;MQ=60.00;MQRankSum=0.00;QD=22.40;ReadPosRankSum=1.20;SOR=0.445 GT:AD:DP:GQ:PL  0/1:17,16:33:99:568,0,607       0/1:17,13:30:99:478,0,675       0/1:19,20:39:99:752,0,635       1/1:0,18:18:54:743,54,0 1/1:0,16:16:48:704,48,0 0/1:20,20:40:99:738,0,774
    
  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Exactly. GenomicsDBImport does import at only single interval. You need to script your way out for all the major contigs that you need variants from.

  • md1jalemd1jale Member

    Hi SkyWarrior,
    To clarify - do i need to create databases for each chromosome using GenomicsDBImport and then call GenotypeGVCFs
    e.g.
    #for chr1
    gatk GenomicsDBImport -R /fastdata/md1jale/reference/hs37d5.fa --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24811_1_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24150_1_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24144_2_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24712_6_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24811_2_1.bam.g.vcf' --genomicsdb-workspace-path /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1 --intervals 1 --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database1 -O /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls_chr1.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    #for chr2
    gatk GenomicsDBImport -R /fastdata/md1jale/reference/hs37d5.fa --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24811_1_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24150_1_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24144_2_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24712_6_1.bam.g.vcf' --variant /fastdata/md1jale/WGS_MShef7_iPS/output/'24811_2_1.bam.g.vcf' --genomicsdb-workspace-path /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database2 --intervals 2 --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_database2 -O /fastdata/md1jale/WGS_MShef7_iPS/output/wt_mshef7_raw_variants_jointcalls_chr2.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

    Sorry -been struggling to find documentation on this.
    Also where can i find the next step after joint calling - im presuming some sort of Quality control recalibration/filtering!

  • md1jalemd1jale Member

    Btw - i need to call variants on all chromosomes - so im looking for a single steps to make my life easier..

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited March 22

    This is what you need to do exactly with GenomicsDBImport. Another solution would be using CombineGVCFs before GenotypeGVCFs. WDL is a nice solution for this issue but I am using my own perl script that runs over all chromosomes with GenomicsDBImport.

    Also you can use --sample-name-map samplennames.txt to feed all your variant files to GenomicsDBImport instead of calling --variant each time seperately.

    Just create a tab separated file with sample_name vcfpathname and GenomicsDBImport will gather samples from that directory of yours.

    Perl Script to import

    #!/usr/bin/perl use warnings; use strict; my $gatkcommand = "gatk --java-options '-Xmx12g -Djava.io.tmpdir=./tmp' GenomicsDBImport -R \$HG38 --sample-name-map sn.txt "; my $contigfile = 'contigs.list'; open my $contig, $contigfile or die "Could not open $contigfile: $!"; while( my $line = <$contig>) { chomp($line); my $gatkfinalcommand = $gatkcommand."--genomicsdb-workspace-path ".$line." -L ".$line; system($gatkfinalcommand); }

    Perl script to genotype
    #!/usr/bin/perl use warnings; use strict; my $gatkcommand = "gatk --java-options '-Xmx12g -Djava.io.tmpdir=./tmp' GenotypeGVCFs -R \$HG38 "; my $contigfile = 'contigs.list'; open my $contig, $contigfile or die "Could not open $contigfile: $!"; while( my $line = <$contig>) { chomp($line); my $gatkfinalcommand = $gatkcommand."--variant gendb://".$line." -O All_".$line.".vcf"; system($gatkfinalcommand); }

    This is how it looks after the operation.

    Finally I combine all these vcf files into one vcf file.

    PS: Heck I can even do this in parallel by seperating contigs into multiple groups. It will accelerate things quite a bit.

  • md1jalemd1jale Member
    edited March 22

    SkyWarrior. Thank you very much - i saw only half of your message excluding the script - that clarifies what i need. I'm indebted to you SkyWarrior!

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    That is a current limitation of GenomicsDBImport but I believe they are working on a solution to fix the single interval issue.

  • md1jalemd1jale Member

    the trouble here is
    I have 9 samples of which l I need to do joint calling on the 1st 6 samples and then 9 samples (6 samples in 1 VCF file, 9 samples in the other) - so i have to create 2 separate databases as well

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭
    edited March 22

    Actually no. You can import all of them together and do joint genotyping and later extract only first 6 into one separate VCF. You don't need to do joint genotyping on separate occasions.

  • md1jalemd1jale Member

    That makes sense. The command GenotypeGVCFs just points to the database location (which has lest say the 9 samples). How can I let GenotypeGVCFs take a subset of samples ?

    gatk GenotypeGVCFs -R /fastdata/md1jale/reference/hs37d5.fa -V gendb:///data/md1jale/mshef7_database1 -O /fastdata/md1jale/WGS_MShef7_iPS/output/mshef7_raw_variants_jointcalls_chr1.vcf --java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true'

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    No you joint genotype all but after the genotyping extract variants based on only the first 6 samples only. Use
    gatk SelectVariants -R $Ref -V variant.vcf -sn sample1 -sn sample2 -sn sample3 -sn sample4 -sn sample5 -sn sample6 -O first6samples.vcf

    This will select the joint genotyped results of the first 6 samples for you after joint genotyping

  • md1jalemd1jale Member
    edited March 22

    Thank you very much for your quick responses.

    Can you elaborate on how the method you use to combine vcf files into one vcf file? I'm considering using vcf-concat from vcftools.

    1) Also how important is applying (Quality Score) Recalibration [VariantRecalibrator, ApplyRecalibration] after?
    2) Are there any standard filters you use to filter the Combined VCF after (1), if (1) is necessary?

    Here is what I have used in the past (not sure about GATK4):
    java -Xmx4g -jar /usr/local/GenomeAnalysisTK-2.8-1/GenomeAnalysisTK.jar \
    -R /lab01/DataSets/bundle_GATK/hg19/ucsc.hg19.fasta \
    -T VariantFiltration \
    --variant OUTPUT.vcf \
    -o OUTPUT.recalibrated.filtered.vcf \
    --clusterWindowSize 10 \
    --filterExpression "MQ0 >= 4 && ((MQ0 / (1.0 * DP)) > 0.1)" \
    --filterName "HARD_TO_VALIDATE" \
    --filterExpression "DP < 5 " \
    --filterName "LowCoverage" \
    --filterExpression "QUAL < 30.0 " \
    --filterName "VeryLowQual" \
    --filterExpression "QUAL > 30.0 && QUAL < 50.0 " \
    --filterName "LowQual" \
    --filterExpression "QD < 1.5 " \
    --filterName "LowQD" \
    --filterExpression "SB > -10.0 " \
    --filterName "StrandBias"
    I would appreciate if it is possible to share these downstream steps (examples even though it might be project specific -- it really would someone like me who doesn't know much about standard parameters).

    Link: https://software.broadinstitute.org/gatk/best-practices/workflow?id=11145

    NB: my starting point for this project was BAM files generated from a well known centre so the generic assumption is that it has been aligned/qc'd well.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    To combine VCFs @md1jale, please use CombineVariants.
    For VQSR details, see this article.
    For hard filtering thresholds, see this article and this tutorial.

  • md1jalemd1jale Member
    edited March 23

    Hiya,
    I would have thought it is CatVariants as my files are generated per chromosome.
    I saw somewhere the option of MergeVcfs?
    Not sure which options under CombineVariants - merge/union?


    MergeVCF -seems to work here!

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    @shlee I believe CombineVariants is not a part of GATK4.0. MergeVcfs from PICARD or GATK4.0 is more suitable for this job.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @md1jale and @SkyWarrior,

    I saw:

    the method you use to combine vcf files into one vcf file

    and assumed combining callsets for different samples.

    Do you mean you want to concatenate VCFs from different genomic loci for the same samples, e.g. as would be produced by scattering the analysis over genomic loci? In this case, we recommend GatherVcfs or GatherVcfsCloud.

  • md1jalemd1jale Member

    No I meant merging across chromosomes for the same samples.

    Also, for 9 whole genome samples - would you recommend Variant Recalibration or Hard filtering or both? -> Is it mandatory to do this after joint calling/ can I do without it?

    The reason I ask is - unfortunately i have a very limited amount of time to finish up a project

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    @md1jale, our Best Practices recommend VQSR when possible, which nine human whole genome sequences would enable. We're not the folks you should be asking about steps being mandatory, especially given we know nothing about your research aims. It is for you to determine what your project requires. Of course, you are welcome to continue using our forum as a discussion arena.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    Without some sort of variant filtering your VCF file will not have PASS or FILTERED expressions per variant therefore your study will not be conclusive. From a single WGS you will be getting bazillions of 'variants' that are not worth pursuing. You need to filter them out.

  • SkyWarriorSkyWarrior TurkeyMember ✭✭✭

    @shlee said:
    @md1jale and @SkyWarrior,

    I saw:

    the method you use to combine vcf files into one vcf file

    and assumed combining callsets for different samples.

    Do you mean you want to concatenate VCFs from different genomic loci for the same samples, e.g. as would be produced by scattering the analysis over genomic loci? In this case, we recommend GatherVcfs or GatherVcfsCloud.

    This was good to know. But also under certain circumstances MergeVcfs from Picard also produces the same result.

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Yes, MergeVcfs is the tool we use otherwise to gather VCFs as illustrated in this outdated WDL task.

  • niuguohaoniuguohao Member

    @SkyWarrior @md1jale @shlee
    hello! I got the same warning message . I used GATK4.0.4.0
    "WARN DepthPerSampleHC - Annotation will not be calculated, genotype is not called or alleleLikelihoodMap is null"

    My command is
    $GATK --java-options "-Xmx10240m -Djava.io.tmpdir=./" HaplotypeCaller -R $GENOME -I seq103-md.bam -O seq103.g.vcf -ERC GVCF

    Is there something wrong with my data ? But I haven't got this kind of warning when I used GATK3.8.

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @niuguohao
    Hi,

    I responded here.

    No need to post twice.

    -Sheila

  • wkyanwkyan Member

    when use HaplotypeCaller, i got the warning.
    WARN IntelPairHmm - Flush-to-zero (FTZ) is enabled when running PairHMM
    What does this warning mean?

  • shleeshlee CambridgeMember, Broadie, Moderator admin

    Hi @wkyan,

    Flush-to-zero (FTZ) relates to rounding or return of zero for very small numbers, apparently referred to as denormals. This is enabled by default for the PairHMM mode of HaplotypeCaller and is discussed here. You can read more about it in this wikipedia article.

Sign In or Register to comment.