GenotypeGVCFs hanging.

smilefreaksmilefreak New ZealandMember
edited May 2015 in Ask the GATK team

Hi All,

I cannot perform joint genotype calling using gVCFs using the GenotypeGVCF command on my dataset containing 352 mtDNA samples.

Does any one has run into this problem before, or has suggestions on how to get this working.

I have attached the standard output for a hanging run.

I can generate VCF files individually no problems.

INFO  17:06:45,732 GenomeAnalysisEngine - Strictness is SILENT
INFO  17:06:45,867 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
INFO  17:06:47,721 MicroScheduler - Running the GATK in parallel mode with 4 total threads, 1 CPU thread(s) for each of 4 data thread(s), of 20 processors available on this machine
INFO  17:06:47,791 GenomeAnalysisEngine - Preparing for traversal
INFO  17:06:47,793 GenomeAnalysisEngine - Done preparing for traversal
INFO  17:06:47,794 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
INFO  17:06:47,794 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
INFO  17:06:47,795 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
INFO  17:06:48,436 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
INFO  17:07:17,800 ProgressMeter - gi|251831106|ref|NC_012920.1|:1         0.0    30.0 s      49.6 w        0.0%   15250.3 w   15250.3 w
Waiting for data... (interrupt to abort)

GATK VERSION: The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
java version "1.7.0_51"

Thanks for your time.
James Boocock

Best Answer

Answers

  • smilefreaksmilefreak New ZealandMember
    edited May 2015

    Just performed an experiment it seems it only starts occurring when the file number >= 30.

    With less than 30 it runs fast as expected it's just a mtDNA run.

    Everything works fine on a MBP with java version "1.8.0_45", is this a issue with that version of java?

    Post edited by smilefreak on
  • smilefreaksmilefreak New ZealandMember

    Sorry about the disjointed information, disregard the above comment.

    I made a mistake, it does not work on both the MBP and the server, but if I change to another dataset containing lots of samples it seems to work as expected. So it looks like an issue with joint calling using my dataset. Does any one have any suggestions?

  • Geraldine_VdAuweraGeraldine_VdAuwera Cambridge, MAMember, Administrator, Broadie

    Two things to try: does it always hang at the same position? If so, any sites with very large numbers of alleles there? And does it still hang if you disable multithreading?

  • smilefreaksmilefreak New ZealandMember
    edited May 2015

    It hangs at position 1,101 or 301 from the log depending on how many samples I try and joint call with. It hangs after disabling multithreading. It works with 59 samples using 4 cores and 29 with 2 cores. Using no threading it starts to stall at 29 samples.

    INFO  15:35:41,819 HelpFormatter - --------------------------------------------------------------------------------
    INFO  15:35:41,825 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
    INFO  15:35:41,825 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO  15:35:41,826 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO  15:35:41,833 HelpFormatter - Program Args: -T GenotypeGVCFs -R /Volumes/BiochemXsan/staff_users/jamesboocock_temp/scratch/ancient_dna_pipeline/ref/rCRS.fa -o out.vcf --variant 1008.final.gvcf --variant 1032.final.gvcf --variant 1075.final.gvcf --variant 1076.final.gvcf --variant 1081.final.gvcf --variant 1113.final.gvcf --variant 1120.final.gvcf --variant 1129.final.gvcf --variant 1130.final.gvcf --variant 1132.final.gvcf --variant 1137.final.gvcf --variant 1140.final.gvcf --variant 1141.final.gvcf --variant 1144.final.gvcf --variant 1148.final.gvcf --variant 1152.final.gvcf --variant 1153.final.gvcf --variant 1160.final.gvcf --variant 1161.final.gvcf --variant 1162.final.gvcf --variant 1163.final.gvcf --variant 1168.final.gvcf --variant 1172.final.gvcf --variant 1175.final.gvcf --variant 1181.final.gvcf --variant 1184.final.gvcf --variant 1185.final.gvcf --variant 1187.final.gvcf --variant 1193.final.gvcf --variant 1194.final.gvcf --variant 1201.final.gvcf --variant 1202.final.gvcf --variant 1203.final.gvcf --variant 1204.final.gvcf --variant 1206.final.gvcf --variant 1211.final.gvcf --variant 1214.final.gvcf --variant 1215.final.gvcf --variant 1217.final.gvcf --variant 1221.final.gvcf --variant 1222.final.gvcf --variant 1223.final.gvcf --variant 1224.final.gvcf --variant 1225.final.gvcf --variant 1226.final.gvcf --variant 1228.final.gvcf --variant 1229.final.gvcf --variant 1240.final.gvcf --variant 1254.final.gvcf --variant 1257.final.gvcf --variant 1261.final.gvcf --variant 1264.final.gvcf --variant 1269.final.gvcf --variant 1270.final.gvcf --variant 1272.final.gvcf --variant 1273.final.gvcf --variant 1275.final.gvcf --variant 1277.final.gvcf --variant 1278.final.gvcf --variant 1280.final.gvcf --variant 1281.final.gvcf --variant 1284.final.gvcf --variant 1287.final.gvcf --variant 1288.final.gvcf --variant 1289.final.gvcf --variant 1292.final.gvcf --variant 1293.final.gvcf --variant 1294.final.gvcf --variant 1295.final.gvcf --variant 1296.final.gvcf --variant 1297.final.gvcf --variant 1298.final.gvcf --variant 1300.final.gvcf --variant 1307.final.gvcf --variant 1308.final.gvcf --variant 1309.final.gvcf --variant 1310.final.gvcf --variant 1311.final.gvcf --variant 1312.final.gvcf --variant 1313.final.gvcf --variant 1314.final.gvcf --variant 1315.final.gvcf --variant 1316.final.gvcf --variant 1317.final.gvcf --variant 1318.final.gvcf --variant 1319.final.gvcf --variant 1320.final.gvcf --variant 1321.final.gvcf --variant 1322.final.gvcf --variant 1323.final.gvcf --variant 1324.final.gvcf --variant 1325.final.gvcf --variant 1326.final.gvcf --variant 1327.final.gvcf --variant 1328.final.gvcf --variant 1329.final.gvcf --variant 1330.final.gvcf --variant 1331.final.gvcf --variant 1332.final.gvcf --variant 1333.final.gvcf
    INFO  15:35:41,861 HelpFormatter - Executing as jamesboocock@biochemcompute.otago.ac.nz on Linux 2.6.32-504.8.1.el6.x86_64 amd64; OpenJDK 64-Bit Server VM 1.7.0_75-mockbuild_2015_01_08_20_32-b00.
    INFO  15:35:41,862 HelpFormatter - Date/Time: 2015/05/25 15:35:41
    INFO  15:35:41,862 HelpFormatter - --------------------------------------------------------------------------------
    INFO  15:35:41,863 HelpFormatter - --------------------------------------------------------------------------------
    INFO  15:35:43,189 GenomeAnalysisEngine - Strictness is SILENT
    INFO  15:35:43,406 GenomeAnalysisEngine - Downsampling Settings: Method: BY_SAMPLE, Target Coverage: 1000
    INFO  15:35:45,122 GenomeAnalysisEngine - Preparing for traversal
    INFO  15:35:45,123 GenomeAnalysisEngine - Done preparing for traversal
    INFO  15:35:45,124 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO  15:35:45,139 ProgressMeter -                 | processed |    time |    per 1M |           |   total | remaining
    INFO  15:35:45,156 ProgressMeter -        Location |     sites | elapsed |     sites | completed | runtime |   runtime
    INFO  15:35:45,531 GenotypeGVCFs - Notice that the -ploidy parameter is ignored in GenotypeGVCFs tool as this is automatically determined by the input variant files
    INFO  15:36:15,166 ProgressMeter - gi|251831106|ref|NC_012920.1|:301         0.0    30.0 s      49.7 w        1.8%    27.6 m      27.1 m
    
  • SheilaSheila Broad InstituteMember, Broadie, Moderator

    @smilefreak
    Hi James,

    Are there any sites that have a large number of alleles? Have you tried CombineGVCFs on batches of the GVCF before running GenotypeGVCFs?

    -Sheila

  • smilefreaksmilefreak New ZealandMember
    edited May 2015

    Hi Geraldine, Sheila

    Thanks for that will change the output extension of these files, it seems to have always done the right thing with these files on other datasets.

    We discovered yesterday that a mistake was made sending the barcode's to the sequencing center leading to multiple samples being represented in the same fastq files. My intuition is that potentially this has created large number of haplotypes for the ones with incorrect barcodes causing downstream stalling when performing joint genotyping.

    Thanks for your help.

Sign In or Register to comment.