biojiangke ✭✭

About

Username
biojiangke
Joined
Visits
221
Last Active
Roles
Member
Points
56
Badges
11

Comments

  • Recently I encountered another instance where this error pops up. I thought it might be good to throw in here in case it shows up for other users. The intervals used for HaplotypeCaller and GenotypeGVCFs have to be EXACTLY the same. Surprisingly, th…
  • This is the gVCF format, which covers the non-variant regions, too. See this: https://gatkforums.broadinstitute.org/gatk/discussion/4017/what-is-a-gvcf-and-how-is-it-different-from-a-regular-vcf
  • @SkyWarrior Are you talking about this option in HaplotypeCaller? --emit-ref-confidence,-ERC:ReferenceConfidenceMode The answer is I don't know whether the original gVCF was called with this option. But I'll give it a try. I think I have used this…
  • This was done using dragen germline pipeline. http://edicogenome.com/pipelines/dragen-germline-v2-pipeline-2/ Unfortunately they would not share details about their variant caller but I believed they used a certain version of GATK. Re-generating …
  • Mmmm...Interesting, I'm pretty sure it is gVCF instead of VCF, as the first few lines shown here: 1 1 . G . . END=369 GT:DP:GQ:MIN_DP:PL 0/0:65:99:54:0,120,1800 1 370 . G A, 389.77 . DP=61;MQ=60;MQRankSum=2.068;ReadPosRankS…
  • Running the validation on just one gVCF file, I got the following error message: A USER ERROR has occurred: A GVCF must cover the entire region. Found 54134140 loci with no VariantContext covering it. The first uncovered segment is:1:20000 Searche…
  • Thanks for the advices. I've been using CombineGVCFs for a while and it has been working well for us. Facing ever increasing amount of WGS data, I'd like to try GenomicsDBImport to gain some performance. I'll give ValidateVariants a try and report b…
  • Here attaches the error log.
  • I'm running on a node in a cluster 2x Intel(R) Xeon(R) CPU E5-2640 v3 @ 2.60GHz 128GB Memory CentOS 7.5 OpenJDK = 1.8.0.181-3.b13.el7_5 OracleJDK = 8u181
  • Thanks for the quick response. With -Xms48g -Xmx48g, still got the same error. Considering there are only six gVCFs about 2G each and a relatively small slice of the genome (1Mbp), I think 48G memories would probably be enough?
  • Thanks for the very detailed explanation. It was really helpful! It's exciting to know the new development of GenomicsDB and I'll take a look at the new database structure.
  • Thanks for the informative responses. It makes sense to use consistent intervals for HaplotypeCaller and CombineGVCFs. But this requirement seems to make GATK less flexible. For example, we have accumulated a large collection of gVCF files from hun…
  • I'm a bit confused. The HaplotypeCaller was run without any -L option but on an entire reference genome assembly. Do you mean I need to run them at chromosome level, one at a time?
  • Some new discoveries on this: For the same interval and reference sequences, using a different set/list of gVCF files does not generate this error. The interval sorting error seems to be triggered by certain gVCF files but not by the reference seque…
  • Thanks for the quick response. But what would be the best way to screen these regions from a large list of gGVCFs? Or simply screen these regions with the reference assembly and exclude them?
  • gatk-package-4.0.3.0-local.jar
  • I did a little experimentation with different "--max-reads-per-alignment-start" values. For my specific case, only the default (50) gave 100% right allele counts and genotype calls (based on pysam allele depth). Increasing this value to 10…
  • Hi, I'm working on some amplicon data. Naturally they come with very high coverage (up to ~8000X). I'm trying to run the analyses with and without downsampling and found some interesting discrepancies. The workflow is: Picard add read groups, GATK …
  • sites.list (GTAK format but only 1bp) 2:121169112 2:35031950 2:42889592 2:4365346 Combine VCF step: gatk CombineGVCFs -R ref.fa -L sites.list -O Combined.vcf -V gVCF.list Joint call step: gatk GenotypeGVCFs -O GTed.vcf -R ref.fa -V Combined.vcf…
  • Sometimes I can see this is an InDel issue, but sometimes it is a SNP, but the position is still off.
  • For example, I supply a position to the GenotypGVCF operation in this format: 2:42889589 with the -L option. I would expect genotypes from this location, but instead, I got variant information at 2:42889592, which is 3bp away from the expected locat…
  • Hi, I have a question about the behavior of the interval option in CombineGVCF: I understand it could take standard samtools/GATK format chr:start-end, and BED format, but it also could take the format of chr:pos, as I tried. I would think GATK pro…
  • Hi, I have a question about the behavior of the interval option in CombineGVCF: I understand it could take standard samtools format chr:start-end, and BED format, but it also could take the format of chr:pos, as I tried. I would think GATK processe…