Blue

About

Username
Blue
Joined
Visits
102
Last Active
Roles
Member
Points
33
Badges
6

Comments

  • Hi @Sheila I assumed my initial problem was caused by inputting vcfs instead of gvcfs, I fixed that under your suggestion (with --emitRefConfidence GVCF). There was a problem in CombineGVCFs (recognising the vcf.list), and I had typos in my Genotyp…
  • Hi again @Sheila I haven't submitted the bug report yet. I added --emitRefConfidence GVCF to fix the non-gvcf mistake. Now with Program Args: --analysis_type CombineGVCFs --reference_sequence ../../reference_sequences/dmel/v6.0/dm6.fa --variant r…
  • @Sheila Though note that with my server configuration I omit the java commands, below showing that 3.2 works (with GenomeAnalysisTK --help) but 3.4 doesn't. Note that I haven't used or installed 3.3. -bash-4.1$ module load jdk/1.7.0_51_openjdk -ba…
    in testing 3.4-0 Comment by Blue July 2015
  • Hi @Sheila -bash-4.1$ module load gatk/3.4-0 -bash-4.1$ java -jar GenomeAnalysisTK.jar Unable to access jarfile GenomeAnalysisTK.jar -bash-4.1$
    in testing 3.4-0 Comment by Blue July 2015
  • Hi Gera, yes, I'm running HC on all 222 bam files together. !/bin/sh $ -N HAPCALL $ -pe openmp 20 $ -S /bin/sh $ -cwd $ -j y $ -q bioinf.q . /etc/profile.d/modules.sh module load gatk/3.2.2 jre/1.7.0_25 GenomeAnalysisTK -nct 20 --analysis_ty…
  • Solution using bash instead of GATK: keep only homozygous non-reference genotypes, and annotation lines. for i in $(ls *.UniGenotyper.SNP.vcf) do grep -E '1/1|#' $i > ${i//UniGenotyper.SNP.vcf/UG.SNP.11.vcf} done; # convert to table using gatk…
  • @Biocyberman Note that the -L option is for defining intervals. CallableLoci generates two files, one detailed (and extensive), one a summary. Both describe numerically "Which interval are not well covered", which I am assuming is useful…
  • Hi @Biocyberman I just happened to see your question, and thought I could help. Try the Diagnostics and Quality Control Tools and the Variant Evaluation and Manipulation Tools. These are for bam files.... GenomeAnalysisTK -T BaseCoverageDistribut…
  • Thanks for the response. I wasn't strictly intending on using the realignment intervals for inde/CNV detection, I was just wondering what they meant, and whether they could be useful for something. Would comparing the intervals between samples be in…
  • I was just wondering what you guys thought of my realignment intervals length distribution. This is 30Mb from a single diploid sample without prior indel position information. Approximately 60,000 events , i.e. one every fifty bases seems like a lo…
  • No problem at all with with the late response. In practical terms, my sequenced sample consists of 200 diploid individuals that share the same father but all have different mothers. I'm only really interesting in the genetic variation from the moth…
  • To rcholic or anyone else who stumbles in here, To split a vcf file by sample: Use vcf-subset, with the -c option: -c, --columns File or comma-separated list of columns to keep in the vcf file. If file, one column per row So if your s…
  • Thanks for the response. I'm not specifically looking for low-frequency variants. I'd like to increase the sensitivity of SNP-calling because: 1) There are many homozygous Mendelian errors that would be corrected if the presence of a second allele…
  • I will do, as soon as I've uninstalled java 1.8, or there's space on the cluster. I tried HaplotypeCaller a while ago: it seems to take two weeks for ten samples, but I guess its sensitivity is better than UnifiedGenotyper. I'm still wondering if se…
  • Thanks Sheila. Could you indicate where to alter the inbreeding coefficient filter. There is no mention of it on http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_genotyper_UnifiedGenotyper.html
  • I've just started getting the same error message (Removed too many insertions, header is now negative), having been using ReduceReads for several months on 2.4 and 2.6. I'm interested in the cause of the problem as well as the solution. The two ch…
  • Thanks for answering question 2, was probably just a typo, so apologies for the menial question. My expected average coverage is 35X per sample. I've been advised to exclude SNPs in the tails of the depth distribution as the calls are unreliable, w…
  • Having updated GATK and Java, my script was still failing at the GATK commands. Under the suspicion that this was caused by conflict with another Java-based program (probably an older version of Picard), I specified for program modules to be loaded …
  • Without being intentionally pedantic, has the supported Java version now changed from 1.6 to 1.7 ? I'm assuming that earlier versions of GATK use 1.6 and at some point the later ones started using 1.7. Although I'm primarily using GATK from a unive…
  • Hi Wim, Thanks for your response. I've been running GATK 2.4.9 and Java 1.6_24-b07. I'm specifying which version of Java to use, at the beginning of the shell script. The reason for this was that even if the default cluster version of Java changes …
  • PS Apologies for the weird font in the above message.
  • This QUAL distribution anomaly exists at a similar level in all chromosomes. I'm not performing variant recalibration because most of the variants are likely to be novel. In the UnifiedGenotyper, what metrics go into the calculation which generates…
  • Just to empirically concur, I've used some of the GATK tools successfully on Drosophila HiSeq whole-genome data. If using the Unified Genotyper, the expected heterozygosity option could be calibrated to e.g. 0.14 for D.melanogaster. I assume that t…