zzq

About

Username
zzq
Location
China
Joined
Visits
318
Last Active
Roles
Member
Points
33
Badges
7
Location
China

Comments

  • Hi all, Thank you very much. I think it is better to put these description on this site. http://software.broadinstitute.org/software/genomestrip/node_ReferenceMetadata.html Best regards, Zhuqing
  • Hi @Tiffany_at_Broad Bob said in the email that the reliably alignable positions have been marked as 1 after running ComputeGenomeMask. But from these sites https://gatkforums.broadinstitute.org/gatk/discussion/1492/genome-mask-files https://gatkfo…
  • Hi everyone, Can anyone have a double check and update this description https://gatkforums.broadinstitute.org/gatk/discussion/23254/broad-website-contact-form-two-things-want-to-confirm-with-you-before-running-genome-strip#latest? Thank you very muc…
  • Hi @Tiffany_at_Broad Thank you very much. Yes, I think he is right. Can anyone have a double-check about this as I can not find any descriptions about this from the following link, http://software.broadinstitute.org/software/genomestrip/node_Refere…
  • Dear @Sheila I think following can get I want. Thank you. -T SelectVariants \ -R ref.fa \ --selectTypeToExclude INDEL --selectTypeToExclude MIXED --selectTypeToExclude MNP --selectTypeToExclude SYMBOLIC \ --selectTypeToInclude NO_VARIATION…
  • Dear @Sheila Sorry for the late reply. When using your recommended parameters, there will be nothing in the output. I just want to keep the sites with one allele or . in ALT column. Best Zhuqing
  • Hi @Sheila Yes, I have tried with --restrictAllelesTo BIALLELIC, but the non-variant sites (sites with . listed in ALT column) will also be excluded. But I want to keep them. Best Zhuqing
  • Hi @Geraldine_VdAuwera thanks, but how can i filter these sites using SelectVariants. Hope your help. best Zhuqing
  • Dear @shlee Yes, the same problem reproduced with the latest GATK. The command java -Xmx100g /bin/GATK3.7/GenomeAnalysisTK.jar \ -T SelectVariants \ -R ref.fa \ --selectTypeToExclude INDEL --selectTypeToExclude MIXED --selectTypeToExcl…
  • Hi @Sheila Thank you. I have run with --selectTypeToExclude, but the results also report the sites with * or multiple allele listed in ALT column (like following). The version of GATK 3.5-0-g36282e4 NC_005044.2 1229 . A . …
  • Hi @Geraldine_VdAuwera , You means that the GATK will give a genotype based on the PLs. If yes, the genotypes match the likelihoods well, so I do not need to test whether the genotype assumed by GATK always have the lowest PLs. Am I right? Many th…
  • Hi @Geraldine_VdAuwera , Many thanks, can CatVariants be used for gvcf? I will take some time to understand the WDL. Best
  • Hi @Sheila First, my programs have finished. I want to know the methods to run parallelism or WDL, I find the parallelism(nt nct) is not suitable for HaplotypeCaller in GVCF model. I think I can generate gvcf for each chromosome. But for those sca…
  • Hi @Geraldine_VdAuwera , Sorry, I can not express my thoughts well. I know, HaplotypeCaller will give a phred scaled likelihood(PL) for each genotype, I want to know whether the genotypes are consistent with the PL(if so, the genotype should have t…
  • Hi @@Sheila Thanks, From the summary of ValidateSamfile, I think it is ok(like following). But I found these samples are all used for outgroups, maybe they have many deep divergence sites, can this explain the slow process? best HISTOGRAM java…
  • Hi @Sheila @Geraldine_VdAuwera I found the process of combining gvcf and genotyping combined gvcf is time-consuming. I want to run it by chromosome. I know it will be ok, but I do not know when and which step should I run by chromosome. If I have w…
  • Hi @bhandsaker I found the fdr in duplication is higher than deletion. I do not know why this bias occurred, I just think the intensity value will be messy in the duplication. And from your published paper (2504 individuals SV database), I also fo…
    in IRS TEST Comment by zzq June 2016
  • Hi @Geraldine_VdAuwera , You means that the samples finished with Java 1.8 successfully should also rerun again (call GVCF)? If the effects on filtering are weak, I will not rerun for these samples because it will take me a long time to do this. …
  • Hi @Geraldine_VdAuwera , Yes, I am sure. But the number of files in the tmp is so large(about 11000, it is difficult for me to open it with ls). If java-1.8 ok, I just doubt that it is difficult to load and write tmp files in this directory and I …
  • Hi @Sheila Thanks, I have generated the gvcf for many individuals successfully using latest stable version with java-1.8, Is there any problem for these gvcf files when using Java-1.8. Best
  • @bhandsaker Hi, recently, I have run CNVDiscoveryPipeline successfully, but some redundant regions are in my result. How should I remove the regions which have overlap and get a suitable copy number for the merged regions ? What is more, If I just…
  • Hi @bhandsaker , The version is version 2.00 (build 1650) and the following is the input VCF record for DEL_70. I hope these information can help you with my problem. 1 3081535 DEL_70 G <DEL> . COVERAGE;DEPTH;DEPTHPVAL CIEND=-68,…
  • Hi @bhandsaker I really find it is difficult for me to work out this problem. I noticed that the region 1:3081468-3081735 are masked as 0 in ref.svmask.fa file, while they are mostly masked as 1 in ref.gcmask.fa. I hava run again, but it just giv…
  • Hi @bhandsaker , Thanks for your help, you said I should look at the CN,CNQ,CNL,CNP fields to get the genotype information for each sample. I have a test deposited in the directory of installtest.Yes, it did not give me a genotype liking 0/0 0/1 1/…
  • Hi @Geraldine_VdAuwera @bhandsaker I came across the same error when I run CNVDiscoveryPipeline.q on our server (one node with 1T memory, 64 processes) using following commands. Can I run CNVDiscoveryPipeline.q without using LSF. If can, how can …
  • Hi @bhandsaker , I am sorry I can not totally understand the CNVDiscoveryPipeline. Because I can not find -o parameter liking SVDiscovery(after SVDiscovery, it will give us a series of sites using which be used following genotype through -o) in CN…
  • Dear all, @Geraldine_VdAuwera @Sheila I am running HaplotypeCaller to call genotype for each confident site including reference homozygous using following command, java -Xmx100g -jar GenomeAnalysisTK-3.5.jar \ -R reference.fa \ -…
  • Hi @Sheila Thanks for your help. The record from the gvcf looks like normally, like following (Image) The original bam file is large more than 30G, so I just get the region 1:400-500 and the IGV screenshot is posted below. Many thanks.
  • Hi @Geraldine_VdAuwera @Sheila I want to have a talk about this post again. I just came across the same problem, My data are whole genome sequencing data(about 12X) . my commands like following java -Xmx100g -jar /home/share/bin/GenomeAnalysisTK-3…
  • Hi @Geraldine_VdAuwera , Many thanks, you said I can avoid this problem by having the GATK tools themselves emit gzipped files, but I got the same warnings from CombineGVCFs (GATK v3.5) when I used .g.vcf.gz / .g.vcf.gz.tbi inputs (generated by Ha…
  • Hi @Geraldine_VdAuwera Many thanks, but for the CombineGVCFs, can I separate the gvcf files into small, then combine those combined gvcf again to get the final result ? Many thanks.
  • Hi @Geraldine_VdAuwera @Sheila I am getting the same warning (GATK v3.5). I do not know whether these warnings will have an effect on the result. I have more than 100 samples and run HaplotypeCaller for each sample with the --emitRefConfidence GVC…
  • Hi @Sheila, Sorry for the late reply. I found the VCF file produced by above commands will have a tag PGT. I think this will be useful for me. I should instead the GT using the PGT and then pass it to beagle. Do you think this will give me a more a…
  • Hi @Sheila , The commands I ran like following, for each sample, java -Xmx50g -jar GenomeAnalysisTK-3.5.jar \ -R ref.fa \ -T HaplotypeCaller -nct 8 \ -I $d.sorted.uniqe.rg.dedup.realn.bam \ -o $d.g.vcf \ --ge…
  • Hi @delangel @Geraldine_VdAuwera , Very nice explanation for beginners. For me, I have a vcf obtained by GATK (version 3.5) HaplotypeCaller running in -ERC GVCF and then using CombineGVCFs and GenotypeGVCFs. You said that post-processing variant c…
  • Hi @bhandsaker, From you shown above, If a CNVR detected using three samples contains three probes, you will pass a 2X9 matrix which contains the ranks using the intensity data and the affected status determined by copy number to R rather than 2X3 …
    in IRS TEST Comment by zzq January 2016
  • Hi @bhandsaker Yes, you are right, but the IRS test is an efficient way to validate the accuracy of the detected CNVRs globally using different platform data. I have never seen another way can achieve this goal. According to you, I can get a matri…
    in IRS TEST Comment by zzq January 2016
  • Hi @bhandsaker , Many thanks for your explain and example. But I also feel puzzled. First, for a region, if there are N samples and K probes, I will get a matrix which contains NK values, why are the values from 1 to NK but not 1 to N ? For me, I …
    in IRS TEST Comment by zzq January 2016
  • Hi @bhandsaker Can you help me with this question? Thanks.
    in IRS TEST Comment by zzq January 2016
  • Hi @bhandsaker, After looking this (http://www.broadinstitute.org/software/genomestrip/org_broadinstitute_sv_qscript_CNVDiscoveryPipeline.html), I realize I should run two different discovery pipelines. Am I right? Thank you.
  • Hi @bhandsaker, Yes, I just run the pipeline deposited in the directory of installest. How should I change the parameters to get duplications ? I have succeeded in other tools, liking CNVnator, breakdancer.., but only failed in GenomeSTRiP. Thank…
  • @Geraldine_VdAuwera I have checked the right encoding file and found that the quality which are 1, 4 and 8 should not be here. I hope that you can provide me some tools to filter these bad reads in bam. Thanks!
  • @Geraldine I just have got the distribution of quality score by picard QualityScoreDistribution. The results like following, QUALITY COUNT_OF_Q1 14 38 133 10937404036 114898437 195265338 1509361039 1416443…
  • @Geraldine_VdAuwera I am sorry for my bad internet. I hope someone can provide some good ideas for the errors caused by mapping quality scores or some tools can fix these problems. Thanks !
  • @Geraldine_VdAuwera‌ yes, I converted by solid2fastq program in the bfast aligner. Here, I also met the same questions for processing illumina Hiseq data with -fixMisencodedQual argument and without any argument related to the quality scores. For t…
  • Thanks for your reply and sorry for my messy expression. I just mapped my single end reads by bfast (0.7.0-a), then sorted and removed duplication by picard. (commands are as above) For subsequent programs, I will use GATK to do indel realignment, b…
  • @ami‌ @Geraldine_VdAuwera‌ I mapped my reads by tophat2. I have filtered these reads by a simple perl script :perl -e 'while(<>){chomp;@tmp=split/\t/;if($tmp[0]=~/^\@/ or $tmp[5] !~ /(\d+)N(\d+)D(\d+)N/){print "$_\n"}}' input.sam >…
  • Hi all: I have failed when i use SplitNCigarReads(mapping by tophat2). It give me an error like this : Cannot split this read (might be an empty section between Ns, for example 1N1D1N): 10M628N2D203N90M Should I filter these reads mapped by this sty…
  • Hi ,all : Can I use tophat2 to instead of STAR ?
  • Thanks ! I solve my problem . But I also want to ask another question. Can I just use the AD to calculate the major allele frequency and minor allele frequency which didn't filter bad reads ? I noted that sometimes the sum of AD is smaller than DP(w…
    in genotype Comment by zzq April 2014
  • Thanks! I am so happy that you can help me. I want to say that my data is pooled sequenced ,does HaplotypeCaller suit this ?
    in genotype Comment by zzq April 2014
  • I just want to do this. Do you solve this problem ? I hope you can help me too.I have a VCF file and a mapping file (BAM),I want to konw the genotype for each site in the VCF file which also included the ref-hom . Thanks !
  • I am so sorry that I can not express my question in detail. For example, I have three samples , I run mapping,remove duplication,realign and recal respectively and UnfiedGenotyper combin these three samples. The output VCF file contains these sampl…
    in genotype Comment by zzq April 2014