Filtering fastq files

Sir,
Hope this mail will find you in sound. Is there any way to filter my fastq files using gatk. I can filter sam, bam, vcf files using gatk and picard. I read all the tutorials in which gatk deals with .sam/.bam/.fasta/.vcf. Would you please to give me some suggestions about this matter.

Best Answers

Answers

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @zillurbmb51
    Hi,

    Can you tell me a little bit more about what exactly you are trying to accomplish?

    Thanks,
    Sheila

  • zillurbmb51zillurbmb51 USAMember

    Thank you very much for your kind reply. I have downloaded paired-end fast files from: http://www.ebi.ac.uk/ena/data/view/Taxon:5864&portal=read_experiment&subtree=true
    Then I aligned them using BOWTIE2 against fasta assembly, then sorted, pileup the bam file using SAMTOOLS and then used BCFTOOLS. But my Ts/Tv ratio was less then 1.(Maybe for background noise)
    I need to process the datasets(using GATK) for getting a higher Ts/Tv (at least >2) for further analysis(Some peers got ). I will also analyse other genomes of Apecomplexan.

    Best Regards
    Zillur

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @zillurbmb51
    Hi Zillur,

    Have a look at the Best Practices, which guides you through the variant calling process. https://www.broadinstitute.org/gatk/guide/best-practices?bpm=DNAseq
    We cannot help you if you are not following our recommendations.

    -Sheila

  • zillurbmb51zillurbmb51 USAMember

    Thank you very much for your kind suggestions. After doing Indel Realignment I was trying to do Base Recalibrations but getting the following result "MESSAGE: Argument with name 'KnownSites' isn't defined."
    Would you give me some suggestions about this matter.

    [[email protected] Filter]$ java -Xmx96g -jar /disk1/zillurdata/Apecomplexan/GenomeAnalysisTK.jar -T BaseRecalibrator -R /disk1/zillurdata/Apecomplexan/B_microti/Reference/B_microti.fasta -I /disk1/zillurdata/Apecomplexan/B_microti/Strain_ATCC_30222/ATCC_new.sorted.bam -KnownSites /disk1/zillurdata/Apecomplexan/B_microti/Strain_ATCC_30222/Filter/ATCC.flt.vcf -o ATCC_recal_data.table

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: Argument with name 'KnownSites' isn't defined.
    ERROR ------------------------------------------------------------------------------------------

    [[email protected] Filter]$

  • zillurbmb51zillurbmb51 USAMember

    Thank you very much for your kind reply. I am sorry for that stupid question. Now I am facing another problem.

    [[email protected] Filter]$ java -Xmx96g -jar /disk1/zillurdata/Apecomplexan/GenomeAnalysisTK.jar -T BaseRecalibrator -R /disk1/zillurdata/Apecomplexan/B_microti/Reference/B_microti.fasta -I ATCC_realigned.bam -knownSites ATCC_filtered_indels.vcf -knownSites ATCC_filtered_snps.vcf -knownSites ATCC_filtered_snps_2nd.vcf -knownSites ATCC_filtered_snps_3rd.vcf -o recal.table

    This command gave me an error report:

    ERROR MESSAGE: SAM/BAM/CRAM file [email protected]9b2753b is malformed: Unable to find color space information in SOLiD read. First observed at read with name = SRR901611.100140231 Unfortunately this .bam file can not be recalibrated without color space information because of potential reference bias.

    I was trying to solve it but failed. You gave a link in the answer of a previous question but that link is not available.

    http://www.broadinstitute.org/gatk/gatkdocs/org_broadinstitute_sting_gatk_walkers_bqsr_BaseRecalibrator.html

    Best Regards
    Zillur

  • SheilaSheila Broad InstituteMember, Broadie, Moderator admin

    @zillurbmb51
    Hi Zillur,

    Can you post the entire stack trace with the error?

    Thanks,
    Sheila

  • zillurbmb51zillurbmb51 USAMember

    Thank you very much for your reply. Here is the full traceback:

    [[email protected] Filter]$ java -Xmx96g -jar /disk1/zillurdata/Apecomplexan/GenomeAnalysisTK.jar -T BaseRecalibrator -R /disk1/zillurdata/Apecomplexan/B_microti/Reference/B_microti.fasta -I ATCC_realigned.bam -knownSites ATCC_filtered_indels.vcf -knownSites ATCC_filtered_snps.vcf -knownSites ATCC_filtered_snps_2nd.vcf -knownSites ATCC_filtered_snps_3rd.vcf -o recal.table
    INFO 10:05:00,687 HelpFormatter - --------------------------------------------------------------------------------
    INFO 10:05:00,689 HelpFormatter - The Genome Analysis Toolkit (GATK) v3.4-0-g7e26428, Compiled 2015/05/15 03:25:41
    INFO 10:05:00,689 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 10:05:00,690 HelpFormatter - For support and documentation go to http://www.broadinstitute.org/gatk
    INFO 10:05:00,693 HelpFormatter - Program Args: -T BaseRecalibrator -R /disk1/zillurdata/Apecomplexan/B_microti/Reference/B_microti.fasta -I ATCC_realigned.bam -knownSites ATCC_filtered_indels.vcf -knownSites ATCC_filtered_snps.vcf -knownSites ATCC_filtered_snps_2nd.vcf -knownSites ATCC_filtered_snps_3rd.vcf -o recal.table
    INFO 10:05:00,702 HelpFormatter - Executing as [email protected] on Linux 2.6.18-404.el5 amd64; OpenJDK 64-Bit Server VM 1.7.0_79-mockbuild_2015_04_14_19_36-b00.
    INFO 10:05:00,702 HelpFormatter - Date/Time: 2015/08/17 10:05:00
    INFO 10:05:00,703 HelpFormatter - --------------------------------------------------------------------------------
    INFO 10:05:00,703 HelpFormatter - --------------------------------------------------------------------------------
    INFO 10:05:00,778 GenomeAnalysisEngine - Strictness is SILENT
    INFO 10:05:00,857 GenomeAnalysisEngine - Downsampling Settings: No downsampling
    INFO 10:05:00,866 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 10:05:00,891 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.02
    INFO 10:05:01,023 GenomeAnalysisEngine - Preparing for traversal over 1 BAM files
    INFO 10:05:01,027 GenomeAnalysisEngine - Done preparing for traversal
    INFO 10:05:01,028 ProgressMeter - [INITIALIZATION COMPLETE; STARTING PROCESSING]
    INFO 10:05:01,028 ProgressMeter - | processed | time | per 1M | | total | remaining
    INFO 10:05:01,028 ProgressMeter - Location | reads | elapsed | reads | completed | runtime | runtime
    INFO 10:05:01,065 BaseRecalibrator - The covariates being used here:
    INFO 10:05:01,065 BaseRecalibrator - ReadGroupCovariate
    INFO 10:05:01,065 BaseRecalibrator - QualityScoreCovariate
    INFO 10:05:01,065 BaseRecalibrator - ContextCovariate
    INFO 10:05:01,065 ContextCovariate - Context sizes: base substitution model 2, indel substitution model 3
    INFO 10:05:01,066 BaseRecalibrator - CycleCovariate
    INFO 10:05:01,068 ReadShardBalancer$1 - Loading BAM index data
    INFO 10:05:01,069 ReadShardBalancer$1 - Done loading BAM index data
    WARN 10:05:03,967 RestStorageService - Error Response: PUT '/TEkOhlOAIxGQa7IPJmRWubL5lPsdiEgm.report.xml.gz' -- ResponseCode: 403, ResponseStatus: Forbidden, Request Headers: [Content-Length: 1026, Content-MD5: /ywfGljvuGArLL5/r2bmfw==, Content-Type: application/octet-stream, x-amz-meta-md5-hash: ff2c1f1a58efb8602b2cbe7faf66e67f, Date: Mon, 17 Aug 2015 14:05:03 GMT, Authorization: AWS AKIAI22FBBJ37D5X62OQ:v2C9wDDEVQyPbcCTbEvdvdawtYw=, User-Agent: JetS3t/0.8.1 (Linux/2.6.18-404.el5; amd64; en; JVM 1.7.0_79), Host: broad.gsa.gatk.run.reports.s3.amazonaws.com, Expect: 100-continue], Response Headers: [x-amz-request-id: 0813C1048C9010B0, x-amz-id-2: ZaayGTU5PohcAHxBNBkYqk9SoLCJE1u+2d9rSYxl97SIiHQ/r+fNKep/ZtShgvd9grUJYGGkPYA=, Content-Type: application/xml, Transfer-Encoding: chunked, Date: Tue, 18 Aug 2015 01:52:36 GMT, Connection: close, Server: AmazonS3]
    WARN 10:05:04,281 RestStorageService - Adjusted time offset in response to RequestTimeTooSkewed error. Local machine and S3 server disagree on the time by approximately 42451 seconds. Retrying connection.
    INFO 10:05:04,665 GATKRunReport - Uploaded run statistics report to AWS S3

    ERROR ------------------------------------------------------------------------------------------
    ERROR A USER ERROR has occurred (version 3.4-0-g7e26428):
    ERROR
    ERROR This means that one or more arguments or inputs in your command are incorrect.
    ERROR The error message below tells you what is the problem.
    ERROR
    ERROR If the problem is an invalid argument, please check the online documentation guide
    ERROR (or rerun your command with --help) to view allowable command-line arguments for this tool.
    ERROR
    ERROR Visit our website and forum for extensive documentation and answers to
    ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ERROR
    ERROR Please do NOT post this error to the GATK forum unless you have really tried to fix it yourself.
    ERROR
    ERROR MESSAGE: SAM/BAM/CRAM file [email protected]37fcb8c is malformed: Unable to find color space information in SOLiD read. First observed at read with name = SRR901611.100140231 Unfortunately this .bam file can not be recalibrated without color space information because of potential reference bias.
    ERROR ------------------------------------------------------------------------------------------

    [[email protected] Filter]$

  • zillurbmb51zillurbmb51 USAMember

    Hi there,
    Is there any solution of my problem? I need your help. Please.

Sign In or Register to comment.