Test-drive the GATK tools and Best Practices pipelines on Terra


Check out this blog post to learn how you can get started with GATK and try out the pipelines in preconfigured workspaces (with a user-friendly interface!) without having to install anything.

gatk 4.1.0.0 CNNScoreVariants cannot reset read-limit

Hi,

I tried CNNScoreVariants and it went well when I used all default settings. But when I reset --read-limit to 2048 instead of default value 128, the process failed. The following is the end of the journal file:

Sending:
[tool.sendAck()
]

nkm received

Traceback (most recent call last):
File "", line 1, in
File "/home/ying/anaconda2/envs/gatk/lib/python3.6/site-packages/vqsr_cnn/vqsr_cnn/inference.py", line 130, in score_and_write_bat
ch
[np.array(read_batch), np.array(annotation_batch)], batch_size=python_batch_size)
File "/home/ying/anaconda2/envs/gatk/lib/python3.6/site-packages/keras/engine/training.py", line 1152, in predict
x, _, _ = self._standardize_user_data(x)
File "/home/ying/anaconda2/envs/gatk/lib/python3.6/site-packages/keras/engine/training.py", line 754, in _standardize_user_data
exception_prefix='input')
File "/home/ying/anaconda2/envs/gatk/lib/python3.6/site-packages/keras/engine/training_utils.py", line 136, in standardize_input_d
ata
str(data_shape))
ValueError: Error when checking input: expected read_tensor to have shape (128, 128, 15) but got array with shape (2048, 128, 15)
Received from stderr: [Traceback (most recent call last):
File "", line 1, in
]

I tried to reset the read-limit because my data are amplicon based sequencing results and have extremely high depth. I checked all parameters, and did not see any other parameter that could relate to read depth.

Here is a sample of my job script:

gatk --java-options "-Djava.io.tmpdir=/mnt/scratch -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xms4G -Xmx60G" CNNScoreVariants
-I /cnn/IonXpress_056.9.hc.bam -R /refGenome/gatk/hg19bundle/ucsc.hg19.fasta -V /cnn/IonXpr
ess_056.9.hc.raw.vcf -O /cnn/IonXpress_056.9.hc.cnn.2048.vcf -L /refGenome/PanelTargets/IAD174034_166_Desi
gned.bed --interval-padding 100 --tensor-type read_tensor --inference-batch-size 8 --transfer-batch-size 32 --read-limit 2048 --enable-journal

Any suggestion?

Thanks,

Ying

Answers

Sign In or Register to comment.