The current GATK version is 3.7-0
Examples: Monday, today, last week, Mar 26, 3/26/04

Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Get notifications!


You can opt in to receive email notifications, for example when your questions get answered or when there are new announcements, by following the instructions given here.

Did you remember to?


1. Search using the upper-right search box, e.g. using the error message.
2. Try the latest version of tools.
3. Include tool and Java versions.
4. Tell us whether you are following GATK Best Practices.
5. Include relevant details, e.g. platform, DNA- or RNA-Seq, WES (+capture kit) or WGS (PCR-free or PCR+), paired- or single-end, read length, expected average coverage, somatic data, etc.
6. For tool errors, include the error stacktrace as well as the exact command.
7. For format issues, include the result of running ValidateSamFile for BAMs or ValidateVariants for VCFs.
8. For weird results, include an illustrative example, e.g. attach IGV screenshots according to Article#5484.
9. For a seeming variant that is uncalled, include results of following Article#1235.

Did we ask for a bug report?


Then follow instructions in Article#1894.

Formatting tip!


Wrap blocks of code, error messages and BAM/VCF snippets--especially content with hashes (#)--with lines with three backticks ( ``` ) each to make a code block as demonstrated here.

Jump to another community
Picard 2.9.4 is now available. Download and read release notes here.
GATK 3.7 is here! Be sure to read the Version Highlights and optionally the full Release Notes.

Meaning of -nt and -nct

hrbigelowhrbigelow San FranciscoMember
edited September 2013 in Ask the GATK team

Hi,

According to http://www.broadinstitute.org/gatk/guide/article?id=1975:

There are two options for multi-threading with the GATK, controlled by the arguments -nt and -nct, respectively, which can be combined:

-nt / --num_threads controls the number of data threads sent to the processor
-nct / --num_cpu_threads_per_data_thread controls the number of CPU threads allocated to each data thread

Setup:
RHEL5, 144 GB memory, 12 cores (Intel 2.8 GHz)

~/src/jre1.7.0_40/bin/java -Xmx64g -Xms32g -d64 -jar /apps/gau/GATK_versions/GATKLite-2.1/GenomeAnalysisTKLite.jar -nt 8 -nct 6 -L chr8:90000001-120000000 -rbs 10000000 -T UnifiedGenotyper -rf BadCigar -R /dev/shm/CEUref.hg19.fasta -glm BOTH -D /dev/shm/dbsnp_135.hg19.reordered.vcf -metrics test.metrics.txt -stand_call_conf 30.0 -stand_emit_conf 10.0 -dcov 1000 --max_alternate_alleles 10 -A AlleleBalance -A AlleleBalanceBySample -A BaseCounts -A BaseQualityRankSumTest -A DepthOfCoverage -A DepthPerAlleleBySample -A FisherStrand -A HaplotypeScore -A HardyWeinberg -A IndelType -A LowMQ -A MappingQualityRankSumTest -A MappingQualityZero -A MappingQualityZeroBySample -A MappingQualityZeroFraction -A QualByDepth -A ReadPosRankSumTest -A RMSMappingQuality -A SampleList -o chunk55.vcf -I ./AC2181ACXX_DS-124072_GAGTGG_L006_001.markdup.fixed.left.recal.rehead.bam -I ./AC2181ACXX_DS-124113_GTCCGC_L001_001.markdup.fixed.left.recal.rehead.bam -I ./AC2181ACXX_DS-124080_ATTCCT_L008_001.markdup.fixed.left.recal.rehead.bam -I ./AD23GUACXX_DS-124122_AGTCAA_L005_001.markdup.fixed.left.recal.rehead.bam...

(with 116 bam files)

/usr/sbin/lsof -p <java process ID> | wc -l
returns 728 open files. (This turns out to be the 116 BAM files opened each of 8 times).

When run with the following settings, I see some strange messages:

-nt 12 -nct 1
INFO 16:57:41,908 SAMDataSource - Running in asynchronous I/O mode; number of threads = 11

-nt 12 -nct 2
INFO 16:58:46,246 SAMDataSource - Running in asynchronous I/O mode; number of threads = 10
INFO 16:58:46,763 MicroScheduler - Running the GATK in parallel mode with 2 concurrent threads

and so on, until:

-nt 12 -nct 11
INFO 17:00:07,673 SAMDataSource - Running in asynchronous I/O mode; number of threads = 1
INFO 17:00:08,288 MicroScheduler - Running the GATK in parallel mode with 11 concurrent threads

-nt 12 -nct 13
ERROR MESSAGE: Invalid thread allocation. User requested 12 threads in total, but the count of cpu threads (13) is higher than the total threads

If -nt is the 'number of data threads', then why does SAMDataSource report <nt> - <nct> as the number of 'threads'?

If -nct is the 'number of CPU threads per data thread', then the total number of CPU threads running should really be <nt> * <nct>. Instead, it seems to be <nt> - <nct>, which makes no sense according to the definitions.

http://www.broadinstitute.org/gatk/guide/article?id=1975 states:

Memory considerations for multi-threading
Each data thread needs to be given the full amount of memory you’d normally give a single run. So if you’re running a tool that normally requires 2 Gb of memory to run, if you use -nt 4, the multithreaded run will use 8 Gb of memory

In any case, all other software I'm familiar with has no notion of a 'data thread', and it seems unnecessary and wasteful -- one simply specifies the inputs, and chooses a number of CPU threads, and the program handles the rest, without reading the same input multiple times.

Thanks,

Henry

Tagged:

Comments

Sign In or Register to comment.