We've moved!
This site is now read-only. You can find our new documentation site and support forum for posting questions here.
Be sure to read our welcome blog!

Picard IlluminaBaseCallsToSam showing inconsistent XT tags in different runs

I am using picard 2.17.10 to extract unmapped bam from Illumina raw data.

My command lines for ExtractIlluminaBarcodes and IlluminaBasecallsToSam:
java -jar picard.jar ExtractIlluminaBarcodes BASECALLS_DIR=Data/Intensities/BaseCalls/ LANE=1 READ_STRUCTURE=76T6B76T BARCODE_FILE=sample.barcode METRICS_FILE=metrics1.out COMPRESS_OUTPUTS=true NUM_PROCESSORS=0
java -jar picard.jar IlluminaBasecallsToSam BASECALLS_DIR=Data/Intensities/BaseCalls/ LANE=1 READ_STRUCTURE=76T6B76T RUN_BARCODE=firstrun IGNORE_UNEXPECTED_BARCODES=true LIBRARY_PARAMS=lane1.params ADAPTERS_TO_CHECK=PAIRED_END MAX_READS_IN_RAM_PER_TILE=1000000 MAX_RECORDS_IN_RAM=5000000 FORCE_GC=false

I am getting different XT tags across different runs. For example in one run, I have
180307_NB551391_0005_AHYMM2AFXX:3:21403:25151:1041 77 * 0 0 * * 0 0 CCACAAATGCCGGTTCCCTTCTACAGGCCCAGTCGCCAGCTCAGAGGACACTCGATCTCCTGAGATCGGAAGAGCA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEA RG:Z:18030.3 XT:i:63
180307_NB551391_0005_AHYMM2AFXX:3:21403:25151:1041 141 * 0 0 * * 0 0 CAGGAGATCGAGTGTCCTCTGAGCTGGCGACTGGGCCTGTAGAANNNANCCNGCATTNGTGGAGATCGNNAGNGCG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE###E#EE#EEEEE#EEEEEEEEEE##EE#EAE RG:Z:18030.3 XT:i:63
In another run, I have
180307_NB551391_0005_AHYMM2AFXX:3:21403:25151:1041 77 * 0 0 * * 0 0 CCACAAATGCCGGTTCCCTTCTACAGGCCCAGTCGCCAGCTCAGAGGACACTCGATCTCCTGAGATCGGAAGAGCA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEEEEEEEEEEEEA RG:Z:18030.3
180307_NB551391_0005_AHYMM2AFXX:3:21403:25151:1041 141 * 0 0 * * 0 0 CAGGAGATCGAGTGTCCTCTGAGCTGGCGACTGGGCCTGTAGAANNNANCCNGCATTNGTGGAGATCGNNAGNGCG AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE###E#EE#EEEEE#EEEEEEEEEE##EE#EAE RG:Z:18030.3

From manual inspection, the one with XT tag should be correct.
How come the second run didn't have it? Is there some randomness
in IlluminaBasecallsToSam?
Does missing XT tag matters in downstream
analysis if I follow the best practice?

Should I not use IlluminaBaseCallsToSam for adapter marking and use
MarkIlluminaAdapters instead. Will MarkIlluminaAdapters produce
consistent XT tags across runs?

Thank you very much for your time

Answers

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @ymc
    Hi,

    I need to check with the team and get back to you.

    -Sheila

  • ymcymc Member

    Thanks for your follow up

  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    IBCTS should not have any non-deterministic behavior. Could you please supply the log files from the two different runs?

    Also, does the second run have any XT fields, or is it missing all of them?

    Do any reads have XT field in the second run and that missing in the first run?

    Finally:

    I do not think that many programs use the XT field, so in any case, you do not need to worry about this (though it would be good to understand the source of the inconsistency...)

  • ymcymc Member

    Thanks for your reply. The second run still has XT fields. Sometimes the same as the first run but sometimes it has XT field but the first one doesn't and vice versa.

    But I noticed that in my downstream bams, there were minor differences. When I trace back,
    it seems like the source of error is likely to be IBCTS.

    First run log omitting the part showing progress:

    • java -Xmx128g -Djava.io.tmpdir=/dev/shm/gatk-temp -jar /usr/bin/picard/picard-
      2.17.10.jar IlluminaBasecallsToSam BASECALLS_DIR=/data/seqcap1/180307_NB551391_0
      005_AHYMM2AFXX/Data/Intensities/BaseCalls/ LANE=1 READ_STRUCTURE=76T6B76T RUN_BA
      RCODE=180307_NB551391_0005_AHYMM2AFXX IGNORE_UNEXPECTED_BARCODES=true LIBRARY_PA
      RAMS=/data/seqcap1/180307_NB551391_0005_AHYMM2AFXX/lane1.params ADAPTERS_TO_CHEC
      K=PAIRED_END MAX_READS_IN_RAM_PER_TILE=1000000 MAX_RECORDS_IN_RAM=5000000 FORCE_
      GC=false BARCODES_DIR=./barcodes_dir
      09:56:33.354 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:
      file:/usr/bin/picard/picard-2.17.10.jar!/com/intel/gkl/native/libgkl_compression
      .so
      [Wed Aug 08 09:56:33 HKT 2018] IlluminaBasecallsToSam BASECALLS_DIR=/data/seqcap
      1/180307_NB551391_0005_AHYMM2AFXX/Data/Intensities/BaseCalls BARCODES_DIR=./barc
      odes_dir LANE=1 RUN_BARCODE=180307_NB551391_0005_AHYMM2AFXX READ_STRUCTURE=76T6B
      76T LIBRARY_PARAMS=/data/seqcap1/180307_NB551391_0005_AHYMM2AFXX/lane1.params AD
      APTERS_TO_CHECK=[INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM, PAIRED_END] FORCE_
      GC=false MAX_READS_IN_RAM_PER_TILE=1000000 IGNORE_UNEXPECTED_BARCODES=true MAX_R
      ECORDS_IN_RAM=5000000 SEQUENCING_CENTER=BI PLATFORM=illumina NUM_PROCESSORS=0
      APPLY_EAMSS_FILTER=true MINIMUM_QUALITY=2 INCLUDE_NON_PF_READS=true MOLECULAR_I
      NDEX_TAG=RX MOLECULAR_INDEX_BASE_QUALITY_TAG=QX VERBOSITY=INFO QUIET=false VALID
      ATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=f
      alse GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INF
      LATER=false
      [Wed Aug 08 09:56:33 HKT 2018] Executing as [email protected] on Linu
      x 3.10.0-862.3.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_171-b10; Defla
      ter: Intel; Inflater: Intel; Picard version: 2.17.10-SNAPSHOT
      [Wed Aug 08 09:58:54 HKT 2018] picard.illumina.IlluminaBasecallsToSam done. Elap
      sed time: 2.35 minutes.
      Runtime.totalMemory()=55751737344

    Second run log omitting the part showing progress:

    • java -Xmx128g -Djava.io.tmpdir=/dev/shm/gatk-temp -jar /usr/bin/picard/picard-
      2.17.10.jar IlluminaBasecallsToSam BASECALLS_DIR=/data/seqcap1/180307_NB551391_0
      005_AHYMM2AFXX/Data/Intensities/BaseCalls/ LANE=1 READ_STRUCTURE=76T6B76T RUN_BA
      RCODE=180307_NB551391_0005_AHYMM2AFXX IGNORE_UNEXPECTED_BARCODES=true LIBRARY_PA
      RAMS=/data/seqcap1/180307_NB551391_0005_AHYMM2AFXX/lane1.params ADAPTERS_TO_CHEC
      K=PAIRED_END MAX_READS_IN_RAM_PER_TILE=1000000 MAX_RECORDS_IN_RAM=5000000 FORCE_
      GC=false BARCODES_DIR=./barcodes_dir
      10:07:03.967 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:
      file:/usr/bin/picard/picard-2.17.10.jar!/com/intel/gkl/native/libgkl_compression
      .so
      [Wed Aug 08 10:07:03 HKT 2018] IlluminaBasecallsToSam BASECALLS_DIR=/data/seqcap
      1/180307_NB551391_0005_AHYMM2AFXX/Data/Intensities/BaseCalls BARCODES_DIR=./barc
      odes_dir LANE=1 RUN_BARCODE=180307_NB551391_0005_AHYMM2AFXX READ_STRUCTURE=76T6B
      76T LIBRARY_PARAMS=/data/seqcap1/180307_NB551391_0005_AHYMM2AFXX/lane1.params AD
      APTERS_TO_CHECK=[INDEXED, DUAL_INDEXED, NEXTERA_V2, FLUIDIGM, PAIRED_END] FORCE_
      GC=false MAX_READS_IN_RAM_PER_TILE=1000000 IGNORE_UNEXPECTED_BARCODES=true MAX_R
      ECORDS_IN_RAM=5000000 SEQUENCING_CENTER=BI PLATFORM=illumina NUM_PROCESSORS=0
      APPLY_EAMSS_FILTER=true MINIMUM_QUALITY=2 INCLUDE_NON_PF_READS=true MOLECULAR_I
      NDEX_TAG=RX MOLECULAR_INDEX_BASE_QUALITY_TAG=QX VERBOSITY=INFO QUIET=false VALIDATION_STRINGENCY=STRICT COMPRESSION_LEVEL=5 CREATE_INDEX=false CREATE_MD5_FILE=f
      alse GA4GH_CLIENT_SECRETS=client_secrets.json USE_JDK_DEFLATER=false USE_JDK_INF
      LATER=false
      [Wed Aug 08 10:07:03 GMT 2018] Executing as [email protected] on Linu
      x 3.10.0-862.3.2.el7.x86_64 amd64; OpenJDK 64-Bit Server VM 1.8.0_171-b10; Defla
      ter: Intel; Inflater: Intel; Picard version: 2.17.10-SNAPSHOT
      [Wed Aug 08 10:09:39 HKT 2018] picard.illumina.IlluminaBasecallsToSam done. Elap
      sed time: 2.59 minutes.
      Runtime.totalMemory()=56627298304
  • yfarjounyfarjoun Broad InstituteDev ✭✭✭

    This is quite odd. I'm not sure I can help...

  • ymcymc Member

    Can you just run it with some NextSeq 2x75 High/Mid dataset?
    It is a rare event. It was only 35 out of 48M read pairs in my case.

  • SheilaSheila Broad InstituteMember, Broadie ✭✭✭✭✭

    @ymc
    Hi,

    Considering that this is such a rare event, we can’t justify allocating resources to investigating it, but maybe you can post a bug ticket in the picard repo. Perhaps someone will be interested in this there.

    -Sheila

Sign In or Register to comment.