Back to Tool Docs Index

IntegrationSiteMapper

Detect and summarize transgene integration

Category Public/Stable Tools

Overview

This tool was originally designed to inspect the results of Tag-PCR (an assay to measure transgene integration into the genome); however, in theory it could be used with any assay providing similar data. At a high level, it uses the terminal sequence(s) of your transgene to inspect BAM reads for those containing genome-transgene junctions. It determines the orientation of the integration event, and can create a table of these locations. It can additionally reconstruct the genome/transgene sequences for each unique site (which can be tricky when considering orientation). Finally, it can optionally uses these reconstructed sequences to design PCR validation primers (using Primer3), and BLAST those primers against the host genome as validation. It provides a very detailed output, but makes some assumptions and requires specific inputs:

The BAM is queryName sorted and the alignments per read inspected. The tool assumes the genome is the reference for that organism. It is possible for this to also contain the transgene and/or delivery vector but this is not necessarily needed and can be more complicated to maintain
Only first-mate reads are considered. Reverse are ignored.
Each alignment is inspected (including cropped bases) for the presence of a short sequence expected to be at the insert/genome junction
*IMPORTANT* The orientation of each hit is used to determine the orientation of the transgene in the genome. Each query sequence is associated with one end of the transgene
The hits are summarized based on the number of reads per genomic position (junction border)
If --primer3-path and --primer-pair-table are provided, the tool will iterate each passing integration site, extract the upstream and downstream region (+/- 1000bp), and design primer pairs that site inside the transgene and flanking genomic region.
If --blastn-path and --blast-db-path are provided, the tool will BLAST putative primers against the reference database and any primer with multiple hits will be flagged/discarded, along with non full-length primers that have a perfect match at the 3' end.
To aid in inspecting the results, a genbank file can also be created (--genbank-output), which has one record per insert region, with the transgene region highlighted. If primers were designed, these will also appear.
An optional table can be produced with summary stats for the run (see --metrics-table)
Putative hits can be filtered, see: --reads-to-output, --min-alignments, --min-fraction, and --min-mapq

Simplest Usage:

  java -jar DISCVRseq.jar IntegrationSiteMapper \
     -R currentGenome.fasta \
     -b myBam.bam \
     --output-table output.txt \
     --insert-name piggybac

The file output.txt will contain a table summarizing the predicted integration sites.

Using More Advanced Features to Reconstruct Transgene/Genome Sequences and Design PCR primers:

  java -jar DISCVRseq.jar IntegrationSiteMapper \
     -R currentGenome.fasta \
     -b myBam.bam \
     --output-table output.txt \
     --primer-pair-table primer_summary.txt \
     --primer3-path /usr/bin/primer3_core \
     --genbank-output output.gb \
     --insert-name lentivirus

To generate the full tool output, IntegrationSiteMapper requires a file with a detailed description of the transgene and expected junction sites. Depending on your delivery system, this is likely specific to your plasmid/vector. IntegrationSiteMapper include two built-in transgene schemes, and these can be output to a file as a reference:

  java -jar DISCVRseq.jar IntegrationSiteMapper \
     --write-default-descriptors outputFile.yml

which creates:

 name: Lentivirus
 junctions:
 - name: LV-3LTR
   searchStrings: [ AGTGTGGAAAATCTCTAGCA ]
   invertHitOrientation: false
 - name: LV-5LTR
   searchStrings: [ TGGAAGGGCTAATTCACTCC ]
   invertHitOrientation: true
 insertUpstreamRegion:
   name: LV-5LTR
   sequence: TGGAAGGGCTAATTCACTCCCAAAGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTAGCAGAACTACACACCAGGGCCAGGGGTCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCCAGATAAGGTAGAAGAGGCCAATAAAGGAGAGAACACCAGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTGTTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACGTGGCCCGAGAGCTGCATCCGGAGTACTTCAAGAACTGCTGATATCGAGCTTGCTACAAGGGACTTTCCGCTGGGGACTTTCCAGGGAGGCGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATCCTGCATATAAGCAGCTGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCA
 insertDownstreamRegion:
   name: LV-3LTR
   sequence: TGGAAGGGCTAATTCACTCCCAACGAAGACAAGATATCCTTGATCTGTGGATCTACCACACACAAGGCTACTTCCCTGATTAGCAGAACTACACACCAGGGCCAGGGGTCAGATATCCACTGACCTTTGGATGGTGCTACAAGCTAGTACCAGTTGAGCCAGATAAGGTAGAAGAGGCCAATAAAGGAGAGAACACCAGCTTGTTACACCCTGTGAGCCTGCATGGGATGGATGACCCGGAGAGAGAAGTGTTAGAGTGGAGGTTTGACAGCCGCCTAGCATTTCATCACGTGGCCCGAGAGCTGCATCCGGAGTACTTCAAGAACTGCTGATATCGAGCTTGCTACAAGGGACTTTCCGCTGGGGACTTTCCAGGGAGGCGTGGCCTGGGCGGGACTGGGGAGTGGCGAGCCCTCAGATCCTGCATATAAGCAGCTGCTTTTTGCCTGTACTGGGTCTCTCTGGTTAGACCAGATCTGAGCCTGGGAGCTCTCTGGCTAACTAGGGAACCCACTGCTTAAGCCTCAATAAAGCTTGCCTTGAGTGCTTCAAGTAGTGTGTGCCCGTCTGTTGTGTGACTCTGGTAACTAGAGATCCCTCAGACCCTTTTAGTCAGTGTGGAAAATCTCTAGCA
 internalPrimers:
 - name: LV-3LTR-Outer
   sequence: GAGAGCTGCATCCGGAGTAC
 - name: LV-3LTR-Inner
   sequence: TAGTGTGTGCCCGTCTGTTG
 - name: LV-5LTR-Outer
   sequence: TCCTCTGGTTTCCCTTTCGC
 - name: LV-5LTR-Inner
   sequence: AAGCAGTGGGTTCCCTAGTT

 name: PiggyBac
 junctions:
 - name: PB-3TR
   searchStrings: [ GCAGACTATCTTTCTAGGGTTAA ]
   invertHitOrientation: false
 - name: PB-5TR
   searchStrings: [ ATGATTATCTTTCTAGGGTTAA ]
   invertHitOrientation: true
 insertUpstreamRegion:
   name: PB-5TR
   sequence: TTAACCCTAGAAAGATAATCATATTGTGACGTACGTTAAAGATAATCATGTGTAAAATTGACGCATGTGTTTTATCGGTCTGTATATCGAGGTTTATTTATTAATTTGAATAGATATTAAGTTTTATTATATTTACACTTACATACTAATAATAAATTCAACAAACAATTTATTTATGTTTATTTATTTATTAAAAAAAACAAAAACTCAAAATTTCTTCTATAAAGTAACAAAACTTTTATGAGGGACAGCCCCCCCCCAAAGCCCCCAGGGATGTAATTACGTCCCTCCCCCGCTAGGGGGCAGCAGCGAGCCGCCCGGGGCTCCGCTCCGGTCCGGCGCTCCCCCCGCATCCCCGAGCCGGCAGCGTGCGGGGACAGCCCGGGCACGGGGAAGGTGGCACGGGATCGCTTTCCTCTGAACGCTTCTCGCTGCTCTTTGAGCCTGCAGACACCTGGGGGGATA
 insertDownstreamRegion:
   name: PB-3TR
   sequence: CGTAAAAGATAATCATGCGTCATTTTGACTCACGCGGTCGTTATAGTTCAAAATCAGTGACACTTACCGCATTGACAAGCACGCCTCACGGGAGCTCCAAGCGGCGACTGAGATGTCCTAAATGCACAGCGACGGATTCGCGCTATTTAGAAAGAGAGAGCAATATTTCAAGAATGCATGCGTCAATTTTACGCAGACTATCTTTCTAGGGTTAA
 internalPrimers:
 - name: PB-3TR-Nested4
   sequence: CATTGACAAGCACGCCTCAC
 - name: PB-NTSR2-R2
   sequence: GCGACGGATTCGCGCTATTT
 - name: PB-3TR-Nested
   sequence: ATTTCAAGAATGCATGCGTCA
 - name: PB-5TR-New
   sequence: CACATGATTATCTTTAACGTACGTCAC
 - name: PB-RCR1
   sequence: GACCGATAAAACACATGCGTCA
 backboneSearchStrings: [ TCTAGCTGCATCAGGATCAT, TCAGGATCATATCGTCGGGT, TCTAGCTGCGTGTTCTGCAG, GTGTTCTGCAGCGTGTCGAG ]

Each block represents the definition of one transgene type, and many sections are optional (although required for certain functions). For the simplest usage (creating a link of integration sites), relatively little is needed. If you want the tool to reconstruct the flanking sequence and/or generate primers, a more complete definition is needed. Pay attention to the orientation of the sequences:

name: required. A name for this transgene type
junctions: required. this lists each junction type to inspect (typically the 5' and 3' junctions. Each junction contains the properties:

name: A name to identify it. Should be used consistently with insertUpstreamRegion and insertDownstreamRegion
searchStrings: These are sequence that mark the expected termination of the transgene. The orientation of the sequence is important. Assuming the transgene integrates in the forward orientation, the sequence representing the 3' (downstream) end of the transgene insert should be in the forward orientation. The sequence representing the 5' (upstream) end of the transgene should be reverse-complemented relative to the transgene sequence, but with invertHitOrientation=true. This is necessary to create the proper orientation when concatenating the transgene and genomic flanking region.
invertHitOrientation: optional. If true, a forward orientation match of the search string denotes an inverted integration event. This is generally what you want for the 5' junction of the transgene.

insertUpstreamRegion: If provided, when attempting to reconstruct the genome/transgene border, this sequence is concatenated to the flanking genomic sequence, like: (GenomicFlank)(insertUpstreamRegion). In the PiggyBac PB-5TR, the starting TTAA represents the expected integration border.
insertDownstreamRegion: If provided, when attempting to reconstruct the genome/transgene border, this sequence is concatenated to the flanking genomic sequence, like: (insertDownstreamRegion)(GenomicFlank). In the PiggyBac PB-3TR, the ending TTAA represents the expected integration border.
internalPrimers: An optional list of existing internal PCR primers for the transgene. If primer prediction is selected, these primers will be annotated in the resulting genbank output, which can be useful for selection of primer pairs.
backboneSearchStrings: An optional list of strings to use to identify non-integrated transgene (such as the source plasmid). If a given read contains any of these sequences, which could typically be short fragments representing the vector backbone, it will be flagged as such.

Finally, IntegrationSiteMapper can be run using a custom transgene definition:

  java -jar DISCVRseq.jar IntegrationSiteMapper \
     -R currentGenome.fasta \
     -b myBam.bam \
     --output-table output.txt \
     --insert-definition outputFile.yml

Additional Information

Genome/Reference Files

Please note that if this tools uses a reference genome, that FASTA must be indexed with samtools and to have a sequence dictionary created with Picard. See here for more information

Read filters

This Read Filter is automatically applied to the data by the Engine before processing by IntegrationSiteMapper.

WellformedReadFilter

IntegrationSiteMapper specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s)	Default value	Summary
Required Arguments
--reference -R	null	Reference sequence file
Optional Tool Arguments
--allow-zero-mapq	false	Allow alignments where MAPQ=0. Some aligners, like BWA-mem, report multi-mapping reads as MAPQ=0
--arguments_file	[]	read one or more arguments files and add them to the command line
--backbone-sequences -bs	[]	If provided, this tool will scan reads for the presence of any of these strings (perfect match-only, but also inspecting for reverse-complement). If found, the read will be counted as overlapping the backbone. This can be useful if the delivery system is a vector, and would allow detection of non-integrated vector
--bam -b	null	A BAM file with alignments to be inspected
--blast-db-path -bdb	null	In order for this tool to use BLAST to detect validate the primers by detecting alternate binding sites, the path to a BLAST DB compiled against this reference FASTA must be provided
--blast-threads -bt	null	If BLAST will be used, this value is passed to the -num_threads argument of blastn.
--blastn-path -bn	null	The path to the blastn executable. This is required for BLAST validation to be perform against putative primers. If blastn is in your $PATH, it will be picked up. Alternately, the environment variable BLASTN_PATH can be set, pointing to the blastn executable.
--cloud-index-prefetch-buffer -CIPB	-1	Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset.
--cloud-prefetch-buffer -CPB	40	Size of the cloud-only prefetch buffer (in MB; 0 to disable).
--disable-bam-index-caching -DBIC	false	If true, don't cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified.
--disable-sequence-dictionary-validation	false	If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk!
--gcs-max-retries -gcs-retries	20	If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection
--gcs-project-for-requester-pays	""	Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.
--genbank-output -g	null	File to which a summary of integration sites, flanking sequence and optionally primers should be written in genbank format
--help -h	false	display the help message
--include-reverse-reads -ir	false	By default, reverse reads are skipped. If true, reverse reads will be included, which might be necessary for some chemistries.
--include-sa -sa	false	If provided, this tool will inspect reads for supplemental alignments (SA tag) and parse these as well
--insert-definition -id	[]	One or more files describing the insert and transgene/genome junctions. See --write-default-descriptors and --validate-descriptor-only
--insert-name	[]	The name of one of the built-in insert descriptors to use. See --write-default-descriptors for available types
--interval-merging-rule -imr	ALL	Interval merging rule for abutting intervals
--intervals -L	[]	One or more genomic intervals over which to operate
--metrics-table -mt	null	File to which a TSV of summary metrics should be written
--min-alignments -ma	3	The minimum number of alignments required at a position to report
--min-fraction -mf	0.0	Only sites with at least this fraction of total reads will be reported
--min-mapq -mmq	20	The minimum MAPQ to consider an alignment
--output-table -o	null	File to which TSV output should be written
--primer-pair-table -pt	null	File to which TSV summarizing potential primer pairs should be written
--primer3-path -p3	null	In order for this tool to design validation primers, the path to primer3 must be provided. If primer3 is in your $PATH, it will be picked up. Alternately, the environment variable PRIMER3_PATH can be set, pointing to the primer3 executable.
--reads-to-output -ro	0	If greater than zero, up to this many reads will be written as a FASTA file for each site. This can be useful to validate the junction border
--sites-only-vcf-output	false	If true, don't emit genotype fields when writing vcf file output.
--validate-descriptor-only	false	If provided, the tool will simply validate the insert definition files (see --insert-definition) and exit
--version	false	display the version number for this tool
--write-default-descriptors	null	If provided, the tool will write the YAML for the default descriptors to this file and exit. This can be useful as a guide for writing your own descriptors (which is likely needed).
Optional Common Arguments
--add-output-sam-program-record	true	If true, adds a PG tag to created SAM/BAM/CRAM files.
--add-output-vcf-command-line	true	If true, adds a command line header line to created VCF files.
--create-output-bam-index -OBI	true	If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.
--create-output-bam-md5 -OBM	false	If true, create a MD5 digest for any BAM/SAM/CRAM file created
--create-output-variant-index -OVI	true	If true, create a VCF index when writing a coordinate-sorted VCF file.
--create-output-variant-md5 -OVM	false	If true, create a a MD5 digest any VCF file created.
--disable-read-filter -DF	[]	Read filters to be disabled before analysis
--disable-tool-default-read-filters	false	Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on)
--exclude-intervals -XL	[]	One or more genomic intervals to exclude from processing
--gatk-config-file	null	A configuration file to use with the GATK.
--input -I	[]	BAM/SAM/CRAM file containing reads
--interval-exclusion-padding -ixp	0	Amount of padding (in bp) to add to each interval you are excluding.
--interval-padding -ip	0	Amount of padding (in bp) to add to each interval you are including.
--interval-set-rule -isr	UNION	Set merging approach to use for combining interval inputs
--inverted-read-filter -XRF	[]	Inverted (with flipped acceptance/failure conditions) read filters applied before analysis (after regular read filters).
--lenient -LE	false	Lenient processing of VCF files
--max-variants-per-shard	0	If non-zero, partitions VCF output into shards, each containing up to the given number of records.
--QUIET	false	Whether to suppress job-summary info on System.err.
--read-filter -RF	[]	Read filters to be applied before analysis
--read-index	[]	Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically.
--read-validation-stringency -VS	SILENT	Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
--seconds-between-progress-updates	10.0	Output traversal statistics every time this many seconds elapse
--sequence-dictionary	null	Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file.
--tmp-dir	null	Temp directory to use.
--use-jdk-deflater -jdk-deflater	false	Whether to use the JdkDeflater (as opposed to IntelDeflater)
--use-jdk-inflater -jdk-inflater	false	Whether to use the JdkInflater (as opposed to IntelInflater)
--verbosity	INFO	Control verbosity of logging.
Advanced Arguments
--showHidden	false	display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.

--add-output-sam-program-record / -add-output-sam-program-record

If true, adds a PG tag to created SAM/BAM/CRAM files.

boolean true

--add-output-vcf-command-line / -add-output-vcf-command-line

If true, adds a command line header line to created VCF files.

boolean true

--allow-zero-mapq / NA

Allow alignments where MAPQ=0. Some aligners, like BWA-mem, report multi-mapping reads as MAPQ=0

boolean false

--arguments_file / NA

read one or more arguments files and add them to the command line

List[File] []

--backbone-sequences / -bs

If provided, this tool will scan reads for the presence of any of these strings (perfect match-only, but also inspecting for reverse-complement). If found, the read will be counted as overlapping the backbone. This can be useful if the delivery system is a vector, and would allow detection of non-integrated vector

List[String] []

--bam / -b

A BAM file with alignments to be inspected

File null

--blast-db-path / -bdb

In order for this tool to use BLAST to detect validate the primers by detecting alternate binding sites, the path to a BLAST DB compiled against this reference FASTA must be provided

String null

--blast-threads / -bt

If BLAST will be used, this value is passed to the -num_threads argument of blastn.

Integer null

--blastn-path / -bn

The path to the blastn executable. This is required for BLAST validation to be perform against putative primers. If blastn is in your $PATH, it will be picked up. Alternately, the environment variable BLASTN_PATH can be set, pointing to the blastn executable.

String null

--cloud-index-prefetch-buffer / -CIPB

Size of the cloud-only prefetch buffer (in MB; 0 to disable). Defaults to cloudPrefetchBuffer if unset.

int -1 [ [ -∞ ∞ ] ]

--cloud-prefetch-buffer / -CPB

Size of the cloud-only prefetch buffer (in MB; 0 to disable).

int 40 [ [ -∞ ∞ ] ]

--create-output-bam-index / -OBI

If true, create a BAM/CRAM index when writing a coordinate-sorted BAM/CRAM file.

boolean true

--create-output-bam-md5 / -OBM

If true, create a MD5 digest for any BAM/SAM/CRAM file created

boolean false

--create-output-variant-index / -OVI

If true, create a VCF index when writing a coordinate-sorted VCF file.

boolean true

--create-output-variant-md5 / -OVM

If true, create a a MD5 digest any VCF file created.

boolean false

--disable-bam-index-caching / -DBIC

If true, don't cache bam indexes, this will reduce memory requirements but may harm performance if many intervals are specified. Caching is automatically disabled if there are no intervals specified.

boolean false

--disable-read-filter / -DF

Read filters to be disabled before analysis

List[String] []

--disable-sequence-dictionary-validation / -disable-sequence-dictionary-validation

If specified, do not check the sequence dictionaries from our inputs for compatibility. Use at your own risk!

boolean false

--disable-tool-default-read-filters / -disable-tool-default-read-filters

Disable all tool default read filters (WARNING: many tools will not function correctly without their default read filters on)

boolean false

--exclude-intervals / -XL

One or more genomic intervals to exclude from processing
Use this argument to exclude certain parts of the genome from the analysis (like -L, but the opposite). This argument can be specified multiple times. You can use samtools-style intervals either explicitly on the command line (e.g. -XL 1 or -XL 1:100-200) or by loading in a file containing a list of intervals (e.g. -XL myFile.intervals). strings gathered from the command line -XL argument to be parsed into intervals to exclude

List[String] []

--gatk-config-file / NA

A configuration file to use with the GATK.

String null

--gcs-max-retries / -gcs-retries

If the GCS bucket channel errors out, how many times it will attempt to re-initiate the connection

int 20 [ [ -∞ ∞ ] ]

--gcs-project-for-requester-pays / NA

Project to bill when accessing "requester pays" buckets. If unset, these buckets cannot be accessed. User must have storage.buckets.get permission on the bucket being accessed.

String ""

--genbank-output / -g

File to which a summary of integration sites, flanking sequence and optionally primers should be written in genbank format

File null

--help / -h

display the help message

boolean false

--include-reverse-reads / -ir

By default, reverse reads are skipped. If true, reverse reads will be included, which might be necessary for some chemistries.

boolean false

--include-sa / -sa

If provided, this tool will inspect reads for supplemental alignments (SA tag) and parse these as well

boolean false

--input / -I

BAM/SAM/CRAM file containing reads

List[GATKPath] []

--insert-definition / -id

One or more files describing the insert and transgene/genome junctions. See --write-default-descriptors and --validate-descriptor-only

List[File] []

--insert-name / NA

The name of one of the built-in insert descriptors to use. See --write-default-descriptors for available types

List[String] []

--interval-exclusion-padding / -ixp

Amount of padding (in bp) to add to each interval you are excluding.
Use this to add padding to the intervals specified using -XL. For example, '-XL 1:100' with a padding value of 20 would turn into '-XL 1:80-120'. This is typically used to add padding around targets when analyzing exomes.

int 0 [ [ -∞ ∞ ] ]

--interval-merging-rule / -imr

Interval merging rule for abutting intervals
By default, the program merges abutting intervals (i.e. intervals that are directly side-by-side but do not actually overlap) into a single continuous interval. However you can change this behavior if you want them to be treated as separate intervals instead.

The --interval-merging-rule argument is an enumerated type (IntervalMergingRule), which can have one of the following values:

ALL
OVERLAPPING_ONLY

IntervalMergingRule ALL

--interval-padding / -ip

Amount of padding (in bp) to add to each interval you are including.
Use this to add padding to the intervals specified using -L. For example, '-L 1:100' with a padding value of 20 would turn into '-L 1:80-120'. This is typically used to add padding around targets when analyzing exomes.

int 0 [ [ -∞ ∞ ] ]

--interval-set-rule / -isr

Set merging approach to use for combining interval inputs
By default, the program will take the UNION of all intervals specified using -L and/or -XL. However, you can change this setting for -L, for example if you want to take the INTERSECTION of the sets instead. E.g. to perform the analysis only on chromosome 1 exomes, you could specify -L exomes.intervals -L 1 --interval-set-rule INTERSECTION. However, it is not possible to modify the merging approach for intervals passed using -XL (they will always be merged using UNION). Note that if you specify both -L and -XL, the -XL interval set will be subtracted from the -L interval set.

The --interval-set-rule argument is an enumerated type (IntervalSetRule), which can have one of the following values:

UNION
INTERSECTION

IntervalSetRule UNION

--intervals / -L

One or more genomic intervals over which to operate

List[String] []

--inverted-read-filter / -XRF

Inverted (with flipped acceptance/failure conditions) read filters applied before analysis (after regular read filters).

List[String] []

--lenient / -LE

Lenient processing of VCF files

boolean false

--max-variants-per-shard / NA

If non-zero, partitions VCF output into shards, each containing up to the given number of records.

int 0 [ [ 0 ∞ ] ]

--metrics-table / -mt

File to which a TSV of summary metrics should be written

File null

--min-alignments / -ma

The minimum number of alignments required at a position to report

int 3 [ [ -∞ ∞ ] ]

--min-fraction / -mf

Only sites with at least this fraction of total reads will be reported

double 0.0 [ [ -∞ ∞ ] ]

--min-mapq / -mmq

The minimum MAPQ to consider an alignment

int 20 [ [ -∞ ∞ ] ]

--output-table / -o

File to which TSV output should be written

File null

--primer-pair-table / -pt

File to which TSV summarizing potential primer pairs should be written

File null

--primer3-path / -p3

In order for this tool to design validation primers, the path to primer3 must be provided. If primer3 is in your $PATH, it will be picked up. Alternately, the environment variable PRIMER3_PATH can be set, pointing to the primer3 executable.

String null

--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean false

--read-filter / -RF

Read filters to be applied before analysis

List[String] []

--read-index / -read-index

Indices to use for the read inputs. If specified, an index must be provided for every read input and in the same order as the read inputs. If this argument is not specified, the path to the index for each input will be inferred automatically.

List[GATKPath] []

--read-validation-stringency / -VS

Validation stringency for all SAM/BAM/CRAM/SRA files read by this program. The default stringency value SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

The --read-validation-stringency argument is an enumerated type (ValidationStringency), which can have one of the following values:

STRICT
LENIENT
SILENT

ValidationStringency SILENT

--reads-to-output / -ro

If greater than zero, up to this many reads will be written as a FASTA file for each site. This can be useful to validate the junction border

int 0 [ [ -∞ ∞ ] ]

--reference / -R

Reference sequence file

R GATKPath null

--seconds-between-progress-updates / -seconds-between-progress-updates

Output traversal statistics every time this many seconds elapse

double 10.0 [ [ -∞ ∞ ] ]

--sequence-dictionary / -sequence-dictionary

Use the given sequence dictionary as the master/canonical sequence dictionary. Must be a .dict file.

GATKPath null

--showHidden / -showHidden

display hidden arguments

boolean false

--sites-only-vcf-output / NA

If true, don't emit genotype fields when writing vcf file output.

boolean false

--tmp-dir / NA

Temp directory to use.

GATKPath null

--use-jdk-deflater / -jdk-deflater

Whether to use the JdkDeflater (as opposed to IntelDeflater)

boolean false

--use-jdk-inflater / -jdk-inflater

Whether to use the JdkInflater (as opposed to IntelInflater)

boolean false

--validate-descriptor-only / NA

If provided, the tool will simply validate the insert definition files (see --insert-definition) and exit

boolean false

--verbosity / -verbosity

Control verbosity of logging.

The --verbosity argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel INFO

--version / NA

display the version number for this tool

boolean false

--write-default-descriptors / NA

If provided, the tool will write the YAML for the default descriptors to this file and exit. This can be useful as a guide for writing your own descriptors (which is likely needed).

File null

Return to top

DISCVR-Seq version 1.3.87 built at 05-08-2025 05:02:45.