14.3 SageDIA
14.3.2 SageDIA parameters

SageDIA Parameters

Full parameter reference for sage-dia. Run sage-dia --help for the latest options.


Input / Output

Positional: <mzparquet>

Input spectrum files in mzParquet format. Accepts multiple files and glob patterns.

--mzbinary <FILES...>

Pre-computed quant files from a previous --quant-only run. Used for two-phase workflows.

--library <FILE> (default: "")

Spectral library file. Supported formats: .tsv, .predicted.tsv, .parquet, .sagelib (fast binary cache).

A .sagelib cache is automatically created next to the library on first use, speeding up subsequent loads.

--output <FILE> (default: r.tsv)

Output file path for the main results TSV.


Search Parameters

--scan-radius <INT> (default: 6)

Number of cycles around each candidate peak to extract for feature computation.

--min-product-len <INT> (default: 3)

Minimum fragment ion length (number of amino acid residues).

--min-charge <INT> (default: 1)

Minimum fragment ion charge state.

--max-charge <INT> (default: 255)

Maximum fragment ion charge state. Set to 255 for no limit.

--top-n-frags <INT> (default: 12)

Maximum number of fragment ions per precursor to use for scoring.

--precursor-isotope <INT> (default: 2)

Number of precursor isotope channels (M0, M1, ...) to extract.

--mass-ppm-tol <FLOAT> (default: 0.0)

Fixed mass tolerance in ppm. When 0.0, tolerance is determined automatically from calibration.

--label <STRING> (default: "")

SILAC/label definition. Format: <AminoAcid>:<MassDelta>.

--custom-mod <STRING> (default: "")

Custom modification substitution. Format: <Mod1>:<Mass1>,<Mod2>:<Mass2>.


Predictor / Scoring

--predictor <MODE> (default: auto)

Scoring model for discriminant analysis.

ValueDescription
autoTry both LDA and XGBoost, pick whichever gives more IDs (default)
ldaLinear Discriminant Analysis — faster
xgboostGradient boosting — more accurate on complex datasets

--xgboost-iterations <INT> (default: 5)

Number of XGBoost training iterations (only used when predictor is xgboost or auto).

--single-lda (default: false)

Use a single LDA optimization iteration instead of the default 10. Faster but may reduce sensitivity.

--peak-detection <MODE> (default: corr)

Peak candidate detection method.

ValueDescription
corrInter-fragment correlation local maxima (default, DIA-NN-like)
saNNLS × spectral angle local maxima only
combinedBoth SA and fragment-correlation maxima

--disable-r2-feature (default: false)

Remove Gaussian R² from discriminant scoring features. May help on very noisy data where R² penalizes real peaks.


Quality Filters

--min-points-per-peak <INT> (default: 3)

Minimum number of non-zero XIC data points for a peak to be reported.

--min-points-per-peak-calib <INT> (default: 1)

Minimum XIC data points during calibration (more permissive to retain calibrants).

--light-heavy-min-correlation <FLOAT> (default: 0.5)

Minimum Pearson correlation between light and heavy channel XICs for labeled searches.


Post-Search Boosting

These features rescue additional identifications after the initial FDR calculation by applying additional quality criteria.

--gaussian-r2-boost (default: true)

Rescue precursors with excellent chromatographic peak shape (high Gaussian R²) even if initial q-value > 0.01.

Sub-parameterDefaultDescription
--gaussian-r2-boost-threshold0.7Minimum R² to qualify
--gaussian-r2-boost-qvalue0.05Maximum q-value to consider for rescue
--gaussian-r2-boost-min-xic3Minimum XIC data points required
--gaussian-r2-boost-min-frags3Minimum fragment ions at apex

Disable with: --gaussian-r2-boost false

--protein-context-boost (default: true)

Relax FDR threshold for peptides belonging to proteins already confidently identified.

Sub-parameterDefaultDescription
--protein-context-qvalue0.05Relaxed q-value threshold for boosted peptides
--protein-context-conf-qvalue0.01Q-value threshold defining "confident" proteins
--protein-context-r2-threshold0.3R² threshold for initial confident protein identification

Disable with: --protein-context-boost false

--filter-one-hit-wonders (default: true)

Remove proteins supported by only a single low-quality peptide.

Sub-parameterDefaultDescription
--one-hit-r2-threshold0.5Minimum R² for a one-hit wonder to survive
--one-hit-min-frags3Minimum fragments at apex for one-hit wonders

Disable with: --filter-one-hit-wonders false


Match-Between-Runs (MBR)

MBR transfers confident identifications across runs to fill missing values, improving data completeness in multi-file experiments. MBR items are marked in the output and do not participate in FDR calculation.

--mbr (default: true)

Enable match-between-runs. Only effective when processing multiple files.

Disable with: --mbr false

Sub-parameterDefaultDescription
--mbr-rt-window1.0RT window in minutes for MBR peak matching
--mbr-score-threshold0.3Score percentile cutoff (0.3 = keep top 70%)
--mbr-min-r20.7Minimum Gaussian R² for MBR transfers
--mbr-min-frags2Minimum fragments at apex
--mbr-min-xic-points3Minimum XIC data points

Interference Removal

--remove-interference (default: false)

Remove precursors whose fragment ions overlap with a higher-scoring precursor in the same DIA window.

Sub-parameterDefaultDescription
--interference-ppm20.0Mass tolerance for fragment overlap detection
--interference-min-overlap0.5Minimum fraction of overlapping fragments

RT Calibration

--rt-calibration-method <METHOD> (default: lowess)

ValueDescription
lowessNon-parametric LOWESS regression (default). May flatten at RT edges.
linearSimple linear regression (DIA-NN style). More robust at extremes.

--calibration-correlation <MODE> (default: L)

Which channels to use for RT calibration in labeled experiments.

ValueDescription
LLight fragments only (default)
HHeavy fragments only
LHBoth light and heavy fragments

Gaussian R² Settings

--gaussian-r2-smooth (default: true)

Smooth XICs before computing Gaussian R² fit.

--gaussian-r2-centroid (default: true)

Use intensity-weighted centroid (instead of apex) as the center for Gaussian fitting.


XIC Output

--gen-xic (default: false)

Generate XIC trace files alongside main results. Produces both .xic.tsv and .xic.db (SQLite).

--xic-rt-tolerance <FLOAT> (default: 1.0)

Extra RT padding (in minutes) around each peak for XIC extraction.

--xic-max-qvalue <FLOAT> (default: 0.01)

Maximum q-value for precursors to include in XIC output.

--xic-only (default: false)

Extract XICs for all library precursors without running the search/scoring pipeline. Useful for benchmarking and debugging.


Library Conversion

Standalone utilities to convert spectral libraries between formats. The process exits after conversion.

--convert-lib <OUTPUT>

Convert the input --library to fast binary .sagelib format.

--convert-lib-parquet <OUTPUT>

Convert a TSV library to Parquet format.

--convert-sagelib-tsv <OUTPUT>

Convert a .sagelib binary library back to TSV.


Workflow Options

--quant-only (default: false)

Run search only (no aggregation/output). Used with --keep-quant-files for two-phase workflows.

--keep-quant-files (default: false)

Retain intermediate .quant files after search. Required for two-phase merge workflows.

--save-partial-results (default: false)

Write per-file partial result TSVs during multi-file search.

--no-search (default: false)

Skip the search phase entirely. Used when re-processing existing quant files.

--no-bounds (default: false)

Disable RT bounds restriction during search (search full RT range for every precursor).

--smoothing (default: false)

Apply XIC smoothing during extraction.

--common-frags (default: true)

Use common fragment ions across runs for consistent quantification.


Peak Quantification

--quant-peak-find (default: true)

Use peak detection to define quantification boundaries, excluding chromatographic shoulders.

--best-frag-xic-consecutive (default: true)

Count consecutive non-zero data points (rather than total) for the best fragment XIC metric.

--median-rt (default: false)

Use median RT from multiple candidates instead of best-scoring peak.

Sub-parameterDefaultDescription
--median-rt-candidates3Number of top candidates to consider
--median-rt-tolerance0.3RT tolerance for candidate clustering

Reporting

--generate-fragment-info (default: false)

Include per-fragment scoring details in the output.

--report-all-fragments (default: false)

Report all fragment ions (not just top-N used for scoring).

--report-all-decoys (default: false)

Include decoy hits in the output.

--no-scan (default: scan output on)

Disable MS2 scan number output. Use --no-scan to omit scan information.


Resource Management

-t, --threads <INT> (default: all CPUs)

Maximum number of threads to use.

--max-memory-gb <FLOAT> (default: available memory)

Maximum memory usage in GB.


General

-v, --verbose (default: false)

Enable detailed debug logging.

--tracked-precursors <FILE> (default: "")

File with a list of precursor sequences to track with detailed logging. One precursor per line.