SageDIA Parameters
Full parameter reference for sage-dia. Run sage-dia --help for the latest options.
Input / Output
Positional: <mzparquet>
Input spectrum files in mzParquet format. Accepts multiple files and glob patterns.
--mzbinary <FILES...>
Pre-computed quant files from a previous --quant-only run. Used for two-phase workflows.
--library <FILE> (default: "")
Spectral library file. Supported formats: .tsv, .predicted.tsv, .parquet, .sagelib (fast binary cache).
A .sagelib cache is automatically created next to the library on first use, speeding up subsequent loads.
--output <FILE> (default: r.tsv)
Output file path for the main results TSV.
Search Parameters
--scan-radius <INT> (default: 6)
Number of cycles around each candidate peak to extract for feature computation.
--min-product-len <INT> (default: 3)
Minimum fragment ion length (number of amino acid residues).
--min-charge <INT> (default: 1)
Minimum fragment ion charge state.
--max-charge <INT> (default: 255)
Maximum fragment ion charge state. Set to 255 for no limit.
--top-n-frags <INT> (default: 12)
Maximum number of fragment ions per precursor to use for scoring.
--precursor-isotope <INT> (default: 2)
Number of precursor isotope channels (M0, M1, ...) to extract.
--mass-ppm-tol <FLOAT> (default: 0.0)
Fixed mass tolerance in ppm. When 0.0, tolerance is determined automatically from calibration.
--label <STRING> (default: "")
SILAC/label definition. Format: <AminoAcid>:<MassDelta>.
--custom-mod <STRING> (default: "")
Custom modification substitution. Format: <Mod1>:<Mass1>,<Mod2>:<Mass2>.
Predictor / Scoring
--predictor <MODE> (default: auto)
Scoring model for discriminant analysis.
| Value | Description |
|---|---|
auto | Try both LDA and XGBoost, pick whichever gives more IDs (default) |
lda | Linear Discriminant Analysis — faster |
xgboost | Gradient boosting — more accurate on complex datasets |
--xgboost-iterations <INT> (default: 5)
Number of XGBoost training iterations (only used when predictor is xgboost or auto).
--single-lda (default: false)
Use a single LDA optimization iteration instead of the default 10. Faster but may reduce sensitivity.
--peak-detection <MODE> (default: corr)
Peak candidate detection method.
| Value | Description |
|---|---|
corr | Inter-fragment correlation local maxima (default, DIA-NN-like) |
sa | NNLS × spectral angle local maxima only |
combined | Both SA and fragment-correlation maxima |
--disable-r2-feature (default: false)
Remove Gaussian R² from discriminant scoring features. May help on very noisy data where R² penalizes real peaks.
Quality Filters
--min-points-per-peak <INT> (default: 3)
Minimum number of non-zero XIC data points for a peak to be reported.
--min-points-per-peak-calib <INT> (default: 1)
Minimum XIC data points during calibration (more permissive to retain calibrants).
--light-heavy-min-correlation <FLOAT> (default: 0.5)
Minimum Pearson correlation between light and heavy channel XICs for labeled searches.
Post-Search Boosting
These features rescue additional identifications after the initial FDR calculation by applying additional quality criteria.
--gaussian-r2-boost (default: true)
Rescue precursors with excellent chromatographic peak shape (high Gaussian R²) even if initial q-value > 0.01.
| Sub-parameter | Default | Description |
|---|---|---|
--gaussian-r2-boost-threshold | 0.7 | Minimum R² to qualify |
--gaussian-r2-boost-qvalue | 0.05 | Maximum q-value to consider for rescue |
--gaussian-r2-boost-min-xic | 3 | Minimum XIC data points required |
--gaussian-r2-boost-min-frags | 3 | Minimum fragment ions at apex |
Disable with: --gaussian-r2-boost false
--protein-context-boost (default: true)
Relax FDR threshold for peptides belonging to proteins already confidently identified.
| Sub-parameter | Default | Description |
|---|---|---|
--protein-context-qvalue | 0.05 | Relaxed q-value threshold for boosted peptides |
--protein-context-conf-qvalue | 0.01 | Q-value threshold defining "confident" proteins |
--protein-context-r2-threshold | 0.3 | R² threshold for initial confident protein identification |
Disable with: --protein-context-boost false
--filter-one-hit-wonders (default: true)
Remove proteins supported by only a single low-quality peptide.
| Sub-parameter | Default | Description |
|---|---|---|
--one-hit-r2-threshold | 0.5 | Minimum R² for a one-hit wonder to survive |
--one-hit-min-frags | 3 | Minimum fragments at apex for one-hit wonders |
Disable with: --filter-one-hit-wonders false
Match-Between-Runs (MBR)
MBR transfers confident identifications across runs to fill missing values, improving data completeness in multi-file experiments. MBR items are marked in the output and do not participate in FDR calculation.
--mbr (default: true)
Enable match-between-runs. Only effective when processing multiple files.
Disable with: --mbr false
| Sub-parameter | Default | Description |
|---|---|---|
--mbr-rt-window | 1.0 | RT window in minutes for MBR peak matching |
--mbr-score-threshold | 0.3 | Score percentile cutoff (0.3 = keep top 70%) |
--mbr-min-r2 | 0.7 | Minimum Gaussian R² for MBR transfers |
--mbr-min-frags | 2 | Minimum fragments at apex |
--mbr-min-xic-points | 3 | Minimum XIC data points |
Interference Removal
--remove-interference (default: false)
Remove precursors whose fragment ions overlap with a higher-scoring precursor in the same DIA window.
| Sub-parameter | Default | Description |
|---|---|---|
--interference-ppm | 20.0 | Mass tolerance for fragment overlap detection |
--interference-min-overlap | 0.5 | Minimum fraction of overlapping fragments |
RT Calibration
--rt-calibration-method <METHOD> (default: lowess)
| Value | Description |
|---|---|
lowess | Non-parametric LOWESS regression (default). May flatten at RT edges. |
linear | Simple linear regression (DIA-NN style). More robust at extremes. |
--calibration-correlation <MODE> (default: L)
Which channels to use for RT calibration in labeled experiments.
| Value | Description |
|---|---|
L | Light fragments only (default) |
H | Heavy fragments only |
LH | Both light and heavy fragments |
Gaussian R² Settings
--gaussian-r2-smooth (default: true)
Smooth XICs before computing Gaussian R² fit.
--gaussian-r2-centroid (default: true)
Use intensity-weighted centroid (instead of apex) as the center for Gaussian fitting.
XIC Output
--gen-xic (default: false)
Generate XIC trace files alongside main results. Produces both .xic.tsv and .xic.db (SQLite).
--xic-rt-tolerance <FLOAT> (default: 1.0)
Extra RT padding (in minutes) around each peak for XIC extraction.
--xic-max-qvalue <FLOAT> (default: 0.01)
Maximum q-value for precursors to include in XIC output.
--xic-only (default: false)
Extract XICs for all library precursors without running the search/scoring pipeline. Useful for benchmarking and debugging.
Library Conversion
Standalone utilities to convert spectral libraries between formats. The process exits after conversion.
--convert-lib <OUTPUT>
Convert the input --library to fast binary .sagelib format.
--convert-lib-parquet <OUTPUT>
Convert a TSV library to Parquet format.
--convert-sagelib-tsv <OUTPUT>
Convert a .sagelib binary library back to TSV.
Workflow Options
--quant-only (default: false)
Run search only (no aggregation/output). Used with --keep-quant-files for two-phase workflows.
--keep-quant-files (default: false)
Retain intermediate .quant files after search. Required for two-phase merge workflows.
--save-partial-results (default: false)
Write per-file partial result TSVs during multi-file search.
--no-search (default: false)
Skip the search phase entirely. Used when re-processing existing quant files.
--no-bounds (default: false)
Disable RT bounds restriction during search (search full RT range for every precursor).
--smoothing (default: false)
Apply XIC smoothing during extraction.
--common-frags (default: true)
Use common fragment ions across runs for consistent quantification.
Peak Quantification
--quant-peak-find (default: true)
Use peak detection to define quantification boundaries, excluding chromatographic shoulders.
--best-frag-xic-consecutive (default: true)
Count consecutive non-zero data points (rather than total) for the best fragment XIC metric.
--median-rt (default: false)
Use median RT from multiple candidates instead of best-scoring peak.
| Sub-parameter | Default | Description |
|---|---|---|
--median-rt-candidates | 3 | Number of top candidates to consider |
--median-rt-tolerance | 0.3 | RT tolerance for candidate clustering |
Reporting
--generate-fragment-info (default: false)
Include per-fragment scoring details in the output.
--report-all-fragments (default: false)
Report all fragment ions (not just top-N used for scoring).
--report-all-decoys (default: false)
Include decoy hits in the output.
--no-scan (default: scan output on)
Disable MS2 scan number output. Use --no-scan to omit scan information.
Resource Management
-t, --threads <INT> (default: all CPUs)
Maximum number of threads to use.
--max-memory-gb <FLOAT> (default: available memory)
Maximum memory usage in GB.
General
-v, --verbose (default: false)
Enable detailed debug logging.
--tracked-precursors <FILE> (default: "")
File with a list of precursor sequences to track with detailed logging. One precursor per line.