14.3 SageDIA
14.3.1 SageDIA Command line how-to

SageDIA Command Line How-to

sage-dia — DIA (Data-Independent Acquisition) proteomics search engine

Download

Download the latest sage-dia binary from the GitHub releases page (opens in a new tab).

Background

SageDIA searches DIA mass spectrometry data against a spectral library to identify and quantify peptides and proteins.

Automatic calibration — SageDIA automatically calibrates both mass (ppm) and retention time from the data. Users do not need to specify mass tolerance or RT alignment parameters. The auto-calibration uses iRT peptides from the spectral library and LOWESS regression. Override with --mass-ppm-tol or --rt-calibration-method only if needed for special cases.

Spectral library formats — SageDIA accepts spectral libraries in common formats:

  • .tsv — tab-separated (DIA-NN, Spectronaut export, etc.)
  • .predicted.tsv — predicted library TSV
  • .parquet — Parquet format

On first use, SageDIA automatically converts the library to a fast binary .sagelib cache file (saved next to the original). Subsequent runs load the .sagelib directly, which is significantly faster. You can also pre-convert with --convert-lib.

Input format — All input spectral files must be in mzParquet format. Convert from vendor formats using:

Scoring — SageDIA uses LDA or XGBoost (or both in auto mode) for discriminant scoring, followed by target-decoy FDR control and protein inference. Match-between-runs (MBR) is enabled by default for multi-file experiments.

Synopsis

sage-dia [OPTIONS] <mzparquet files...>
sage-dia --mzbinary <quant files...> [OPTIONS]

Run sage-dia --help for full option listing.


Basic run

sage-dia input1.mzparquet input2.mzparquet --library spec_lib.tsv
sage-dia *.mzparquet --library spec_lib.parquet

Run individual mzparquet and merge (useful for parallel runs)

  1. For individual files before merging (generate temporary quant files). Each mzparquet produces 10 quant files:
sage-dia input.mzparquet --library spec_lib.tsv --quant-only --keep-quant-files
  1. Merge quant files from step 1 (specify only the first quant file of the 10):
sage-dia --mzbinary r1.mzparquet_0.quant r2.mzparquet_0.quant --library spec_lib.tsv --output r.tsv
sage-dia --mzbinary r1.mzparquet_0.quant r2.mzparquet_0.quant --library spec_lib.tsv --output r.tsv --gen-xic

SILAC labeled search

sage-dia *.mzparquet --library library.tsv --label K:8.0437 --output r_label.tsv

Generate XIC traces

sage-dia *.mzparquet --library library.tsv --gen-xic --output results.tsv

Disable all boosting (strict mode)

sage-dia *.mzparquet --library library.tsv \
  --gaussian-r2-boost false \
  --protein-context-boost false \
  --mbr false \
  --filter-one-hit-wonders false \
  --output r.strict.tsv

Resource-constrained environment

sage-dia *.mzparquet --library library.tsv -t 4 --max-memory-gb 16

Output files

The main output is a TSV file. Several derived files are generated alongside:

  • <output>.p0_01.tsv — results filtered to 1% FDR
  • <output>.xic.tsv — XIC traces (when --gen-xic is set)
  • <output>.xic.db — XIC SQLite database (when --gen-xic is set)
  • <output>.log — full run log

Output columns

ColumnDescription
precursorPeptide sequence with modifications and charge
proteinsMapped protein accessions
pathSource mzParquet filename
q valuePrecursor-level q-value (NaN for MBR items)
protein q valueProtein-level q-value (NaN for MBR items)
discriminant scoreCombined LDA/XGBoost score
RTApex retention time (minutes)
RT startPeak start retention time
RT endPeak end retention time
intensityQuantified precursor intensity
gaussian_fit_r2Gaussian fit R² of the chromatographic peak
MBR0 = directly identified, 1 = transferred via MBR
scanMS2 scan number at apex (unless --no-scan)

See also

  • ThermoParquet — convert Thermo RAW files to mzParquet
  • dotdconverter — convert Bruker timsTOF .d files to mzParquet