SageDIA Command Line How-to
sage-dia — DIA (Data-Independent Acquisition) proteomics search engine
Download
Download the latest sage-dia binary from the GitHub releases page (opens in a new tab).
Background
SageDIA searches DIA mass spectrometry data against a spectral library to identify and quantify peptides and proteins.
Automatic calibration — SageDIA automatically calibrates both mass (ppm) and retention time from the data. Users do not need to specify mass tolerance or RT alignment parameters. The auto-calibration uses iRT peptides from the spectral library and LOWESS regression. Override with --mass-ppm-tol or --rt-calibration-method only if needed for special cases.
Spectral library formats — SageDIA accepts spectral libraries in common formats:
.tsv— tab-separated (DIA-NN, Spectronaut export, etc.).predicted.tsv— predicted library TSV.parquet— Parquet format
On first use, SageDIA automatically converts the library to a fast binary .sagelib cache file (saved next to the original). Subsequent runs load the .sagelib directly, which is significantly faster. You can also pre-convert with --convert-lib.
Input format — All input spectral files must be in mzParquet format. Convert from vendor formats using:
- ThermoParquet (opens in a new tab) — for Thermo RAW files
- dotdconverter — for Bruker timsTOF
.dfiles
Scoring — SageDIA uses LDA or XGBoost (or both in auto mode) for discriminant scoring, followed by target-decoy FDR control and protein inference. Match-between-runs (MBR) is enabled by default for multi-file experiments.
Synopsis
sage-dia [OPTIONS] <mzparquet files...>
sage-dia --mzbinary <quant files...> [OPTIONS]Run sage-dia --help for full option listing.
Basic run
sage-dia input1.mzparquet input2.mzparquet --library spec_lib.tsv
sage-dia *.mzparquet --library spec_lib.parquetRun individual mzparquet and merge (useful for parallel runs)
- For individual files before merging (generate temporary quant files). Each mzparquet produces 10 quant files:
sage-dia input.mzparquet --library spec_lib.tsv --quant-only --keep-quant-files- Merge quant files from step 1 (specify only the first quant file of the 10):
sage-dia --mzbinary r1.mzparquet_0.quant r2.mzparquet_0.quant --library spec_lib.tsv --output r.tsv
sage-dia --mzbinary r1.mzparquet_0.quant r2.mzparquet_0.quant --library spec_lib.tsv --output r.tsv --gen-xicSILAC labeled search
sage-dia *.mzparquet --library library.tsv --label K:8.0437 --output r_label.tsvGenerate XIC traces
sage-dia *.mzparquet --library library.tsv --gen-xic --output results.tsvDisable all boosting (strict mode)
sage-dia *.mzparquet --library library.tsv \
--gaussian-r2-boost false \
--protein-context-boost false \
--mbr false \
--filter-one-hit-wonders false \
--output r.strict.tsvResource-constrained environment
sage-dia *.mzparquet --library library.tsv -t 4 --max-memory-gb 16Output files
The main output is a TSV file. Several derived files are generated alongside:
<output>.p0_01.tsv— results filtered to 1% FDR<output>.xic.tsv— XIC traces (when--gen-xicis set)<output>.xic.db— XIC SQLite database (when--gen-xicis set)<output>.log— full run log
Output columns
| Column | Description |
|---|---|
precursor | Peptide sequence with modifications and charge |
proteins | Mapped protein accessions |
path | Source mzParquet filename |
q value | Precursor-level q-value (NaN for MBR items) |
protein q value | Protein-level q-value (NaN for MBR items) |
discriminant score | Combined LDA/XGBoost score |
RT | Apex retention time (minutes) |
RT start | Peak start retention time |
RT end | Peak end retention time |
intensity | Quantified precursor intensity |
gaussian_fit_r2 | Gaussian fit R² of the chromatographic peak |
MBR | 0 = directly identified, 1 = transferred via MBR |
scan | MS2 scan number at apex (unless --no-scan) |
See also
- ThermoParquet — convert Thermo RAW files to mzParquet
- dotdconverter — convert Bruker timsTOF .d files to mzParquet