Label-Free Quantification (LFQ) in SageDDA

This document provides comprehensive documentation for the Label-Free Quantification (LFQ) module in SageDDA, covering its algorithms, configuration options, and usage.

Important: SageDDA (formerly known as SagePro) is developed by Chaparral Labs and is distinct from the open-source Sage project. Features described here are proprietary to SageDDA.

Overview

Label-Free Quantification (LFQ) is a mass spectrometry-based approach that quantifies proteins without the need for isotopic or chemical labeling. SageDDA's LFQ module extracts and integrates MS1 precursor ion intensities across multiple LC-MS runs, enabling relative protein quantification.

The LFQ workflow in SageDDA consists of:

Feature Map Construction - Building a searchable index of peptide precursors
MS1 Feature Tracing - Extracting ion chromatograms (XICs) for each peptide
Retention Time Alignment - Aligning chromatographic profiles across runs
Peak Detection & Scoring - Identifying optimal peak boundaries
Integration - Calculating peptide/protein abundances

Algorithm Details

Feature Map Construction

The feature map is built from high-confidence peptide spectrum matches (PSMs) with:

Peptide-level FDR ≤ 1% (peptide_q <= 0.01)
Target sequences only (label == 1)

For each identified peptide, precursor ranges are generated for:

All charge states within the specified range (precursor_charge)
Multiple isotopes (M+0, M+1, M+2) - controlled by N_ISOTOPES = 3
Both target and decoy (shifted RT) entries for FDR estimation

The feature map is organized in retention time bins (16K entries per bin) with secondary indexing by mass for efficient lookup.

Isotopic Envelope Modeling

SageDDA calculates the theoretical isotopic distribution for each peptide based on its elemental composition:

Composition = Σ composition(residue) for each amino acid

The isotopic envelope is used to:

Search for M+0, M+1, M+2 isotopes in MS1 spectra
Calculate spectral angle similarity between observed and theoretical distributions

MS1 Feature Tracing

For each MS1 spectrum:

Retention Time Normalization: The spectrum RT is normalized and aligned using pre-calculated alignment factors
Mass Lookup: Binary search finds peptide precursors within the mass tolerance window
Ion Mobility Filtering (if applicable): Additional filtering based on ion mobility tolerance
Grid Population: Matched intensities are added to a discretized RT × file × isotope grid

The grid uses:

GRID_SIZE = 100 equally-spaced RT bins
RT_TOL = 0.005 (0.5% of total run length) as the search window

Gaussian Smoothing

Raw intensity traces are smoothed using a Gaussian kernel:

Kernel width: K_WIDTH = 10 bins
Standard deviation: σ = 0.5

This reduces noise and improves peak detection accuracy.

Spectral Angle Calculation

The normalized spectral angle measures similarity between observed and theoretical isotopic distributions:

spectral_angle = 1 - (2 × arccos(similarity)) / π

Where similarity is the cosine similarity (dot product normalized by magnitudes) between:

Observed isotope intensities
Theoretical isotopic envelope

Values range from 0 to 1, with 1 indicating perfect agreement.

Retention Time Alignment (Warping)

To compensate for chromatographic drift between runs:

Reference Selection: The LC-MS run with the most confident PSM for each peptide serves as the reference
Correlation Optimization: For each run, find the time shift (within ±75 bins) that maximizes dot product correlation with the reference
Warp Application: Apply the calculated shifts to align all runs

Peak Scoring Strategies

SageDDA supports four peak scoring strategies:

Strategy	Description	Formula
`RetentionTime`	Favors peaks near expected RT	`(1 - \|rt - center\| / center)^0.33`
`SpectralAngle`	Uses spectral angle directly	`spectral_angle`
`Intensity`	Favors intense peaks	`(intensity / max_intensity)^0.5`
`Hybrid` (default)	Combines all factors	`sa³ × rt^0.33 × (int/max)^0.5`

Peak Integration

Once the optimal peak is identified:

Boundary Detection: Expand from the apex until the score drops below 50% of the peak score or spectral angle falls below threshold
Integration: Calculate the area under the curve using the selected strategy:
- Sum: Sum intensities within peak boundaries
- Apex: Use only the apex intensity

Configuration

Enabling LFQ

Add the following to your parameter file:

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.70,
      "ppm_tolerance": 5.0,
      "combine_charge_states": true
    }
  }
}

LFQ Settings Reference

Parameter	Type	Default	Description
`peak_scoring`	String	`"Hybrid"`	Peak scoring strategy: `"Hybrid"`, `"RetentionTime"`, `"SpectralAngle"`, or `"Intensity"`
`integration`	String	`"Sum"`	Integration method: `"Sum"` (area under curve) or `"Apex"` (maximum intensity)
`spectral_angle`	Float	`0.70`	Minimum spectral angle threshold (0-1). Higher values require better isotopic pattern match
`ppm_tolerance`	Float	`5.0`	Mass tolerance in ppm for matching MS1 ions to theoretical precursor masses
`mobility_pct_tolerance`	Float	`1.0`	Ion mobility tolerance as percentage (for timsTOF data)
`combine_charge_states`	Boolean	`true`	If `true`, intensities from different charge states are combined. If `false`, charge states are quantified separately

Choosing Peak Scoring Strategy

Hybrid (recommended): Best for most experiments. Balances RT, spectral quality, and intensity.
RetentionTime: Use when chromatography is highly reproducible and RT is the most reliable feature.
SpectralAngle: Use when isotopic patterns are well-resolved and you want strict quality filtering.
Intensity: Use when you want to prioritize the most abundant signals.

Choosing Integration Strategy

Sum (recommended): Provides more robust quantification by integrating across the entire peak. More tolerant of peak shape variations.
Apex: Faster computation. Use when peaks are symmetric and well-resolved. May be more sensitive to noise.

Output

When LFQ is enabled, SageDDA generates:

lfq.tsv: Tab-separated file containing peptide-level quantification results
lfq.parquet (if --parquet flag is used): Parquet format for large-scale analysis

Output Columns

Column	Description
`peptide`	Peptide sequence with modifications
`proteins`	Associated protein accessions
`q_value`	Peptide-level FDR q-value
`spectral_angle`	Intensity-weighted average spectral angle
`<filename>_intensity`	Integrated MS1 intensity for each input file

Technical Details

Internal Constants

Constant	Value	Description
`RT_TOL`	0.005	RT tolerance as fraction of total run length (0.5%)
`K_WIDTH`	10	Gaussian kernel width in bins
`GRID_SIZE`	100	Number of RT bins for peak tracing
`N_ISOTOPES`	3	Number of isotopes to trace (M+0, M+1, M+2)

Decoy Generation for LFQ

Note: This is different from PSM-level decoy generation. For peptide/protein identification, SageDDA uses reversed peptide sequences (with the rev_ prefix) following the picked-peptide approach. The method described here is specifically for LFQ quantification FDR control.

For LFQ-specific FDR control, decoy XICs (extracted ion chromatograms) are generated by:

Shifting the retention time by -2 × RT_TOL
Adding a mass offset of +11.06 Da

These decoy XICs are scored using the same peak detection algorithm as targets. The resulting target and decoy peak scores are then used for q-value calculation using the standard target-decoy competition approach:

All peaks (target and decoy) are sorted by score
Q-values are calculated as: q = decoy_count / target_count
Each LFQ peptide receives a q_value in the output

This enables FDR control at the quantification level - peptides with poor chromatographic evidence will have higher q-values. The RT shift and mass offset create "impossible" XIC locations that should not contain real peptide signals, providing a null distribution for scoring.

Ion Mobility Support

For timsTOF and other ion mobility data:

Additional filtering based on mobility_pct_tolerance
Ion mobility bounds are calculated as percentage tolerance around the observed mobility value

Best Practices

Data Requirements

MS1 Spectra: Ensure your mzML files contain MS1 spectra. LFQ will warn if no MS1 spectra are found.
Sufficient Identifications: LFQ works best with a reasonable number of high-confidence PSMs.
Chromatographic Quality: Reproducible chromatography improves quantification accuracy.

Parameter Tuning

ppm_tolerance:
- Start with 5.0 ppm for Orbitrap/TOF data
- Use 10-20 ppm for lower resolution instruments
spectral_angle:
- Default 0.70 works well for most cases
- Increase to 0.80+ for stricter quality filtering
- Decrease to 0.50-0.60 for complex samples or low-intensity peptides
combine_charge_states:
- Keep true for most experiments (recommended)
- Set to false if you need charge-state-specific quantification

Troubleshooting

Issue	Possible Cause	Solution
No LFQ output	Missing MS1 spectra	Check mzML files contain MS1 data
Low quantification rates	Strict spectral_angle	Lower spectral_angle threshold
High variability	Chromatographic drift	Ensure good RT alignment; check input data quality
Missing peptides	Low confidence PSMs	Verify PSM identification quality

Example Configurations

Basic LFQ

{
  "database": {
    "fasta": "proteins.fasta"
  },
  "quant": {
    "lfq": true
  }
}

High-Stringency LFQ

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.85,
      "ppm_tolerance": 3.0,
      "combine_charge_states": true
    }
  }
}

Charge-State-Specific LFQ

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "combine_charge_states": false,
      "spectral_angle": 0.70,
      "ppm_tolerance": 5.0
    }
  }
}

timsTOF Ion Mobility LFQ

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.70,
      "ppm_tolerance": 5.0,
      "mobility_pct_tolerance": 1.5
    }
  }
}

References

The LFQ algorithm in SageDDA is inspired by and builds upon concepts from:

MaxQuant's MaxLFQ algorithm
Spectral angle similarity scoring for isotopic pattern matching
Correlation Optimized Warping (COW) for retention time alignment

15.1 Overview