Label-Free Quantification (LFQ) in SageDDA
This document provides comprehensive documentation for the Label-Free Quantification (LFQ) module in SageDDA, covering its algorithms, configuration options, and usage.
Important: SageDDA (formerly known as SagePro) is developed by Chaparral Labs and is distinct from the open-source Sage project. Features described here are proprietary to SageDDA.
Overview
Label-Free Quantification (LFQ) is a mass spectrometry-based approach that quantifies proteins without the need for isotopic or chemical labeling. SageDDA's LFQ module extracts and integrates MS1 precursor ion intensities across multiple LC-MS runs, enabling relative protein quantification.
The LFQ workflow in SageDDA consists of:
- Feature Map Construction - Building a searchable index of peptide precursors
- MS1 Feature Tracing - Extracting ion chromatograms (XICs) for each peptide
- Retention Time Alignment - Aligning chromatographic profiles across runs
- Peak Detection & Scoring - Identifying optimal peak boundaries
- Integration - Calculating peptide/protein abundances
Algorithm Details
Feature Map Construction
The feature map is built from high-confidence peptide spectrum matches (PSMs) with:
- Peptide-level FDR ≤ 1% (
peptide_q <= 0.01) - Target sequences only (
label == 1)
For each identified peptide, precursor ranges are generated for:
- All charge states within the specified range (
precursor_charge) - Multiple isotopes (M+0, M+1, M+2) - controlled by
N_ISOTOPES = 3 - Both target and decoy (shifted RT) entries for FDR estimation
The feature map is organized in retention time bins (16K entries per bin) with secondary indexing by mass for efficient lookup.
Isotopic Envelope Modeling
SageDDA calculates the theoretical isotopic distribution for each peptide based on its elemental composition:
Composition = Σ composition(residue) for each amino acidThe isotopic envelope is used to:
- Search for M+0, M+1, M+2 isotopes in MS1 spectra
- Calculate spectral angle similarity between observed and theoretical distributions
MS1 Feature Tracing
For each MS1 spectrum:
- Retention Time Normalization: The spectrum RT is normalized and aligned using pre-calculated alignment factors
- Mass Lookup: Binary search finds peptide precursors within the mass tolerance window
- Ion Mobility Filtering (if applicable): Additional filtering based on ion mobility tolerance
- Grid Population: Matched intensities are added to a discretized RT × file × isotope grid
The grid uses:
GRID_SIZE = 100equally-spaced RT binsRT_TOL = 0.005(0.5% of total run length) as the search window
Gaussian Smoothing
Raw intensity traces are smoothed using a Gaussian kernel:
- Kernel width:
K_WIDTH = 10bins - Standard deviation: σ = 0.5
This reduces noise and improves peak detection accuracy.
Spectral Angle Calculation
The normalized spectral angle measures similarity between observed and theoretical isotopic distributions:
spectral_angle = 1 - (2 × arccos(similarity)) / πWhere similarity is the cosine similarity (dot product normalized by magnitudes) between:
- Observed isotope intensities
- Theoretical isotopic envelope
Values range from 0 to 1, with 1 indicating perfect agreement.
Retention Time Alignment (Warping)
To compensate for chromatographic drift between runs:
- Reference Selection: The LC-MS run with the most confident PSM for each peptide serves as the reference
- Correlation Optimization: For each run, find the time shift (within ±75 bins) that maximizes dot product correlation with the reference
- Warp Application: Apply the calculated shifts to align all runs
Peak Scoring Strategies
SageDDA supports four peak scoring strategies:
| Strategy | Description | Formula |
|---|---|---|
RetentionTime | Favors peaks near expected RT | (1 - |rt - center| / center)^0.33 |
SpectralAngle | Uses spectral angle directly | spectral_angle |
Intensity | Favors intense peaks | (intensity / max_intensity)^0.5 |
Hybrid (default) | Combines all factors | sa³ × rt^0.33 × (int/max)^0.5 |
Peak Integration
Once the optimal peak is identified:
- Boundary Detection: Expand from the apex until the score drops below 50% of the peak score or spectral angle falls below threshold
- Integration: Calculate the area under the curve using the selected strategy:
Sum: Sum intensities within peak boundariesApex: Use only the apex intensity
Configuration
Enabling LFQ
Add the following to your parameter file:
{
"quant": {
"lfq": true,
"lfq_settings": {
"peak_scoring": "Hybrid",
"integration": "Sum",
"spectral_angle": 0.70,
"ppm_tolerance": 5.0,
"combine_charge_states": true
}
}
}LFQ Settings Reference
| Parameter | Type | Default | Description |
|---|---|---|---|
peak_scoring | String | "Hybrid" | Peak scoring strategy: "Hybrid", "RetentionTime", "SpectralAngle", or "Intensity" |
integration | String | "Sum" | Integration method: "Sum" (area under curve) or "Apex" (maximum intensity) |
spectral_angle | Float | 0.70 | Minimum spectral angle threshold (0-1). Higher values require better isotopic pattern match |
ppm_tolerance | Float | 5.0 | Mass tolerance in ppm for matching MS1 ions to theoretical precursor masses |
mobility_pct_tolerance | Float | 1.0 | Ion mobility tolerance as percentage (for timsTOF data) |
combine_charge_states | Boolean | true | If true, intensities from different charge states are combined. If false, charge states are quantified separately |
Choosing Peak Scoring Strategy
- Hybrid (recommended): Best for most experiments. Balances RT, spectral quality, and intensity.
- RetentionTime: Use when chromatography is highly reproducible and RT is the most reliable feature.
- SpectralAngle: Use when isotopic patterns are well-resolved and you want strict quality filtering.
- Intensity: Use when you want to prioritize the most abundant signals.
Choosing Integration Strategy
- Sum (recommended): Provides more robust quantification by integrating across the entire peak. More tolerant of peak shape variations.
- Apex: Faster computation. Use when peaks are symmetric and well-resolved. May be more sensitive to noise.
Output
When LFQ is enabled, SageDDA generates:
lfq.tsv: Tab-separated file containing peptide-level quantification resultslfq.parquet(if--parquetflag is used): Parquet format for large-scale analysis
Output Columns
| Column | Description |
|---|---|
peptide | Peptide sequence with modifications |
proteins | Associated protein accessions |
q_value | Peptide-level FDR q-value |
spectral_angle | Intensity-weighted average spectral angle |
<filename>_intensity | Integrated MS1 intensity for each input file |
Technical Details
Internal Constants
| Constant | Value | Description |
|---|---|---|
RT_TOL | 0.005 | RT tolerance as fraction of total run length (0.5%) |
K_WIDTH | 10 | Gaussian kernel width in bins |
GRID_SIZE | 100 | Number of RT bins for peak tracing |
N_ISOTOPES | 3 | Number of isotopes to trace (M+0, M+1, M+2) |
Decoy Generation for LFQ
Note: This is different from PSM-level decoy generation. For peptide/protein identification, SageDDA uses reversed peptide sequences (with the
rev_prefix) following the picked-peptide approach. The method described here is specifically for LFQ quantification FDR control.
For LFQ-specific FDR control, decoy XICs (extracted ion chromatograms) are generated by:
- Shifting the retention time by
-2 × RT_TOL - Adding a mass offset of
+11.06 Da
These decoy XICs are scored using the same peak detection algorithm as targets. The resulting target and decoy peak scores are then used for q-value calculation using the standard target-decoy competition approach:
- All peaks (target and decoy) are sorted by score
- Q-values are calculated as:
q = decoy_count / target_count - Each LFQ peptide receives a
q_valuein the output
This enables FDR control at the quantification level - peptides with poor chromatographic evidence will have higher q-values. The RT shift and mass offset create "impossible" XIC locations that should not contain real peptide signals, providing a null distribution for scoring.
Ion Mobility Support
For timsTOF and other ion mobility data:
- Additional filtering based on
mobility_pct_tolerance - Ion mobility bounds are calculated as percentage tolerance around the observed mobility value
Best Practices
Data Requirements
- MS1 Spectra: Ensure your mzML files contain MS1 spectra. LFQ will warn if no MS1 spectra are found.
- Sufficient Identifications: LFQ works best with a reasonable number of high-confidence PSMs.
- Chromatographic Quality: Reproducible chromatography improves quantification accuracy.
Parameter Tuning
-
ppm_tolerance:
- Start with 5.0 ppm for Orbitrap/TOF data
- Use 10-20 ppm for lower resolution instruments
-
spectral_angle:
- Default 0.70 works well for most cases
- Increase to 0.80+ for stricter quality filtering
- Decrease to 0.50-0.60 for complex samples or low-intensity peptides
-
combine_charge_states:
- Keep
truefor most experiments (recommended) - Set to
falseif you need charge-state-specific quantification
- Keep
Troubleshooting
| Issue | Possible Cause | Solution |
|---|---|---|
| No LFQ output | Missing MS1 spectra | Check mzML files contain MS1 data |
| Low quantification rates | Strict spectral_angle | Lower spectral_angle threshold |
| High variability | Chromatographic drift | Ensure good RT alignment; check input data quality |
| Missing peptides | Low confidence PSMs | Verify PSM identification quality |
Example Configurations
Basic LFQ
{
"database": {
"fasta": "proteins.fasta"
},
"quant": {
"lfq": true
}
}High-Stringency LFQ
{
"quant": {
"lfq": true,
"lfq_settings": {
"peak_scoring": "Hybrid",
"integration": "Sum",
"spectral_angle": 0.85,
"ppm_tolerance": 3.0,
"combine_charge_states": true
}
}
}Charge-State-Specific LFQ
{
"quant": {
"lfq": true,
"lfq_settings": {
"combine_charge_states": false,
"spectral_angle": 0.70,
"ppm_tolerance": 5.0
}
}
}timsTOF Ion Mobility LFQ
{
"quant": {
"lfq": true,
"lfq_settings": {
"peak_scoring": "Hybrid",
"integration": "Sum",
"spectral_angle": 0.70,
"ppm_tolerance": 5.0,
"mobility_pct_tolerance": 1.5
}
}
}References
The LFQ algorithm in SageDDA is inspired by and builds upon concepts from:
- MaxQuant's MaxLFQ algorithm
- Spectral angle similarity scoring for isotopic pattern matching
- Correlation Optimized Warping (COW) for retention time alignment