15. Algorithm Reference
15.2 Label-Free Quantification (LFQ)

Label-Free Quantification (LFQ) in SageDDA

This document provides comprehensive documentation for the Label-Free Quantification (LFQ) module in SageDDA, covering its algorithms, configuration options, and usage.

Important: SageDDA (formerly known as SagePro) is developed by Chaparral Labs and is distinct from the open-source Sage project. Features described here are proprietary to SageDDA.

Overview

Label-Free Quantification (LFQ) is a mass spectrometry-based approach that quantifies proteins without the need for isotopic or chemical labeling. SageDDA's LFQ module extracts and integrates MS1 precursor ion intensities across multiple LC-MS runs, enabling relative protein quantification.

The LFQ workflow in SageDDA consists of:

  1. Feature Map Construction - Building a searchable index of peptide precursors
  2. MS1 Feature Tracing - Extracting ion chromatograms (XICs) for each peptide
  3. Retention Time Alignment - Aligning chromatographic profiles across runs
  4. Peak Detection & Scoring - Identifying optimal peak boundaries
  5. Integration - Calculating peptide/protein abundances

Algorithm Details

Feature Map Construction

The feature map is built from high-confidence peptide spectrum matches (PSMs) with:

  • Peptide-level FDR ≤ 1% (peptide_q <= 0.01)
  • Target sequences only (label == 1)

For each identified peptide, precursor ranges are generated for:

  • All charge states within the specified range (precursor_charge)
  • Multiple isotopes (M+0, M+1, M+2) - controlled by N_ISOTOPES = 3
  • Both target and decoy (shifted RT) entries for FDR estimation

The feature map is organized in retention time bins (16K entries per bin) with secondary indexing by mass for efficient lookup.

Isotopic Envelope Modeling

SageDDA calculates the theoretical isotopic distribution for each peptide based on its elemental composition:

Composition = Σ composition(residue) for each amino acid

The isotopic envelope is used to:

  1. Search for M+0, M+1, M+2 isotopes in MS1 spectra
  2. Calculate spectral angle similarity between observed and theoretical distributions

MS1 Feature Tracing

For each MS1 spectrum:

  1. Retention Time Normalization: The spectrum RT is normalized and aligned using pre-calculated alignment factors
  2. Mass Lookup: Binary search finds peptide precursors within the mass tolerance window
  3. Ion Mobility Filtering (if applicable): Additional filtering based on ion mobility tolerance
  4. Grid Population: Matched intensities are added to a discretized RT × file × isotope grid

The grid uses:

  • GRID_SIZE = 100 equally-spaced RT bins
  • RT_TOL = 0.005 (0.5% of total run length) as the search window

Gaussian Smoothing

Raw intensity traces are smoothed using a Gaussian kernel:

  • Kernel width: K_WIDTH = 10 bins
  • Standard deviation: σ = 0.5

This reduces noise and improves peak detection accuracy.

Spectral Angle Calculation

The normalized spectral angle measures similarity between observed and theoretical isotopic distributions:

spectral_angle = 1 - (2 × arccos(similarity)) / π

Where similarity is the cosine similarity (dot product normalized by magnitudes) between:

  • Observed isotope intensities
  • Theoretical isotopic envelope

Values range from 0 to 1, with 1 indicating perfect agreement.

Retention Time Alignment (Warping)

To compensate for chromatographic drift between runs:

  1. Reference Selection: The LC-MS run with the most confident PSM for each peptide serves as the reference
  2. Correlation Optimization: For each run, find the time shift (within ±75 bins) that maximizes dot product correlation with the reference
  3. Warp Application: Apply the calculated shifts to align all runs

Peak Scoring Strategies

SageDDA supports four peak scoring strategies:

StrategyDescriptionFormula
RetentionTimeFavors peaks near expected RT(1 - |rt - center| / center)^0.33
SpectralAngleUses spectral angle directlyspectral_angle
IntensityFavors intense peaks(intensity / max_intensity)^0.5
Hybrid (default)Combines all factorssa³ × rt^0.33 × (int/max)^0.5

Peak Integration

Once the optimal peak is identified:

  1. Boundary Detection: Expand from the apex until the score drops below 50% of the peak score or spectral angle falls below threshold
  2. Integration: Calculate the area under the curve using the selected strategy:
    • Sum: Sum intensities within peak boundaries
    • Apex: Use only the apex intensity

Configuration

Enabling LFQ

Add the following to your parameter file:

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.70,
      "ppm_tolerance": 5.0,
      "combine_charge_states": true
    }
  }
}

LFQ Settings Reference

ParameterTypeDefaultDescription
peak_scoringString"Hybrid"Peak scoring strategy: "Hybrid", "RetentionTime", "SpectralAngle", or "Intensity"
integrationString"Sum"Integration method: "Sum" (area under curve) or "Apex" (maximum intensity)
spectral_angleFloat0.70Minimum spectral angle threshold (0-1). Higher values require better isotopic pattern match
ppm_toleranceFloat5.0Mass tolerance in ppm for matching MS1 ions to theoretical precursor masses
mobility_pct_toleranceFloat1.0Ion mobility tolerance as percentage (for timsTOF data)
combine_charge_statesBooleantrueIf true, intensities from different charge states are combined. If false, charge states are quantified separately

Choosing Peak Scoring Strategy

  • Hybrid (recommended): Best for most experiments. Balances RT, spectral quality, and intensity.
  • RetentionTime: Use when chromatography is highly reproducible and RT is the most reliable feature.
  • SpectralAngle: Use when isotopic patterns are well-resolved and you want strict quality filtering.
  • Intensity: Use when you want to prioritize the most abundant signals.

Choosing Integration Strategy

  • Sum (recommended): Provides more robust quantification by integrating across the entire peak. More tolerant of peak shape variations.
  • Apex: Faster computation. Use when peaks are symmetric and well-resolved. May be more sensitive to noise.

Output

When LFQ is enabled, SageDDA generates:

  • lfq.tsv: Tab-separated file containing peptide-level quantification results
  • lfq.parquet (if --parquet flag is used): Parquet format for large-scale analysis

Output Columns

ColumnDescription
peptidePeptide sequence with modifications
proteinsAssociated protein accessions
q_valuePeptide-level FDR q-value
spectral_angleIntensity-weighted average spectral angle
<filename>_intensityIntegrated MS1 intensity for each input file

Technical Details

Internal Constants

ConstantValueDescription
RT_TOL0.005RT tolerance as fraction of total run length (0.5%)
K_WIDTH10Gaussian kernel width in bins
GRID_SIZE100Number of RT bins for peak tracing
N_ISOTOPES3Number of isotopes to trace (M+0, M+1, M+2)

Decoy Generation for LFQ

Note: This is different from PSM-level decoy generation. For peptide/protein identification, SageDDA uses reversed peptide sequences (with the rev_ prefix) following the picked-peptide approach. The method described here is specifically for LFQ quantification FDR control.

For LFQ-specific FDR control, decoy XICs (extracted ion chromatograms) are generated by:

  • Shifting the retention time by -2 × RT_TOL
  • Adding a mass offset of +11.06 Da

These decoy XICs are scored using the same peak detection algorithm as targets. The resulting target and decoy peak scores are then used for q-value calculation using the standard target-decoy competition approach:

  1. All peaks (target and decoy) are sorted by score
  2. Q-values are calculated as: q = decoy_count / target_count
  3. Each LFQ peptide receives a q_value in the output

This enables FDR control at the quantification level - peptides with poor chromatographic evidence will have higher q-values. The RT shift and mass offset create "impossible" XIC locations that should not contain real peptide signals, providing a null distribution for scoring.

Ion Mobility Support

For timsTOF and other ion mobility data:

  • Additional filtering based on mobility_pct_tolerance
  • Ion mobility bounds are calculated as percentage tolerance around the observed mobility value

Best Practices

Data Requirements

  1. MS1 Spectra: Ensure your mzML files contain MS1 spectra. LFQ will warn if no MS1 spectra are found.
  2. Sufficient Identifications: LFQ works best with a reasonable number of high-confidence PSMs.
  3. Chromatographic Quality: Reproducible chromatography improves quantification accuracy.

Parameter Tuning

  1. ppm_tolerance:

    • Start with 5.0 ppm for Orbitrap/TOF data
    • Use 10-20 ppm for lower resolution instruments
  2. spectral_angle:

    • Default 0.70 works well for most cases
    • Increase to 0.80+ for stricter quality filtering
    • Decrease to 0.50-0.60 for complex samples or low-intensity peptides
  3. combine_charge_states:

    • Keep true for most experiments (recommended)
    • Set to false if you need charge-state-specific quantification

Troubleshooting

IssuePossible CauseSolution
No LFQ outputMissing MS1 spectraCheck mzML files contain MS1 data
Low quantification ratesStrict spectral_angleLower spectral_angle threshold
High variabilityChromatographic driftEnsure good RT alignment; check input data quality
Missing peptidesLow confidence PSMsVerify PSM identification quality

Example Configurations

Basic LFQ

{
  "database": {
    "fasta": "proteins.fasta"
  },
  "quant": {
    "lfq": true
  }
}

High-Stringency LFQ

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.85,
      "ppm_tolerance": 3.0,
      "combine_charge_states": true
    }
  }
}

Charge-State-Specific LFQ

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "combine_charge_states": false,
      "spectral_angle": 0.70,
      "ppm_tolerance": 5.0
    }
  }
}

timsTOF Ion Mobility LFQ

{
  "quant": {
    "lfq": true,
    "lfq_settings": {
      "peak_scoring": "Hybrid",
      "integration": "Sum",
      "spectral_angle": 0.70,
      "ppm_tolerance": 5.0,
      "mobility_pct_tolerance": 1.5
    }
  }
}

References

The LFQ algorithm in SageDDA is inspired by and builds upon concepts from:

  • MaxQuant's MaxLFQ algorithm
  • Spectral angle similarity scoring for isotopic pattern matching
  • Correlation Optimized Warping (COW) for retention time alignment