Data-dependent vs. Data-independent Proteomic Analysis

Listicle

Published: April 28, 2021

| Natasha Beeton-Kempen, Ph.D.

Data-dependent vs. Data-independent Proteomic Analysis content piece image

In proteomics, one of the major aims is to compare samples of interest (such as healthy vs diseased tissue) to identify which proteins are differentially expressed and to quantify these differences. Mass spectrometry (MS) is one of the most popular methods used for such analyses.

There are currently two broad approaches toward generating bottom-up or “shotgun” MS proteomic data: data-dependent acquisition (DDA) and data-independent acquisition (DIA).¹ In tandem MS (MS/MS), the DDA approach only puts forward certain peptides generated during the first cycle of MS for fragmentation during the second cycle, while with the DIA approach, all peptides generated during the first MS cycle can be fragmented in the second round.

As with data acquisition, data analysis can be performed using one of two main approaches.¹ The database search, which compares measured spectra with those in established databases, and the de novo search, where the MS/MS spectra are first deconvolved into multiple “pseudo spectra” that are then compared to known spectra using database searches. DDA uses the first approach, whereas DIA makes use of the latter or mixed approaches.

Here, we will compare and contrast the DDA and DIA approaches in proteomic analysis, so that readers can gain a useful overview of where they are best applied and what their advantages and disadvantages are.

Data-dependent acquisition (DDA)

Characteristics

Only selected peptides are further fragmented during the second stage of tandem MS
These selected peptides are chosen within a narrow range of mass-to-charge (m/z) signal intensity
Typically, the precursors of highest abundance (called the “top N” precursors) are selected for further analysis
The top N are typically 10–15 peptides in total
MS/MS data acquisition occurs sequentially for each peptide
The resulting data are used to search an existing database/s¹^–5

Pros

Simpler to set up and analyze
Lower demand on computational resources
Cheaper to run
Database-dependent algorithms used for DDA analysis are generally faster than de novo algorithms
DDA may be best for targeted analysis (where the target peptides are in an existing database) as it offers more sensitive quantification than DIA
Allows relative quantification of peptides between samples using various chemical labeling approaches (e.g., SILAC or iTRAQ)¹^–5

Cons

The MS instrument decides on the fly which are the top N precursors and then fragments them one after the other. This introduces a level of bias.
As a result, DDA datasets can contain “gaps” where peptides have been identified in some samples only. Even though some tweaks have been introduced to mitigate this, this remains an issue.
Lower precision and reproducibility than DIA
Low-abundance peptides are under-represented¹^–5

Data-independent acquisition (DIA)

Characteristics

All peptides are fragmented and analyzed during the second stage of tandem MS
Tandem mass spectra are acquired either by fragmenting all ions that enter the mass spectrometer at a given time (called broadband DIA) or by sequentially focusing on a narrow m/z window of precursors and fragmenting all precursors detected within that window
MS/MS data acquisition occurs in parallel across peptides
Resulting MS spectra are highly multiplexed (MS² spectra)¹^–5

Pros

Does not require prior knowledge of the protein composition of the sample
Less biased as all peptides are included in the analysis
Allows greater temporal resolution, which is an advantage for certain analyses (e.g., looking at changes in protein expression or post-translational modifications over time within the same tissue)
Can quantify proteins in complex mixtures over a large dynamic range, thereby overcoming the challenge of undersampling when using DDA
Offers higher precision and better reproducibility than DDA
Best approach for discovery proteomics as no assumptions are made (e.g., comparison of large sample cohorts to see differences in protein expression)
DIA data can be retrospectively analyzed with an improved algorithm to generate even better results¹^–5

Cons

Amount of data generated is much larger, so can place a high demand on computational resources
Data analysis is challenging because of the multiplexed nature of the MS² spectra
The robust database-based search methods used for DDA cannot be applied directly
Further improvements are required in the tools and software used to deconvolute the complex spectra produced
De novo search algorithms used in DDA are usually iterative and may not always converge around the same answers
Fragment ions in MS² spectra cannot be traced back to their precursors as they can potentially result from multiple precursor ions
Tends to be more expensive than DDA
In terms of quantification, DIA has lower sensitivity than DDA as the complete spectrum must be scanned, reducing the acquisition time per data point
De novo search algorithms are not as good at quantification as database search algorithms, which can also reduce quantification sensitivity
Algorithms need to control the false discovery rate among the identified peptides while also identifying as many of the real peptides as possible¹^–5

Final thoughts

Some experts believe that, because of the continual improvements in algorithms and software for deconvoluting the complexity of DIA data, DDA and DIA will eventually merge into a single hybrid method. Indeed, this appears to already be happening, as a recent report still in publication discusses the development of a method called, “Data dependent-independent acquisition proteomics," or "DDIA" for short. This method combines DDA and DIA in a single LC-MS/MS run and uses deep-learning tools for more streamlined data analysis.

Overall, because of its ease of setup and analysis, DDA is probably the best approach to use if you are new to tandem MS and/or discovery proteomics. On the other hand, DIA is the best approach if you are more experienced and you want an unbiased and deeper look at the proteome of your samples, particularly when these samples are from a little-studied organism (e.g., the water flea, a keystone species of aquatic habitats) or cell type (e.g., senescent cells).

References

Hu A, Noble WS, Wolf-Yadlin A. Technical advances in proteomics: new developments in data-independent acquisition. F1000Res. 2016;5(F1000 Faculty Rev):419. doi: 10.12688/f1000research.7042.1.
Kawashima Y, Watanabe E, Umeyama T, et al. Optimization of data-independent acquisition mass spectrometry for deep and highly sensitive proteomic analysis. Int. J. Mol. Sci. 2019;20(23):E5932. doi: 10.3390/ijms20235932.
Bruderer R, Bernhardt OM, Gandhi T, et al. Optimization of experimental parameters in data-independent mass spectrometry significantly increases depth and reproducibility of results. Mol Cell Proteomics. 2017;16(12):2296–2309. doi: 10.1074/mcp.RA117.000314.