Create Tidy Long-Format Peak Table with Optional Annotations

Generates a comprehensive long-format peak table where each row represents one feature for one sample. Optionally includes CAMERA annotations for isotopes, adducts, and pseudospectrum groups if provided.

Usage

tidy_peaklist(x, xsAnnotate = NULL)

Arguments

x: An xcms::XCMSnExp or xcms::XcmsExperiment object containing peak detection results. Feature grouping (via xcms::groupChromPeaks()) is optional but recommended.
xsAnnotate: Optional. An xsAnnotate object from the CAMERA package containing peak annotations (isotopes, adducts, pseudospectrum groups). If NULL (default), annotation columns will not be included in the output.

Value

A tibble::tibble in long format with one row per feature per sample (or one row per peak per sample if features are not defined). The tibble contains the following columns with their data types:

feature_id: integer: Feature identifier (1, 2, 3, ...). Only present when features are defined via xcms::groupChromPeaks().
f_mzmed, f_mzmin, f_mzmax: numeric: Feature-level m/z statistics (median, minimum, maximum). Only present when features are defined.
f_rtmed, f_rtmin, f_rtmax: numeric: Feature-level retention time statistics in seconds (median, minimum, maximum). Only present when features are defined.
ms_level: integer: MS level (typically 1 for MS1 features). Only present when features are defined.
isotopes: character: Isotope annotations from CAMERA (e.g., ""[M]+"", ""[M+1]+""). Only present when xsAnnotate is provided.
adduct: character: Adduct annotations from CAMERA (e.g., ""[M+H]+"", ""[M+Na]+""). Only present when xsAnnotate is provided.
pcgroup: integer: Pseudospectrum group ID from CAMERA. Features with the same pcgroup are believed to originate from the same compound. Only present when xsAnnotate is provided.
feature_group: character: Feature group identifier from MsFeatures::groupFeatures() (e.g., "FG.001", "FG.002"). Only present if groupFeatures was applied.
mz, mzmin, mzmax: numeric: Peak m/z values (centroid, minimum, maximum).
rt, rtmin, rtmax: numeric: Peak retention time in seconds (apex, minimum, maximum).
into: numeric: Integrated peak intensity (integrated area of original (raw) peak).
intb: numeric: Baseline-corrected integrated peak intensity.
maxo: numeric: Maximum peak intensity (apex).
sn: numeric: Signal-to-noise ratio.
is_filled: logical: Whether the peak was gap-filled (TRUE) or originally detected (FALSE).
filepath: character: Full file path to the raw data file.
filename: character: Basename of the raw data file.
fromFile: numeric: Sample index (1-based) corresponding to the order in xcms::fileNames,MsExperiment-method.
peakidx: numeric: Internal peak index from xcms::chromPeaks().
Additional columns: Variable types: Any columns from Biobase::pData() or MsExperiment::sampleData() are included with their original data types.

Details

This function integrates data from multiple sources:

Chromatographic peaks from xcms::chromPeaks()
Feature definitions from xcms::featureDefinitions() (if available)
Feature intensities from xcms::featureValues() (if features defined)
CAMERA annotations from the xsAnnotate object (if provided)
Sample metadata from Biobase::pData() or MsExperiment::sampleData()

The function supports both xcms::XCMSnExp and xcms::XcmsExperiment objects, providing flexibility for modern XCMS workflows.

Feature grouping is optional: If xcms::groupChromPeaks() has not been run, the function will return peak-level data only, without feature-level aggregation.

Intensity matching: When multiple peaks from the same sample are assigned to a single feature, XCMS has a parameter (method = "maxint") that selects which peak intensity to use. This function filters peaks to match the intensity values that XCMS selected, ensuring consistency with XCMS's feature intensity matrix.

Missing values (NA) indicate that a feature was not detected in a particular sample.

Note

XCMS sometimes creates duplicate peaks; this function keeps only the first occurrence.
The function uses "maxint" method for feature values, meaning it takes the maximum intensity peak for each feature in each sample.
CAMERA annotations are optional. If xsAnnotate is NULL, the isotopes, adduct, and pcgroup columns will not be present in the output.
MsFeatures grouping is automatically detected. If MsFeatures::groupFeatures() was applied to the object, the feature_group column will be included in the output.
For XcmsExperiment objects, sample metadata is accessed via sampleData() instead of pData().
For large datasets (many features x many samples), the output can be memory-intensive. A warning is issued if the expected output exceeds 10 million rows.

Examples

# \donttest{
library(xcms)
#> Loading required package: BiocParallel
#> 
#> This is xcms version 4.10.0 
library(BiocParallel)

# Load example data
faahko_sub <- loadXcmsData("faahko_sub")

# Peak detection
cwp <- CentWaveParam(peakwidth = c(20, 80), noise = 5000)
xdata <- findChromPeaks(faahko_sub, param = cwp, BPPARAM = SerialParam())
#> Removed previously identified chromatographic peaks.
#> Detecting mass traces at 25 ppm ... 
#> OK
#> Detecting chromatographic peaks in 2075 regions of interest ...
#>  OK: 561 found.
#> Detecting mass traces at 25 ppm ... 
#> OK
#> Detecting chromatographic peaks in 2308 regions of interest ...
#>  OK: 741 found.
#> Detecting mass traces at 25 ppm ... 
#> OK
#> Detecting chromatographic peaks in 2081 regions of interest ...
#>  OK: 448 found.

# Peak grouping
pdp <- PeakDensityParam(sampleGroups = rep(1, length(fileNames(xdata))),
                        minFraction = 0.5)
xdata <- groupChromPeaks(xdata, param = pdp)

# Example 1: Without CAMERA annotations
peak_table_no_camera <- tidy_peaklist(xdata)
head(peak_table_no_camera)
#> # A tibble: 6 × 24
#>   sampleNames fromFile feature_id f_mzmed f_mzmin f_mzmax f_rtmed f_rtmin
#>   <chr>          <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 ko15.CDF           1          1    205     205     205    2788.   2785.
#> 2 ko15.CDF           1          2    206     206     206    2785.   2783.
#> 3 ko15.CDF           1          3    207.    207.    207.   2713.   2711.
#> 4 ko15.CDF           1          4    233     233     233    3025.   3024.
#> 5 ko15.CDF           1          5    233.    233.    233.   3016.   3015.
#> 6 ko15.CDF           1          6    240.    240.    240.   3677.   3675.
#> # ℹ 16 more variables: f_rtmax <dbl>, ms_level <int>, filepath <chr>,
#> #   filename <chr>, peakidx <dbl>, mz <dbl>, mzmin <dbl>, mzmax <dbl>,
#> #   rt <dbl>, rtmin <dbl>, rtmax <dbl>, into <dbl>, intb <dbl>, maxo <dbl>,
#> #   sn <dbl>, is_filled <lgl>

# Example 2: With CAMERA annotations
library(CAMERA)
library(commonMZ)

# Convert to xcmsSet for CAMERA (CAMERA requires the old xcmsSet class)
xset <- as(xdata, "xcmsSet")
#> Note: you might want to set/adjust the 'sampclass' of the returned xcmSet object before proceeding with the analysis.

# CAMERA annotation
xs <- xsAnnotate(xset, polarity = "positive")
xs <- groupFWHM(xs, perfwhm = 0.1, intval = "into", sigma = 6)
#> Start grouping after retention time.
#> Created 227 pseudospectra.
xs <- findIsotopes(xs, ppm = 10, mzabs = 0.01, intval = "into")
#> Generating peak matrix!
#> Run isotope peak annotation
#>  % finished: 10  20  30  40  50  60  70  80  90  100  
#> Found isotopes: 23 
xs <- groupCorr(xs, calcIso = FALSE, calcCiS = TRUE, calcCaS = FALSE,
                cor_eic_th = 0.7, pval = 1E-6)
#> Start grouping after correlation.
#> Generating EIC's .. 
#> Warning: Found NA peaks in selected sample.
#> 
#> Calculating peak correlations in 227 Groups... 
#>  % finished: 10  20  30  40  50  60  70  80  90  100  
#> 
#> Calculating graph cross linking in 227 Groups... 
#>  % finished: 10  20  30  40  50  60  70  80  90  100  
#> New number of ps-groups:  280 
#> xsAnnotate has now 280 groups, instead of 227 

# Get adduct/fragment rules from commonMZ
rules_pos <- commonMZ::MZ_CAMERA(mode = "pos", warn_clash = TRUE, clash_ppm = 5)
#> Warning: The following adducts/fragments seem to collide. 
#> # A tibble: 2 × 2
#>   first       second                             
#>   <chr>       <chr>                              
#> 1 [M+H-NH3]+  [M+NH4]+                           
#> 2 [M+H-C3H4]+ [M+H+(CH3)2CO-H2O]+ (acetone cond.)
#> 
#> 
#> Consider removing one of them. Example: 
#>  rules=rules[            !grepl("[M+NH4]+",rules[,"name"],fixed=TRUE)         ,]
rules_pos <- as.data.frame(rules_pos)

# Find adducts using the rules
xs <- findAdducts(xs, ppm = 10, mzabs = 0.01, multiplier = 4,
                  polarity = "positive", rules = rules_pos)
#> Generating peak matrix for peak annotation!
#> Polarity is set in xsAnnotate: positive 
#> Found and use user-defined ruleset!
#> Calculating possible adducts in 280 Groups... 
#>  % finished: 10  20  30  40  50  60  70  80  90  100  

# Create long-format peak table
peak_table <- tidy_peaklist(xdata, xs)
head(peak_table)
#> # A tibble: 6 × 27
#>   sampleNames fromFile feature_id f_mzmed f_mzmin f_mzmax f_rtmed f_rtmin
#>   <chr>          <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 ko15.CDF           1          1    205     205     205    2788.   2785.
#> 2 ko15.CDF           1          2    206     206     206    2785.   2783.
#> 3 ko15.CDF           1          3    207.    207.    207.   2713.   2711.
#> 4 ko15.CDF           1          4    233     233     233    3025.   3024.
#> 5 ko15.CDF           1          5    233.    233.    233.   3016.   3015.
#> 6 ko15.CDF           1          6    240.    240.    240.   3677.   3675.
#> # ℹ 19 more variables: f_rtmax <dbl>, ms_level <int>, isotopes <chr>,
#> #   adduct <chr>, pcgroup <int>, filepath <chr>, filename <chr>, peakidx <dbl>,
#> #   mz <dbl>, mzmin <dbl>, mzmax <dbl>, rt <dbl>, rtmin <dbl>, rtmax <dbl>,
#> #   into <dbl>, intb <dbl>, maxo <dbl>, sn <dbl>, is_filled <lgl>

# Explore the data
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MSnbase':
#> 
#>     combine
#> The following objects are masked from 'package:S4Vectors':
#> 
#>     first, intersect, rename, setdiff, setequal, union
#> The following object is masked from 'package:Biobase':
#> 
#>     combine
#> The following objects are masked from 'package:BiocGenerics':
#> 
#>     combine, intersect, setdiff, setequal, union
#> The following object is masked from 'package:generics':
#> 
#>     explain
#> The following objects are masked from 'package:xcms':
#> 
#>     collect, groups
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Count features per sample
peak_table %>%
  group_by(filename) %>%
  summarize(n_features = sum(!is.na(into)))
#> # A tibble: 3 × 2
#>   filename n_features
#>   <chr>         <int>
#> 1 ko15.CDF        292
#> 2 ko16.CDF        318
#> 3 ko18.CDF        303

# View adduct annotations
peak_table %>%
  filter(!is.na(adduct)) %>%
  select(feature_id, f_mzmed, f_rtmed, adduct, pcgroup)
#> # A tibble: 1,044 × 5
#>    feature_id f_mzmed f_rtmed adduct                                     pcgroup
#>         <int>   <dbl>   <dbl> <chr>                                        <int>
#>  1          1    205    2788. "[M+H-C3H9N-C2H4O2]+ 323.091"                   10
#>  2          2    206    2785. "[M+H-NH3-C3H4]+ 262.041 [M+H-C4H6O]+ 275…      36
#>  3          3    207.   2713. ""                                             164
#>  4          4    233    3025. ""                                             181
#>  5          5    233.   3016. ""                                             208
#>  6          6    240.   3677. ""                                             228
#>  7          7    241.   3670. ""                                              97
#>  8          8    242.   3674. ""                                             202
#>  9          9    244.   2829. ""                                             146
#> 10         10    249.   3664. "[M+H-C4H6-H2O]+ 320.158 [M+H-C4H6-NH3-H2…     111
#> # ℹ 1,034 more rows

# Example 3: With XcmsExperiment object
library(MsExperiment)
library(msdata)
#> 
#>  IMPORTANT: msdata will be deprecated and replaced by the new
#>  MsDataHub package -- see https://bioconductor.org/packages/MsDataHub.
#>  Please open an issue at https://github.com/rformassspectrometry/MsDataHub/issues
#>  to inform us of any data from msdata that would need to be moved to MsDataHub.
#> 
#> Attaching package: 'msdata'
#> The following object is masked from 'package:dplyr':
#> 
#>     ident

# Load data as XcmsExperiment
fls <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)
xdata_exp <- readMsExperiment(spectraFiles = fls[1:2], BPPARAM = SerialParam())

# Peak detection and grouping
cwp <- CentWaveParam(peakwidth = c(5, 20), noise = 100)
xdata_exp <- findChromPeaks(xdata_exp, param = cwp, BPPARAM = SerialParam())
pdp <- PeakDensityParam(sampleGroups = rep(1, 2), minFraction = 0.5)
xdata_exp <- groupChromPeaks(xdata_exp, param = pdp)

# Create peak table without CAMERA
peak_table_exp <- tidy_peaklist(xdata_exp)
head(peak_table_exp)
#> # A tibble: 6 × 25
#>   sample_index spectraOrigin fromFile feature_id f_mzmed f_mzmin f_mzmax f_rtmed
#>          <int> <chr>            <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1            1 /home/runner…        1          1    105.    105.    105.    63.1
#> 2            1 /home/runner…        1          2    105.    105.    105.   158. 
#> 3            1 /home/runner…        1          3    105.    105.    105.   203. 
#> 4            1 /home/runner…        1          4    106.    106.    106.   181. 
#> 5            1 /home/runner…        1          5    106.    106.    106.   195. 
#> 6            1 /home/runner…        1          6    106.    106.    106.    30.8
#> # ℹ 17 more variables: f_rtmin <dbl>, f_rtmax <dbl>, ms_level <int>,
#> #   filepath <chr>, filename <chr>, peakidx <dbl>, mz <dbl>, mzmin <dbl>,
#> #   mzmax <dbl>, rt <dbl>, rtmin <dbl>, rtmax <dbl>, into <dbl>, intb <dbl>,
#> #   maxo <dbl>, sn <dbl>, is_filled <lgl>
# }