Skip to contents

Generates a comprehensive long-format peak table where each row represents one feature for one sample. Optionally includes CAMERA annotations for isotopes, adducts, and pseudospectrum groups if provided.

Usage

tidy_peaklist(x, xsAnnotate = NULL)

Arguments

x

An xcms::XCMSnExp or xcms::XcmsExperiment object containing peak detection results. Feature grouping (via xcms::groupChromPeaks()) is optional but recommended.

xsAnnotate

Optional. An xsAnnotate object from the CAMERA package containing peak annotations (isotopes, adducts, pseudospectrum groups). If NULL (default), annotation columns will not be included in the output.

Value

A tibble::tibble in long format with one row per feature per sample (or one row per peak per sample if features are not defined). The tibble contains the following columns with their data types:

feature_id

integer: Feature identifier (1, 2, 3, ...). Only present when features are defined via xcms::groupChromPeaks().

f_mzmed, f_mzmin, f_mzmax

numeric: Feature-level m/z statistics (median, minimum, maximum). Only present when features are defined.

f_rtmed, f_rtmin, f_rtmax

numeric: Feature-level retention time statistics in seconds (median, minimum, maximum). Only present when features are defined.

ms_level

integer: MS level (typically 1 for MS1 features). Only present when features are defined.

isotopes

character: Isotope annotations from CAMERA (e.g., ""[M]+"", ""[M+1]+""). Only present when xsAnnotate is provided.

adduct

character: Adduct annotations from CAMERA (e.g., ""[M+H]+"", ""[M+Na]+""). Only present when xsAnnotate is provided.

pcgroup

integer: Pseudospectrum group ID from CAMERA. Features with the same pcgroup are believed to originate from the same compound. Only present when xsAnnotate is provided.

feature_group

character: Feature group identifier from MsFeatures::groupFeatures() (e.g., "FG.001", "FG.002"). Only present if groupFeatures was applied.

mz, mzmin, mzmax

numeric: Peak m/z values (centroid, minimum, maximum).

rt, rtmin, rtmax

numeric: Peak retention time in seconds (apex, minimum, maximum).

into

numeric: Integrated peak intensity (integrated area of original (raw) peak).

intb

numeric: Baseline-corrected integrated peak intensity.

maxo

numeric: Maximum peak intensity (apex).

sn

numeric: Signal-to-noise ratio.

is_filled

logical: Whether the peak was gap-filled (TRUE) or originally detected (FALSE).

filepath

character: Full file path to the raw data file.

filename

character: Basename of the raw data file.

fromFile

numeric: Sample index (1-based) corresponding to the order in xcms::fileNames,MsExperiment-method.

peakidx

numeric: Internal peak index from xcms::chromPeaks().

Additional columns

Variable types: Any columns from Biobase::pData() or MsExperiment::sampleData() are included with their original data types.

Details

This function integrates data from multiple sources:

The function supports both xcms::XCMSnExp and xcms::XcmsExperiment objects, providing flexibility for modern XCMS workflows.

Feature grouping is optional: If xcms::groupChromPeaks() has not been run, the function will return peak-level data only, without feature-level aggregation.

Intensity matching: When multiple peaks from the same sample are assigned to a single feature, XCMS has a parameter (method = "maxint") that selects which peak intensity to use. This function filters peaks to match the intensity values that XCMS selected, ensuring consistency with XCMS's feature intensity matrix.

Missing values (NA) indicate that a feature was not detected in a particular sample.

Note

  • XCMS sometimes creates duplicate peaks; this function keeps only the first occurrence.

  • The function uses "maxint" method for feature values, meaning it takes the maximum intensity peak for each feature in each sample.

  • CAMERA annotations are optional. If xsAnnotate is NULL, the isotopes, adduct, and pcgroup columns will not be present in the output.

  • MsFeatures grouping is automatically detected. If MsFeatures::groupFeatures() was applied to the object, the feature_group column will be included in the output.

  • For XcmsExperiment objects, sample metadata is accessed via sampleData() instead of pData().

  • For large datasets (many features x many samples), the output can be memory-intensive. A warning is issued if the expected output exceeds 10 million rows.

Examples

# \donttest{
library(xcms)
#> Loading required package: BiocParallel
#> 
#> This is xcms version 4.8.0 
library(BiocParallel)

# Load example data
faahko_sub <- loadXcmsData("faahko_sub")

# Peak detection
cwp <- CentWaveParam(peakwidth = c(20, 80), noise = 5000)
xdata <- findChromPeaks(faahko_sub, param = cwp, BPPARAM = SerialParam())
#> Removed previously identified chromatographic peaks.
#> Detecting mass traces at 25 ppm ... 
#> OK
#> Detecting chromatographic peaks in 2075 regions of interest ...
#>  OK: 561 found.
#> Detecting mass traces at 25 ppm ... 
#> OK
#> Detecting chromatographic peaks in 2308 regions of interest ...
#>  OK: 741 found.
#> Detecting mass traces at 25 ppm ... 
#> OK
#> Detecting chromatographic peaks in 2081 regions of interest ...
#>  OK: 448 found.

# Peak grouping
pdp <- PeakDensityParam(sampleGroups = rep(1, length(fileNames(xdata))),
                        minFraction = 0.5)
xdata <- groupChromPeaks(xdata, param = pdp)

# Example 1: Without CAMERA annotations
peak_table_no_camera <- tidy_peaklist(xdata)
head(peak_table_no_camera)
#> # A tibble: 6 × 24
#>   sampleNames fromFile feature_id f_mzmed f_mzmin f_mzmax f_rtmed f_rtmin
#>   <chr>          <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 ko15.CDF           1          1    205     205     205    2788.   2785.
#> 2 ko15.CDF           1          2    206     206     206    2785.   2783.
#> 3 ko15.CDF           1          3    207.    207.    207.   2713.   2711.
#> 4 ko15.CDF           1          4    233     233     233    3025.   3024.
#> 5 ko15.CDF           1          5    233.    233.    233.   3016.   3015.
#> 6 ko15.CDF           1          6    240.    240.    240.   3677.   3675.
#> # ℹ 16 more variables: f_rtmax <dbl>, ms_level <int>, filepath <chr>,
#> #   filename <chr>, peakidx <dbl>, mz <dbl>, mzmin <dbl>, mzmax <dbl>,
#> #   rt <dbl>, rtmin <dbl>, rtmax <dbl>, into <dbl>, intb <dbl>, maxo <dbl>,
#> #   sn <dbl>, is_filled <lgl>

# Example 2: With CAMERA annotations
library(CAMERA)
library(commonMZ)

# Convert to xcmsSet for CAMERA (CAMERA requires the old xcmsSet class)
xset <- as(xdata, "xcmsSet")
#> Note: you might want to set/adjust the 'sampclass' of the returned xcmSet object before proceeding with the analysis.

# CAMERA annotation
xs <- xsAnnotate(xset, polarity = "positive")
xs <- groupFWHM(xs, perfwhm = 0.1, intval = "into", sigma = 6)
#> Start grouping after retention time.
#> Created 227 pseudospectra.
xs <- findIsotopes(xs, ppm = 10, mzabs = 0.01, intval = "into")
#> Generating peak matrix!
#> Run isotope peak annotation
#>  % finished: 10  20  30  40  50  60  70  80  90  100  
#> Found isotopes: 23 
xs <- groupCorr(xs, calcIso = FALSE, calcCiS = TRUE, calcCaS = FALSE,
                cor_eic_th = 0.7, pval = 1E-6)
#> Start grouping after correlation.
#> Generating EIC's .. 
#> Warning: Found NA peaks in selected sample.
#> 
#> Calculating peak correlations in 227 Groups... 
#>  % finished: 10  20  30  40  50  60  70  80  90  100  
#> 
#> Calculating graph cross linking in 227 Groups... 
#>  % finished: 10  20  30  40  50  60  70  80  90  100  
#> New number of ps-groups:  280 
#> xsAnnotate has now 280 groups, instead of 227 

# Get adduct/fragment rules from commonMZ
rules_pos <- commonMZ::MZ_CAMERA(mode = "pos", warn_clash = TRUE, clash_ppm = 5)
#> Warning: The following adducts/fragments seem to collide. 
#> # A tibble: 2 × 2
#>   first       second                             
#>   <chr>       <chr>                              
#> 1 [M+H-NH3]+  [M+NH4]+                           
#> 2 [M+H-C3H4]+ [M+H+(CH3)2CO-H2O]+ (acetone cond.)
#> 
#> 
#> Consider removing one of them. Example: 
#>  rules=rules[            !grepl("[M+NH4]+",rules[,"name"],fixed=TRUE)         ,]
rules_pos <- as.data.frame(rules_pos)

# Find adducts using the rules
xs <- findAdducts(xs, ppm = 10, mzabs = 0.01, multiplier = 4,
                  polarity = "positive", rules = rules_pos)
#> Generating peak matrix for peak annotation!
#> Polarity is set in xsAnnotate: positive 
#> Found and use user-defined ruleset!
#> Calculating possible adducts in 280 Groups... 
#>  % finished: 10  20  30  40  50  60  70  80  90  100  

# Create long-format peak table
peak_table <- tidy_peaklist(xdata, xs)
head(peak_table)
#> # A tibble: 6 × 27
#>   sampleNames fromFile feature_id f_mzmed f_mzmin f_mzmax f_rtmed f_rtmin
#>   <chr>          <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1 ko15.CDF           1          1    205     205     205    2788.   2785.
#> 2 ko15.CDF           1          2    206     206     206    2785.   2783.
#> 3 ko15.CDF           1          3    207.    207.    207.   2713.   2711.
#> 4 ko15.CDF           1          4    233     233     233    3025.   3024.
#> 5 ko15.CDF           1          5    233.    233.    233.   3016.   3015.
#> 6 ko15.CDF           1          6    240.    240.    240.   3677.   3675.
#> # ℹ 19 more variables: f_rtmax <dbl>, ms_level <int>, isotopes <chr>,
#> #   adduct <chr>, pcgroup <int>, filepath <chr>, filename <chr>, peakidx <dbl>,
#> #   mz <dbl>, mzmin <dbl>, mzmax <dbl>, rt <dbl>, rtmin <dbl>, rtmax <dbl>,
#> #   into <dbl>, intb <dbl>, maxo <dbl>, sn <dbl>, is_filled <lgl>

# Explore the data
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following object is masked from 'package:MSnbase':
#> 
#>     combine
#> The following objects are masked from 'package:S4Vectors':
#> 
#>     first, intersect, rename, setdiff, setequal, union
#> The following object is masked from 'package:Biobase':
#> 
#>     combine
#> The following objects are masked from 'package:BiocGenerics':
#> 
#>     combine, intersect, setdiff, setequal, union
#> The following object is masked from 'package:generics':
#> 
#>     explain
#> The following objects are masked from 'package:xcms':
#> 
#>     collect, groups
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union

# Count features per sample
peak_table %>%
  group_by(filename) %>%
  summarize(n_features = sum(!is.na(into)))
#> # A tibble: 3 × 2
#>   filename n_features
#>   <chr>         <int>
#> 1 ko15.CDF        292
#> 2 ko16.CDF        318
#> 3 ko18.CDF        303

# View adduct annotations
peak_table %>%
  filter(!is.na(adduct)) %>%
  select(feature_id, f_mzmed, f_rtmed, adduct, pcgroup)
#> # A tibble: 1,044 × 5
#>    feature_id f_mzmed f_rtmed adduct                                     pcgroup
#>         <int>   <dbl>   <dbl> <chr>                                        <int>
#>  1          1    205    2788. "[M+H-C3H9N-C2H4O2]+ 323.091"                   10
#>  2          2    206    2785. "[M+H-NH3-C3H4]+ 262.041 [M+H-C4H6O]+ 275…      36
#>  3          3    207.   2713. ""                                             164
#>  4          4    233    3025. ""                                             181
#>  5          5    233.   3016. ""                                             208
#>  6          6    240.   3677. ""                                             228
#>  7          7    241.   3670. ""                                              97
#>  8          8    242.   3674. ""                                             202
#>  9          9    244.   2829. ""                                             146
#> 10         10    249.   3664. "[M+H-C4H6-H2O]+ 320.158 [M+H-C4H6-NH3-H2…     111
#> # ℹ 1,034 more rows

# Example 3: With XcmsExperiment object
library(MsExperiment)
library(msdata)
#> 
#> Attaching package: 'msdata'
#> The following object is masked from 'package:dplyr':
#> 
#>     ident

# Load data as XcmsExperiment
fls <- dir(system.file("sciex", package = "msdata"), full.names = TRUE)
xdata_exp <- readMsExperiment(spectraFiles = fls[1:2], BPPARAM = SerialParam())

# Peak detection and grouping
cwp <- CentWaveParam(peakwidth = c(5, 20), noise = 100)
xdata_exp <- findChromPeaks(xdata_exp, param = cwp, BPPARAM = SerialParam())
pdp <- PeakDensityParam(sampleGroups = rep(1, 2), minFraction = 0.5)
xdata_exp <- groupChromPeaks(xdata_exp, param = pdp)

# Create peak table without CAMERA
peak_table_exp <- tidy_peaklist(xdata_exp)
head(peak_table_exp)
#> # A tibble: 6 × 25
#>   sample_index spectraOrigin fromFile feature_id f_mzmed f_mzmin f_mzmax f_rtmed
#>          <int> <chr>            <dbl>      <int>   <dbl>   <dbl>   <dbl>   <dbl>
#> 1            1 /home/runner…        1          1    105.    105.    105.    63.1
#> 2            1 /home/runner…        1          2    105.    105.    105.   158. 
#> 3            1 /home/runner…        1          3    105.    105.    105.   203. 
#> 4            1 /home/runner…        1          4    106.    106.    106.   181. 
#> 5            1 /home/runner…        1          5    106.    106.    106.   195. 
#> 6            1 /home/runner…        1          6    106.    106.    106.    30.8
#> # ℹ 17 more variables: f_rtmin <dbl>, f_rtmax <dbl>, ms_level <int>,
#> #   filepath <chr>, filename <chr>, peakidx <dbl>, mz <dbl>, mzmin <dbl>,
#> #   mzmax <dbl>, rt <dbl>, rtmin <dbl>, rtmax <dbl>, into <dbl>, intb <dbl>,
#> #   maxo <dbl>, sn <dbl>, is_filled <lgl>
# }