Prepare files with your new configuration
- First install and run the demo and make sure everything is working.
- Make copies of
docker-compose_demo.yml
andsettings_demo.env
and rename the copies to remove_demo
from the filenames.
Choose what processes QC4metabolomics should run and correct paths
Open your newly created docker-compose.yml
in a text
editor and modify to fit your system.
The system consists of different “machines” that take care of different
parts of QC4Metabolomics:
- mariadb (required): runs the database that keep all the data.
- qc_process (required): This runs the background processing of the files that extracts the data.
- qc_shiny (required): This runs the web interface that show you the data.
- ms_converter (optional): This is an automatic converter that takes the raw vendor data and converts it into mzML files.
- db-backup (optional): Automatic backup of the database
First thing you do is replace every instance of
settings_demo.env
with settings.env
in your
new docker-compose.yml
.
Next you need to change several locations:
- under
mariadb
–>volumes
–> changesource
to the location to where the database files will be kept. - under
qc_process
–>volumes
–> changesource
to the location of your data (where your mzML files live). - under
qc_shiny
–>volumes
–> changesource
to the location of your data (where your mzML files live). - under
ms_converter
–>volumes
–> changesource
to the location of your data (where your mzML files live). - under
db-backup
–>volumes
–> changesource
to the location where you want your database backups to be saved.
If you do not want to use ms_converter
or
db-backup
you can remove those sections entirely.
General settings
Now we can change settings in settings.env
to fit your
specific needs.
The first setting is the time zone that can be adjusted so that analysis times are displayed in the local time.
General module settings
Next you will find that all modules modules have the sections:
QC4METABOLOMICS_module_MODULENAME_enabled
QC4METABOLOMICS_module_MODULENAME_shiny_enabled
QC4METABOLOMICS_module_MODULENAME_schedule
QC4METABOLOMICS_module_MODULENAME_file_schedule
Normally you’d only need to change
QC4METABOLOMICS_module_MODULENAME_enabled
that decides if
the module is loaded at all.
Some modules also have the setting
QC4METABOLOMICS_module_MODULENAME_init_db_priority
that
decides in which order the module’s tables are created in the database
when the database is first created. Should not be touched.QC4METABOLOMICS_module_MODULENAME_process_order
controls in
which order background processesing runs. Probably no need to change
these.
You can enable/disable parts of a module using these settings:
-
QC4METABOLOMICS_module_MODULENAME_shiny_enabled
enables/disables the interface for the module. -
QC4METABOLOMICS_module_MODULENAME_schedule
enables/disables the scheduled run on background processing. -
QC4METABOLOMICS_module_MODULENAME_file_schedule
chooses if this modules will be scheduled to work on individual files.
Specific module settings
Files
-
QC4METABOLOMICS_module_Files_include_ext
: When determining which files to include choose only files matching these extension (OR). Cannot be empty. Separated by;
. -
QC4METABOLOMICS_module_Files_include_path
: To be included, the full path must include this string (AND). Can be empty. -
QC4METABOLOMICS_module_Files_exclude_path
: To be included, the full path must NOT include this string (OR). Can be empty. -
QC4METABOLOMICS_module_Files_files_from_txt
: Chooses whether or not files are found by traversing a folder or by reading a text file that includes the exact paths. We recommend that you use a text file that is updated as your analysis progresses as a deep folder structure with thousands of files will become slow to traverse. Refer also to “Moving raw files automatically” for a help script that moves files and creates this index file. -
QC4METABOLOMICS_module_Files_files_txt_path
: The path to the text file that contains the path to the data files. The paths should be relative to the data path as seen from inside the docker image. that means that if you files are ind:\folder\mzML_files
and you mountedd:\folder\mzML_files
to/data
in the docker compose file then the paths would look like:/data/project name/mzML files/project name_date_instrument_sequence no_mode_sample name.mzML
. Refer also to “Moving raw files automatically” for a help script that moves files and creates this index file.
FileInfo
-
QC4METABOLOMICS_module_FileInfo_mask
: This is very important. This is the pattern of your file name. Required information in the file name isinstrument
,project
,mode
(if not using below workaround) andsample_id
. Other fields can be specified but are not used.
Example:%project%_%date%_%instrument%_%batch_seq_nr%_%mode%_%sample_id%
-
QC4METABOLOMICS_module_FileInfo_mode_from_other_field
: If the mode is part of another “field” and not specified separately, this setting can be enabled and the mode extracted. -
QC4METABOLOMICS_module_FileInfo_mode_from_other_field_which
: This is the field name the mode should be deduced from. -
QC4METABOLOMICS_module_FileInfo_mode_from_other_field_pos_trigger
: What string from the field name above should trigger the mode being set to “pos”? -
QC4METABOLOMICS_module_FileInfo_mode_from_other_field_neg_trigger
: What string from the field name above should trigger the mode being set to “neg”?
TrackCmp
-
QC4METABOLOMICS_module_TrackCmp_ROI_ppm
: When subsetting the raw data how large a deviation from the target m/z is allowed? This needs to be quite large since the deviation at the tails of peaks is usually larger than expected. Only scans inside this window is used. The centwave ppm setting will limit appropriately the peak selected. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_method
: selected peak picking method. OnlycentWave
supported ATM. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_snthr
: The signal to noise threshold for a peak to be picked. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_ppm
: The ppm tolerance for peak detection. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_peakwidth
: The allowed peak with range in seconds. Should be two numbers separated by comma. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_scanrange
: The scan range to consider. Can beNULL
to analyze the whole file. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_prefilter
: The prefilter settings for centWave. Refer to the CentWave for details. Should be two numbers separated by comma. The first number says how many scans need to be above the intensity given by the second number for a peak to be picked. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_integrate
: Integration method 1 or 2. Refer to the Centwave documentation. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_verbose_columns
: Whether extra columns should be returned or not. Do not touch. -
QC4METABOLOMICS_module_TrackCmp_findPeaks_fitgauss
: Whether a gaussian curve should be fitted to the peak. -
QC4METABOLOMICS_module_TrackCmp_xcmsRaw_profparam=
: Profiling parameter for peak picking. Set to 0 to disable use of a profile matrix. Do not touch. -
QC4METABOLOMICS_module_TrackCmp_std_match_ppm
: The ppm tolerance for matching the defined compounds to track. -
QC4METABOLOMICS_module_TrackCmp_std_match_rt_tol
: The retention time tolerance, in seconds, for matching the defined compounds to track.
Contaminants
-
QC4METABOLOMICS_module_Contaminants_cont_list_type
: “URL” or “local”. Only URL supported at the moment. -
QC4METABOLOMICS_module_Contaminants_cont_list_loc_positive
:The URL/path to the positive mode contaminant list. The file should be a tsv-file (tab delimited) containing the columnscompound_ID
,ion_ID
,mode
,mz
,anno
,origin
. -
QC4METABOLOMICS_module_Contaminants_cont_list_loc_unknown
: The URL/path to the contaminant list for files where the mode is unknown. -
QC4METABOLOMICS_module_Contaminants_cont_list_loc_negative
: The URL/path to the negative mode contaminant -
QC4METABOLOMICS_module_Contaminants_EIC_ppm
: The ppm tolerance for a mass peak to be included in the extracted ion chromatogram the contaminant examination is based on. Should we wide to include low intensity peaks.
ICMeter
-
QC4METABOLOMICS_module_ICMeter_user
: The username for the ICMeter system. -
QC4METABOLOMICS_module_ICMeter_password
: The password for the ICMeter system.
Warner
-
QC4METABOLOMICS_module_Warner_email_from
: The email address that will appear as the sender. -
QC4METABOLOMICS_module_Warner_email_to
: The recepient of the warning emails. -
QC4METABOLOMICS_module_Warner_email_user=YOUR_USER
: The username for the SMTP (outgoing) e-mail server. See “running the demo” for more information. -
QC4METABOLOMICS_module_Warner_email_password=YOUR_PASSWORD
: The password for the SMTP (outgoing) e-mail server. -
QC4METABOLOMICS_module_Warner_email_host
: The SMTP server host address. -
QC4METABOLOMICS_module_Warner_email_port
: The SMTP server port number. -
QC4METABOLOMICS_module_Warner_email_use_ssl
: If to use SSL encryption with the mail server.
Settings for additional tools
MS convert
The docker container ms_converter takes a text file named
raw_filelist.txt
in the mounted folder, converts the files
to mzML and writes the new path to mzML_filelist.txt
in the
mounted folder. It runs every minute to check for new files not yet
converted.
-
QC4METABOLOMICS_msconvert_args
: settings msconvert from ProteoWizard uses to convert the data. the default is--filter \"scanEvent 1\" --mzML --zlib --64
, which takes only the first scanEvent (what Waters calls functions), outputs to mzML, compresses using zlib and saves the values with 64 bit precision. Quotes should be escaped with\
. -
QC4METABOLOMICS_msconvert_outdir_prefix
: The output folder relative to the source files. the default,/../mzML
, steps one folder back, makes a new mzML folder and puts the converted files there.
Internal settings for advanced users
These settings do not need to be changed but can be.
-
QC4METABOLOMICS_base
: the data folder internal to the docker images. -
MYSQL_ROOT_PASSWORD
: The root database password. -
MYSQL_DATABASE
: The name of the database. -
MYSQL_USER
: The database user. -
MYSQL_PASSWORD
: The database user’s password. -
MYSQL_HOST
: The database host name. Needs to match with the container name in docker-compose.yml. -
MYSQL_PORT
: The database port number. -
MARIADB_AUTO_UPGRADE
: Whether the database automatically upgrades.
DB backup
This container automatically backs up the database regularly.
TIMEZONE
: The time zone for correctly dating the backup. Use a TZ identifier from the official list.CONTAINER_NAME
: The database container name. Needs to match with the container name in docker-compose.yml.CONTAINER_ENABLE_MONITORING
: ??? The documentation is unclear but this should nto be changed.BACKUP_JOB_CONCURRENCY
: Number of concurrent backups (if used with more than one database)DEFAULT_CHECKSUM
: Whether to create checksumsDEFAULT_BACKUP_INTERVAL
: Minutes between backups. 1440 minutes = 24 h.DEFAULT_BACKUP_BEGIN
: Whether to make a backup at lunchDEFAULT_CLEANUP_TIME
: Number of hours to keep the backupsDEFAULT_COMPRESSION
: Compression method used. Use either GzipGZ
, Bzip2BZ
, XZipXZ
, ZSTDZSTD
or noneNONE
.DB01_TYPE
: Type of databaseDB01_HOST
: The database host name. Needs to match with the container name in docker-compose.yml.DB01_NAME
: The name of the database.DB01_USER
: The database user.DB01_PASS
: The database user’s password.
For more details refer to the image’s documentation.