msalign2
Copyright Notice
msalign2 is Copyright (C) 2007-2009 Magnus Palmblad
msalign2 is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License as
published by the Free Software Foundation; either version 2 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
You should have received a copy of the GNU General Public
License along with this program; if not, write to the Free Software Foundation,
Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
The additional scripts (wrap_mzXML.R, FeaturesPlot.sh, MSAlign.sh and all PHP scrips) are Copyright (C) 2008-2009 Ekaterina Nevedomskaya and Rico. J. E. Derks and available under the same GNU General Public License as msalign2 (see above).
About msalign2
msalign2 aligns LC-MS or CE-MS datasets using (reasonably) accurate mass measurements in complex samples sharing many compounds (metabolites, peptides etc.). Datasets can be aligned by matching masses across samples and fitting a curve to these matches. The curve represents the temporal relation between the chromatograms or electropherograms.
Figure 1: Example piecewise curve fitting of 2 BSA digest test runs.
Figure 2: Below an alignment example is shown of a CE-MS dataset before and after alignment.
Installation webapplication msalign2:
The package consists of 4 separate programs/scripts:
- MSAlign.sh: bash script starting everything and keeps track of which datafiles are running and which are already done.
- msalign2: actual program which does the alignment.
- FeaturesPlot.sh: bash script which use gnuplot to create the alignment plots (PNGs).
- wrap_mzXML.R: R script which generates the new aligned mzXML files.
Requirements:
- Linux
- Webserver (e.g. Apache)
- PHP 5.2
- R 2.8.0
- Gnuplot 4.0 (for creating the alignment plots)
- Base64 library (for compiling msalign2, included)
- Random Access Minimal Parser (RAMP) library (for compiling msalign2, included).
MSAlign.sh configuration:
The alignment script can be configured to use multiple CPUs. For this, modify the variable MAXJOBS to the amount of CPUs you want to use.
Default: MAXJOBS=2
msalign2:
In order to compile msalign2 the libraries base64 and RAMP are needed. msalign2 should be compiled as e.g. (this example uses gcc version 4.1.2):
gcc -o msalign2 base64.c ramp.c msalign2.c -I. -lgd -lm -lz -std=gnu99
FeaturesPlot.sh
The FeaturesPlot.sh script needs GNUplot 4.0 for creating the alignment plots.
wrap_mzXML.R:
The R-script needs at least R version 2.8.0 and the XML library package to be installed in R and can be downloaded here. The script uses the XML package for handling of the mzXML files.
Webserver configuration:
You need to give the webserver write access to several directories (./data and ./temp). The website can only search for your datafiles within a certain directory (./data). In the webpage root directory there should be a directory called ./data which links to the place where your datafiles are located. The ./temp directory is for the MSAlign.sh script to store temporary files.
PHP 5.2 configuration:
To be able to execute programs / bash scripts from the webserver the php.ini file was modified. Safe mode (safe_mode = off) was switched off. It is also possible to leave safe mode on (safe_mode = on) and set the appropiate directories in safe_mode_exec_dir (i.e. the directory where the webpage is running in).
Installation / using standalone msalign2:
In order to compile msalign2 the libraries base64 and RAMP are needed. msalign2 should be compiled as e.g. (this example uses gcc version 4.1.2):
gcc -o msalign2 base64.c ramp.c msalign2.c -I. -lgd -lm -lz -std=gnu99
Usage: msalign2 -1 <datafile you want to align> -2<master datafile> -e<max. mass error in MS-only data (in ppm)> -l<typical standard deviation in LC retention time in LC-MS data> -f<number of featurs used for alignment (within +/-10% of this value)> -c<cost per breakpoint (general value is between 0.3 and 0.5)> -d<fitness (general value is 1)> -X<X scan> -Y<Y scan> -R<MS start scan>,<MS end scan>
How to use:
Below the seperate parameters are explained. The focus is mainly on the webapplication, but it is useful for the standalone application as well.
- Before you can start your alignment your files need to be in the correct location. Make sure that the ./data directory is linking to the right location.
- First select a master file to which you want to align all other files. It is only possible to select one file as a master file.
- Next, select all files you want to use fo alignment.
- Select the parameters you want to use. The default parameters already set should give already a proper alignment.
- Mass measurement error [ppm]: Set the mass error of you measurements in ppm
- Standard deviation retention time [scans]: Set the standard deviation of the retention time of your measure in number of scans.
- Number of features: Set the number of features you want to use for aligning. You need at least 100 features for alignment. The more complex your samples is the more features you want to use.
- Start at scan number: From which scan number to start the alignment.
- End at scan number: Until which scan number to end the alignment.
- Endpoint X: The alignment starts at (0,0), but also needs an endpoint beyond the last scan, this is the X value. e.g. add 500 - 1000 to the end scan.
- Endpoint Y: The alignment starts at (0,0), but also needs an endpoint beyond the last scan, this is the Y value. e.g. add 500 - 1000 to the end scan.
- Costs per breakpoint: Set the costs per breakpoint. This determines if a breakpoint is kept or discarded. Lower this value to keep more breaktpoint. Default is 0.5 and should generally give a good aligment.
- Fitness: The fitness is the added fitness for the curve passing through a point. Increasing this value also increases the possibility that an outlier is used for alignment. Generally the default value 1 should give a good alignment.
- After starting the alignment, you can refresh the statuspage to monitor the progress. This page can also be bookmarked!
- After a file is aligned a link appears which will give you a link to show you how good or bad the alignment is - see a good and a bad aligment example.
- After the alignment is finished a button for removing the temporary files appears. Please use this button to remove the temporary files and be redirected to the homepage again.
- Your aligned mzXML files will be located in the same folder as your original files.
For general inquiries, contact magnus.palmblad@gmail.com. For build and installation questions, contact r.j.e.derks@lumc.nl.