Unveiling metabolic remodeling in mucopolysaccharidosis type III through integrative metabolomics and pathway analysis

Background Metabolomics represent a valuable tool to recover biological information using body fluids and may help to characterize pathophysiological mechanisms of the studied disease. This approach has not been widely used to explore inherited metabolic diseases. This study investigates mucopolysaccharidosis type III (MPS III). A thorough and holistic understanding of metabolic remodeling in MPS III may allow the development, improvement and personalization of patient care. Methods We applied both targeted and untargeted metabolomics to urine samples obtained from a French cohort of 49 patients, consisting of 13 MPS IIIA, 16 MPS IIIB, 13 MPS IIIC, and 7 MPS IIID, along with 66 controls. The analytical strategy is based on ultra-high-performance liquid chromatography combined with ion mobility and high-resolution mass spectrometry. Twenty-four amino acids have been assessed using tandem mass spectrometry combined with liquid chromatography. Multivariate data modeling has been used for discriminant metabolite selection. Pathway analysis has been performed to retrieve metabolic pathways impairments. Results Data analysis revealed distinct biochemical profiles. These metabolic patterns, particularly those related to the amino acid metabolisms, allowed the different studied groups to be distinguished. Pathway analysis unveiled major amino acid pathways impairments in MPS III mainly arginine–proline metabolism and urea cycle metabolism. Conclusion This represents one of the first metabolomics-based investigations of MPS III. These results may shed light on MPS III pathophysiology and could help to set more targeted studies to infer the biomarkers of the affected pathways, which is crucial for rare conditions such as MPS III. Electronic supplementary material The online version of this article (10.1186/s12967-018-1625-1) contains supplementary material, which is available to authorized users.

. MRM transitions for each amino acid and its corresponding internal standard. Table S2. Instrumental settings for UHPLC-IM-MS analysis. Table S3. The normalized concentrations of free amino acids in urine samples of the studied groups. Table S4. Data for Venn diagram of the significant pathways retrieved from untargeted, targeted approaches and in silico systems biology. Figure S1. Illustration of the analysis sequence.  Figure S9. Area under the receiver operating characteristic (ROC) curves, comparing diagnostic performance of the most significant quantified Arginine to differentiate the different MPS III subtypes and Control samples. Figure S10. Pathways of arginine metabolism and its connections to urea cycle.
The following protocol was used for preparation of urine samples. The urine were first diluted using 50/50 v/v using deionized water. An aliquot of 40 μL of the sample was added to 10 μL of 10% sulfosalicylic acid in order to precipitate proteins. After mixing and centrifugation (10 000 x g for 2 min) the supernatant was mixed with 40 μL of borate buffer. Next, an aliquot of 10 μL the obtained solution was labeled with aTRAQ reagent solution (aTRAQ reagent Δ8), mixed and centrifuged. After 30 min of incubation at room temperature the labeling reaction was stopped by addition of 5 μL 1.2% hydroxylamine solution and the sample was incubated at room temperature for 15 min. In the next step, 32 μL of the internal standard solution was added to the sample. After mixing and centrifugation the sample was evaporated in a vacuum concentrator for 15 min in order to reduce volume to about 20 μL. Then the residue was diluted with 20 μL of water. Each determined amino acid had its corresponding internal standard (the same amino acid labeled with the aTRAQ reagent Δ0). Two non-proteinogenic amino acids (norleucine and norvaline) were used to evaluate the labeling efficiency and recovery. A calibration curve was constructed using five concentrations derived from the peak area ratio of each amino acid and the internal standard. A calibration curve has been used as follows: V; source temperature, 600 °C; ion source gas 1, 60 psig and ion source gas 2, 50 psig. The mass spectrometer operated in positive ionization mode with the following parameters: entrance potential, 10 V; declustering potential, 30 V and collision cell exit potential, 5 V. Collision energy of 30 eV was applied. The list of measured MRM transitions is presented in Table S1. Scheduled multiple reaction monitoring mode was used with nitrogen as a collision gas. A system suitability test was conducted before each batch of the samples (analysis of a standard mixture) to warm up the LC-MS/MS system and check the inter-day performance of the system. Data acquisition and processing were performed using the Analyst 1.5 software (Sciex, Framingham, MA, USA). both solvents contained 0.1% formic acid. The duration of column equilibration was adjusted to provide sufficient retention and chromatographic precision of early eluting species in subsequent analyses at the minimal expense of time. Sample analysis order has been randomized to avoid potential for confounding critical variables with analytical run order effects. Peak splitting and column overload was avoided by using small injection volumes (2 μL) for LC-IM-MS analysis.

Mass spectrometry
The U-HPLC system was coupled to a hybrid quadrupole orthogonal time-of-flight (TOF) mass spectrometer (SYNAPT G2 HDMS, Waters MS Technologies, Manchester, UK). The mass spectrometer was operated in positive electrospray ionization mode. A mass range of m/z 50−1200 was used in both modes. The sample cone voltage, extraction cone voltage, source temperature, desolvation temperature, desolvation gas flow and cone gas flow were optimized and were as follows respectively: 25V, 5V, 120°C, 500°C, 400 L/h, 50 L/h. Leucine enkephalin was used as the lock mass [M+H] + at m/z 556.2771. Sodium formate solution was used for external instrument calibration.

Ion mobility
Synapt G2 HDMS (Waters MS Technologies, Manchester, UK) was used in our study for Ion Mobility. It is equipped with a traveling wave "Triwave TM " geometry in which the ion mobility cell (IMS T-wave) is placed between two traveling wave ion guides (trap T-wave and transfer T-wave). After ionization in the source and transfer through the quadrupole, the ions arrive at the first traveling-wave ion guide that acts as an ion trap, namely "trap TWIG". In this region, the ions are accumulated before being released in packets and accelerated using the trap-bias voltage to the second cell "IMS-TWIG" for mobility separation. In the IMS-TWIG a traveling wave is continuously applied at a given wave height and velocity to enhance separation through the mobility cell, which is filled with a gas. In this study, the IMS drift gas flow (nitrogen) and the wave velocity settings were assessed and optimized. The helium cell gas flow, wave height, Trap Bias and IMS wave delay were set at 180 mL/min, 40 V, 45 V and 450 µs respectively. The TOF analyzer was operated in the V resolution mode with an average mass resolution of m/Δm 20,000 (full-width at half-maximum S-7 definition). Data acquisition of an ion mobility experiment consisted of 200 bins. CCS values, obtained in nitrogen, were experimentally determined using singly charged Poly-DL-alanine oligomers as the TWIM calibrant species for ESI+. CCS values were derived according to previously reported procedures 1 . The ion mobility resolution was ∼40 Ω/ΔΩ (fwhm). The N2 CCS values reported were determined at the apex of the ion-mobility peak. Detailed instrument settings are presented in Table S-2.

Raw data preprocessing
All LC-IM-MS raw data files data processing, peak detection and peak matching across samples using retention time (tR) correction and chromatographic alignment along with drift time and CCS calculation were performed using Progenesis QI (Waters MS Technologies, Manchester, UK) to yield a data matrix containing retention times, accurate masses, CCS and peak intensities. The preprocessing step resulted in an X-matrix where tR, CCS and m/z values were concatenated into ''tR_m/z_CCS'' features (in columns) present in each sample (in rows) with corresponding peak areas.

Quality Control
Aliquoted 10 μL of each urine sample are mixed together to generate a pooled quality control sample (QCs). QCs and solvent blank samples (mobile phase) were injected sequentially inbetween the urine samples. In addition, a dilution series of QC samples (6%, 12.5%, 25%, 50% and 100% original concentration) are used to assess the quality of the extracted features. An analysis sequence is presented in Figure. S-1. Indeed, feature extraction algorithms including automatic peak detection, grouping, and integration often yield a data matrix containing analytical system noise such as mobile phase chemical contaminants signals. Depending on the software and the used parameters for feature extraction, such noise can represent a significant portion of the total number of detected features. Therefore, this may mislead further data analysis such as transformation, normalization and data modeling. Simple noise filtering strategies such as the minimum fraction filter may remove infrequently observed signals within the considered distinct sample classes. In this study, we used a filter strategy in which the features intensity must be correlated to the matrix concentration in a series of diluted QC samples in order to be included in further analysis. Beyond its role as a system noise marker, the dilution series filter is very useful to assess the informative quality of the extracted features. It ensures that the observed signal of a given feature and its relative concentration in the sample are positively correlated. This approach is used to identify feature groups that are not correlated to the gradient of concentration generated by the dilutions series and therefore should not be considered as reliable features. Thus, feature groups with correlation coefficient of less than 0.7 were removed from the dataset. Furthermore, datasets are refined by removal of feature groups that do not meet threshold of peak area measurement precision prior to data analysis. This approach uses RSD values derived from repeated measurements of a pooled QC sample. The threshold was set to RSD<25%. Thus enhancing the biological interpretation of metabolomics data. The system stability assessment has been done using the Principal Component Analysis and assessing the clustering of the QC samples. Figures are shown in supporting information (Figures S-2) presenting the PCA score plot derived from the metabolomics analysis of all the urine samples and QCs replicates. In the PCA score plot, each point corresponds to a different sample or QC. The tightness of the clustering reveals the similarity of the samples (QCs), and thus the system's stability. The QCs are tightly clustered indicating a good instrumental stability over the metabolomics analysis. S-9

Data analysis and modeling
Support vector regression normalization method was applied using the MetNormalizer R package 2 before any data analysis, to remove the unwanted intra-and inter-batch measurements analytical variations. The effect of this normalization step on the raw data is shown in Figures S-2, S-3 and S-4 (Supporting information). Then normalized data matrix has been log-transformed and paretoscaled. All data analyzes and modeling were done using SIMCA 14.0 (MKS DAS, Umeå, Sweden). First, hierarchical cluster analysis has been applied to the data set to get an overview of the clustering trends of samples with similar profiles of variable intensity. Furthermore, multivariate data analysis and modeling is performed using Principal Component Analysis (PCA) as an unsupervised method. PCA was first applied to get an overview of the data and identify potential severe outliers which are defined as observations whose scores mapped outside the Hotelling's T2 ellipse (confidence interval = 0.95) in a cross-validated seven-component model. The DmodX was used to detect moderate outliers 3 . Orthogonal Partial Least-Squares-Discriminant Analysis (OPLS-DA) is used as a supervised method. To select the most relevant features, the training group has been repeatedly split into a training set and a test set. A permutation test (999 iterations) was performed to prevent the OPLS-DA over fitting of the model by comparing diagnostic statistic metrics of the generated model with those of randomly generated models. R2X is the cumulative modeled variation in X (X = features), R2Y is the cumulative modeled variation in Y (Y = sample groups), and Q2Y is the cumulative predicted variation in Y, based on the cross-validation. The range of these parameters is between 0 and 1, where 1 indicates a perfect fit. Furthermore, crossvalidated analysis of variance (CV-ANOVA) was systematically performed based on the crossvalidated model 4 . The Figure S-2 shows a PCA score plot of the raw data compared to normalized, log-transformed and pareto-scaled data which highlights the importance of these data pretreatment steps before data modeling. The X matrix was 100 x 854 variables. The Y matrix was 100 analyses × 3 groups. Characteristics and validation results from all OPLS-DA models are provided in supplemental material.

Feature selection and annotation
To select the most discriminant variables for the separation of groups, S-Plot was used. The S-plot combines the covariances and correlations between the X matrix and OPLS scores for a given model component. The covariance values give the magnitude of contribution of a variable while the correlation values reflect the effect and reliability of the variable for the model component scores. Variables with both very high correlation and covariance are important for the explanation power of the model. Furthermore, selection of discriminant variables was achieved using the VIP score procedures for each validated OPLS-DA model 5 . Putative annotation of detected features was performed using both accurate mass comparison using freely available metabolite databases HMDB, LipidBlast, KEGG, and Metlin. Furthermore, CCS values were also compared to the MetCCS database 6 .

Pathway and network analysis
In order to provide a broader understanding of metabolic changes in MPS I, we also explored the biochemical pathways using a network analysis approach using Mummichog (v.1.0.5) which allows pathway enrichment analyses. The idea behind this metabolic network prediction strategy assumes that metabolite concentration alterations are more likely to occur within a metabolic connected network rather than in a random fashion. This Mummichog python package highlights pathways that are significantly impacted in the studied groups. Significantly impacted biochemical pathways are those exhibiting an adjusted p-value <0.05. For this comparison, we focused on features that significantly changed (511 features with q-values = 0.05 and FDR = 5%).

S-10
Mummichog annotates metabolites based on accurate mass m/z (5 ppm mass error was used) and tests significant pathway enrichment within a reference metabolic network using a Fisher's exact test 7 . The matched candidates were then mapped to reference human metabolic networks from the KEGG, MetaCyc, Recon and Edinburgh Human Metabolic Network. The null distribution in pathway analysis was obtained from 1000 set of randomly permutated m/z lists draw from all features detected in the whole metabolomic dataset and modeled by Gamma distribution. To protect against incorrect pathway selection, redundant pathways or those enriched by less than two metabolites were excluded. MetaboAnalyst 8 has been used for Metabolite Set Enrichment Analysis using the amino acid concentration matrix.   Figure S1. Illustration of the analysis sequence order. Ten μL of each urine sample are mixed together to generate a pooled quality control sample (QCs). QCs and solvent blank samples (mobile phase) were injected sequentially in-between the urine samples. In addition, a dilution series of QC samples (6%, 12.5%, 25%, 50% and 100% of the original concentration) are used to assess the quality of the extracted features. A conditioning step is used to condition the column using ten QC injections. Sample injection order has been orthogonalized.    Sadenosylmethionine. aKG: a-ketoglutarate. The metabolites that were measured in the present study are shown in red boxes. Elevated metabolites in MPSI urine are highlighted in orange. For a more convenient highlight of the metabolites, each MPS III subtype is depicted in a single figure hereafter.