CAC-2018 Abstracts

PL-01 | PL-02 | PL-03 | PL-04 | PL-05 |

KL-01 | KL-02 | KL-03 | KL-04 | KL-05 | KL-06 |

YL-01 | YL-02 | YL-03 | YL-04 | YL-05 |

OA-01 | OA-02 |

OB-01 | OB-02 | OB-03 | OB-04 | OB-05 | OB-06 |

OC-01 | OC-02 | OC-03 | OC-04 |

OD-01 | OD-02 | OD-03 | OD-04 | OD-05 |

OE-01 | OE-02 | OE-03 | OE-04 | OE-05 |

OF-01 | OF-02 | OF-03 | OF-04 |

OG-01 | OG-02 | OG-03 | OG-04 |

OH-01 | OH-02 | OH-03 | OH-04 |

OI-01 | OI-02 | OI-03 |

OJ-01 | OJ-02 |

OK-01 | OK-02 | OK-03 | OK-04 |

OL-01 | OL-02 | OL-03 | OL-04 | OL-05 |

P-01 | P-02 | P-03 | P-04 | P-05 | P-06 | P-07 | P-08 | P-09 | P-10 |

P-11 | P-12 | P-13 | P-14 | P-15 | P-16 | P-17 | P-18 | P-19 | P-20 |

P-21 | P-22 | P-23 | P-24 | P-25 | P-26 | P-27 | P-28 | P-29 | P-30 |

P-31 | P-32 | P-33 | P-34 | P-35 | P-36 | P-37 | P-38 | P-39 | P-40 |

P-41 | P-42 | P-43 | P-44 | P-45 | P-46 | P-47 | P-48 | P-49 | P-50 |

P-51 | P-52 | P-53 | P-54 | P-55 | P-56 | P-57 | P-58 | P-59 | P-60 |

P-61 | P-62 | P-63 | P-64 | P-65 | P-66 | P-67 |

Invited Presentations:



Steven D. Brown1

1Brown Laborartory,Department of Chemistry and Biochemistry, University of Delaware, Newark, DE 19716 USA

While analysis applied to chemical data has been around since the days of Gossett, it wasn’t until 1969 that the area caught the attention of chemists. This talk is a retrospective on the development of the field and how Kowalski’s unusual approach to research in general and data analysis in particular resonated with the rapid changes in computation and in chemistry. I will highlight innovations that Kowalski was a part of, and how those helped shape the field. I’ll also discuss how the principles that underlay his approach might be useful in the rapidly developing area of data science.

Acknowledgement: Work supported by the United States National Science Foundation, Grant  1506853.




Federico Marini1

1Dept. of Chemistry, University of Rome La Sapienza, Rome, Italy

Although partial least squares discriminant analysis (PLS-DA) [1-3] is nowadays one of the most widely used if not the most commonly used classification techniques, especially in fields where the need of building discriminant models for multi- and megavariate data is more stringent, such as metabolomics or systems biology in general, and food science, yet it is often applied acritically and not infrequently as a blackbox, confiding in what (commercial, but not only) softwares implement and as if there be a only a single and univocal definition of the technique. However, already in the two-class case and even more for the multi-class configuration, many possible implementations have been proposed for this method, each one with its advantages and limitations, so that whenever one speaks of PLS-DA sans phrase, it is hard to understand which technique one is actually referring to.
Starting from these consideration, the present communication aims at critically sketching an overview of the different PLS-DA implementations for the two- and the multi-class case, highlighting their salient features by means of real and simulated examples.

[1] M. Sjöström, S. Wold, B. Söderström, PLS discriminant plots. In: E.S. Gelsema, L.N. Kanal (Eds.), Pattern recognition in pratice II, Elsevier, Amsterdam, The Netherlands, 1986, pp. 461-470.
[2] L. Ståhle, S. Wold, Partial least squares analysis with cross-validation for the two-class problem: a Monte Carlo study, J. Chemometr. 1 (1987) 185-196.
[3] M. Barker, W. Rayens, Partial least squares for discrimination, J. Chemometr. 17 (2003) 166-173



Peter de Boves Harrington1

1Ohio University Center for Intelligent Chemical Instrumentation, Department of Chemistry and Biochemistry, Clippinger Laboratories, Athens, OH 45701-2979 USA.

Many advances in modern society are due to embedded artificial intelligence. Some of these innovations are voice recognition in phones and personal assistants, image recognition for identifying faces in photos, and digital transcription of handwriting. These advances rely on deep neural networks, some of which are efficiently built with layers of restricted Boltzmann machines (RBMs). RBMs have many potential applications in analytical chemistry for finding features in data. They may be used to transform data and improve the performance of classification and calibration methods that are applied to the RBM outputs. Because the goal of the RBM layer is to furnish a nonlinear transform, using a cascading network structure allows both linear and nonlinear features to be made available to subsequent chemometric methods such as partial least squares and support vector machines. Some examples will demonstrate the improvement for quantifying fat and moisture content using near-infrared spectroscopy[1]. The improvement by RBMs of the classification of spectra from botanical materials, such as Cannabis or teas, for authentication and chemotyping shall also be demonstrated. 

[1] P.B. Harrington, Feature Expansion by a Continuous Restricted Boltzmann Machine for Near-Infrared Spectrometric Calibration, Analytica Chimica Acta. (2018) 1010 20-28 DOI: 10.1016/j.aca.2018.01.026.




J-M. Roger1

1ITAP, Irstea Montpellier Centre, BP 5095 34196 Montpellier cedex 5, France.

Orthogonal projections are algebraic tools that make it possible to separate the contribution of complementary vector subspaces, e.g. the horizontal 2D space of a map and the vertical space 1D of the altitudes. With respect to an X matrix, orthogonal projections can act on the row space (same dimension as number of variables) or the column space (same dimension as number of samples) of X. 

Classical statistics commonly uses the orthogonal projection to split the variance into complementary parts. To do that, the projection is performed in the column space. This operation is found (implicitly) in most of the multivariate linear methods, as PCA, PLS, LDA, etc. and also explicitly in the NIPALS algorithm.

When the variables are (more or less) independent, the orthogonal projection in the row space has no evident interest; this is why this option has not been studied in the framework of classical statistics. When the variables are very dependent, as it is the case in spectrometry, the column space is structured. The useful information lies on a small subspace (comparatively to whole space), as the detrimental information does. Hence, it becomes useful to clearly separate the useful subspace from the detrimental one. Calibration intends to identify the useful subspace, spanned by so called latent variables and to focus on it. On the opposite, preprocessing is used to remove the detrimental information.

This talk will make a review of the main methods using orthogonal projections in chemometrics. A brief theoretical introduction will recall the fundamentals, on the basis of examples taken from NIR spectrometry. A second part will focus on using row space orthogonal projections for calibration robustness improvement. A third part will illustrate the use of column space orthogonal projections for variable selection. A fourth part will be devoted to discrimination by means of orthogonal projections.




Rasmus Bro1, Anne Bech Risum1

1Univ. Copenhagen, Denmark

In machine learning, a number of tools have appeared in the last years that seem interesting. Many concepts are quite hyped; such as e.g. deep learning and big data. But many of the new tools are also truly innovative and provide new opportunities.
Rather than trying to apply the new methods on our old problems, we are currently seeking how the new methods can help us solve the problems that chemometric methods have problems with.

Specifically, we will show how we can solve some fundamental problems in regard to analysis of GC-MS data and use machine learning to automate and improve the analysis of such data.




Biao Huang1,

Department of Chemical and Materials Engineering, University of Alberta

Modern process industries are awash with large amount of data. Extraction of information and knowledge discovery for control system design and optimization, from day by day routine process operating data, is challenging. There exist numerous issues such as dynamics, nonlinearity, high dimensionality, collinearity, missing measurement etc that must be considered during the information extraction process. This presentation will start from a discussion on the state-of-the-art dynamic data analytics for image processing in process systems engineering applications. It then focuses on two examples of analytic techniques for interface detection. One is through advanced filtering to directly process raw image data to detect the water-oil interface; the other is to fuse image data and other sensors to obtain a reliable interface detection.

The objective of this work is to develop a new approach to improve the reliability of camera sensors through advanced dynamic data analytic technologies, such as Kalman smoother and Markov random field filter. An experimental setup was designed to simulate the liquid interface and evaluate the performance of the proposed technique. On the other hand, given interface level value after processing of raw image data, it would be interesting to synthesize it with other available interface measurements. Image data is not always reliable owing to occasionally possible low image quality or out of range of the interface from the camera monitored areas. This presentation continues on discussion of multivariate statistical methods for a successful application of information fusion for real industrial image data processing and interface detection.




Mari van Reenen3, Johan A Westerhuis1,3, Carolus J Reinecke3, J Hendrik Venter2

1Biosystems Data Analysis, Swammerdam Institute for Life Sciences, University of Amsterdam, Science Park 904, 1098 XH Amsterdam, The Netherlands

2Centre for Business Mathematics and Informatics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa.

3Centre for Human Metabolomics, Faculty of Natural Sciences, North-West University (Potchefstroom Campus), Private Bag X6001, Potchefstroom, South Africa
Mari Van Reenen <>

We introduce an approach using minimum classification error rates as test statistics to find discriminatory variables. The thresholds resulting in the minimum error rates can be used to classify new subjects. This approach transforms error rates into p-values and is referred to as ERp. ERp can handle unequal and small group sizes, as well as account for the cost of misclassification. In metabolomics studies, often many values below the detection limit (indicated by zero’s in the data table) are observed. We extended ERp (to XERp) to address two sources of zero-valued observations: (i) zeros reflecting the complete absence of a metabolite from a sample (true zeros); and (ii) zeros reflecting a measurement below the detection limit. XERp is able to identify variables that discriminate between two groups by extracting information from the difference in the proportion of zeros and shifts in the distributions of the non-zero observations simultaneously. To demonstrate the utility of XERp, it is applied to GC-MS data from a metabolomics study on tuberculosis meningitis in infants and children. We find that XERp is able to provide an informative shortlist of discriminatory variables, while attaining satisfactory classification accuracy for new subjects in a leave-one-out cross-validation context.



Least dependent components analysis: Miracle or mirage

Mohsen Kompany-Zareh1,2, Chelsi Wicks1, Peter Wentzell1 

1Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, PO Box 15000, Halifax, NS  B3H 4R2 Canada.
2Chemistry Department, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, 45137-66731 Iran.

Different independent components analysis (ICA) techniques such as FastICA [1], mean field ICA (ICAMF) [2], mutual information least dependent components analysis (MILCA) [3] and ICA Joint Approximate Diagonalization of Eigen-matrices (ICA-JADE) [4], as four commonly applied blind source separation (BSS) methods, are critically compared using different data sets and for different purposes. Whitened data, with uncorrelated and orthogonal variables, is input for most of presented algorithms. The main goal is minimization of dependence among variables.

Distance of the source vectors from Gaussianity is the criterion of independence in some methods, which is defined by the kurtosis (fourth order auto-cumulant) value and is considered in FastICA. Fourth order cross-cumulants is another parameter from higher order statistics that represents dependency regarding the mutual information among variables and is considered in ICA-JADE.  Mutual information is an information theoretic parameter estimated from joint entropy of variables and is considered in MILCA.

In one part of this presentation, the ability of methods to resolve data into actual source vectors is examined. In case of acoustic signal mixture data, almost all methods successfully resolved the mixtures data into real source vectors. ICA-JADE was the technique that properly resolved the square wave data into six actual source vectors. However, the resolved source vectors for square waves were not as perfect as that estimated for sound waves. In case of simulated spectrochemical data and experimental fluorescence data, the resolved source vectors are completely different from actual source vectors. This is because the actual spectral vectors are usually not necessarily the most independent and non-Gaussian resolved vectors.

When the objective is unsupervised clustering of NMR data from salmon and and NIR data from ink, results from ICA-JADE  and MILCA are more or less similar and show that the applied criteria for estimation of source vectors for the two methods lead to comparable results. No meaningful clustering is obtained from the application of Fast ICA on any of data sets.

Resolved least dependent vectors do not always correspond to the actual source vectors and finding the least dependent variables does not always lead to meaningful clustering of data. Results depend on the criteria, data and method we apply. There is no miracle in searching for least dependent variables, and in some cases it is just a mirage! ICA methods are tools, just like PCA, MCR-ALS and projection pursuit, and we should find in which conditions they are helpful.

[1] A. Hyvarinen, J Karhunen, E Oja, Independent components analysis, 2001, Wiley.
[2] M. Jalali-Heravi, H. Parastar, H. Sereshti, J Chromatogr A 1217, 29 (2010) 4850-4861.
[3] H. Strogbauer, A. Kraskov, S. A. Astakhov, P. Grassberger, Phys Rev E 70 (2004) 066123.
[4] D.N. Rutledge, D. J.-R. Bouveresse, Trends in Anal Chem 50 (2013) 22-3.




Anna de Juan1

1Dept. of Chemical Engineering and Analytical Chemistry. Universitat de Barcelona. Diagonal, 645. 08028 Barcelona.

Data fusion encompasses all strategies oriented to understand the complexity of scientific problems by means of combining and interpreting diverse information inputs. Traditional data structures to deal with fusion included regular multisets (of row- and/or column-wise augmented data matrices) or regular multi-way structures. New scenarios claim for structures and related algorithms able to work with missing blocks or joining information from two- and three-way data structures [1,2].

Likewise, fusion levels were traditionally structured as low (concatenated raw data structures), mid (concatenating compressed expressions of information) or high (combining outputs of separate analysis on diverse information blocks). This separate scheme ends up being too simple and hybrid multilevel fusion strategies are often required.

Real examples related to hyperspectral image analysis, process control and –omics studies will serve to illustrate the wide span of data structures and multilevel strategies that can be used nowadays to handle efficiently different kinds of information [1-3].


[1] M. Alier, R. Tauler. Chemom. Intell. Lab. Sys., 2013, 127: 17-28.

[2]  A. de Juan.  J. Chemom., (2018) doi:10.1002/cem.2985.

[3]  V. Olmos, M. Marro, P. Loza-Alvarez, D. Raldúa, E. Prats, F. Padrós, B. Piña, R. Tauler, A. de Juan. J. Biophotonics. (2017). doi:10.1002/jbio.201700089.

EU through projects CHEMAGEB and ProPAT and Spanish government through CTQ2015--66254-C2-2-P are acknowledged for financial support. Former and current researchers of UB, IDAEA-CSIC and ICFO are thanked for insightful scientific exchange.




Marcelo M. Sena1,2

1Chemistry Department, ICEx,Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil
2Instituto Nacional de Ciência e Tecnologia em Bioanalítica, 13083-970, Campinas, SP, Brazil

In recent years, ambient ionization sources have been applied to improve the ability of MS to perform fast and sensitive analysis of complex matrices. Jointly with spectrometer miniaturization, this is an important trend allowing the use of MS to develop direct and increasingly simple analytical methods. Some MS methods can be even comparable to methods based on vibrational spectroscopy in terms of simplicity. In the search for eliminating laborious and time-consuming sample pretreatment steps traditionally used in MS, paper spray ionization mass spectrometry (PS-MS) was proposed by the research group of G. Cooks, in 2009 [1]. Liquid or extracts from solid samples are applied on a triangular piece of chromatography paper, followed by a very small volume (some µL) of a proper solvent. This paper is positioned in front of the MS inlet with the aid of a mobile platform and the substrate is supported by a metallic clip connected to the potential source of the instrument.

In this lecture, some qualitative and quantitative multivariate methods developed in our research group for the analysis of beverage and/or forensic samples will be presented. Qualitative applications involve screening methods for detecting counterfeit samples of beers [2], Scottish whiskies [3] and perfumes [4]. In the first application, 8 different brands of lager beers produced by 4 different breweries were discriminated according to their market prices. The 3 leading brands in the Brazilian market, which have been targets of fraud, were modeled as the higher-price class, while the 5 brands most used for counterfeiting according to police reports were modeled as the lower-price class. A PLS-DA model using full scan MS was improved by variable selection with ordered predictors selection (OPS) [5], providing 100% of reliability rate. This model was interpreted by detecting 15 variables as the most significant VIP scores, which were considered diagnostic ions for this type of beer counterfeit. In other qualitative application, discriminant analysis (PLS-DA) and one-class modeling (SIMCA) were compared for differentiating authentic and counterfeit perfumes seized by the police. PLS-DA provided slightly better results and the interpretation of their informative vectors allowed the detection of diagnostic ions for both the sample classes, including compounds with allergenic properties related to counterfeit samples. Quantitative PLS models were also developed for quantifying blends of American and Korean ginsengs. Finally, data fusion PLS models for quantifying blends of Robusta and Arabica coffees were developed. ATR-FTIR, PS-MS and X-ray fluorescence spectra were merged and models with good prediction ability were obtained with the aid of discrete variable selection methods (OPS and GA). The interpretation of these models allowed finding out correlations between molecular and atomic compositions, characterizing these two coffee species.

[1] H. Wang, J. Liu, R.G. Cooks, Z. Ouyang, Angew. Chem. Int. Ed., 48, 1-5 (2009).
[2] H.V. Pereira et al., Anal. Chim. Acta, 940, 104–112 (2016).
[3] J.A.R. Teodoro et al., Food Chem., 237, 1058-1064 (2017).
[4] J.A.R. Teodoro et al., Anal. Methods, 9, 4979-4987 (2017).
[5] R.F. Teófilo, J.P.A. Martins, M.M.C. Ferreira, J. Chemom., 23, 32–48 (2009).

Acknowledgement: Chemistry Department/ICEx – UFMG and INCT-Bio for supporting my participation in CAC2018. CNPq, CAPES and FAPEMIG.




Joaquim Jaumot, Meritxell Navarro-Reig, Miriam Pérez-Cova, Romà Tauler

Institute of Environmental Assessment and Water Research – IDAEA, Spanish Council for Scientific Research – CSIC, Jordi Girona 18-26, Barcelona E08034, Spain

The complexity of omics research has encouraged the development of new technologies able to deal with these challenging samples. In this way, the rise of new high-throughput techniques based on mass spectrometry should be highlighted. These techniques provide massive amounts of data that allow the extraction of both qualitative and quantitative information. Omics sciences benefit from these new technologies as new biological insights can be retrieved from the vast amount of experimentally generated data. However, these large datasets require the application of chemometric data analysis tools to extract this hidden knowledge.

In this work, a chemometric workflow for the analysis of MS-based data will be introduced taking as an example two recently popularized techniques: comprehensive LCxLC-MS and MS imaging. The first step of this chemometric analysis is the raw data compression and arrangement. Special attention will be given to the selection of the most relevant mass traces using a region of interest (ROI) approach. Next, the resolution of both spectral and chromatographic or spatial contributions is performed using the MCR-ALS method. The last step consists of the exploration and/or classification of samples and detection of potential biomarkers using standard chemometric methods such as PCA, ASCA and PLS-DA. 

The potential of this data analysis workflow will be shown using examples from the metabolomics field demonstrating the exceptional power of the combination of MS-based techniques and chemometric tools [1,2].

[1] M. Navarro-Reig, J. Jaumot, A. Baglai, G. Vivó-Truyols, P.J. Schoenmakers, R. Tauler, Anal. Chem., 89, 7675-7683 (2017)
[2] J. Jaumot, R. Tauler, Analyst, 140, 837-846 (2015)

Acknowledgements: We acknowledge support from both the Ministry of Economy, Industry and Competitiveness (Grant CTQ2017-82598-P) and the Catalan government (2017SGR753).




Roma Tauler
Institute of Environmental Assessment and Water Diagnostic (IDEA-CSIC), Barcelona, 08043, Spain

Multivariate Curve Resolution (MCR) methods are applied to estimate the concentration and spectra profiles of the components present in unknown chemical mixture systems. MCR methods are based on the fulfillment of a bilinear model (like the generalization of Beer’s law), which in general has rotation ambiguities: a range of feasible solutions can explain the experimental data equally well under the application of a set of constraints derived from the properties and previous knowledge of the system.  Different methods have been proposed to characterize these rotation ambiguities and to estimate the area of feasible solutions (AFS). Among them, the MCR-BANDS method1,2 has been proposed to evaluate the extension of rotation ambiguities associated to a particular MCR solution, calculating the maximum and minimum of an optimization function based on the relative contribution of every component of the system. In this presentation, the effects of the application of constraints in the evaluation of the extension of rotation ambiguities on MCR solutions is shown. Systems of different complexity and with different number of components are investigated by the MCR-BANDS method under different constraints and their solutions are projected on the AFS3. The effect of rotation ambiguities and constraints on the precision of MCR quantitative estimations is investigated.

1. R. Tauler, J. of Chemom., 2001, 15, 627-646
2. J. Jaumot and R. Tauler, Chemom. Intell. Lab. Syst., 2010, 103, 96-107
3. X. Zhang and R. Tauler, Chemom. Intell. Lab. Syst., 2015, 147, 47-57

Acknowledgement: grant CTQ2015-66254-C2-1-P, Ministerio de Economía y Competividad, Spain




Hongmei Lu, Zhimin Zhang1,2, Hongchao Ji2, Pan Ma3

College of Chemistry and Chemical Engineering, Central South University, Changsha 410083, PR China

Untargeted metabolomics are rapidly becoming an important tool for studying complex biological samples. Modern analytical technologies afford comprehensive and quantitative investigation of a multitude of different metabolites. Typical metabolomics experiments can therefore produce large amounts of data. Handling such complex datasets is an important step that has big impact on extent and quality at which the metabolite identification and quantification can be made, and thus on the ultimate biological interpretation of results. We presents the key steps of metabolomic data acquisition and processing, and focuses on reviewing our group’s work related to this topic, particularly on methods for handling data from chromatography-mass spectrometry experiments.

Acknowledgement: The authors gratefully thank the National Natural Science Foundation of China for support of the projects (Grant Nos. 21375151, 21305163 and 21675174).




Philip K. Hopke,1,2 Mauro Masiol,1 Stefania Squizzato,1 Meng-Dawn Cheng3

1Department of Public Health Sciences, University of Rochester Medical Center, Rochester, NY 14642 USA
2Center for Air Resources Engineering and Science, Clarkson University, Potsdam, NY 13699
3Environmental Sciences Division, Oak Ridge National Laboratory, Oak Ridge, TN 37831 usa

Conditional probability functions are to identify source locations in air pollution studies. These methods provide conditional probabilities associated with high concentrations as a function of wind direction and speed (CBPF) or geographical location (PSCF). CBPF (conditional bivariate probability function) categorizes the probability of high concentrations being observed at a location by wind direction/speed. PSCF (potential source contribution function), a trajectory-ensemble method, allows the identification of the source regions most likely to be associated with high measured concentrations. The differences between conditional probability values for temporally different data sets highlights changes in emission rates or locations. This study presents and tests the application of differential probability functions over a 12-year set of air quality data (2005/2016) from Rochester, NY. Probability functions were computed over 4 periods that represent known changes in emissions that occurred at local, state, and/or continental scale. Correlation analyses were also performed to identify pollutants undergoing similar changes in local and regional sources. Changes in local air pollutant concentrations were related to the shutdown of a large coal power plant (SO2) and to abatement measures applied to road and off-road traffic (primary pollutants). Concurrent effects of these changes in local emissions are also linked to decreases in nucleation mode particle concentrations. Changes in regional source areas are related to the decrease in secondary aerosol and PM2.5 mass concentrations. Changes in the source areas of secondary sulfate, nitrate, and organic particulate matter are different. Changes in source areas of black carbon were highly correlated with those of PM2.5 mass.

Acknowledgement: his work was supported by the New York State Energy Research and Development Authority (NYSERDA) under agreement #59802. .




Beata Walczak, Zuzanna Małyjurek

Institute of Chemistry, University of Silesian, 40-006 Katowice, Szkolna 9, Poland

In many studies, particularly in metabolomics, an overall concentration of the studied samples is unknown. This fact influences the analysis of instrumental signals (e.g., NMR, LC-MS or GC-MS), which cannot be directly compared due to the ‘size effect’ caused by differences in an overall concentration of the samples studied. At the pre-processing step of data analysis, it is necessary to estimate and remove this effect through signals normalization, which usually is performed after signals de-noising, background removal, and warping, and before scaling/transformation. In the arsenal of normalization techniques, there are many different methods.

Recently, a new approach, based on Compositional Data Analysis (CODA) becomes popular [1]. It consists of an attractive idea of working with log ratios, thus eliminating data normalization step. CODA was tested in our previous simulation study and its performance was compared with Total Sum Normalization, Probabilistic Quotient Normalization and Pair-wise Log Ratios, which showed that clr transformation should not be applied to identify biomarker [2]. These conclusions do not coincide with that presented in [1]. As stated in [1], the discrepancy was probably caused by a limited number of variables considered in our study. Thus we undertook a new simulation study, working with data sets of much higher variables numbers. Additional motivation appeared, when we realized that there was another question to be answered. Namely, it was observed that the Pair-wise Log Ratio (plr) method performs well [2], whereas CODA (based on similar concept) does not. The undertaken study aims on explanation of this contradiction.

References (if needed)
[1]  A. Gardlo, A. K. Smilde, K. Hron, M. Hrda, R. Karlkova, D. Friedecky, T. Adam, Normalization techniques for PARAFAC modeling of urine metabolomic data, Metabolomics (2016) 12:117
[2] P. Filzmoser, B. Walczak, What can go wrong at the data normalization step for identification of biomarkers?, Journal of Chromatography A 1362 (2014) 194-205

Acknowledgement: Authors acknowledges the financial support of the project PL-RPA/ROOIBOS/05/2016, accomplished within the framework of the bilateral agreement co-financed by the National Research Foundation (NRF), South Africa, and the National Centre for Research and Development (NCBR), Poland




Hai-Long Wu and Ru-Qin Yu

Institute of Chemometrics & Chemical Sensing Technology, State Key Laboratory of Chemo/Biosensing and Chemometrics, College of Chemistry and Chemical Engineering, Hunan University, Changsha 410082, China

Multiway calibration methods having “second-order or extended advantages” are gaining more and more attentions in the field of analytical sciences. It is a smart and green quantitative analysis strategy based on “mathematical separation” for complicated chemical systems when combined with advanced analytical instruments capable of generating multidimensional arrays, e.g. HPLC-DAD, EEMs and LC-MS. It enables one to quantify directly and rapidly the component(s) of analytical interest even in the presence of unknown interferences not included in the calibration samples and makes the final goal of analytical chemistry achievable even without the aid of complicated separation procedures. These multiway calibration methods could be applied to resolve different problems of qualitative and quantitative analysis in complicated chemical systems by combining with the advanced analytical instruments.

The performances of these algorithms have been evaluated by using simulated and real experimental datasets. They have been utilized to simultaneous or direct determinations of multiple component(s) of analytical interest in many fields such as pharmaceutical, biomedical, environmental, food sciences. The main points on successful applications of these chemometric methods combined with advanced analytical instruments have been also summarized. Some recent developments on the theories and their applications of multiway calibration methodologies including second- and third-order calibration are also reviewed in detail.

Keywords: Higher-order calibration; Mathematical separation; Alternating multilinear decomposition; Multilinear model; Second-order advantage; Direct quantitative analysis; Complex system; Chemometrics

Acknowledgement: The authors gratefully acknowledge the National Natural Science Foundation of China (Grant Nos. 21575039 and 21775039 ) for financial support.


Oral Presentations:



Cliff Spiegelman1, Mary Frances Dorn2, Amit Moscovich3, Boaz Nadler4

1Texas A&M U.
2Los Alamos National Lab
3Princeton U.
4Weizmann Institute of Science
Body of abstract here. Limit to about 250 Words.
We propose a new semi-parametric approach to binary classification that exploits the modeling flexibility of sparse graphical models. Specifically, we assume that a forest-structured graphical model can represent each class.  A key observation is that under this assumption the optimal classifier is linear in the log of the one- and two-dimensional marginal densities. Our proposed procedure non-parametrically estimates the univariate and bivariate marginal densities, maps each sample to the logarithm of these estimated densities and constructs a linear SVM in the transformed space. We prove that when the forest-structure assumption holds, the risk of the resulting classifier converges to that of the optimal Bayes risk.  Experiments with simulated and real data indicate that the resulting classifier is competitive with several popular methods across a range of applications.
Space between paragraphs, no indentation.

Acknowledgement: This work was supported by a grant from the TAMU/Weizmann grant program started by Tina and Paul Gardner




Marian Kraus 1, Florian Gebert 1, Carsten Pargmann 1, Arne Walter 1, Frank Duschek 1

1German Aerospace Center, Institute of Technical Physics, Langer Grund, Lampoldshausen, 74239 Hardthausen, Germany

Chemical contamination of objects and areas, caused by accident or on purpose, is a common scenario. The immediate countermeasures depend on the class of risk and consequently on the characteristics of the substances. Standoff laser based detection techniques provide this information without direct contact of humans to the hazardous materials. This presentation explains the data acquisition and classification procedure for laser induced fluorescence spectra of several chemical agents. The substances are excited from a distance of 3.5 m by laser pulses of two UV wavelengths (266 and 355 nm) with less than 0.1 mJ and a repetition rate of 100 Hz. Each pair of simultaneously emitted laser pulses is temporally separated by an optical delay line [1]. Every measurement consists of a dataset of 100 spectra per wavelength containing the signal intensities in the spectral range of approximately 250 nm and 680 nm, recorded by a 32 channel photo multiplying tube array. Based on these datasets three classification algorithms are trained which can distinguish the samples by their single spectra with an accuracy of over 95 %. These predictive models, generated with decision trees, support vector machines and neural networks, can identify all agents (e.g. benzaldehyde, isoproturon and piperine) within the current set of substances [2].

Keywords: standoff identification of chemical agents, classification algorithms, laser induced fluorescence

[1] F. Gebert, L. Fellner, K. Grünewald, M. Kraus, C. Pargmann, A. Walter, F. Duschek, Standoff detection and classification of chemical and biological hazardous substances combining temporal and spectral laser induced fluorescence techniques, First Scientific International Conference on CBRNe (2017)
[2] M. Kraus, L. Fellner, F. Gebert, K. Grünewald, C. Pargmann, A. Walter, F. Duschek, Comparison of Classification Models for Spectral Data of Laser Induced Fluorescence, First Scientific International Conference on CBRNe (2017)




Jone Omar, Boleslaw Slowikowsi, Ana Boix

European Commission, Directorate-General Joint Research Centre, Retieseweg 111, 2440 Geel, Belgium

New psychoactive substances (NPS) commonly subtracted by police, customs and other legal agents, are not easily identifiable products thus, there is the need of developing a fast and cheap analytical method which can make this straightforward and safe routine. In this work 3 families of NPS (fentanyls, synthetic cannabinoids and cathinones) have been studied by means of Raman spectroscopy. The main idea is to identify these NPS, which are commonly intercepted in order to create a fast identification model for customs laboratories. For this purpose, 4 different instruments, benchtop and handhelds have been employed.

Two wavelength lasers (785 and 1064 nm) have been compared in a benchtop instrument in order to develop a model able to distinguish among the 3 main NPS families. The significant benefit of using the 1064 nm wavelength has been highlighted and the developed model has been transferred into a handheld instrument, which provides the custom laboratories with the profit of a fast, non-destructive and clean measurement method. On the other hand, when the spectra cannot be found in a library of a handheld device (which is often the case due to the fast development of the NPS) this screening method allows identifying whether the suspicious NPS is a fentanyl, cathinone or synthetic cannabinoid. This fact is of mayor relevance considering the health risk that some of these NPS have.




Felipe Bachion de Santana1, Waldomiro Borges Neto2, Ronei J. Poppi1

1 Laboratory of Chemometrics in Analytical Chemistry, Institute of Chemistry, University of Campinas, 13084-971 Campinas, SP, Brazil
2 Laboratory of Chemometrics of Triângulo, Institute of Chemistry, Federal University of Uberlândia, 38408-100, Brazil.

Qualitative analysis is widely used in industries and routine laboratories, however many of the traditional methods for detection of food authentication and quality control require elaborate steps of sample preparation or need an elevated number of adulteration samples in the training set.  Spectroscopy techinques combined with one class modeling can be used for this kind of problem, where fast and automatic responses are required, preseting many applications in food authentication.

In this study we have investigated the use of one class random forest method, combining the random forest algorithm with the artificial generation of artificial outliers. The outliers are generated in a uniform hyperspherical distribution around the target class of radius R and center c, using a d-dimensional Gaussian distribution [1,2].

The methodology has been applied to verify the authenticity of evening primrose oil (EPO) using FTIR-HATR and ground nutmeg (GN) samples based on near infrared (NIR) spectroscopy. The algorithm shown superior performance in relation to partial least squares for discriminant analysis (PLS-DA) method, with values of sensitivity and specificity equals to 1.0 and 0.9988 for EPO and 0.9286 and 1.0 for GN, respectively. The model had no sample exclusion in the external validation set and it was developed without any information about the adulterants in the training set. Therefore, the  developed methodology can be employed in routine laboratories, regulatory agencies and industry for quality analysis of EPO and GN.

[1] D.M.J. Tax, R.P.W. Duin, Uniform Object Generation for Optimizing One-class Classifiers, J. Mach. Learn. Res. 2, 155–173 (2001).
[2] L. Breiman, Random Forests, Mach. Learn., 45. 1, 5-32 (2001).

Acknowledgement: The authors thank CAPES for financial support.



Liangxiao Zhang1,3,5, Zhe Yuan1,2, Xinjing Dou1,2, Peiwu Li1,3,4,5

1Oil Crops Research Institute, Chinese Academy of Agricultural Sciences, Wuhan 430062, China
2Key Laboratory of Biology and Genetic Improvement of Oil Crops, Ministry of Agriculture, Wuhan 430062, China
3Laboratory of Quality and Safety Risk Assessment for Oilseed Products (Wuhan), Ministry of Agriculture, Wuhan 430062, China
4Key Laboratory of Detection for Mycotoxins, Ministry of Agriculture, Wuhan 430062, China
5Quality Inspection and Test Center for Oilseed Products, Ministry of Agriculture, Wuhan 430062, China;

Edible vegetable oil is an important daily dietary supplement having beneficial effects on human health because it is a source of healthy unsaturated fatty acids and hundreds of micronutrients, including phenol compounds, vitamin E and carotenes. However, according to Database of Food Ingredient Fraud and Economically Motivated Adulteration, edible oils are reported as the most common target for adulteration in scholarly journals [1], making its authentication an important issue. Traditionally, chemometrics plays an important role in classifying authentic and adulterated edible oils by using two-class or multi-class classification methods [2]. Since the adulterants in edible oils are usually unknown, authentication generally requires a one-class classification technique in chemometrics [3].

In this study, Monte Carlo one-class partial least squares (MCOCPLS) was proposed and employed to build a one class classification model for identifying the authenticity of virgin olive oil (VOO) by fatty acid profiles [4]. Monte Carlo sampling for variable subspace was introduced to obtain the performance of the one-class classifier. The validation set of 5000 adulterated oils simulated by the Monte Carlo method was employed to test the performance of the one-class classifier. The correct prediction rate of the best model of MCOCPLS reaches 99.10% for VOOs adulterated with 3% of other edible oils. In conclusion, the proposed MCOCPLS approach could effectively detect adulterated olive oils with as low as 3% of other vegetable oils.

[1] C.M. Jeffrey, J. Spink, M. Lipp, J. Food Sci., 77, 118-126 (2012)
[2] M.P. Callao, I. Ruisánchez, Food Control, 86, 283-293 (2018)
[3] L.X. Zhang, X.R. Huang, P.W. Li, N. Wei, J, Jiang, X.X. Ding, Q. Zhang, Chemometr. Intell. Lab., 161, 147-150 (2017)
[4] L.X. Zhang, Z. Yuan, P.W. Li, X.F. Wang, J, Mao, Q. Zhang, C.D. Hu, Chemometr. Intell. Lab., 169, 94-99 (2017)

Acknowledgement: The financial support by National Key Research and Development Project of China (2017YFC1601700), National Major Project for Agro-product Quality & Safety Risk Assessment (GJFP2017001), and the earmarked fund for China Agriculture research system (CARS-12-10B).




Carlos Alberto Teixeira1, Ronei Jesus Poppi1

1Institute of Chemistry, State University of Campinas, POB 6154, 13084-971 Campinas, SP, Brazil

Several pesticides are commonly used in crops for protection. Residues of pesticides can remain in food, or be taken to the air, soil and groundwater, after its application. There are several international organizations for the control of abusive use of pesticides, regulating its use and validating and applying analytical methodologies for their detection and quantification.

In this study we have built a one-class model, using DD-SIMCA (Data Driven Softly Independent Modeling of Class Analogies), that can determine whether a mango is safe for human consumption based on the contend of thiabendazole (TBZ). Although TBZ is known to have low toxicity to humans, exposure to higher levels may make it potentially carcinogenic, especially for fishes, particularly in estuarine environments [2].

Data used to build the model was obtained by Surface Enhanced Raman Spectroscopy (SERS). All nanometric substrates was homemade build depositing a gold colloidal solution onto common office paper. In order to obtain optimal results NaSCN was used as internal standard, reducing the standard deviation between replicates by 10 percentage points. The model was established and validated using standard sample of TBZ. Threshold set up was 2 ppm, that is the maximum residual level allowed for TBZ in mangos at Brazil [1]. The final model presented 10% and 6,6% of error type I and II, respectively. Then standard TBZ solution was applied in mangos, recovered by extraction in water and analyzed by the proposed methodology. One-class model was able to identify the fruits with levels of TBZ above allowed by law.

[1] “Program for Analysis of Agrochemical Residues in Food: Report on Monitored Sampling Analysis in the Period from 2013 to 2015.” [Online]. Available:ório+PARA+2013-2015_VERSÃO-FINAL.pdf/494cd7c5-5408-4e6a-b0e5-5098cbf759f8. [Accessed: Feb-2018]. - translated from portuguese
[2] U.S. Environment Protection Agency. (2002). Thiabendazole and salts (R.E.D. Facts) [Online]. Available: [Accessed: Feb-2018]

Acknowledgement: The financial support by SpecLab, Instituto Brasileiro de Análises (IBRA), Empresa Brasileira de Pesquisa Agropecuária (Embrapa Project MP5 00), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (Capes) and Fundação de Amparo à Pesquisa do Estado de São Paulo (Fapesp) is acknowledged.




Fabrice Berrué, Camilo Martinez-Farina, Ian Burton

 Aquatic and Crop Resource Development, National Research Council of Canada, Halifax, NS.

One of the greatest challenges facing the functional food and Natural Health Products (NHPs) industries is sourcing high quality functional ingredients for their finished products. Increasingly consumers are demanding full transparency for the products they consume regarding their quality, source and how they are made. Unfortunately, the lack of ingredient standards, modernized analytical methodologies and industry oversight creates the potential for low quality and in some cases deliberate adulteration of ingredients. DNA barcoding has emerged as one tool but its suitability for processed foods and functional ingredients has not been established. Due to its excellent quantitative properties, NMR spectroscopy is increasingly being used as an innovative solution to warrant the quality and safety of processed foods and manufactured functional ingredients. The NRC has been partnering with the industry to develop alternative analytical methods to capture the complex chemical composition of raw materials and extracts into a “chemical barcode”. Supported by statistical methodologies, a non-directed chemical approach to evaluate ingredients quality provide a key advantage in the ability to detect and quantitate in the same analysis, the presence of both the expected bioactives as well as any potential adulterants that are presumed to be absent.

This presentation will introduce these concepts and show their application to a diverse range of extracts and foods (more than 200 ingredients) illustrating how quantitative NMR spectroscopy and chemometrics are being used to classify and improve the quality assurance of these products.




Claudia Beleites1,2, Fabian Tietz3, Sascha Rohn3, Andrea Krähmer1

1Julius-Kühn-Institut, Königin-Luise-Str. 19, Berlin, Germany
2Chemometric Consulting and Chemometrix GmbH, Södeler Weg 19, Wölfersheim, Germany
3Hamburg School of Food Science, Institute of Food Chemistry, University of Hamburg, Hamburg, Germany

Project CocoaChain studies cacao/cocoa quality along the processing chain. Biological systems, production processes and analytical procedures all produce deeply nested structures of influencing or confounding factors. Efficient experimental design then concentrates effort according to the (unknown) variance.

We compare fully nested, staggered and inverted nested designs[1], distributing degrees of freedom mainly in lower levels, equally and concentrated in the upper levels, respectively. Two scenarios are considered: the analytic measurement capacity at the lowermost level being the bottleneck (e.g. chromatographic reference analyses for spectroscopic calibration) and limited primary samples: for CocoaChain, biological constraints do not allow proper incremental field sampling when studying dynamics of small-scale fermentations. The fermentations are strongly spatially heterogeneous and bean sizes are large, so proper field sampling would need many and rather large increments. Yet, fermentations must not be unduly disturbed by exposure or removing too much material.

The inverted nested design achieves the most efficient use of the total number of measurements. However, everything else being equal, estimation of the upper variance components is still more uncertain than the lowermost variance component. We recommend this strategy for situations with limited measurement capacity. On the other hand, our scientific project has severely limited access to primary samples.Thus, we need to obtain all possible information of the few available samples. In consequence, the uppermost level that can have more replicates should have them – below, an inverted design may be used. The considered designs concern only the replicate numbers across the factor hierarchy and may be combined with other design criteria e.g. concerning the factor levels, i.e., which samples should be have replicate measurements.

[1] Bainbridge T.R.: Staggered nested designs for estimating Variance components, Industrial Quality Control (1965), 22, 1, 12-20.

Acknowledgement: Financial support of the project “CocoaChain” (IGF 169 EN/3) by the AIF (Arbeitskreis industrielle Forschung) and FEI (Forschungskreis der Ernährungsindustrie) is highly acknowledged.





Université de Lille, LASIR UMR CNRS 8516, Bât. C5, 59655 Villenuve d’Ascq Cedex.

Hyperspectral imaging has become an essential tool in many scientific domains for the exploration of complex and heterogeneous samples providing simultaneously spectral and spatial information from the acquired data cube. Current instrumental developments become faster and faster generating an always higher number of pixels which is becoming a real problem for many chemometric tools. But even more importantly, actual methods often impose a data model which implicitly defines the geometry of the data set. Thus it is a good opportunity to explore new concept of data analysis with new properties. Topology, a sub-field of pure mathematics, is the mathematical study of shape. Although topologists usually study abstract objects, they have developed recently what they call Topological Data Analysis (TDA) [1]. The idea is here to use topology in order visualize and explore high dimensional and complex real-world data sets. This concept has been successfully used in different topics like gene expression profiling on breast tumors [2], viral evolution [3], population activity in visual cortex [4] but also on unexpected topic as 22 years of voting behavior of the members of the US House of Representatives [5], characteristics of NBA basketball players via their performance statistics [5]. Lately, we have demonstrated that TDA could be used to discriminate single bacteria with Raman spectroscopy [6] and it had come naturally to extend this exploration to spectroscopic imaging which will be presented in this work.

The two main tasks of TDA is the measurement of shape and its representation. One fundamental idea of the concept is to consider a data set to be a sample or point cloud taken from a manifold in some high-dimensional space. The sample data are used to construct simplices, generalizations of intervals, which are, in turn, glued together to form a kind of wireframe approximation of the manifold. This manifold and the wireframe represent the shape of the data set i.e. the topological network. The purpose of this contribution is to demonstrate that TDA could be useful in partitioning of hyperspectral data sets. Because K-means clustering is certainly one of the most widely used methods for the exploration of hyperspectral images, it will be compared with TDA considering a Raman data set [7].

[1] G. Carlsson, Bull. Amer. Math. Soc. 46 (2009) 255-308.
[2] M. Nicolau, A.J. Levine, G. Carlsson, Proc. Natl. Acad. Sci. U. S. A. 108 (2011) 7265-7270.
[3] J.M. Chan, G. Carlsson, R. Rabadan, Proc. Natl. Acad. Sci. U. S. A. 110 (46) (2013) 18566-18571.
[4] G. Singh, F. Memoli, T. Ishkhanov, G. Sapiro, G. Carlsson, D.L. Ringach, J. Vis. 8(8) (2008) 1-18.
[5] P.Y. Lum, G. Singh, A. Lehman, T. Ishkanov, M. Vejdemo-Johansson, M. Alagappan, J. Carlsson, G. Carlsson, Sci. Rep. 3 (2012) 1236.
[6] M. Offroy, L. Duponchel, Anal. Chim. Acta  910 (2016) 1-11.
[7] L. Duponchel, Anal. Chim. Acta 1000 (2018) 123-131.



Iterative Target Detection for Detection and Classification in Hyperspectral Images with an Application to a Landsat 8 Image of Lake Chelan, WA USA.

Neal B Gallagher1

1Eigenvector Research, Inc.

Classical least squares (CLS) is the tool of choice for detection and classification in hyperspectral images because often target spectra are known but reference values for each pixel are rarely available. Generalized least squares (GLS) is a weighted CLS model used to suppress clutter signal (interferences and noise) while enhancing minor target signal. (GLS is also known as the matched-filter.) An iterative target detection approach exhibits synergy between GLS and the extended mixture model (extended least squares, ELS) to further improve discrimination. The approach is relevant for chemical imaging, medical imaging and remote sensing. An example is shown for a Landsat 8 image of Lake Chelan, WA USA. GLS was used iteratively in a hierarchical approach to classification and GLS combined with ELS was used to further split a single class that was otherwise difficult to classify.

The first task used GLS iteratively to create global classes in the image associated with Water, Green (orchards, vineyards and lawn), Bare Earth, three types of forest, Road, Buildings and Other (corresponding to no specific class). The second task used GLS/ELS to split Class Green into signal attributable to lawn (e.g., associated with the municipal golf course) from cherry orchard (and other agricultural land including vineyards and other orchard types). Both objectives were complicated by the presence of a large number of pixels associated with water, forest, rough terrain, dry scrubland, homes, roads and buildings. The local GLS/ELS model showed good results that were verified in many cases using ground truth.




Ronan Dorrepaal and Aoife Gowen

UCD School of Biosystems and Food Engineering, UCD College of Engineering and Architecture, University College Dublin, Belfield, Dublin 4, Ireland

Hyperspectral imaging and mapping techniques collect a matrix of spectroscopic points of a desired material surface area. In implementing hyperspectral imaging, one can learn [1] the type of heterogeneity across a surface, [2] the relative scale of that heterogeneity and [3] the spatial location of that heterogeneity.

However, when one only wishes to know of the presence or absence of chemical species, criterion [2] in particular is not required, while the collection of such large amounts of data can prove burdensome and time consuming, as can be the case with Raman imaging. When implementing pixel-wise imaging, acquisition time, data storage capacity and data analysis times can be significant. In time sensitive studies, long imaging times will affect the data collected and that time sensitivity can be exacerbated by the introduction of electromagnetic radiation required as part of the imaging process.

This in silico study investigated algorithms which could be used within pixel-wise imaging instruments for targeted chemical searching. The targeted algorithms change image acquisition patterns based on spectral data already collected, searching for specific chemical features of interest, or to investigate general material surface heterogeneity where no previous chemical information is known or available.

The study was performed on Raman and IR hyperspectral images of biomaterials (polymers & bone cements). Performance metrics have been developed which measure algorithmic features such as PCA scatter plot relative volume analysis, processing speed and percentage of total pixels sampled. Preliminary results show sampling of ~60 to 90% of surface of heterogeneity with ~1 to 2% of total pixels sampled.

Acknowledgement: Funding for this research was provided by the European Research Council (ERC) under the starting grant programme ERC-2013-StG call—Proposal No. 335508—BioWater. Polymer materials were provided by the Spinal Implant Design Research Group of the Department of Mechanical Engineering of the University of Birmingham, United Kingdom.




Willian Francisco Cordeiro Dantas1, Larissa Elizabeth Cordeiro Dantas2, Ronei Jesus Poppi1

1Institute of Chemistry, University of Campinas, P.O. Box 6154, 13083-970, Campinas, SP, Brazil 
2Technical-Scientific Police Superintendence from Sao Paulo State, SP, Brazil

In this work, we present a procedure using near infrared hyperspectral imaging and multivariate curve resolution with alternating least squares (MCR-ALS) for the analysis of cocaine and other common additives that were seized in the streets by the Police Department in Campinas area/Brazil. The most common street cocaine additives studied in this work are benzocaine, caffeine, dipyrone, levamisole, lidocaine and theophylline [1].

 We demonstrated that the procedure developed herein is suitable for drug analysis, since it can identify and indicate the relative concentrations of the studied compounds. This technique uses the seized samples without pre-processing or sample destruction, and it requires only a minimum amount of sample for a reasonably short period of time. By using this MCR-ALS augmented matrix, where the doped samples and pure compounds were analyzed together, a threshold was calculated to indicate the presence or absence of a given compound in a sample [2].

The combination of near infrared hyperspectral imaging with MRC-ALS provided promising results for the identification of additive compounds in the seized street drugs. Benzocaine, caffeine, dipyrone, levamisole and lidocaine were conclusively identified in seized samples of cocaine or crack. Levamisole was the most commonly found compound, which is consistent with findings from the Criminalistics Institute.

In this field, researchers are focused on finding faster, more conclusive, and non-destructive tests, as new drug seizures are made every day and the reports must be promptly posted.

[1] C. A. F. de O. Penido, M. T. T. Pacheco, I. K. Lednev and L. Silveira, J. Raman Spectrosc., 2016, 47, 28–38.
[2] J. Jaumot, A. de Juan and R. Tauler, Chemom. Intell. Lab. Syst., 2015, 140, 1–12.

Acknowledgement: The authors would like to thank the Director of the Criminalistics Institute of Campinas and the Toxicology Laboratory of Campinas for providing the samples used to complete this work, as well as CAPES for financial support.




Puneet Mishra1, Azam Karami2, Alison Nordon1, Douglas N. Rutledge3, Jean-Michel Roger4

1WestCHEM, Department of Pure and Applied Chemistry and Centre for Process Analytics and Control Technology, University of Strathclyde, Glasgow, G1 1XL, United Kingdom
2Faculty of Physics, Shahid Bahonar University of Kerman, 7616-914111, Kerman, Iran
3UMR Ingéneérie Procédés Aliments, AgroParisTech, INRA, Université Paris-Saclay, F-91300 Massy, France
4ITAP, Irstea, Montpellier SupAgro, University Montpellier, Montpellier, France

The present work provides a wavelength-specific shearlet-based [1] image noise reduction method for automatic noise-reduction of HS images. Shearlets are an extension of wavelets, which, being isotropic, are not capable of taking into account anisotropic features such as edges in images, whereas shearlets can. The method utilises the spectral correlation between wavelengths to distinguish between noise levels present in different spectral planes. Depending on the noise level, the method adapts the use of the 2-D non-subsampled shearlet transform (NSST) coefficients obtained from each spectral plane to perform the spatial and spectral noise reduction. The method was compared with two commonly used pixel-based spectral de-noising techniques, Savitzky Golay (SAVGOL) smoothing and median filtering. The methods were compared using simulated data, with Gaussian and a combination of Gaussian and spike noise added, and real HSI data of six commercial tea products. Noise reduction with the shearlet-based method resulted in a visual improvement in the quality of the noisy spectral planes and the spectra of both simulated and real HS images. The spectral correlation and peak signal-to-noise ratio were highest with the shearlet-based method compared to SAVGOL and median filtering. There was a clear improvement in the classification accuracy of the support vector machine (SVM) models for both the simulated and real HSI data that had been de-noised using the shearlet-based method. The method presented is a promising technique for automatic de-noising of close-range HS images, especially when the amount of noise present is high and in consecutive wavelengths.

[1] G. Easley, D. Labate, W.-Q. Lim, Sparse directional image representations using the discrete shearlet transform, Appl. Comput. Harmon. Anal. 25 (2008) 25–46

Acknowledgement: This work has received funding from the European Union’s Horizon 2020 research and innovation programme named ’MODLIFE’ (Advancing Modelling for Process-Product Innovation, Optimization, Monitoring and Control in Life Science Industries) under the Marie Sklodowska-Curie grant agreement number 675251.




J-M. Roger1, G. Rabatel1, F. Marini2, B. Walczack3

1ITAP, Irstea Montpellier Centre, BP 5095 34196 Montpellier cedex 5, France.
2Department of Chemistry, Univeristy of Rome “La Sapienza”, P.le Aldo Moro 5, I-00185 Rome, Italy
3Silesian University, 9 Szkolna Street, 40-006 Katowice, Poland

Keywords: SNV, normalisation, preprocessing

SNV is often to correct for multiplicative and additive effects. It consists in subtracting to every spectrum its mean value and to divide it by its standard deviation. However, the compound of interest also has an effect on the spectrum. Consequently, because SNV processes all the variations indistinctly, it jeopardizes the interpretation of the resulting spectrum shapes and of the model coefficients. One solution is to weight the variable before the calculation of the mean and of the standard deviation used in the SNV normalization. In this study, we propose an original method to calculate these weights:

1.     For a given couple of spectra S1 and S2, the RANSAC algorithm [1] allows us to find the largest set of wavelengths lu that satisfy the same equation S2(l) = a . S1(l) + b.

2.     On the total set of spectra, every possible couple (Si, Sj) is considered, and the set of wavelengths [lu]ij  are computed according to 1. Finally, a weight is affected to every wavelength, according to the number of times it has been selected for a (Si, Sj) couple. This leads to a global wavelength weighting vector w.

The procedure has been tested on synthetic data and on a set of spectra issued from hyperspectral images of apple leaves. This set included spectra of healthy leaves and spectra of leaves infected by apple scab. The spectra processed by standard and selective SNV have been used in a PLS-DA model. The main result of the modified SNV is that the resulting model gives more interpretable loadings and is more robust.


[1]   Fischler M.A. & Bolles R.C. (1981). Random Sample Consensus: A Paradigm for Model Fitting with Applications to Image Analysis and Automated Cartography.  Communications of the ACM, 24 (6),‎  pp. 381–395.




Jan Walach1, Peter Filzmoser1, Karel Hron2, Stepan Kouril2,3

TU Wien, Austria
Palacky University, Olomouc, Czech Republic
University Hospital Olomouc, Czech Republic

Outlier analysis typically refers to an analysis of observations (rows) of the data. In that case, the whole observation is marked as outlying. However, real-world data are more complicated, since often only few data cells of a row are outlying. Furthermore, each observation has generally different deviating cells. An analysis of such outliers is called cell-wise outlier diagnostics, and it can provide more insight into the data.

Recently, different tools for cell-wise outlier diagnostics have been proposed in the statistics literature [1]. However, these approaches might fail to work appropriately if the data has compositional structure and/or if there is the so-called "size-effect" present in the data. The size-effect refers to a situation with different sample volume or concentration of the observations. Under such circumstances, absolute information is not relevant and only relative information in terms of (log-)ratios between the variables might lead to a proper understanding of the data.

Here, a cell-wise diagnostic tools is presented based on the pairwise log-ratio approach [2]. The weighting function borrowed from the Tau estimator of scale and several other weighting functions are applied on each log-ratio, thus assigning a weight between [-1,1\right] to each log-ratio, where values around 0 points to regular cells. To come back to the original dimensions of the data, several aggregation functions and filtering methods are considered.

From a statistical point of view, important features are those variables which differ between the groups (e.g. controls/patients). These identified biological outliers with group-wise differences in certain variables can be distinguished from unstructured technical cell-wise outliers. A tool based on the R-software Shiny has been developed for visualizing the weights, and it will be successfully applied on simulated an real data sets.

[1] Rousseeuw, Peter J., and Wannes Van Den Bossche (2017). Detecting deviating data cells. Technometrics: 1-11
[2] Pawlowsky-Glahn, V., Egozcue, R. Tolosana-Delgado, J.J. (2015) Modeling and Analysis of Com-
positional Data Chichester: Wiley
[3] Walach, J., Filzmoser, P., Hron, K., Walczak, B., Najdekr, L. (2016) Robust biomarker identifica-
tion in a two-class problem based on pairwise log-ratios Manuscript submitted for publication
[4] Walach, J., Filzmoser, P., Hron, K., S. Kouril (2018)  Cell-wise outlier diagnostics based on log-ratios. In preparation

Acknowledgement: This research is supported by the Austrian Science Fund (FWF) and Czech Science
Fund (GACR), project number I 1910-N26 (15-34613L).





Mourad Kharbach1,2, Mohammed Alaoui Mansouri2, Joeri Vercammen3, Abdelaziz Bouklouze2  and Yvan Vander Heyden1

1 Department of Analytical Chemistry Applied Chemometrics and Molecular Modelling, Vrije Universiteit Brussel (VUB), Laarbeeklaan 103, B-1090 Brussels, Belgium
2 Biopharmaceutical and Toxicological Analysis Research Team. Laboratory of Pharmacology and Toxicology, Faculty of Medicine and Pharmacy, University Mohammed V- Rabat- Morocco
3 Interscience, Avenue J. E. Lenoir 2,1348 Louvain-la-Neuve, Belgium

The Argan forest is a region of great ecological, cultural and economic importance, which allowed the subsistence of a part of the Moroccan population for centuries. Recognized for its advantageous nutritional composition, Argan oil is a leader in both the edible and cosmetic oil markets. Research studies about authentication of Extra Virgin Argan Oil (EVAO) are very limited; currently, no document ensures the traceability.

In this study, Selected-Ion Flow-Tube Mass-Spectrometry (SIFT-MS) was used to identify and quantify trace adulterants (olive oil, sunflower oil, and Argan oil of lower quality) at relatively low levels in the EVAO. SIFT-MS is an analytical technique, which has the ability to identify and quantify traces of volatile organic compounds. Three chemical ionization precursors (H3O+, NO+, and O2+) were used. This work investigated the effectiveness of SIFT-MS associated to partial least square (PLS) and support vector machine regression (SVMR) to perform rapid screening of EVAO and to quantify trace adulterants. The variables from the SIFT-MS data that contain information for the aimed quantification were selected, whereas those variables encoding the noise are eliminated using Variable Importance in Projection (VIP). The PLS and the SVMR models were able to predict the EVAO authenticity with high accuracy and 1% of each adulterant could be quantified.

SIFT-MS data handled with the appropriate chemometric tools were found suitable for the rapid quality evaluation of the Moroccan Extra Virgin Argan Oil.




Nicola Cavallini1,2, Rasmus Bro2, Francesco Savorani3, Marina Cocchi1

1Dipartimento di Scienze Chimiche e Geologiche, Università di Modena e Reggio Emilia, Via Campi 103, 41125 Modena, Italia
2Department of Food Science, University of Copenhagen, Rolighedsvej 30 1958 Frederiksberg C, Denmark 
3Department of Applied Science and Technology, Politecnico di Torino, Corso Duca degli Abruzzi, 24 - 10129 Torino, Italia

Consumer’s preferences are traditionally assessed by directly interviewing small groups of people, but with the growth of the Internet and its web communities, mining online-posted reviews has become an interesting approach for assessing product appreciation and reception [1], [2]. Huge amounts of user-generated data are available today in very different formats, such as numeric scores, logical scores (in the form of “like/dislike”), geotags, written descriptions etc.

In this study, beer has been investigated both from the point of view of its chemistry and composition [3]–[5] and also from the point of view of the consumers’ preferences [6], [7].

The aim of our work is to investigate the gap between these two “worlds”: the “objective” world of analytical chemical profiling e.g. using spectroscopy, and the “subjective” world of user-generated reviews. We believe it is interesting to investigate how much synergy arises from combining these types of information.

A dataset of beer samples of different brewing styles is used as a benchmark. It is obtained merging the measured using NMR, NIR and Visible spectra, together with user-generated reviews mined from a website. We propose using text analysis methods to process the user-generated reviews and convert them into numeric form. Chemical features are extracted from the spectral data, and Partial Least Square is used to investigate the links between spectral and text data.

A preliminary screening using a reduced vocabulary basis and a simple single-word bag-of-words [8] approach gave promising results. For instance, beers produced by craft breweries were found clustered in the principal components space, but also distant from more “industrial” products.

[1] K. Christensen et al., “Mining online community data: The nature of ideas in online communities,” Food Qual. Prefer., vol. 62, pp. 246–256, Dec. 2017.
[2] G. D. Jacobsen, “Consumers, experts, and online product evaluations: Evidence from the brewing industry,” J. Public Econ., vol. 126, pp. 114–123, Jun. 2015.
[3] I. Duarte et al., “High-Resolution Nuclear Magnetic Resonance Spectroscopy and Multivariate Analysis for the Characterization of Beer,” J. Agric. Food Chem., vol. 50, no. 9, pp. 2475–2481, 2002.
[4] S. Rossi, V. Sileoni, G. Perretti, and O. Marconi, “Characterization of the volatile profiles of beer using headspace solid-phase microextraction and gas chromatography-mass spectrometry,” J. Sci. Food Agric., vol. 94, no. 5, pp. 919–928, Mar. 2014.
[5] F. A. Iñón, S. Garrigues, and M. de la Guardia, “Combination of mid- and near-infrared spectroscopy for the determination of the quality properties of beers,” Anal. Chim. Acta, vol. 571, no. 2, pp. 167–174, 2006.
[6] C. Gómez-Corona, H. B. Escalona-Buendía, M. García, S. Chollet, and D. Valentin, “Craft vs. industrial: Habits, attitudes and motivations towards beer consumption in Mexico,” Appetite, vol. 96, pp. 358–367, Jan. 2016.
[7] C. Gómez-Corona, M. Lelievre-Desmas, H. B. Escalona Buendía, S. Chollet, and D. Valentin, “Craft beer representation amongst men in two different cultures,” Food Qual. Prefer., vol. 53, pp. 19–28, 2016.
[8] R. E. Banchs, Text Mining with MATLAB®. New York, NY: Springer New York, 2013.



Paul J. Gemperline1, Mahsa Akbari Lakeh2, Elnaz Tavakkoli2, Hamid Abdollahi2

1Department of Chemistry, East Carolina University, Greenville, NC, 27858
2Department of Chemistry, Institute for Advanced Studies in Basic Sciences, P.O. Box 45195-1159, Zanjan, Iran

Multivariate curve resolution (MCR) is a powerful tool in chemometrics that has been involved in the solution of many analytical problems. The introduction of trilinearity constraints and reference values as known-value constraints in MCR problems can considerably reduce the extent of rotational ambiguity for all components; however, in practice, many systems exhibit non-ideal behavior.  In such cases, strictly enforced constraints or “hard” constraints can lead to MCR solutions that yield estimated profiles with active constraints at the expense of an increase in the model lack of fit. Sometimes, enforcing the estimated profiles to strictly obey the constraints may generate solutions that show severe lack of fit to the curve resolution model. Soft constraints have been proposed to address and solve this problem [1, 2].

In this paper, we introduce “soft-trilinearity constraints” to permit peak profiles of given components to have small deviations from ideal trilinear response (e.g, small deviations in their shape and position in different samples). The advantages and disadvantages of this approach are compared to other methods like PARAFAC2. We show that application of hard-trilinearity constraints can lead to solutions that are completely wrong or exclude the opportunity of a possible solution at all. We also introduce “soft reference value constraints” to investigate the use of soft known-value constraints on the accuracy of MCR solutions. An illustration using soft known value constraints is given for a batch reaction experiment and the results are compared to the well-established multivariate regression algorithm and partial least squares (PLS) method.

[1] P.J. Gemperline, E. Cash, Advantages of soft versus hard constraints in self-modeling curve resolution problems. Alternating least squares with penalty functions, Analytical Chemistry, 75 (2003) 4236-4243.

[2] S. Richards, R. Miller, P. Gemperline, Advantages of Soft versus Hard Constraints in Self-Modeling Curve Resolution Problems. Penalty Alternating Least Squares (P-ALS) Extension to Multi-way Problems, Applied Spectroscopy, 62 (2008) 197-206.




Henning Schröder¹, Mathias Sawall¹, Denise Meinhardt¹², Klaus Neymeyr¹²

1University of Rostock, Ulmenstr. 69, 18057 Rostock, Germany
2Leibniz-Institute for Catalysis, Albert-Einstein-Str. 29a, 18059 Rostock, Germany

Structure elucidation for the chemical components of a reaction system can be significantly supported by spectroscopic measurements. If the spectroscopic data contains isolated signals or groups of partially separated peaks, then the identification of correlations between these peaks can help to determine the pure components by their functional groups.

We present a computational method which recovers spectral corellations from a selected peak in a small frequency window to peaks or peak groups in the full frequency range. This global spectrum reproduces the spectrum in the local frequency window or, at least, reproduces the contribution from the dominant component in the local window. A successive application of the procedure is used to extract multiple components. The method is called the Peak Group Analysis (PGA). The methodological background of the PGA are a multivariate curve resolution method and the solution of a minimization problem with weighted soft constraints. The method is explained for an experimental FT-IR data set on the formation of rhodium clusters in a hydroformylation process.

PGA is implemented in the freely available FACPACK toolbox and comes with a user-friendly MATLAB GUI.

[1] H. Schröder, M. Sawall, C. Kubis, A. Jürß, D. Selent, A. Brächer, A. Börner, R. Franke, K. Neymeyr, Comparative multivariate curve resolution study in the area of feasible solutions. Chemom. Intell. Lab. Syst., 163, pp 55-63, 2017.
[2] M. Sawall, C. Kubis, E. Barsch, D. Selent, A. Börner and K. Neymeyr, Peak group analysis for the extraction of pure component spectra. J. Iran. Chem. Soc., 13(2), pp 191-205, 2016.




Siewert Hugelier1, Dario Cevoli1, Raffaele Vitale1,2, Cyril Ruckebusch1

1Université de Lille, LASIR, F-59655 Villeneuve d’Ascq, Cedex, France.
2KU Leuven, Laboratory of Photochemistry and Spectroscopy, B-3001 Heverlee, Belgium.

Hyperspectral imaging (HSI) is a tool used to create a visual image of a chemical sample while simultaneously providing information about its chemical composition or properties. The basic unit is the chemical map, a grayscale image that carries intensity information of only a single wavelength, and its goal is to gain an understanding of the structures and processes of this sample by investigating these images for the different wavelengths. Dealing with complex mixture samples, different methods can be applied such as Multivariate Curve Resolution – Alternating Least Squares (MCR-ALS) [1] and with the recent proposal to adapt the MCR-ALS framework slightly for HSI [2], image processing algorithms can be introduced as constraints during the analysis. The approach allows a considerable additional flexibility during the resolution of the spectral problem by inputting information that links pixels together.

In this work, we present a range of different spatial constraints and show their influence on the obtained distribution maps and spectral profiles of the components present in the mixture. In general, the true spectra can hardly ever be uniquely obtained from real experimental data, but the chemical maps can give a good indication about the correctness of these profiles; a picture says more than a thousand words and this is no different for spectral decomposition of chemical mixtures. On the other hand, pushing the solution into a direction in which the chemical maps are more attractive and visually pleasing, the spectral profiles will not remain unchanged. We rationalize this approach on simulated data and show the results obtained on hyperspectral imaging covering Raman, near-infrared and infrared spectroscopy data in which spatial constraints prove to be advantageous for both decomposition directions.

[1] A. De Juan, R. Tauler, Crit. Rev. Anal. Chem., 36, 163–176 (2006)
[2] S. Hugelier, O. Devos, C. Ruckebusch, J. Chemom., 29, 557–561 (2015).



A Generalized Constraint for Achieving the Unique Solution in Self Modeling Curve Resolution Methods based on Duality Principle

Hamid Abdollahi1, Robert Rajko2, Elnaz Tavakkoli1, Mahsa Akbari Lakeh1, Mahdiyeh Ghaffari1, Nematollah Omidikia1 and Saeed Khalili1

1 Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Gava Zang, Zanjan (Iran)

2 Department of Process Engineering, Faculty of Engineering, University of Szeged, P. O. Box 433, H-6701 Szeged (Hungary)


There is a natural duality between the row and column vector spaces of the data matrix using minimal constraints. First time Henry [15] has introduced the duality principle in chemometrics and later Rajko [16] showed that there is a natural duality between the row and column vector spaces of a bilinear data matrix using only the non-negativity property of data set. It is remarkable that this mathematical relation between row and column space provides an efficient tool to transfer the information of the considered space to the dual space. The duality concept is a general principle that can be formulated in an easy way.

Simply the duality principle states that each data point corresponds to one special directed hyper-plane in the dual space. This relation can be used for any point in the considered space. Thus, a unique solution as a point fixes a unique directed hyper-plane as its dual subspace. Hence the conditions for achieving the unique solution can be studied simply based on this general principle. Undoubtedly, the necessary condition to get a unique solution is the definition of the directed hyper-plane in the dual space. So according to this concept, there is a generalized constraint for achieving the unique solution.

We have shown that implying constraints like trilinearity, known values and local rank constraint can be interpreted based on the duality principle. Similarly, we have explored that other conditions which result in the unique solution such as extracting the net analyte signal and resolution theorem conditions just fulfill the duality principle.



 [1] R.C. Henry, Duality in multivariate receptor models, Chemometrics and intelligent laboratory systems, 77 (2005) 59-63.

[2] R. Rajko, Natural duality in minimal constrained self modeling curve resolution, J. Chemom., 20 (2006) 164-169.




Xin Zhang1, Zhuoyong Zhang1

1Department of Chemistry, Capital Normal University, Beijing, 100048, China

Hyperspectral imaging can provide the information about chemical structure, composition, physical property, concentration distribution of the analytes. The imaging instruments could be based on near infrared, mid-infrared, Raman and terahertz spectroscopy, and different techniques are complementary to each other for the characterization of the samples. Considering the advantages of, non-destructive, on line, in vivo, hyperspectral imaging has been applied to different areas like biology, food industry, agriculture, environment etc.

We all know that detecting destroyed pixels, removing baselines, spectral smoothing, variables selection, choosing region of interest are necessary for spectra pretreatment to improve the qualitative and quantitative models of hyperspectral imaging analysis. Also, when MCR methods were applied, constrains have to be very carefully chosen to correctly resolve the hyperspectral imaging, especially when data fusion was applied for the data collected from different spectroscopy.

In our work we have applied hyperspectral imaging based on Raman, near infrared and terahertz spectroscopies to study the metabolomics of plants and animal skin damages. Through different constraints, the information was resolved on time dimension trends, chemical reaction processes, change of metabolism components, pure component distribution and structural damage of skins.

Acknowledgement: This work was supported by Natural Science Foundation of China (21705112), Scientific Research Project of Beijing Educational Committee (KM201710028009) and Youth Innovative Research Team of Capital Normal University (009175301300).




Masoumeh Alinaghi1, Hamid Abdollahi1

1Department of Chemistry, Institute for Advanced Studies in Basic Sciences,               
P.O. Box 45195-1159, Zanjan, Iran

The objective of multivariate curve resolution (MCR) methods is to decompose a second-order bilinear data matrix into the product of two chemically meaningful matrices without any prior knowledge. In many cases, the obtained results are not unique; it means a range of feasible solutions equally fit the data matrix.

Resolution theorems based on window information was proposed by Rolf Manne to introduce the general conditions for the unique resolution. Successful prediction of the probable unique calculation of profiles depends entirely on the reliable estimation of local rank and selectivity information. Correct local rank detection, calculated by algorithms such as evolving factor analysis (EFA), relies on some factors like the degree of overlap between profiles and the noise level [1].

In this work, resolution theorems have been investigated in multi-set data analysis that has the major benefit of being unrelated to estimation of local rank information. The results can also be explained by duality concept. Accordingly, the correspondence constraint is applied while simultaneous analysis is considered. Even the simultaneous analysis without applying correspondence can improve the accuracy of SMCR results by itself due to increase in information and in some cases can obtain the unique answer based on databased-uniqueness. Additionally, applying the correspondence can lead to unique resolution based on resolution theorems. Therefore, not only multiset analysis can improve the results in well-designed data sets, but also can provide the situation to apply correspondence and lead to unique resolution. It should be noted that multi-set can be useless if it is not providing any further information about the studied system [2].

[1] M. Akbari Lakeh, R. Rajko, H. Abdollahi, Anal. Chem., 89(4), 2259-2266 (2017).
[2] M. Alinaghi, R. Rajkó, H. Abdollahi, Chemom. Intell. Lab. Syst., 153, 22-32 (2016).




Paul H. C. Eilers1
1Erasmus University Medical Centre, Rotterdam, the Netherlands

The traditional approach to multivariate curve resolution models a matrix as a sum of rank-one outer products. Each component of the sum is a product of, say, a time series and a spectrum. The physical background demands that each series and each spectrum be non-negative. Usually an alternating least squares (ALS) algorithm is used, updating a trial solution iteratively.  Constraints are introduced to force the factors to be positive.

The logarithm of an outer product is the sum of the logarithms of the factors. In my contribution, I explore the theoretical and practical aspects of modeling the logarithms of the factors. I call this a log-linear model, borrowing terminology from the statistical literature. An obvious advantage is that positive results are guaranteed. Also, it is easy to add penalties for smoothness and shape constraints, like unimodality.

In contrast to alternating least squares, one linearized system of equations can be set up, to update all model parameters at the same time. We have to solve a non-linear problem, so it remains challenging to set proper starting values. To study practical performance, I will analyze data sets from the literature and compare my results to ALS.




C. Ruckebusch, S. Hugelier, T. Do, M. Sliwa

Université de Lille LASIR CNRS F59000 Lille France

Successful use of multispectral time-resolved fluorescence imaging for multi-target imaging of stained cells has been demonstrated at the ensemble and single-molecule level.[1,2] On the one hand, it provides spatially-resolved information of the fluorescence lifetime of a given fluorophore, a property influenced by its direct molecular and chemical environment. On the other hand, the technique allows spectrally-resolved detection for multicolor analysis from multiple fluorescent labels and provides high-content imaging of complex samples. However, state-of-the-art data analysis for extracting the signatures from the different fluorophores rely on multi-exponential fitting or pattern matching algorithms based on characteristic fluorescence decays.[2]

In this work, we propose to take a chemometric approach and investigate how multilinear resolution methods perform on the multiway data provided by advanced fluorescence imaging modalities (multi-way confocal, STORM, STED). We present results obtained on images of fluorescent organic nanoparticles, for which emission heterogeneity can be characterized both in the spatial and in the time spectral mode. These results open a better understanding of the photodynamics at single nanoparticle level. We also report preliminary results obtained on biological cells labelled with different fluorophores attached to different molecular targets in situations where fluorophores may have spectrally similar emission spectra.  

[1] Bastiaens et al., Trends Cell Biol. 9 1999 48 52

[2] Niehorster et al., Nat. Methods 2015 3740




Ivan Krylov1, Anastasia Drozdova,2 Timur Labutin1

1Department of chemistry, Lomonosov Moscow State University, Moscow, Russia
2Shirshov Institute of Oceanology, Russian Academy of Sciences, Moscow, Russia

Dissolved organic matter (DOM) plays an important role in the environment by supporting growth of marine biota and participating in flocculation of colloid clay particles in estuarine zones; it also indicates terrestrial processing of organic matter.

One of the most informative methods of DOM analysis is two-dimensional spectrofluorometry, characterized by high sensitivity, minimal sample preparation and a small sample volume required for analysis. It is usually not possible to discern separate emission lines corresponding to individual compounds in DOM fluorescence spectra, so instead, a small number of fluorophores with defined optical characteristics is described and correlated to allochthonous protein-like compounds or terrigenous humic substances.

Tensor rank decomposition methods (e.g. PARAFAC) are successfully employed to find the independent components (considered to be correlated to fluorophores) comprising each individual excitation-emission matrix (EEM) in the dataset. There is still only fragmentary information on PARAFAC components of EEMs of Arctic seawater. In this work, 150 samples of DOM from Arctic shelf seas and freshwater ponds of Novaya Zemlya archipelago were collected during the cruises 2015-2018. EEMs were recorded in wide ranges of excitation (230–550 nm) and emission (240–650 nm) wavelengths. Different methods of signal normalization were applied. Due to hardware limitations, the data contained unfiltered scattering peaks which had to be removed from the dataset to build an accurate PARAFAC model.

CORCONDIA and split-half analysis were used to choose an appropriate number of components. Selected components correlate well with conventional HIX and BIX indices confirming the sufficiency of the model.

Acknowledgement: This work is supported by the Russian Foundation for Basic Research, grant No 16-35-60032 mol_a_dk.




Marina Cocchi1, Nicola Cavallini1,2, Caterina Durante3, Mario Li Vigni3, Rasmus Bro2

1Dipartimento di Scienze Chimiche e Geologiche, Università di Modena e Reggio Emilia, Via Campi 103, 41125 Modena, Italia
2Department of Food Science, University of Copenhagen, Rolighedsvej 30 1958 Frederiksberg C, Denmark 
3ChemSTAMP S.r.l., Via Campi 103, 41125 Modena, Italia

To enhance information extraction and visualization from data with weak structure, not showing evident grouping or trends, e.g. because the characteristic features vary progressively among samples categories or number of samples is huge, we propose an approach based on the fusion of adjacency matrices. Adjacency matrices are squared binary matrices in which a “1” is present when the “adjacency condition” is fulfilled by the samples pair, and zero otherwise. The approach is in two-steps, each based on the fusion of adjacency matrices: the first fuses information coming from different clustering techniques and/or similarity/distance criteria, while the second fuses information from different blocks of data, e.g. acquired by different analytical platforms.

The proposed approach is intended as an unsupervised exploratory tool to better highlight grouping structure and its visualization, for which OPTICS and PCA can been used.

The overall idea is indeed to combine multiple weak sources of information to obtain better performing models that benefit from this sort of cooperation, a concept similar to methods like Random Forest or Weak Learning Algorithm, in the supervised context.

The proposed approach will be discussed in the data fusion framework and illustrated using different datasets as benchmarks.




Alessandra Biancolilloa, Federico Marinia, Jean-Michel Rogerb

aUniversity of Rome La Sapienza, Piazzale Aldo Moro 5, I-00185, Rome, Italy
b ITAP, Irstea, Montpellier SupAgro, Univ Montpellier, Montpellier, France

Due to technological development, it is becoming more and more common to handle multi-platform data, i.e., diverse data blocks collected by different instrumental techniques on the same set of samples. It has been widely demonstrated that this kind of data sets should be handled by multi-block methodologies (which allow the simultaneous extraction of information from all the blocks) rather than analyzed separately in different individual models [1-2]. Although there exist methods for managing globally several blocks, some applications require to select a limited number of meaningful variables over the blocks; nevertheless, how to select variables in a multi-block framework has not been widely discussed yet. As a consequence, the possibility of combining a multi-block regression method, the Sequential and Orthogonalized-Partial-Least-Square (SO-PLS) [3], with the Covariance Selection (CovSel) approach [4] has been investigated, and a novel multi-block method, called SO-CovSel, has been developed. Among the several approaches available in literature, SO-PLS and CovSel have been chosen because, in general, they have demonstrated to provide good predictions and to be particularly suitable from the interpretation point of view.
Briefly, the algorithm of the novel methodology resembles the SOPLS’s where PLS has been replaced by the feature selection operated by CovSel. The proposed method finds its applicability either in a regression or a discrimination framework, in analogy to SO-PLSR and SO-PLS-DA. In both contexts it has proved to give predictions comparable with other state of the art methods, and it resulted particularly suitable for the interpretation of complex systems.

[1] I. E. Frank and B. R. Kowalski, Prediction of wine quality and geographic origin from chemical measurements by Partial Least-Squares regression modeling, Anal. Chim. Acta, 162, 241–251 (1984) .
[2] T. Skov, A.H. Honoré, H.M. Jensen, T. Næs, S.B. Engelsen, Chemometrics in foodomics: Handling data structures from multiple analytical platforms, Trends Anal. Chem. 60, 71-79 (2014).
[3] T. Næs, O. Tomic, B.-H. Mevik, H. Martens, Path modelling by sequential PLS regression, J. Chemometr. 25, 28–40 (2011) .
[4] J.M.Roger, B.Palagos, D.Bertrand, E.Fernandez-Ahumada, Chemometr. Intel. Lab. Syst. 106, 216-223 (2011).




Kévin Jacq1,2, Didier Coquin2, Bernard Fanget1, Yves Perrette1, Pierre Sabatier1, Fabien Arnaud1, Ruth Martinez-Lamas3,4, Maxime Debret3

1 University Grenoble Alpes, University Savoie Mont Blanc, CNRS, EDYTEM, 73000 Chambéry, France
2 University Grenoble Alpes, University Savoie Mont Blanc, Polytech Annecy-Chambéry, LISTIC, 74000 Annecy, France
3 Normandie Univ, UNIROUEN, UNICAEN, CNRS, M2C, 76000 Rouen, France
4 IFREMER, Laboratory Géodynamique et Enregistrement Sédimentaire (LGS), France

For solids environmental samples, spectroscopic properties can be analyzed but their interpretation is difficult due to the lack of common referential. For the spectroscopic images, pixels are relatively spatially referenced but in most cases, each sensor has his own spatial resolution.

The sample used in this work is the first 30cm of a sedimentary core from the Lake Le Bourget (Western Alps), characterized by a stratified area corresponding to last eutrophic conditions of the lake.

The aim of this work is to combine four images, (1) two hyperspectral images (9x15cm²): VNIR (98 bands, pixel: 60µm) and SWIR (144 bands, pixel: 189µm), and (2) two fluorescence images (2x10cm²; sub-sample of the previous one) using excitation wavelengths of 266nm and 355nm (1024 bands each, pixel: 100µm).

Each hyperspectral data can be resume with a structured grayscale image. With these, it is possible to calculate a micro-deformation model (digital image correlation) and registered them with the same spatial dimension.

Applying ARSIS method [1], a pixel level data fusion model is created to fuse all the spectra in a unique spatial cube with the optimal resolution using wavelet spatial transform (decomposition in 4 images: details, vertical, horizontal and diagonal). The new cube can be used as a new instrument.

The ARSIS method allows to create a correlation model between the wavelet functions for all the resolution images used. This correlation can be used to add spatial structures to the low spatial resolute data calculated with wavelet transform.

[1] Ranchin, T., & Wald, L. (2000). Fusion of high spatial and spectral resolution images: the ARSIS concept and its implementation. Photogrammetric Engineering and Remote Sensing, 66(1), 49–61.




Astrid Maléchaux1, Rabia Korifi1, Sonda Laroussi1,2, Yveline Le Dréau1, Jacques Artaud1, Nathalie Dupuy1

1Aix Marseille Univ, Univ Avignon, CNRS, IRD, IMBE, Marseille, France
2Institut de l’Olivier, Unité Qualité et Technologie, Sfax, Tunisie

To this day, technological progress has increased the rapidity and precision of chemical data collection and the efficiency of statistical analysis of large databases. This has been particularly useful in chemometrics with the development of various modelling algorithms, one of the most popular being Partial Least Squares – Discriminant Analysis (PLS-DA) [1]. Many applications can be found, notably for food authenticity. For instance, the variety or region of origin of food products can be determined [2]. The discrimination is often based on data from one particular analytical method. However there is a growing interest in the combination of complementary data from different methods, leading to the development of multiblock (MB) algorithms such as MB-PLS [3].

In this study, Mid Infrared (MIR) spectroscopy and Gas Chromatography (GC) data are compared and combined to discriminate Tunisian Virgin Olive Oil samples collected during the 2011-2012 harvest, from Chemlali (n=187), Chetoui (n=102) and Oueslati (n=45) varieties. MIR spectra were recorded from 4000 to 700 cm-1 and fatty acid compositions were obtained by GC after transmethylation [4]. PLS-DA models were developed on MIR and GC data separately and then both datasets were combined in a multiblock approach to estimate its ability to improve the prediction performances. GC data gives better classification rates than MIR data for all 3 varieties. MB-PLS improves the prediction results, especially for Oueslati samples with 99% correct classification versus 96% and 94% for GC and MIR alone respectively. The influence of the PLS-DA classification threshold is also discussed.

[1] M. Barker, W. Rayens, ‘Partial least squares for discrimination’, J. Chemom., 17, 3, 166–173 (2003)
[2] G. Downey, Advances in food authenticity testing. (2016)
[3] J. A. Westerhuis, P. M. J. Coenegracht, ‘Multivariate modelling of the pharmaceutical two-step process of wet granulation and tableting with multiblock partial least squares’, J. Chemom., 11, 5, 379–392 (1997)
[4] D. Ollivier, J. Artaud, C. Pinatel, J. P. Durbec, M. Guérère, ‘Triacylglycerol and fatty acid composition of French virgin olive oils’, J. Agric. Food Chem., 51, 5723-5731 (2003)

Acknowledgement: This work was financially supported by the French National Agency for Research (ANR) as part of the MedOOmics project, included in the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement number 618127 (ARIMNet2). Financial support was also obtained from the “PHC Utique” program of the Tunisian Ministry of Higher Education and Scientific Research and the French Ministry of Foreign Affairs, in the Committee for University Cooperation (CMCU) project number 11G1214.




Åsmund Rinnan1, José Camacho2, Rasmus Bro1

1Department of Food Science, University of Copenhagen, Denmark
2Department of Signal Theory, Networking and Communications, University of Granada, Spain

Interpretation of regression models, and PLS in particular, continues to cause problems in many fields of applied science. For spectroscopic data, PLS has become the default method for regression analysis, and is used daily in the industry to predict attributes of interest, both as a quality assessment tool for the product, as well as for process monitoring. This clearly shows that PLS easily can handle highly correlated variables and gives adequate good models. The willingness of scientists to interpret PLS models has also increased throughout the years, either based on the regression coefficients and/ or loading weights. There has also been a tendency to interpret these models equally independent of the pre-processing and validation performed. In this project we would like to highlight the pitfalls by trusting that what you see in these parameters are connected to true correlations and causal effects in the data. We will do this through a series of real examples, as well as on synthetic data.




S. Ruiz1, M.C. Ortiz2, L.A. Sarabia1, M.S. Sánchez1

1Departamento de Matemáticas y Computación, Facultad de Ciencias, Universidad de Burgos, Plaza Misael Bañuelos s/n, 09001 Burgos, Spain 
2Departamento de Química, Facultad de Ciencias, Universidad de Burgos, Plaza Misael Bañuelos s/n, 09001 Burgos, Spain

When fitting a PLS regression model between process variables (predictors, X) and quality characteristics (responses, Y) a reduction of dimensionality is achieved by projecting the input variables into a latent space, usually in a much less dimension. Besides, with such a large number of predictors, latent variable models provide a way of visualizing the space of process variables through its projection into latent variables.

However, this projection and the subsequent ‘loss’ of dimensionality makes it hard sometimes to mathematically invert the model, that is, to know which values of the process variables should be chosen to produce a product with a priori decided, desired or target characteristics, yt.

In a previous work [1], we propose a computational approach to tackle the problem, providing a family of potential solutions, that is, a region inside the process variable space where the expected y (predicted by the PLS model) would be, in practice, equal to yt.

In the present work, we explore the variability around these values due to what we can call the residual space, that is, the space spanned by the discarded latent variables when fitting the PLS model which, as such, add variability to the input space without distorting the predicted quality characteristics. In the context of Quality by Design (QbD) and/or Process Analytical Technologies (PAT), we are trying to estimate the part of the design space due to the prediction model.

The approach is compared to the results of directly inverting the model, when possible, and its relation to the null space [2].

  1. S. Ruiz, M.C.Ortiz, L.A.Sarabia, M.S. Sánchez, A computational approach to partial least squares model inversion in the framework of the process analytical technology and quality by design initiatives, submitted July 2017.
  2. E. Tomba, P. Facco, F. Bezzo, M. Barolo, Latent variable modeling to assist the implementation of Quality-by-design paradigms in pharmaceutical development and manufacturing: A review, International Journal of Pharmaceutics 457 (2013) 283-297.

Acknowledgement: The work is developed in the context of project CTQ2014-53157-R financed by Spanish Ministerio de Economía y Competitividad, and project BU012P17 financed by Junta de Castilla y León, both with European FEDER funds.




Francis B. Lavoie1, Koji Muteki2, Ryan Gosselin1

1 Chemical and Biotechnological Engineering Department, Faculty of Engineering, Université de Sherbooke, 2500. Boul. Université, Sherbooke, QC, J1E 4H1, Canada
2 SpecTECH Group, Pfizer Worldwide Research and Development, 280 Shennecossett Rd, Groton, CT, 06340, USA

Partial-Least-Squares (PLS) is now considered to be an essential tool to work with datasets in which spectra (X) are used to predict concentrations (y) [1]. However, this regression methodology has major drawbacks. First, any predictive model calculated from PLS is constrained by its linear form. Consequently, a strong nonlinear relation between (X) and (y) may be poorly explained. Second, PLS weights generally determine that several X-variables are significant when working with spectral data. The interpretation of such weights may then be arduous, especially when attempting to associate spectral energy levels to theoretical molecular/atomic absorption or emission bands.

To mitigate these 2 important drawbacks, we developed a new Nonlinear (NL) PLS variant. Similarly to the original PLS algorithm, our new NL-PLS methodology only requires to set the number of PLS components implemented in the model. Moreover, it is based on a novel iterative process largely differing from common NL-PLS variants proposed over the past. To avoid model overfitting, our NL-PLS algorithm is also characterized by nonlinear relations modelled with Cubic Hermite Splines under specific constraints [2]. Our algorithm also integrates the Powered-PLS (P-PLS) variant [3] which allows to simplify a predictive model, while maximizing its predictive performances.

From the analysis of NIR datasets, we demonstrate that our novel NL-PLS yields superior cross-validation performances compared to standard PLS and 6 common NL-PLS variants. Models from our proposed NL-PLS are also easier to interpret as it typically identifies a very limited number of wavelengths as being relevant to the prediction of (y), which in turn can easily ben correlated to theoretical molecular absorption bands.

[1] BLANCO, M., COELLO, J., ITURRIAGA, H., et al. NIR calibration in non-linear systems: different PLS approaches and artificial neural networks. Chemometrics and Intelligent Laboratory Systems, 2000, vol. 50, no 1, p. 75-82.
[2] DE BOOR, Carl, DE BOOR, Carl, MATHÉMATICIEN, Etats-Unis, et al. A practical guide to splines. New York : Springer-Verlag, 1978.
[3] INDAHL, Ulf. A twist to partial least squares regression. Journal of Chemometrics, 2005, vol. 19, no 1, p. 32-44.




Erik Andries1, John H. Kalivas2, Anit Gurung2

1Department of Mathematics, Science and Engineering, Central New Mexico Community College 
2Department of Chemistry, Idaho State University 
3Department of Chemistry, daho State University

The process of building a spectral calibration model typically requires a large set of samples such that all variances are included in future predictions.  However, once a calibration model has been built, circumstances can cause the model to become invalid (e.g., instrumental drift, uncalibrated spectral features appearing in new samples later in time, an unknown sample measured on an instrument other than the instrument  the calibration model was built on). For these and other situations, the instrument must be recalibrated to accommodate the new secondary conditions.

In the absence of standardization or transfer samples, calibration updating (transfer and/or maintenance) has historically been implemented using  a simple but effective technique: in addition to primary samples, include a small number of labeled secondary samples and weight them. However, 
it would be ideal if we could only use secondary spectra with unknown reference measurements (semi-supervised calibration updating). Moreover, if the use case scenario allows it, one could also incorporate an additional set of unlabeled samples...the samples from the test set (transductive calibration updating).  We propose two approaches for transductive calibration updating where all of the reference measurements for the secondary samples are considered to be unknown: one is based upon null projections and another is based upon minimizing the domain scatter between primary and secondary samples.

Acknowledgement: Financial support for this research was provided by the National Science Foundation Chemical Measurement and Imaging Grant CHE-1506417.



Improvement of NIR Calibration models and their value for animal food producers 

Mark Schoot1,2, Christiaan Kapper1, Gijs van Kessel3, Jeroen Jansen

1Nutricontrol, P.O. Box 107 - 5460 AC Veghel - N.C.B. laan 52 - 5462 GE Veghel, The Netherlands
2Radboud University, Institute for Molecules and Materials, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands
3Agrifirm Innovations Center, Agrifirm, Landgoedlaan 20, 7325 AW Apeldoorn, The Netherlands

Near infrared (NIR) spectroscopy has applications for process control if the performance of calibration models is adequate. We investigated potential improvements for making and updating NIR calibration models, as well as the (potential) value of NIR measurements for animal food (feed) producers. Results show that the method resulting in the best calibration model is highly dependent on the product. Important factors for optimizing calibration models for feed producers are the number of latent variables and the preprocessing strategy. Suitable methods for making and updating prediction models are cross-validation (to determine the number of latent variables) and a design of experiments (to determine a preprocessing strategy). Simulated NIR measurements on feed ingredients in feed factories show great potential for saving costs (4%) and improving end product stability. Better predictions improve these benefits even more, highlighting the importance of good calibration models.




Jan Stiedl1,2,3, Simon Green3, Thomas Chassé2, Karsten Rebner1

1Reutlingen University, Process Analysis & Technology, Alteburgstrasse 150, 72762 Reutlingen, Germany
2University of Tuebingen, Institute of Physical and Theoretical Chemistry, Auf der Morgenstelle 18, 72076 Tuebingen, Germany
3Robert Bosch GmbH, Automotive Electronics, Postfach 1342, 72703 Reutlingen, Germany

As well in research as in industry, copper is a very common used raw material. In most applications, the focus is on the surface and its condition. Due to its electrical properties, copper is used both in battery development and in the development of control units for automotive and non-automotive applications. To ensure the electrical properties and the adhesion of the contacting and packaging technology, the surface of the copper must not be covered by organic contaminants or by uncontrolled growth of oxide layers. In order to prove this, state of the art photoelectron spectroscopy, such as XPS, Auger or TOF-SIMS is used. This is not sufficient for satisfactory statistics and coverage of production batches. For this purpose, an UV/Visible spectroscopy measurement system was developed, which can qualitatively and quantitatively characterize the surface with a partial Least Square regression model. Superimposed absorption and interference spectra are used to determine oxide layers and organic impurities. Partial Least Square Regression is used as a data evaluation tool to establish a regression between the UV/Visible spectra and film thickness measurements using AES depth profiles. The accuracy of the regression is in the range of about 2.3 nm. Alternative, already known methods cannot be used in these cases, mostly due to the high roughness of the technical copper surfaces. The measuring system with an integrating sphere reacts only slightly to different types of roughness of the surface. Remaining roughness effects can be identified and filtered using multivariate data analysis.




M.K. Nieuwoudt1-3,5, S.E. Holroyd4, C.M. McGoverin1,5,6, C.M. Triggs7, M.C. Simpson1-3,5-6

1The Photon Factory, The University of Auckland, 23 Symonds st., Auckland, 1142, New Zealand
2The MacDiarmid Institute for Advanced Materials and Nanotechnology, New Zealand
3School of Chemical Sciences, The University of Auckland, 23 Symonds st., Auckland, 1142, New
4Fonterra Research & Development Centre, Private Bag 11029, Palmerston North, New Zealand
5Dodd-Walls Centre for Photonics and Quantum Technologies, New Zealand
6Department of Physics, The University of Auckland, 23 Symonds st., Auckland, New Zealand
7Department of Statistics, The University of Auckland, 23 Symonds st., Auckland, 1142, New Zealand

Milk is a highly complex mixture of at least 10,000 different biomolecules and subject to considerable natural variations [1]. Rapid analysis of liquid milk for major components such as fat and protein for farmer pay-out is performed routinely in milk test laboratories worldwide using mid-IR spectroscopy. Optimisation of dedicated mid-IR milk spectrometers and refinement of PLS calibration models has enabled high accuracy of measurement for major components such as fat, protein and lactose. Recently there has been much interest in using mid-IR spectroscopy to also measure minor components such as individual fatty acids, and traits such as methane emission, titratable acidity and ketosis. Crucial for accurate measurement of these is standardization of individual spectrometers in the laboratory, to maintain intra-instrument and inter-instrument stability [2].

We examined the intra-instrument and inter-instrument variations between four FTIR spectrometers in a single milk test laboratory over a period of one year. Spectra of a liquid milk pilot sample were recorded from each instrument approximately every hour during each week, on a different pilot sample for each week. One-way ANOVA and ASCA analyses were performed on the spectra and also on 32 major and minor components and traits measured from these spectra.

The results from the ASCA analysis, as well as %RSD and %deviation from the mean of the median and interquartile ratios of the data provided valuable insight into the stability and repeatability of prediction for selected major and minor components, as well as the effectiveness of the laboratory standardization routine of the instruments.

[1] P. Jelen, in Conference Proceedings of 35th Biennial session of ICAR (EAAP, Rome, Italy), 2007.
[2] F. Dehareng, C. Grelet, S. Holroyd, H. Van den Bijgaart, S. Warnecke, J. A. Fernández Pierna, P. Broutin, P. Dardenne, Bulletin of the IDF, 490, 39-69 (2017).




Kimmo Sirén, Ulrich Fischer, Jochen Vestner

DLR Rheinpfalz, Institute for Viticulture and Oenology, Neustadt an der Weinstrasse, Germany

A workflow is introduced to find the most important regions of interest or features between multiple groups based using statistical learning and utilizing the power of sample sizes and a priori knowledge of group membership. The approach is applied entirely in Python.
The key benefit of this automatized workflow is to automatize and speed up finding the features that differentiate the groups straight from the raw data. The performance of this approach was evaluated through 3 different published datasets of which two are publicly available: wine[1] (39 samples), urine[2] (160 samples) and rice[3] (80 samples).
To obtain necessary information this machine learning approach is utilizing tensor decomposition, feature selection and extreme gradient booster classifier. Following prediction metrics of each single, features above a certain threshold are ranked and selected for further downstream analysis. It is shown that the classical approach is often not sensitive enough to allow retrieving most out of data. This observation is validated by comparing the separations of the classes from this approach to published results.
In general, segmentation quality and amount of segments are found to be critical but over-segmenting does not necessarily reach better results. The number of deconvoluted compounds is observed reaching to a plateau. Peak picking, deconvolution, linear alignments and integration are always problematic in any approaches; the focal point of this approach is delaying the introduction of errors and artificial manipulation of the data.

[1] Vestner J. et al. Analytica Chimica Acta. 911, 42-58 (2016)
[2] Webb-Robertson, BJ., Kim, YM., Zink, E.M. et al. Metabolomics 10, 897-908 (2014)
[3] Hu, C. et al. Sci. Rep. 6, 20942  (2016)

Acknowledgement: This study was funded by the Horizon 2020 Programme of the European Commission within the Marie Skłodowska-Curie Innovative Training Network “MicroWine” (grant number 643063)




Gabriel Vivo-Truyols1,2

1Principal Scientist, Tecnometrix, Spain 
2Guest researcher, University of Amsterdam, The Netherlands

Data analysis methods applied to chromatographic data, including base-line correction, peak detection, alignment and peak tracking, calibration and/or classification are a routine part of most modern analytical workflows. With the emergence of hyphenation (especially high-resolution mass spectrometry) and two-dimensional methods (e.g. LCxLC) new challenges for the data analysis are emerging.  We are witnessing a boom of the amount of data to be processed, so we can start to talk about Big Data in Analytical chemistry. Analysing these enormous and complex quantities of data becomes a tremendous challenge, especially because of the need to do it automatically. Traditionally, chromatographic data has been processed using the so-called frequentist approach. With this approach, we get just a final answer about the hypotheses we are testing, but we have no information about its probability of being true.

Contrary to the frequentist approach, Bayesian statistics offers a very interesting alternative, estimating the probabilities of the processes mentioned above. This way of thinking opens a new world of possibilities, especially in the area of automated massive data treatment. In this way, the chromatographer has no longer to “trust” the results of the data analysis, but (s)he has to decide on the different configurations that explain the data, based on the probabilities of each one.

We have applied this way of thinking to a broad range of situations. One example concerns toxicological screening, in which the probabilities of a list of compounds being present in the sample, analysed with LC-MS. Using a Bayesian approach, it is easy to build up evidence about the presence/absence of a compound by taking into account adduct formation, isotope ratios, retention times and mass values, resulting in more accurate values of probability. Another example is a Bayesian view of the well-known peak tracking methods. In (traditional) peak tracking methods, peaks of the same compound are recognized in different chromatographic conditions. A Bayesian thinking approaches the problem in a probabilistic way, i.e. assigning different possibilities of peaks to the different compounds available.

The use of Bayesian statistics to deal with massive data treatment in chromatography constitutes a shift in the way we think about data analysis. Basically, we are proposing to work with probabilities of hypotheses (and update them as long as more information/data is taken into account), opposed to deliver the final answer to the user.



Multivariate analysis applied to Multicolour Flow Cytometry to detect rare cell populations

Geert Postma1, Rita Folcarelli1, Selma van Staveren2,3, Gerjen Tinnevelt1,2, Lutgarde Buydens1, Leo Koenderman2, Jeroen Jansen1

1 Radboud University, Institute for Molecules and Materials, Analytical Chemistry, P.O. Box 9010, 6500 GL Nijmegen, The Netherlands
TI-COAST, Science Park 904, 1098 XH Amsterdam, The Netherlands
Department of Respiratory Medicine, University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands

Minimal Residual Disease (MRD) indicates the persistence of rare aberrant cells in patients with neoplasms disease, such as Multiple Myeloma, despite the fulfillment of complete remission. The adverse prognosis makes the monitoring of MRD a standard diagnostic care for treatment effectiveness and risk of relapse evaluations. Immunophenotyping Multicolour Flow Cytometry (MFC) is preferred, among other technique, for evaluating MRD because it is simple, fast, less expensive and potentially more sensitive than the other available methods. However, detection of MRD with conventional MFC manual gating based analysis can be quite challenging. In fact, bi-variate manual gating is highly subjective and resource-intensive, requiring expertise in technicians to identify rare cells among an overwhelming majority of normal cells. Moreover, identification of aberrant cells cannot rely on the observation of single or two markers expression: combination and simultaneous analysis of multiple markers are needed. Here we describe the novel multivariate analysis method, ECLIPSE, to automatically eliminate normal cells from the analyses of patient samples, so that the resulting information-rich multivariate model is specifically aimed at identifying and describing cells related to the studied disease. We applied ECLIPSE to 8-colors multiparameter flow cytometry data from bone marrow samples of controls and patients treated for Multiple Myeloma. We showed that the method, after eliminating the majority of normal cells present in the patients, was able to identify MRD in all the Multiple Myeloma patients in a fast and automated way. Our algorithm represents a promising unsupervised method, which allows analysis of multi-dimensional data and identification of rare disease-specific cell subsets for clinical, and research applications.




Marianna Lucio1, Tanja V Maier1, Alesia Walker1, Dirk Haller2,3,  Janet Jansson4,5,6,7, Philippe Schmitt-Kopplin 1,8

1Research Unit Analytical BioGeoChemistry, Helmholtz Zentrum München – Deutsches Forschungszentrum für Gesundheit und Umwelt GmbH, Ingolstädter Landstraße 1, 85764 Neuherberg, Germany
2Chair of Nutrition and Immunology, Technische Universität München, Gregor-Mendel-Straße 2, 85354 Freising, Germany
3ZIEL Institute for Food and Health, Technische Universität München, Gregor-Mendel-Straße 2, 85354 Freising, Germany
4Biological Sciences Division, Pacific Northwest National laboratory 902 Battelle Boulevard P.O. Box 999, MSIN J4-18 Richland, WA 99352, USA
5DOE Joint Bioenergy Institute, Emeryville, CA 94608, USA
6Department of Plant and Microbial Biology, University of California, Berkeley 94720, USA
7Center for Permafrost (CENPERM), University of Copenhagen, Copenhagen 1017, Denmark
8 TUM Technische Universität München, Analytical Food Chemistry, Head of the Comprehensive Foodomics Platform, Alte Akademie 10, 85354 Freising, Germany

Metabolomics can be performed using a multitude of analytical instruments each offering distinct sensitivities and frameworks for metabolite identification. The role of metabolomics is crucial and directly correlated with the information obtained from other OMICS fields. The holistic comprehension of biological systems increasingly demands for the merging of different data sets to gain an understanding of complexity. Particularly the integration of different datasets from various approaches can reveal and explain intrinsic properties of complex systems. Therefore, interdisciplinarity represents the best way to forge new insides and perspectives into biological systems.
Here, we show how this can be achieved via examples from different studies, in which we have generated multi-correlations between metabolite and microbial community data (sequencing OTUs). Examples will be given out of actual studies, involving foods supplements and the effect of diets on human gut microbiome (described with metabolomics and integrated with other Omics). With a omics- combinatory approach we studied the strongest data relations, with the relative consistency and biologically validity, between gut microbiome, metaproteome, and metabolome. Moreover, we identified novel links between specific members of the microbiome, proteins, and metabolites in the gut, determining key processes attributed to the host that was also impacted by the crossover diet.

[1] Maier, T.V. et al. (2017): Impact of dietary resistant starch on the human gut microbiome, metaproteome, and metabolome. mBio 10.1128/mBio.01343-17  
[2] Bazanella M. Et al. (2017): Randomized controlled trial on the impact of early-life intervention with bifidobacteria on the healthy infant fecal microbiota and metabolome. AJCN 10.3945/ajcn.117.157529




Gerjen H. Tinnevelt1,2, Selma van Staveren2,3 ,  Kristiaan Wouters4, Bart Hilvering3, Rita Folcarelli1, Leo Koenderman3,  Lutgarde M.C. Buydens1, Jeroen J. Jansen1

1Radboud University, Institute for Molecules and Materials, (Analytical Chemistry/Chemometrics), P.O. Box 9010, 6500 GL Nijmegen, The Netherlands
2TI-COAST, Science Park 904, 1098 XH Amsterdam, The Netherlands
3Department of Respiratory Medicine and laboratory of translational immunology (LTI), University Medical Center Utrecht, Heidelberglaan 100, 3584CX, Utrecht, The Netherlands
4Dept. of Internal Medicine, Laboratory of Metabolism and Vascular Medicine, P.O. Box 616 (UNS50/14), 6200 MD Maastricht, The Netherlands

A typical multicolour flow cytometry sample may contain a large number of cells (>10,000), of which specific marker expressions are measured at the single-cell level. Individuals that exhibit an immune response, may have both changes in marker expressions on individual cells and changes in ratios of similar cells. Conventional analysis uses sequential binary selection of areas of interests, which is time-consuming and experience-based. Our new data analysis method Discriminant Analysis of Multi-Aspect flow Cytometry (DAMACY)1 uses a combination of Principal Component Analysis and Partial Least Sqaures to merge and quantitatively integrate all the relevant characteristics on marker co-expression, the specific cells on which these are expressed, the distribution of these cells within all samples and the systematic change in this distribution upon changes in homeostasis of the host such as immune responses. The resulting model is statistically validated to optimize the model information content and robustness.

We also show how DAMACY may be used to quantitatively integrate different multicolour flow cytometry tubes. The multiple tubes are needed because the number of markers per measurement is technologically limited. This is unfortunate, as many high-impact studies reveal that interrogation of more markers simultaneously leads to a more comprehensive view on the immune system. We show how data fusion of all tubes may find all the relevant cells in an activated immune states in obese versus lean people. The resulting model may not find the best biomarker, but will show how very different cells may function together in the individual development of diabetes.

[1] Tinnevelt, G. H.; Kokla, M.; Hilvering, B.; Staveren, S.; Folcarelli, R.; Xue, L.; Bloem, A. C.; Koenderman, L.; Buydens, L. M.; Jansen, J. J., Novel data analysis method for multicolour flow cytometry links variability of multiple markers on single cells to a clinical phenotype. Scientific Reports 2017, 7, 5471.

Acknowledgements: This research received funding from the Netherlands Organization for Scientific Research (NWO) in the framework of the Technology Area COAST of the Fund New Chemical Innovations.




Raffaele Vitale1,2, Arno Bouwens2,3, Jochem Deen3, Siewert Hugelier1, Laurens D'Huys2, Kris Janssen2, Johan Hofkens2, Cyril Ruckebusch1

1Laboratoire de Spectrochimie Infrarouge et Raman - LASIR CNRS - UMR 8516, Université de Lille, 59000, Lille, France
2Molecular Imaging and Photonics Unit, Department of Chemistry, Katholieke Universiteit Leuven, Celestijnenlaan 200F, B-3001, Leuven, Belgium
3Laboratoire d'Optique Biomédicale, Institut de Microtechnique, Ecole Polytechnique Fédérale de Lausanne, CH-1015, Lausanne, Switzerland

In the last decade, single molecule fluorescence microscopy has gotten enormous success and diffusion in many different research fields especially in biology and medicine [1]. One of its most recent applications is the characterization of the composition of the gut microbiome for monitoring the development and progression of particular disorders such as Alzheimer’s disease [2, 3]. In this specific case, single DNA fragments isolated from distinct gut microbial species are imaged and characteristic signal patterns – giving an idea of their nitrogenous base sequences – are extracted and compared to a database of reference traces for identification purposes. These traces are just logical sequences of zeros and ones codifying at which sites the full genomes of various target species should ideally produce an active fluorescence signal based on the specific DNA labelling procedure needed during the experimental stage. However, directly relating the aforementioned patterns to such dummy traces is unfeasible. One way of addressing this issue is to generate convoluted versions of the reference sequences and cross-correlate each one of them with the sample series. Alternatively, the precise position of the various fluorescent labels attached to the imaged DNA fragments can be identified by e.g. sparse deconvolution approaches [4,5], and the resulting localization subsequences matched to the reference binary traces. Is one of these two methodologies better than the other? Which are their respective pros and cons? How sensitive are they to noise and experimental conditions? This work will try to answer these questions through a comprehensive comparison study.

[1] W. E. Moerner, D. P. Fromm, SciInstrum., 74, 3597-3619 (2003)
[2] J. C. Clemente, L. K. Ursell, L. W. Parfrey, R. Knight, Cell, 148, 1258-1270 (2012)
[3] N. M. Vogt, R. L. Kerby, K. A. Dill-McFarland, S. J. Harding, A. P. Merluzzi, S. C. Johnson, C. M. Carlsson, S. Asthana, H. Zetterberg, K. Blennow, B. B. Bendlin, F. E. Rey, SciRep., 7, 13537 (2017)
[4] J. de Rooi, P. Eilers, AnalChimActa, 705, 218-226 (2011)
[5] S. Hugelier, J. de Rooi, R. Bernex, S. Duwé, O. Devos, M. Sliwa, P. Dedecker, P. Eilers, C. Ruckebusch, SciRep., 6, 21413 (2016)




Ricard Boqué1, Julieta Cavaglia1, Barbara Giussani2, Olga Busto1, Miquel Puxeu3, Montserrat Mestres1

11Universitat Rovira i Virgili. Dept. of Analytical Chemistry and Organic Chemistry. Campus Sescelades, 43007 Tarragona (Spain)
2Dipartimento di Scienza e Alta Tecnologia. Università degli Studi dell'Insubria. Via Valleggio, 9. 22100 Como (Italy)
3VITEC. The Technology Park for the Wine Industry. Carretera de Porrera, km.1, 43730 Falset (Spain)

A portable FTIR-ATR spectrometer was used to monitor small-scale must fermentations (microvinifications), which were intentionally deviated from normal fermentation conditions (NFC). Three common undesirable problems that may appear during this fermentation process were studied: growth of acetic acid bacteria (AAB), simultaneous alcoholic and malolactic fermentation (MLF) and yeast assimilable nitrogen deficiency (YAN). 20 fermentations operating in NFC were also monitored to compare their evolution to that of the AAB, MLF and YAN ones. In order to early detect problematic fermentations, FTIR-ATR measurements were collected at different time points during the fermentation process. In addition, at these time points, relative density, sugars (glucose and fructose), acetic acid, malic acid and lactic acid contents were analyzed using traditional methods.

Different multivariate analysis strategies were applied to the spectral data to build multivariate control charts in order to early detect fermentation defects. Principal Component Analysis (PCA) was applied to the NFC batches at each time point during the fermentation process and the data from the deviated fermentations were subsequently projected onto the PCA models. Control charts based on Hotelling T2 and Q statistics were built to detect abnormal deviations.

We found that deviations of the YAN batches could be successfully identified 4 days after the beginning of the fermentations, whereas deviations in FML and AAB batches were detected after 7 days. In conclusion, this methodology shows great potential as a fast and simple at-line analysis tool for early detection of fermentation problems.

Acknowledgement: The financial support by the Spanish Ministry of Science and Technology (Project AGL2015-70106-R) is acknowledged.



M. Koeman1, J. Engel1, J. Jansen1 and L. Buydens

1. Radboud University, Institute for Molecules and Materials (IMM)
Heyendaalseweg 135, 6525 AJ Nijmegen, The Netherlands

2. Biometris, Wageningen UR, Droevendaalsesteeg 1, 6708 PB Wageningen, The Netherlands

Identification of abnormal variables in a single sample, i.e. fault diagnosis, is a crucial step in statistical process control (SPC) to inform the researcher about the root cause of a fault [1]. Similarly, in (personalized) health care the aim is to identify abnormal patterns in e.g. metabolomics data of a single patient to diagnose a disease [2].

For fault identification in high-dimensional data some form of feature reduction has to be applied, which typically is a dimension reduction using Principal Component Analysis (PCA). Subsequently, contribution plots based around Hoteling t2 and the Q-statistic are used to diagnose the fault. It is well known, however, that reliable identification of the variables primarily associated to the fault is hampered by the so-called smearing effect, which is a result of the dimension reduction step [3].

Recently, several variable selection-based fault diagnosis approaches have been proposed, where the abnormal variables correspond to the first selected variables [4, 5]. The application of these approaches to high-dimensional data does not require dimension reduction. This way, the smearing effect is circumvented, which should result in more reliable fault diagnosis. However, these approaches have their own limitations when it comes to fault diagnosis. For example, in the case of highly correlated abnormal variables it may not be guaranteed that both are selected. The aim of the present work is to compare methods based on dimension reduction to methods based on variable selection for fault diagnosis in high-dimensional data. Simulated and real (metabolomics) data sets are used to highlight the strengths and weaknesses of both approaches.

[1] MacGregor, John F., and Theodora Kourti. "Statistical process control of multivariate processes." Control Engineering Practice 3.3 (1995): 403-414.
[2] Engel, Jasper, et al. "Towards the disease biomarker in an individual patient using statistical health monitoring." PloS one 9.4 (2014): e92452.
[3] Van den Kerkhof, Pieter, et al. "Analysis of smearing-out in contribution plot based fault isolation for Statistical Process Control." Chemical Engineering Science 104 (2013): 285-293.
[4] Wang, Kaibo, and Wei Jiang. "High-dimensional process monitoring and fault isolation via variable selection." Journal of Quality Technology 41.3 (2009): 247.
[5] Zou, Changliang, and Peihua Qiu. "Multivariate statistical process control using LASSO." Journal of the American Statistical Association 104.488 (2009): 1586-1596.




R.R. de Oliveira1, C. Avila2, F. Mahdi2, F. Muller2, W. Sinclair3, A. de Juan1

1Dept. Chemical Engineering and Analytical Chemistry, Universitat de Barcelona, Diagonal, 645, 08028 Barcelona, Spain.
2School of Chemical and Process Engineering, University of Leeds, Leeds, United Kingdom.

The natural variability present in process evolution makes that batches showing trajectories with uneven length and dynamics be accepted as in normal operating conditions (NOC). However, most Multivariate Statistical Process Control (MSPC) strategies for online batch process monitoring evolution still require batch alignment to work adequately [1-4].

The current work proposes an “alignment-free” MSPC strategy for online monitoring of batch process evolution. To do so, data sets from several NOC batches are arranged in a column-wise augmented data matrix, with a common spectral mode and an extended concentration direction, designed to respect the individual variability of batch evolution. Then, a PCA decomposition is performed and all individual NOC batch trajectories are overlapped on a scatter score plot. From this information, a “super-trajectory” is derived that is representative of the variability of all individual trajectories. The “super-trajectory” is used to distribute all NOC observations into several local MSPC models covering the complete process evolution.

For real-time batch monitoring, every new observation is projected onto all local models and a set of related residual statistics (Qstat.) values are obtained. If all Qstat. values are above MSPC limits, the observation is off-specification and the process does not evolve adequately. For on-spec observations, the minimum Qstat. value indicates the process stage according to the position in the “super-trajectory”, which can be an intermediate stage or the end-point of the process.

The proposed strategy is applied to two NIR monitored lab-scale processes: a fluidized bed drying of pharmaceutical granules and an automated distillation process [2].

[1]       E.N.M. van Sprang, H.-J. Ramaker, J.A. Westerhuis, S.P. Gurden, A.K. Smilde, Chem. Eng. Sci. 57 (2002) 3979–3991.
[2]       R.R. de Oliveira, R.H.P. Pedroza, A.O. Sousa, K.M.G. Lima, A. de Juan, Anal. Chim. Acta 985 (2017) 41–53.
[3]       J.M. González-Martínez, R. Vitale, O.E. de Noord, A. Ferrer, Ind. Eng. Chem. Res. 53 (2014) 4339–4351.
[4]       H.-J. Ramaker, E.N.M. van Sprang, J.A. Westerhuis, A.K. Smilde, Anal. Chim. Acta 498 (2003) 133–153.
Acknowledgement: The research project receives funding from the European Community‘s Framework programme for Research and Innovation Horizon 2020 (2014-2020) under grant agreement number 637232




Geert van Kollenburg1, Heleen Lanters1,2, Jeroen Jansen2

1 Radboud University, Netherlands, Department of Analytical Chemistry
University of Applied Sciences Utrecht, Netherlands

When there are many measured variables, it is not practical to model all the relationships between all the variables. By combining dimension reduction techniques with path modelling (i.e., regression analyses), we are able to incorporate substantive process analytical knowledge about industrial or natural processes into our statistical analyses. The resulting PLS path models allow us to model the relationships between various 'blocks' of a process, where in each block many separate variables may be measured. These models enable us to calculate the effects of each individual variable on the end-product, while retaining an easy-to-interpret process analytical model. Similarly, these models can aid in deciding on certain changes in a process to guarantee a required end-product quality.

The presented application entails a continuous industrial process in which we want to predict the substantive variation in yield per kilogram of a batch of costly catalyst. The raw data consisted of hourly measurements over a period of almost two years on two equivalent parallel ‘streets’, resulting in over 17000 measurements per variable per street. The variables were grouped according to the separate blocks in the process where they were measured. The causal ordering between the blocks was a priori specified, and multiple theoretically plausible models were compared on based on prediction power and model fit. Since the process in each street and per batch of catalyst is considered to be equivalent, we were able to multiply cross-validate each model, and compare regression effects and variable loadings across batches.




Werickson Fortunato de Carvalho Rocha1, Paulo Roque Martins Silva1, Luiz Henrique da Conceição Leal1, Valnei Smarçaro da Cunha1, David A. Sheen2

1National Institute of Metrology, Quality and Technology  (INMETRO), 25250-020 Duque de Caxias, RJ, Brazil

2Chemical Sciences Division, National Institute of Standards and Technology (NIST), Gaithersburg, MD 20899, USA

The proficiency testing schemes (PT Scheme) of automotive emissions evaluate laboratories by the measuring the concentrations of compounds in vehicle emissions and then comparing the relative performance of the laboratories,, contributing to the harmonization of emission measurements in the community. According to ISO 13528: 2015 [1] and ISO/IEC 17043 [2], there are several statistical toolsto assess the results of participating analytical laboratories in proficiency testing. These tools can only be used for cases of univariate measurement results and have not been systematically extended to multivariate analyses.

In this study, we have investigated the use of multivariate and univariate quality control metrics in the context of this PT Scheme. For this investigation, several vehicle emission parameters were measured, including CO, CO2, THC, NOx,NMHC, ETOH, NMHC-ETOH and total aldehydes (formaldehyde + acetaldehyde) and city and highway fuel economy under various performance cycles. 19 laboratories have participated in this PT Scheme round

The univariate quality control metric is used to evaluate the performance in each laboratory in a per variable fashion (i.e. one score per variable for each laboratory). The multivariate quality control metric is useful to easily evaluate the performance in each laboratory when there are  many variables. In this case there is  only one result per laboratory, and it doesn’t matter the variable numbers (laboratory-level scoring metric).

The multivariate metric showed that all laboratories have overall satisfactory performance relative to each other, even though the univariate metrics showed that around 6% of laboratories have unsatisfactory results for certain measurements.


[1] ISO 13528: 2015 Statistical methods for use in proficiency testing by interlaboratory

[2] ISO/IEC 17043: 2010 Conformity assessment – General requirements for proficiency testing. 2010

[3] D. A. Sheen, W.F.C. Rocha, K. A. Lippa, D. W. Bearden, Chemometr. Intell. Lab. Syst.,162, 10-20 (2017)


Technical commission of emission laboratories accreditation from Automotive Engineering Association (AEA Brazil) for the participation in the PT Scheme and General Motors Brazil that provided the test item (vehicle)


Poster Presentations:



Anselmo E. de Oliveira1, Pedro A. de O. Morais1, Diego M. de Souza2, Beata E. Madari2

1Laboratory of Theoretical and Computational Chemistry, Instituto de Química, UFG, Goiânia, GO, Brazil 
2Embrapa Arroz e Feijão, Santo Antônio de Goiás, GO, Brazil

A new green analytical method to predict and classify soil texture is proposed using digital image processing of soil samples (image segmentation) and multivariate image analysis (MIA). In order to evaluate this methodology, digital images of 63 soil samples, sieved to <2 mm, were acquired. Clay and sand contents determined by the standard pipette method were used as reference values and, after image processing, particle contents in the measured size fractions were correlated to image data using PLS2 multivariate regression. 
The computer vision approach adopted for the recognition of soil textures based on soil images matched 100% of the classification predicted according to the standard method.
The new method is low-cost, environment-friendly, nondestructive, and faster than the standard method.

Acknowledgement: The finantial support by the Brazil's Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Embrapa, and Fundação de Amparo à Pesquisa do Estado de Goiás (FAPEG) is gratefully acknowledged.




Anselmo E. de Oliveira1, Pedro A. de O. Morais1, Diego M. de Souza2, Beata E. Madari2, Anderson da S. Soares3

1Laboratory of Theoretical and Computational Chemistry, Instituto de Química, UFG, Goiânia, GO, Brazil 
2Embrapa Arroz e Feijão, Santo Antônio de Goiás, GO, Brazil
3Institute of Informatics, UFG, Goiânia, GO, Brazil

The soil organic carbon (SOC) influences both soil physical and chemical properties. SOC contents are usually measured in two ways: elemental analysis or wet methods. The present work proposes a new SOC quantitation method using digital images of 177 soil samples acquired from a table scanner. Soil samples were collected from 3 regions of Brazil (north, west central, and northeast).
RGB, HSI, and grayscale sample image data were correlated to SOC contents from reference Walkley-Black method using least-squares support vector machine (LS-SVM), partial least-squares (PLS), and successive projections algorithm combined with multivariate lineas regression (SPA-MLR).
LS-SVM showd better performance than both PLS and SPA-MLR methods for SOC quantitation, presenting r2 (calibration and validation sets) higher than 0.9 and residual prediction deviation (RPD) > 3.5 g/kg.
The proposed new method results in faster, simpler, and cleaner alternative for quantitation of SOC in soil samples than the reference method.

Acknowledgement: The finantial support by the Brazil's Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Embrapa, and Fundação de Amparo à Pesquisa do Estado de Goiás (FAPEG) is gratefully acknowledged.




Rocío Ríos-Reina1, Raquel M Callejón1, Francesco Savorani2,3, Jose M Amigo3, Marina Cocchi4 
1Dpto. de Nutrición y Bromatología, Toxicología y Medicina Legal, Facultad de Farmacia, Universidad de Sevilla, C/P. García González n°2, E-41012 Sevilla, Spain
2Department of Applied Science and Technology (DISAT), Polytechnic University of Turin – Corso Duca degli Abruzzi 24, 10129 Torino (TO), Italy
3Quality & Technology, Department of Food Science, University of Copenhagen, Rolighedsvej 26, 1958 Frederiksberg C, Denmark
4Dipartimento di Scienze Chimiche e Geologiche, Università di Modena e Reggio Emilia, Via Campi 103, 41125 Modena, Italy

Spain is one of the major producers of high-quality wine vinegars, protected by three protected designation of origins (PDOs): “Vinagre de Jerez”, “Vinagre Condado de Huelva” and “Vinagre Montilla-Moriles”. Their high prices due to their high quality and their high production costs explain the need for developing an adequate quality control technique. Rapid, inexpensive and non-destructive methodologies based on non-targeted techniques, such as spectroscopies, are becoming popular in food authentication. Thus, for improving vinegar quality assessment, fusion of data blocks obtained from the same samples by different analytical techniques could be a good strategy, since the quantity and quality of sample knowledge could be enhanced, allowing a better characterization and classification than a single technique [1]–[3].

In this study, a muti-platform characterization and a model able to classify the Spanish vinegar PDOs were developed. Sixty-six PDO wine vinegars were analyzed by four spectroscopic techniques: Fourier-transform infrared spectroscopy (MIR), near infrared spectroscopy (NIR), multidimensional fluorescence spectroscopy (EEM) and proton nuclear magnetic resonance (1H-NMR). First, separated principal component analysis and partial-least square models on the data from individual techniques were performed and then, all the instrumental signals were processed together by using data fusion strategies, trying to improve the classification results. The data fusion improved wine vinegar classification, providing a more efficient differentiation than the models based on single methods, and supporting the approach to combine these methods to achieve synergies for an optimized PDO differentiation. With regard to single analytical methods, especially the classification results of 1H-NMR models were promising.


[1]  M. Silvestri, L. Bertacchini, C. Durante, A. Marchetti, E. Salvatore, and M. Cocchi, “Application of data fusion techniques to direct geographical traceability indicators,” Anal. Chim. Acta, vol. 769, no. June 2012, pp. 1–9, 2013.
[2]  D. Lahat, T. Adali, and C. Jutten, “Multimodal Data Fusion: An Overview of Methods , Challenges , and Prospects,” Proc. IEEE, vol. 103, no. 9, pp. 1449–1477, 2015.
[3]  E. Borràs, J. Ferré, R. Boqué, M. Mestres, L. Aceña, and O. Busto, “Data fusion methodologies for food and beverage authentication and quality assessment - A review,” Anal. Chim. Acta, vol. 891, pp. 1–14, 2015.




F.A. Chiappini, M.R. Alcaráz, H.C. Goicoechea 

Laboratorio de Desarrollo Analítico y Quimiometría, Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral-CONICET, Ciudad Universitaria, 3000 Santa Fe, Argentina

Fluorescence excitation-emission matrices (EEM) coupled to multi-way analysis have proved to be a powerful tool for the analysis of fluorophoric mixtures or complex systems with analytical purposes [1]. However, scattering phenomena, which are usually present in fluorescence landscapes, can significantly affect the performance of chemometric models. Thus, they must be removed or corrected. In this regard, several methodologies have been proposed. In this work a comparative analysis of different approaches for scattering removal is presented. The particular aspects and fundamentals for each method were analysed. As a result of this study, an alternative methodology is proposed.
For the analysis, the effect of the width and height of the scatter signals was also studied. Two experimental systems were chosen considering the stoke shifts between spectra, i.e., a system of fluoroquinolones mixture with large stoke shift in the UV spectral range and system of dyes mixture with very short stoke shift in the Vis spectral range.

EEM were processed using the different methodologies and algorithms, e.g. cleanscan [2], eemscat [3] and gauss-fit-based [4], which demonstrated to be efficient for scattering correction. However, interesting aspects were identified, allowing a characterisation regarding robustness, versatility and complexity. In addition, a spectral analysis revealed that interpolations could considerably modify the attributes of the target signals, especially when noise or highly overlapping between scattering and signal are present. Thus, the new methodology presented, lying on the philosophy of the Gaussian fit-based correction, showed to be able to tackle the drawbacks that cannot be overcome for interpolation-based approaches.

[1] K. Kumar, M. Tarai, A.K. Mishra, Trends in Analytical Chemistry 97, 216-243 (2017).
[2] R.G. Zepp, W.M. Sheldon, Mary Ann Moran, Marine Chemistry 89, 15– 36 (2004).
[3] M. Bahram, R. Bro, C. Stedmon, A. Afkhami, J. of Chemometrics 20, 99-105 (2006).
[4] P.H.C. Eilers and P.M. Kroonenberg, Chemometrics and Intelligent Laboratory Systems 130, 1–5 (2014).

Acknowledgement: The authors gratefully acknowledge UNL, CONICET and ANPCyT..




Silvana M. Azcarate1,4, Adriano de Araújo Gomes2, José M. Camiña1,4 and Héctor C. Goicoechea3,4

1 Facultad de Ciencias Exactas y Naturales, Universidad Nacional de La Pampa, and Instituto de Ciencias de la Tierra y Ambientales de La Pampa (INCITAP), Av. Uruguay 151 (6300) Santa Rosa, La Pampa, Argentina
2 Laboratório Paraense de Desenvolvimento Analitico e Quimiometria – LPDAQ. Faculdade de Química, Instituto de Ciências Exatas da Universidade Federal do Sul e Sudoeste do Pará. Folha 17, Quadra 04, Lote Especial, Nova Marabá, CEP: 68505080 Marabá, Pará, Brazil.
3Laboratorio de Desarrollo Analítico y Quimiometría, Facultad de Bioquímica y Ciencias Biológicas, Universidad Nacional del Litoral, Ciudad Universitaria, 3000 Santa Fe, Argentina
4 Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET), Godoy Cruz 2290 CABA (C1425FQB), Argentina.

Over the last years, in multivariate classification setting, second-order classification methods have been applied as a need to solve a high complexity data structure [1]. Notwithstanding, their applications has not been motivated on the need to increase the data order as a strategy to solve a classification problem [2].

In this work, the possibility to increase order data from two to three ways to achieve a potential classification when a first-order classification is not possible is demonstrated. For that, first-and second-order fluorescence spectroscopy data sets -simulated and experimental- were acquired. Simulated two- and three-way data involving three target classes were generated. Later, experimental data was acquired from 80 mayonnaise samples to evaluate spoiled mayonnaise according to different five-storage time. On the one hand, samples were excited to 320 nm and the spectra were recorded from 300 to 500 nm. After, fluorescence excitation spectra were recorded between 230 and 400 nm, and emission wavelengths from 300 to 500 nm. PLS-DA and N-PLS-DA were employed for first and second-order data, respectively. Later, model predictive ability were evaluated through some figures of merit, e.g. sensitivity, specificity, precision and accuracy. The results showed a poor classification for first- order data with a high error rate of 78 %, which was awesomely improved when second-order data were applied.

Certainly, it was possible to attain a significant improvement on results and a positive impact on analytical figures of merit when second-order data were analyzed. Thus, it can be demonstrated that the way in which the data are generated has a significant effect on the classification.

[1] J.M. Amigo, F. Marini, Multiway Methods, in: F. Marini (Ed.), Data Handling in Science and Technology, E-Publishing Inc., Amsterdam, 2013, pp. 283–309.
[2] E. Salvatore, M. Bevilacqua, R. Bro, F. Marini, M. Cocchi, Compr. Anal. Chem. 60 (2013) 339-379.

Acknowledgement: Authors are grateful to UNL, UNLPam, CONICET and ANPCyT.





S.E. Cunliffe1,2, M.R. Baker2, O. Mihailova2, P.A. Martin1, P.J. Martin1

1The University of Manchester, School of Chemical Engineering and Analytical Science, UK, CH63 3JW
2Unilever Research and Development, Port Sunlight, UK, M13 9PL

Deviations from standard operating conditions during the continuous sulfation of ethoxylated alcohols results in compositional changes in the product due to varying conversion rates of the feedstock alcohol into anionic surfactant. Gas chromatography is currently used to quantify the levels of unreacted alcohols, however the time consuming and off-line nature of this measurement often means the results are not representative of the product over the sampling interval, eliminating the possibility for the technique to be used as a quality control measure.

NIR spectroscopy has been used as an alternative method to quantify the levels of unreacted feedstock in the product. Rapid in-line measurements and results suggest this to be a more suitable analytical method than gas chromatography in the industrial environment.

An offline calibration for the unreacted feedstock level incorporating a range of processing temperature for robustness was generated. Mass spectrometry gas chromatography were used to identify the impurities and generate accurate compositional data for surfactant samples. NIR spectra for all samples were obtained using a transmission probe with a 2mm path length, 32 scans and a resolution of 8cm-1. The GC and NIR data was combined using the PLS toolbox in MatLab to generate a prediction model for the unreacted feedstock. Validation of the model was carried out using the Venetian blinds method and tested on some unseen samples. Real time in-line measurements of unreacted feedstock were obtained, and the results compared to samples collected and measured using the traditional GC methods.




Felipe Bachion de Santana1, André Marcelo de Souza2, Ronei J. Poppi1

1 Laboratory of Chemometrics in Analytical Chemistry, Institute of Chemistry, University of Campinas, 13084-971 Campinas, SP, Brazil
2 Brazilian Agricultural Research Corporation (Embrapa Soils), 22460-000 Rio de Janeiro, RJ, Brazil

Determination of soil organic matter (SOM) are performed by hundreds of laboratories in Brazil. The quality of their determinations is attested by proficiency assay programs for fertility laboratories. Most of these laboratories use a wet based method on the oxidation of the SOM by potassium dichromate. However, the standard methodology is time consuming, demands skill labor and generates residues that are environmental polluting.

In this sense, this study aim to investigate three major issues: (1) the performance of a new visible-near infrared spectrofotometer customised to perform soil fertility analysis named SpecSoil-Scan®; (2) the performance of random forest regression method [1] to predict the SOM using a model with more than 42,000 samples of several regions of Brazil; and (3) the performance in a proficiency assay.The dataset range is from 0 to 50 g/dm3. The metodology was evaluated through the analisys of 12 samples annualy distributted by the proficiency essay coordenated by Embrapa Soils. The figures of merit values of the model were 0.73, 0.74, 4.89 and 4.81 for R2cal, R2val, RMSEC and RMSEP respectivaly, without exclusion of any sample in the validation set.

The random forest regression model predicted correctly 10 samples from 12 with minimum penalties. The high capability of random forest in deal with complex and high variability datasets, besides attenuated the influence of noise and outliers. These excellent results have validated the SpecSoil-Scan® instrument for SOM analysis. Therefore, this methodology is ready to be implemented for the analysis of SOM by visible-near infrared spectroscopy from Brazilian soils.

[1] L. Breiman, Random Forests, Mach. Learn., 45. 1, 5-32 (2001)

Acknowledgement: The authors thank CAPES for financial support. Speclab - Brazil for providing the samples and Embrapa project MP5 00.




Puneet Mishra1, Alison Nordon2 and Mohd Shahrimie Mohd Asaari2

1WestCHEM, Department of Pure and Applied Chemistry and Centre for Process Analytics and Control Technology, University of Strathclyde, Glasgow, G1 1XL, United Kingdom
2Vision Lab, Department of Physics, Campus Drie Eiken, University of Antwerp, Edegemsesteenweg 200-240, 2610, Antwerp, Belgium

The present work provides a scheme for fusing the spectral and spatial information present in the hyperspectral images to improve classification modelling. The methodology has been validated using a case study for classifying the geographical origin of green tea products originating from different parts of world. The methodology involves selection of the best spectral plane based on peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) from the hypercube to perform the analysis of spatial information. The methodology utilises the unsharp masking to amplify the high-frequency components of the image, thus, sharpening the image and enhancing texture details as a result. Textural information was extracted from the sharpened spectral plane on the basis of the statistical properties of grey level co-occurrence matrices (GLCM) using a moving window operation. The textural properties were combined with the spectral information at three different levels of data fusion, i.e. raw data level, feature level and decision level. Raw data level involved concatenating the spectral and textural data before the classification. Feature level involved feature compression with principal component analysis (PCA) on spectral and textural domain before the classification. Decision level involved a majority voting scheme to enhance the final classification maps. All the classification tasks were performed using support vector machine classifiers (SVM). Use of the data fusion scheme for incorporating the textural and spectral information during classification modelling gave improved classification accuracies leading to enhanced classification maps. This illustrates the importance of fusing the spectral and textural information during HSI data processing.

Acknowledgement: This work has received funding from the European Union’s Horizon 2020 research and innovation programme named ’MODLIFE’ (Advancing Modelling for Process-Product Innovation, Optimization, Monitoring and Control in Life Science Industries) under the Marie Sklodowska-Curie grant agreement number 675251.




Jan Stiedl1,2,3, Simon Green3, Thomas Chassé2, Karsten Rebner1

1Reutlingen University, Process Analysis & Technology, Alteburgstrasse 150, 72762 Reutlingen, Germany
2University of Tuebingen, Institute of Physical and Theoretical Chemistry, Auf der Morgenstelle 18, 72076 Tuebingen, Germany
3Robert Bosch GmbH, Automotive Electronics, Postfach 1342, 72703 Reutlingen, Germany

In recent years, packaging of electronic control units for automotive applications has undergone significant change. Formerly used conventional packaging methods like aluminum boxes are replaced by direct packaging with epoxy mold compound. To ensure the quality over lifetime, the requirements for packaging robustness are increasing. One of the highest requirements is the adhesion between the copper lead frame and the epoxy mold compound as most failures occur due to delamination in at this interface.                                                                                    The research on the methodology to avoid such failures has two major points/areas of interest. The first is the thickness of the oxide layer on the leadframe copper and the second is the ratio of copper(I) oxide to copper(II). The state of the art measurement techniques, like depth profiles with Auger Electron Spectroscopy (AES) or X-ray Photoemission Spectroscopy are able to detect them. However due to the high requirements such as ultra-high vacuum required for such measurement techniques, they are not suitable for use in the manufacturing area. For this reason, UV-vis spectroscopy measurement system was developed. By using multivariate curve resolution with pure component spectra, the proportion of copper(I)- and copper(II)oxide in the measured spectra at different oxide layer thicknesses was identified.                                                        The determined oxidation of copper at 175 °C can be divided into two regimes. In the beginning of the oxidation process, copper(I)oxide grows to a thickness of 30 nm. In the subsequent oxidation process, copper(II)oxide grows on the surface. These results were also validated by AES depth profile measurements.




Yingyi Hao1, Yi Ran2, Yu Liang1, Yuan Liu1, Fanfan Xie1, Menglong Li1, Zhining Wen1*, Li He2*

1College of Chemistry, Sichuan University, Chengdu, China
2Biogas Appliance Quality Supervision and Inspection Center, Biogas Institute of Ministry of Agriculture, Chendu, China
*Zhining Wen,
*Li He,

In modern medical research, the development of precision medicine has been hampered sometimes due to the complexity of clinical samples and the limited number of samples. Chemometrics has achieved great success in quantitative and qualitative analysis in chemistry and other related areas, such as environmental science, food science, and pharmaceutical analysis. Therefore, for the sake of assisting clinical diagnosis, classifying diseases sub-type and predicting disease prognosis, we have proposed a series of algorithms for the quantitative and qualitative analysis based on the spectral data and genomic data of the clinical samples.

In quantitative analysis, we proposed a post-modified non-negative matrix factorization (NMF) algorithm for deconvoluting heterogeneous tissue samples. It can estimate the gene expression profiles and the proportion of each cell type without any prior knowledge. The method is more suitable for clinical quantitative analysis because it can estimate cell content in the case of no reference samples. In qualitative analysis, we grafted a variety of feature selection methods on diverse data sets. The variable combination population analysis (VCPA) was applied to selecting the feature points for identifying the Raman spectra of three subtypes of parotid neoplasms. We also proposed a functional modules partition strategy for predicting the cancer prognosis by constructing DEGs networks from cancer data sets. In addition, we successfully predicted the prognosis of AML patients based on the expression profiles of the metabolic genes. All the results illustrated the chemometrics methods can effectively assist clinical rapid diagnosis and predict the cancer prognosis.

[1] Junmei Xu, et al., Scientific Reports (2016), 6:28720.
[2] Yuan Liu, et al., Journal of Chemometrics (2017), e2929.
[3] Yongning Yang, et al., Chemometrics and Intelligent Laboratory Systems (2017), 170:102-108.
[4] Fanfan Xie, et al., Computational Biology and Chemistry (2017), 67:150-157.

Acknowledgement: This project was supported by grant from the National Natural Science Foundation of China (No. 21575094).




Astrid Maléchaux, Yveline Le Dréau, Pierre Vanloot, Nathalie Dupuy

Aix Marseille Univ, Univ Avignon, CNRS, IRD, IMBE, Marseille, France

Vibrational spectroscopy techniques such as Mid Infrared (MIR), Near Infrared (NIR) and Raman, produce large quantities of data. However, when aiming for a classification purpose a lot of this data may contain irrelevant information or be too noisy. Thus, variable selection or dimension reduction can be useful to improve the prediction performance [1].

In this study Partial Least Squares – Discriminant Analysis (PLS-DA) is applied to MIR, NIR and Raman spectral data of 132 French and Portuguese Extra Virgin Olive Oils samples from the 2016-2017 harvest, to discriminate between 12 varietal origins. The effect of three variable selection methods on the prediction performance are compared. The first one uses the study of spectral variance to select the most influential variables prior to PLS-DA modelling [2]. The spectral variance of all data (InterSP) and that of each class (IntraSP) are calculated, then the wavelengths giving a ratio of InterSP over IntraSP higher than the critical value of the Fisher-Snedecor test are selected. In the second approach, PLS-DA calibration is conducted for each class on a randomly selected subset of samples and only the variables with high beta coefficient values are included in further modelling [3]. The third method uses a different strategy, since dimension reduction is obtained by applying a Principal Components Analysis (PCA) to the spectral data and then developing PLS-DA models on the PCA scores [4]. Preliminary results indicate that these different variable selection methods can improve the prediction performances for several of the olive oil varieties.

[1] T. Mehmood, K. H. Liland, L. Snipen, S. Sæbø, ‘A review of variable selection methods in Partial Least Squares Regression’, Chemom. Intell. Lab. Syst., 118, 62–69 (2012)
[2] O. Galtier, O. Abbas, Y. Le Dréau, C. Rebufa, J. Kister, J. Artaud, N. Dupuy, ‘Comparison of PLS1-DA, PLS2-DA and SIMCA for classification by origin of crude petroleum oils by MIR and virgin olive oils by NIR for different spectral regions’, Vib. Spectrosc., 55, 1, 132–140 (2011)
[3] A. Garrido Frenich, D. Jouan-Rimbaud, D. L. Massart, S. Kuttatharmmakul, M. Martinez Galera, J. L. Martinez Vidal, ‘Wavelnegth selection method for multicomponent spectrophotometric determinations using partial least squares’, Analyst, 120, 2787-2792 (1995)
[4] K. Janné, J. Pettersen, N.-O. Lindberg, T. Lundstedt, ‘Hierarchical principal component analysis (PCA) and projection to latent structure (PLS) technique on spectroscopic data as a data pretreatment for calibration’, J. Chemometrics, 15, 203-2013 (2001)

Acknowledgement: This work received funding from the French National Agency for Research (ANR) as part of the MedOOmics project, supported by the European Union’s Seventh Framework Programme for research, technological development and demonstration under grant agreement number 618127 (ARIMNet2).



Zhuoyong Zhang, Xin Zhang, Lijuan Huang

Department of Chemistry, Capital Normal University, Beijing 100048, China
Colorimetry of the AuNPs is a sensitive method for the detection of metal ions and metal oxyanions in aqueous solution. However, suffering from the interference of co-existents this method is usually not suitable for multi-objective analysis in complex mixture systems. In the present paper, we take the multivariate advantage of chemometric method and proposed a sensitive, flexible, and low cost method for the qualitative and quantitative analysis of metal ions and metal oxyanion based on the global ultraviolet and visible spectroscopy (UV-Vis) spectra of amino acid-gold nanoparticles (amino acid-AuNPs) sensors combining with chemomettics. Different amino acids (L-Histidine, L-Lysine, L-Methionine, D-Penicillium) which can prevent the aggregation of the AuNPs in NaCl solution, were investigated to build multi-channel sensors responding to different ions induced AuNPs aggregation. The UV-Vis spectra of Cd2+, Ba2+, Mn2+, Ni2+, Cu2+, Fe3+, Cr3+, Cr2O72-, Sn4+, Pb2+ induced amino acid-AuNPs displayed different characteristics and they were classified correctly by using partial least squares-discriminant analysis (PLS-DA). Quantification of binary and ternary mixtures was also implemented. We simultaneously quantified the ions in binary and ternary mixture systems (Cr3+/Cr2O72-, Fe3+/Cd2+, Fe3+/Cr3+/Cr2O72-) by using partial least squares (PLS). Data fusion methods can further improve the prediction accuracy of the PLS models built on multi-amino acids-AuNPs sensors and can provide the possibility to analysis the metal ions or and metal oxyanion in much more complex mixture systems.

Acknowledgment: This work was supported by the National Science Foundation of China (21705112), Scientific Research Project of Beijing Educational Committee (KM201710028009) and Youth Innovative Research Team of Capital Normal University (009175301300).




Xin Zhang1, Zhuoyong Zhang1, Roma Tauler2

1Department of Chemistry, Capital Normal University, Beijing, 100048, China
2Institute of Environmental Assessment and Water Diagnostic (IDEA-CSIC), Barcelona, 08043, Spain

Solutions obtained by the Multivariate Curve Resolution Alternating Least Squares, MCR-ALS [1], and by the MCR-BANDS [2] methods solutions are projected on the area of feasible solutions (AFS) calculated by the FACPACK [3] method. In the presence of rotation ambiguities, MCR-ALS solutions (with optimal ALS fit) are not unique and depend on initials and applied constraints. In these cases, MCR-BANDS method, can be used to measure semi-quantitatively the extent of rotation ambiguities associated to a particular solution. Chemical systems of different complexity are examined in detail and different scenarios are investigated. The relative positions of MCR-ALS solutions and MCR-BANDS maximum and minimum solutions on AFS are shown and discussed in detail.

[1] R.Tauler, Multivariate curve resolution applied to second order data, Chemom. Intell. Lab. Syst. 30 (1995)133-146.
[2] J. Jaumot, R. Tauler, MCR-BANDS: A user friendly MATLAB program for the evaluation of rotation ambiguities in Multivariate Curve Resolution, Chemometrics and Intelligent Laboratory Systems, 103 (2010) 96-107.
[3] M. Sawall, K. Neymeyr, A fast polygon inflation algorithm to compute the area of feasible solutions for threecomponent systems. II: Theoretical foundation, inverse polygon inflation, and FACPACK implementation, J. Chemom., 28 (2014) 633-644.

Acknowledgement: This work was supported by Natural Science Foundation of China (21705112), Scientific Research Project of Beijing Educational Committee (KM201710028009) and Youth Innovative Research Team of Capital Normal University (009175301300). Roma Tauler acknowledges the Ministerio de Economía y Competividad, Spain for the grant CTQ2015-66254-C2-1-P.




L. Rubio1, S. Sanllorente1, L.A. Sarabia2, M.C. Ortiz1

1Department of Chemistry, 2Department of Mathematics and Computation

Faculty of Sciences, University of Burgos, Plaza Misael Bañuelos s/n, 09001 Burgos (Spain)

Color is one of the most important organoleptic properties of foodstuffs that directly influences food selection and commercial success. Food industry adds natural colors or synthetic dyes to compensate for color loss due to processing, storage or to make foodstuff with no inherent color more appealing to consumers. European Union has established maximum levels [1] for some of these food additives depending on the food category .

The simultaneous determination of two food colorants (cochineal (E120) and erythrosine (E127)) was achieved by means of excitation-emission fluorescence matrices and PARAFAC decomposition. In the measured conditions, the amount of cochineal present in the sample affected the fluorescence signal of erythrosine since cochineal caused a quenching effect in the fluorescence of the other food additive. However, the signal of cochineal was not affected by the presence of erythrosine. The quantification of erythrosine was possible using the regression “amount of cochineal” versus “the slope of the calibration line for erythrosine”. These slopes were different depending on the amount of quencher. Using this procedure, the mean of the absolute values of the relative errors in prediction were 5.86% (n = 10) for cochineal and 4.17% (n = 10) for erythrosine. In addition, both analytes were unequivocally identified.  

Pitted cherries in syrup were analyzed, being the average recovery percentages 122.45% for cochineal and 48.79% for erythrosine. Cochineal and erythrosine were found in those cherries at a concentration of 185.05 mg kg-1 and 10.76 mg kg-1, respectively, which were below the maximum levels [1] set for both compounds.


[1] Commission Regulation (EU) No 1129/2011 of 11 November 2011 amending Annex II to Regulation (EC) No 1333/2008 of the European Parliament and of the Council by establishing a Union list of food additives, Off. J. Eur. Union L295, 2011, pp. 1-177.

Acknowledgements: The authors wish to thank the financial support from Spanish MINECO and Junta de Castilla y León through projects (CTQ2014-53157-R) and BU012P17, respectively. Both co-financed with European FEDER funds.




S. Sanllorente1, L. Rubio1, M.C. Ortiz1, L.A. Sarabia2

1Department of Chemistry, 2Department of Mathematics and Computation

Faculty of Sciences, University of Burgos, Plaza Misael Bañuelos s/n, 09001 Burgos (Spain)

Light-emitting diodes (LEDs) have interest in chemical analysis because they are robust, small-size and low-cost. There are many articles on analytical devices based on LEDs, but none used an array of LEDs to obtain excitation-emission fluorescence matrices (EEM) for the identification and quantification of analytes.

Recently, the availability of LEDs emitting below 300 nm has increased its application range allowing the determination of fluoroquinolones whose presence in food is regulated.

This paper shows the results obtained in the determination of enrofloxacin by means of a portable low-cost fluorimeter (PTF) with four LEDs (265, 275, 280 and 295 nm) and a compact spectrometer BLACK-Comet-SR (StellarNet). At the same time, all the samples were measured on a PerkinElmer LS50B equipped with a xenon discharge lamp (MAF).

Three types of samples were considered: calibration, test and transference. With the EEM of the calibration samples, PARAFAC models were made (for both PTF and MAF). Then, the test samples were quantified. Using the transference samples and the procedure of ref. [1], the signal transfer from the PTF to MAF instrument was made. Next, the EEM of the test samples recorded in the PTF were transferred and the determination with the calibration of the MAF instrument was performed.  

Three cases have been considered: enrofloxacin alone and binary mixtures with ciprofloxacin or with flumequine. The mean of the absolute value of the relative errors ranged from 1.1 to 8.8%, which are of the same order that those obtained with MAF and PTF calibrations.


[1] J. Thygesen, F. van den Berg, Calibration transfer for excitation–emission fluorescence measurements. Analytica Chimica Acta 705 (2011) 81–87.

Acknowledgements: The authors wish to thank the financial support from Spanish MINECO and Junta de Castilla y León through projects (CTQ2014-53157-R) and BU012P17, respectively. Both co-financed with European FEDER funds.




Robson B. Godinho1,2, Mauricio C. Santos2, Carlos Alberto Teixeira1, Ronei J. Poppi1

1Institute of Chemistry, State University of Campinas, P.O.B 6154, Campinas, SP, 13083-970, Brazil
2Givaudan do Brasil Ltda, Av. Eng. Billings, 1729 Ed. 31, São Paulo, SP, 05321-010, Brazil

The increasing demand on the regulatory and safety issues for perfumery and the tendency of a manufacturing process increasingly efficient and streamlined, makes critical the control of raw materials faster and accurately1. For this purpose methodologies using Raman spectroscopy associated with chemometrics procedures have been used to discriminate essential oils for the cosmetics industry2. The aim of current work was the investigation of the sensitivity of a portable instrument for classification of fragrances and discrimination of samples spiked with different levels of an allergenic compound using the supervised classification method soft independent modeling of class analogies (SIMCA). For this study, it was produced 40 lots of six different classes of fragrances and used as a training set. The validation set consisted of a fragrance contaminated with concentrations of 0.3% to 5.0% (m/m) of Cinnamyl Alcohol. After appropriate spectral pre-processing and selection of variables it was possible to correctly classify all 40 lots used in the training set and discriminate from the initial one the samples contaminated with the allergenic substance. The results are encouraging for the application of this new analytical approach in the quality control of inadvertently contaminated fragrances, particularly with allergenic substances within the sensitivity level of the investigated method.

[1] Klaschka, U. Risk management by labelling 26 fragrances?, Int. J. Hyg. Environ. Health. 2010, 213, 308–320.
[2] Jentzsch, P. V.; Ramos, L. A.; Ciobot, V. Handheld Raman Spectroscopy for the Distinction of Essential Oils Used in the Cosmetics Industry, Cosmetics 2015, 2, 162–176.




Mohsen Kompany-Zareh1,2, Wrya Farrokhi-Kurd2, Saeed Bagheri2, Peter Wentzell1

1Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, PO Box 15000, Halifax, NS B3H 4R2, Canada.  
2Department of Chemistry, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan,45137-66731, Iran.;

Successive selection of least dependent variables (SSLDV) is presented as a novel algorithm for forward selection of variables with minimum mutual information, based on higher order statistics (HOS) rules [1]. The inputs of successive selection program are the first selected variable by user and the number of desired variables.

SSLDV was utilized in spectroscopic unsupervised classification problems. It resulted in better clustering of different classes of ink samples when applying PCA, independent components analysis (ICA) [2] and projection pursuit (PP) [3] on NIR data. In case of NIR corn data [4], this approach caused better separation of classes. The kurtosis limitation was applied before selection.

SSLDV was applied in UV-Vis spectroscopic calibration of a number of transition metal ions [5]. RMSEP from the set of selected variables by SSLDV was about 0.14 mg.l-1, that was approximately comparable to successive projections analysis (SPA). The technique was used on NIR corn data set for quantification of moisture, oil, protein and starch. RMSEP for SPA and SSLDV were similar and acceptable. The result of applying SSLDV on calibration data sets depends on nature of data.

[1]  Blaschke, T. and L. Wiskott, CuBICA IEEE Transactions on Signal Processing, 2004. 52(5):  1250-6.
[2]  Rutledge, D.N. and D.J.-R. Bouveresse, TrAC Trends in Anal Chem, 2013. 50:  22-32.
[3]  Hou, S. and P. Wentzell, Analytica chimica acta, 2011. 704(1-2):  1-15.
[4]  NIR of Corn Samples Data Set. 2005 Wed, June 1, 2005; Available from:
[5]  Araújo, M.C.U., et al., Chemom Intell Lab Syst, 2001. 57(2):  65-73.




Mohsen Kompany-Zareh1,2, Peter D. Wentzell2, Saeed Bagheri1, Chelsi Wicks2

1Chemistry Department, Institute for Advanced Studies in Basic Sciences (IASBS), Zanjan, Iran 
2Department of Chemistry, Dalhousie University, Halifax, NS, B3H 4J3 Canada

Independent components analysis (ICA) is a method for transforming data into components that are statistically as independent from each other as possible [1]. In ICA-JADE the measure of choice for statistical independence is the fourth order cross-cumulants that are among higher order statistical (HOS) parameters and representative of mutual information [2]. Practical implementations of ICA includes rotation of whitened data to minimize a measure of dependence [3].

Here, we minimized statistical dependence between selected PCA vectors through using particle swarm optimization (PSO) algorithm, which is among powerful global optimization techniques. In each step, a pair of PCA scores was rotated by an angle defined in an orthogonal rotation matrix. The orthogonal rotation retains the whitened shape of data. The statistical independence in every step is calculated from fourth-order cross-cumulants.

By optimization of orthogonal rotation angles using PSO, the statistical dependence between rotated PCA vectors comes close to zero, which indicates the rotation of PCA vectors into ICs. As it was expected the magnitude of MI diminishes by minimizing of statistical dependence between ICs. This procedure was successfully applied for some data with clustering purpose. The scatter plots of ICs show that minimizing the statistical dependence, leads to a better separation and clustering of classes in the data.

[1] D.N. Rutledge, D. J. R. Bouveresse, Trends in Analytical chemistry, 2013, 50, 22–32.
[2] T.-H. Le and M. Berthier, Advances in Information and Computer Security, 2010, 285-300.
[3] J. F. Cardoso, Journal of Machine Learning Research, 2003, 4, 1177-1203.




Chelsi C. Wicks1, Peter D. Wentzell1, Jez W.B. Braga2, Liz F. Soares2, Tereza C.M. Pastore3, Vera T.R. Coradin3, Fabrice Davrieux4

1Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, PO Box 15000, Halifax, NS  B3H 4R2  Canada
2Chemistry Institute, University of Brasilia, Brasília, 72910-000, Brasilia, DF, Brasil
3Forest Products Laboratory, Brazilian Forest Service, 70818-970, Brasilia, DF, Brasil
4French Agricultural Research Center for International Development, CIRAD-UMR Qualisud, F-34398, Montpellier Cedex 5, France

The analysis of multivariate chemical data is commonplace in fields ranging from metabolomics to forensic classification.  Many of these studies rely on exploratory visualization methods that represent the multidimensional data in spaces of lower dimensionality, such as hierarchical cluster analysis (HCA) or principal components analysis (PCA).  However, such methods rely on assumptions of independent measurement errors with uniform variance and can fail to reveal important information when these assumptions are violated, as they often are for chemical data.  This work demonstrates how two alternative methods, maximum likelihood principal components analysis (MLPCA) and projection pursuit analysis (PPA), can reveal chemical information hidden from more traditional techniques.  Experimental data to compare different methods consists of near-infrared (NIR) reflectance spectra from 108 samples of wood that are derived from four different species of Brazilian trees.  The measurement error characteristics of the spectra are examined and it is shown that, by incorporating measurement error information into the data analysis (through MLPCA) or using alternative projection criteria (PPA), samples can be separated by species.  These techniques are proposed as powerful tools for multivariate data analysis in chemistry.




Carolina S. Silva1*, Maria Fernanda Pimentel2, José Manuel Amigo1,3, Carmen Garcia-Ruiz4, Fernando Ortega-Ojedad4

1Department of Fundamental Chemistry, Federal University of Pernambuco, Recife, Brazil; 
2Department of Chemical Engineering, Federal University of Pernambuco, Recife, Brazil;
3Department of Food Science, University of Copenhagen, Copenhagen, Denmark
3Department of Analytical Chemistry, Physical Chemistry and Chemical Engineering and University Institute of Research in Police Sciences (IUICP), University of Alcalá, Alcalá de Henares (Madrid), Spain

Document dating is a major problem in forensic sciences. The variety of the paper composition, mechanism of degradation and storage conditions are factors that make the study of aging process a complex topic. The paper chemical composition influences its degradation process. Therefore, in order to estimate the document age, the variability associated with its inorganic fillers should be attenuated. This work proposes a nondestructive methodology based on Fourier Transformed Infrared Spectroscopy (FTIR) that employs different chemometric approaches to build one unique model that can date samples of naturally aged documents. FTIR-ATR spectra of documents from 15 different years (between 1985 and 2012) were acquired; in which five documents per year containing five sheets each were analyzed. Eight spectra per sheet were acquired in the 4000-650 cm-1 spectral range, with a resolution of 4 cm-1 and 32 scans per spectrum. PLS models were built employing Generalized Least Squares Weighting (GLSW) and Orthogonal Least Squares (OLS) filters to reduce the variability of the samples from the same year. Afterwards, sparse PLS, which is an extension of the PLS model including a variable selection step, was applied to compare its performance with the preprocessing filters. All the proposed techniques showed improvements when compared to the initial PLS models (R2 = 0.75, RMSEP = 4.4 years), with sPLS model (R2 = 0.89, RMSEP = 3.8 years) showing the best results. This proves the potential of the chemometric approaches applied to the FTIR data to estimate the age of unknown documents.

Acknowledgement: the authors thank the Spanish General Commissary of Scientific Police (Documentoscopy section, Spain) for providing the analyzed documents, the funding agencies INCTAA (Processes nº.: CNPq 573894/2008-6; FAPESP 2008/57808-1), NUQAAPE –FACEPE (APQ-0346-1.06/14), Núcleo de Estudos em Química Forense – NEQUIFOR (CAPES AUXPE 3509/2014, Edital PROFORENSE 2014), CNPq, FACEPE and CAPES.




Larissa C. Richards1,2, Nicholas G. Davey2, Chris G. Gill1,2, Erik T. Krogh1,2

1Department of Chemistry, University of Victoria, 3800 Finnerty Road, Victoria, B.C., Canada, V8P 5C2
2Applied Environmental Research Labs, Department of Chemistry, Vancouver Island University, 900 Fifth Street, Nanaimo, B.C., Canada, V9R 5S5

Volatile and semi-volatile organic compounds (S/VOCs) are important atmospheric pollutants affecting both human and environmental health. These compounds are emitted from a wide variety of point and non-point sources, both natural and anthropogenic, and their atmospheric distributions can vary widely over time and space. Direct sampling mass spectrometry techniques such as membrane introduction mass spectrometry (MIMS) and proton-transfer reaction time-of-flight mass spectrometry (PTR-ToF-MS) can be used to continuously measure S/VOCs as unresolved mixtures. Both MIMS and PTR-ToF-MS can be operated in a moving vehicle, which produces temporally and spatially resolved mass spectral data. The recent completion of a mobile mass spectrometry lab allows chemical information to be mapped at the regional and neighbourhood scale.

We describe the application of chemometric techniques, such as principal component analysis (PCA) to discriminate samples and multivariate curve resolution-alternating least squares to extract pure component contributions from full scan MIMS spectra of lab-constructed samples. PCA has also been applied to real-world mass spectral data collected from MIMS and PTR-ToF-MS systems in a moving vehicle in order to discriminate air masses impacted by anthropogenic activities and biogenic processes. As the mobile mass spectrometry lab allows flexibility in sampling locations, these techniques have also been extended to data collected from the ambient air inside a series of coffee roasting and beer brewing establishments during production cycles. This work has applications in environmental forensics and occupational exposure studies, including source identification and apportionment of atmospheric pollutants.

Acknowledgement: This work has been supported by funding from the Natural Science and Engineering Research Council of Canada (RGPIN-2016-06454), the Canadian Foundation of Innovation (#32238), MITACS Accelerate, and the Fraser Basin Council's British Columbia Clean Air Research Fund.




Huimin Zhao, Yaofang Fan

Key Laboratory of Industrial Ecology and Environmental Engineering (Ministry of Education, China), School of Environmental Science and Technology, Dalian University of Technology, Linggong Road 2, Dalian 116024, PR China.

Nanozymes, nanomaterials with enzymatic characteristics, have received tremendous attention in an emerging research due to their possibility of low-cost, enhanced mass transport, large surface to volume ratio, stability against denaturing and adjustable catalytic activity compared to natural enzyme [1]. Recently, kinds of two-dimensional (2D) nanomaterials, such as carbon nanomaterials and metal-based nanomaterials, have also been proved to process high intrinsic horse radish peroxidase-like catalytic activities [2]. However, most of these above materials suffer limited their wide applications because of complicated operation processes, time consuming, exorbitant price and heavy-metal perceptive toxicity. Graphitic carbon nitride (g-C3N4), as a carbon-based, two-dimensional (2D) metal-free semiconductor nanomaterial, possessing a graphite-like structure, but also has some different properties, becoming a new research hotspot [3]. However, the unavailable agglomeration of 2D g-C3N4 could lead to active sites drastic loss. To date, developing g-C3N4 with high quality as nanozyme was urgent.

Hence, we successfully fabricated three-dimensional (3D) g-C3N4 with huge surface area and higher peroxidase-like in the presence of H2O2 and containing TMB substrate. Concededly, the unmodified NPs faces a formidable challenge to enhance their quite low surface catalytic activity. Inspired by the surface of g-C3N4 also has a high density of π-π stacking, allowing it to provide an extensively suitable surface for conjugating with ssDNA, but the mechanism reaction was less explored. Herein, we have researched surface chemistry of reactions and communicate that ssDNA-modified 3D g-C3N4 exhibits significantly enhanced 3D g-C3N4 peroxidase-like for H2O2 detection, and applicating it for OTC detection successfully.


[1] X. Wang, Y. Hu, H. Wei, Inorg. Chem. Front., 3, 41-60 (2016)

[2] B.W. Liu, J.W. Liu, Nanoscale, 7, 13831-13835 (2015)

[3] W.J. Ong, L.L. Tan, Y.H. S.T. Ng, S.P. Yong, Chemical reviews, 116, 7159-7329 (2016)

Acknowledgement: This work was financially supported by the National Natural Science Foundation of China (No. 21777012).




O. Devos1, J.P Placial2, R. Métivier2 M. Sliwa1, C. Ruckebusch1

1Univ. Lille, CNRS, UMR 8516 - LASIR - Laboratoire de Spectrochimie et Raman, F-59000 Lille, France
2ENS Cachan, CNRS, UMR 8531 - PPSM - Laboratoire de Photophysique et Photochimie  Supramoléculaires et Macromoléculaires, F-94235 Cachan Cedex, France

Photochemical processes are chemical processes occurring in the excited state and induced by light absorption. Their dynamic is usually investigated with UV-Visible spectroscopy under continuous monochromatic irradiation. Dynamic studies aim at the determination of the photochromic parameters, quantum yields of each reaction[1] and spectra of the different species. Quantum yields can be obtained fitting photochemical models (hard-modelling) to data. However, the situation is more complicated for photochemical processes than for conventional kinetic processes. Indeed, all the species absorbing at the irradiation wavelength are in competition to absorb light, which leads to the consideration of a photokinetic factor F(t) resulting in sets of nonlinear differential equations.  These equations can only be solved numerically using Euler or Runge-Kutta methods.

The incorporation of hard-models as constraints in MCR-ALS (HS-MCR)[2] is known to drastically reduce rotational ambiguity and to allow extracting kinetic parameters in situations where a complete knowledge of the system is not available (interferent species, instrumental artifact…). In this work, photochemical HS-MCR has been applied to the study of CMTE (Cis-1,2-dicyano-1,2-bis(2,4,5-trimethyl-3-thienyl)ethene) under continuous irradiation at different wavelengths (405 nm and 365 nm) in a multi-experiment approach. For irradiation at 405 nm the system is known[3]: CMTE can undergo photoreversible photoisomerization and photocyclization leading to three possible forms: 2 open forms (cis/trans) and one closed-ring form. For irradiation at 365 nm, the formation of an unidentify product has to be considered, and no complete model can be postulated. Multiset photochemical HS-MCR allowed extracting as much information as possible (quantum yields, pure spectra...) to describe the photodynamics of CMTE.

[1] M.H. Deniel, D. Lavabre, J.C. Micheau, Photokinetics under Continuous Irradiation, in: J.C. Crano, R.J. Guglielmetti (Eds.), Org. Photochromic Thermochromic Compd., Springer US, 2002: pp. 167–209.
[2] A. de Juan, M. Maeder, M. Martnez, R. Tauler, Application of a novel resolution approach combining soft- and hard-modelling features to investigate temperature-dependent kinetic processes, Anal. Chim. Acta. 442 (2001) 337–350.
[3] A. Spangenberg, J.A.P. Perez, A. Patra, J. Piard, A. Brosseau, R. Métivier, K. Nakatani, Probing photochromic properties by correlation of UV-visible and infra-red absorption spectroscopy: a case study with cis-1,2-dicyano-1,2-bis(2,4,5-trimethyl-3-thienyl)ethene, Photochem. Photobiol. Sci. 9 (2010) 188–193.



Sergio Borraz1, Joan Simó1,3, Ricard Boqué2, Silvia Sans1, Anna Gras1

1Dept. of Agri-Food Engineering and Biotechnology, Universitat Politècnica de Catalunya. Campus Baix Llobregat, Esteve Terrades 8, 08860 Castelldefels, Spain
2 Universitat Rovira i Virgili, Dept. of Analytical Chemistry and Organic Chemistry. Campus Sescelades, 43007 Tarragona, Spain
3Fundació Miquel Agustí, Campus Baix Llobregat, Esteve Terrades 8, 08860 Castelldefels, Spain

Nursery plants companies that work with huge volumes of plants must face an important problem, which is to avoid varietal mixtures. Traditional methods are based on genetic analysis and they are expensive and complex. Near-infrared spectroscopy (NIRS) has been used in some studies about species authentication [1] and it could be a faster and cheaper alternative to traditional methods. The global objective of this work is to assess the potential of near-infrared spectroscopy (NIRS) and to classify varieties of almond tree (Prunus dulcis) that are genetically very close and not morphologically distinguishable.

First of all, it was necessary to develop a sampling protocol and to select the best material to use. Two almond trees of seven different varieties of agricultural interest were used to obtain seventy-three leaves and wood samples. Each sample was dried and ground to finally obtain a homogeneous powder. Samples were scanned by reflectance using a TANGO FT-NIR Spectrometer (Bruker, Billerica, MA).

The results obtained by Partial Least-Squares Discriminant Analysis (PLS-DA) showed a good classification rate in the cross-validation stage. Regarding the leaves’ results, an 85 - 100 % of correct classification was achieved depending on the variety, with errors ranging among 2.2 – 4.5 % for the individual varieties. As for the wood outcomes, the correct classification rate was between 72 - 100 %, with errors ranging among 2.9 – 8.8 %. The results obtained in this preliminary study show the potential of NIR spectroscopy application in the classification of almond tree varieties.

[1] Pei Wang, ZhiguoYu. Species authentication and geographical origin discrimination of herbal medicines by near infrared spectroscopy: A review. Journal of Pharmaceutical Analysis. 277–284. 2015.

Acknowledgment: The authors thank the financial support provided by DI-COF 2017




Silvia Sans1,2, Joan Ferré3, Ricard Boqué3, Joan Casals1,2, Sergio Borraz1, Joan Simó1,2

1Universitat Politècnica de Catalunya, Dept. of Agri-Food Engineering and Biotechnology, Campus Baix Llobregat, Esteve Terrades 8, 08860 Castelldefels, Spain
2Fundació Miquel Agustí, Campus Baix Llobregat, Esteve Terrades 8, 08860 Castelldefels, Spain
3Universitat Rovira i Virgili, Dept. of Analytical Chemistry and Organic Chemistry, 43007 Tarragona, Spain

‘Calçots’ are the immature floral stem of second-year resprouts of onions from the ‘Blanca Tardana de Lleida’ landrace. In their traditional area of cultivation, the European Union has established the Protected Geographic Indication (PGI) ‘Calçot de Valls’. Sensory analysis for quality control requires trained panelists using standardized working methods. This approach involves considerable effort and it is not suitable for a large number of samples. Near-infrared spectroscopy (NIRS) has widely been used for food evaluation and can be a faster and cheaper alternative for predicting organoleptic traits.

This work presents a first attempt to estimate the sweetness of cooked ‘calçots’, one of the key traits that drives consumer acceptance [1], using NIRS and partial least squares (PLS) regression. For that, 85 samples were analyzed, in two different sample preparation modes: puree and the flour obtained after drying and grinding the puree. The goodness of the models fit were tested using the determination coefficient for prediction (R2) and the root mean square error for prediction (RMSEP).

The results obtained showed the capability of NIRS to roughly estimate sweetness in cooked ‘calçot’. The predictive ability of the PLS models from puree and flour were similar (R2=0.44 and 0.47; RMSEP=0.85 and 0.83, respectively), thus puree seems a better laboratory procedure as it involves less work. Considering the results, a class separation approach could provide more accuracy. In conclusion, NIRS can be a useful tool in screening large collections of samples to discard accessions and could reduce the number of samples to analyze by the trained panel. 

[1] Simó, J., Romero del Castillo, R., & Casañas, F. (2012). Tools for breeding 'calçots' (Allium cepa L.), an expanding crop. African Journal of Biotechnology, 11(50), 11065–11073.

Acknowledgment: The authors thank the financial support provided by ACCIÓ (RD14-1-004) and FI-DGR 2015.




Cassiano L. S. Costa1, Daleska P. Ramos1, Elias G. Vieira1, Fernanda A. F. Almeida1, Juliana B. Silva1

1Radiopharmaceutical Production and Development Unit, Nuclear Technology Development Center (CDTN), Av. Antônio Carlos, 6627 Campus UFMG, Pampulha, Belo Horizonte, Brazil.

Carbon-11 labeled N-butan-2-yl-1-(2-chlorophenyl)-N-methylisoquinoline-3-carboxamide ([11C]PK11195) is used as a positron emission tomography (PET) radiopharmaceutical for neuroinflammatory imaging in humans, such as glial neoplasms, multiple sclerosis, amyotrophic lateral sclerosis, and Parkinson’s disease [1]. Organic solvents (acetone, ethanol, acetonitrile and dimethyl sulfoxide) are used in the [11C]PK11195 production process and their residues may be present in the final product, which can cause toxic effects if above acceptable limits. Thus, the determination of residual solvents is necessary to ensure patient safety and meet regulatory expectations [2]. In this study, chemometric tools were applied to develop a efficient and fast procedure for the determination of residual solvents in [11C]PK11195 by gas chromatography-flame ionization detector (GC-FID). The 2IV(6-2) fractional factorial design was used to select the significant factors among carrier gas flow (mL/min), time held at initial temperature (min), temperature gradient (°C/min), injection split rate, detector temperature (°C) and injector temperature (°C).  The central composite design was used to the determination of the optimum conditions of two significant factors (injection split rate and time held at initial temperature). Two types of response were monitored: time of analysis and resolution between the ethanol and acetonitrile peaks. Therefore, a desirability function was constructed from the responses. Chemometric tools proved to be effective in achieving satisfactory optimization of chromatographic conditions. The analytical procedure validation was carried out based on INMETRO and ANVISA method validation guideline [3,4]. Recovery was from 72.5% to 109.9% and precision ranged from 2.4% to 11%. Thus, the developed GC-FID methodology can be routinely applied.

[1] P. Luoto, I. Laitinen, S. Suilamo, K. Någren, A. Roivainen, Mol. Imag. and Biology, 12, 435-442 (2010)
[2] M. Tankiewicz, J. Namies´nik, W. Sawicki, Trends in Analytical Chemistry, 80, 328-344 (2016)
[3] INMETRO. Orientação sobre validação de métodos analíticos. DOQ-CGCRE-008 Rev. 06 (2017). Accessed March 2018.
[4] ANVISA. Dispõe sobre a validação de métodos analíticos e dá outras providências. RDC n° 166 de 24 de julho de 2017. Accessed March 2018.

Acknowledgement: Radiopharmaceutical Production and Development Unit team; the following Brazilian institutions support: Nuclear Technology Development Center (CDTN), National Nuclear Energy Commission (CNEN), Research Support Foundation of the State of Minas Gerais (FAPEMIG), Brazilian Council for Scientific and Technological Development (CNPq) and Coordination for the Capacitation of Graduated Personnel (CAPES).




Cassiano L. S. Costa, Nicolly N. Santos, Juliana B. Silva, Soraya M. Z. M. D. Ferreira

Radiopharmaceuticals Service, Nuclear Technology Development Center (CDTN), Av. Antônio Carlos, 6627 Campus UFMG, Pampulha, Belo Horizonte, Brazil.

16-α-[18F]-fluoroestradiol ([18F]FES) is an imaging agent used for Positron Emission Tomography. [18F]FES binds estrogen receptors and is used for breast cancer diagnosis [1,2]. As byproducts can be formed during the radiosynthesis of [18F]FES, the aim of this work was to develop a procedure to assess radiochemical and chemical purity of [18F]FES using high performance liquid chromatography equipped with ultraviolet and radioactive detectors (HPLC-UV-RA). The development of the HPLC-UV-RA procedure was performed according to a multivariate experimental design [3]. The 23 full-factorial design was used to investigate the influence of the initial concentration of acetonitrile in mobile phase (MPi), the start time of gradient ellution (T0) and final concentration of acetonitrile (MPf) on the resolution of [18F]FES and a reported radiochemical impurity [4]. The Pareto chart showed that all factors and their interactions were not significant, which means that the variables in the studied levels did not need a final optimisation. However, the level of each factor were chosen according to data from full-factorial design (MPi = 30% v/v, T0 = 5 min  and MPf = 90% v/v). For this condition, the best resolution obtained was 1.10, which meets the United States Pharmacopeia (USP) criteria (>1.0). Chemometric approach enabled a refinement of factor levels, using few experiments. As a continuity of this work, the validation of the developed procedure to demonstrate that it produces reliable results and is suitable for the intended purpose will be performed.

[1] L. Sundarajan, H. Linden, J. Link, K. Krohn, D. Mankoff, Semin. Nucl. Med., 37, 470-476 (2007)
[2] G. Liao, S. Clark, E. Schubert, D. Mankoff, J. Nucl. Med., 57, 1269-1275 (2016)
[3] L. Candioti, M. Zan, M. Cámara, H. Goioechea, Talanta,124, 123–138 (2014)
[4] K. Knott, D. Gratz, S. Hubner, S. Juttler, C. Zankl, M. Muller, J Label Compd Radiopharm., 54, 749-753 (2011).

Acknowledgement: Radiopharmaceuticals Service team; the following Brazilian institutions support: Nuclear Technology Development Center (CDTN), National Nuclear Energy Commission (CNEN), Research Support Foundation of the State of Minas Gerais (FAPEMIG).




Neirivaldo Cavalcante da Silva1, Leandro de Moura França2, José Manuel Amigo3, Manel Bautista4, Maria Fernanda Pimentel1

1Department of Chemical Engineering, Federal University of Pernambuco, Recife - PE, Brazil
2Department of Pharmacy, Federal University of Pernambuco, Recife - PE, Brazil
3Department of Fundamental Chemistry, Federal University of Pernambuco, Recife - PE, Brazil
4Novartis Pharma AG, CH-4056 Basel, Switzerland

The aim of this work is to demonstrate the potential of the heterogeneity curves [1] and the percentage of homogeneity (%H) [2] for assessing the homogeneity in single channel non-binary (SCNB) images (a.k.a grayscale images). A more representative homogeneity index approach for SCNB images is addressed by including more realistic assumptions, considering that the objects are divisible and/or smaller than the pixel size, i.e. one pixel can contain a mixture of different objects. The proposed %H is an absolute homogeneity percentage that uses only the self-contained information of one image, with no need for additional modelling steps, as described in most of the works addressing this issue. In this regard, the Mean Squares Successive Difference Test (MSSDT) algorithm was applied to assess the blending end-point of simulated powder mixing batches with basis on %H value, since its profile is expected to reach a steady state as the blending process reaches a stable stage of homogeneity. The results obtained in this work showed the %H to have high potential for industrial applications where the distributional homogeneity represents a critical process parameter to ensure the quality of the final product.


[1] P.M.C. Lacey, Developments in the theory of particle mixing, J. Appl. Chem. 4 (1954) 257–268. doi:10.1002/jctb.5010040504.
[2] L. de Moura França, J.M. Amigo, C. Cairós, M. Bautista, M.F. Pimentel, Evaluation and assessment of homogeneity in images. Part 1: Unique homogeneity percentage for binary images, Chemom. Intell. Lab. Syst. 171 (2017) 26–39. doi:10.1016/j.chemolab.2017.10.002.

Acknowledgement: CAPES and FACEPE for the financial support, CNPQ, NUQAAPE and INCTAA.




Jun-Li Xu, Da-Wen Sun, Aoife A. Gowen

School of Biosystems and Food Engineering, University College Dublin, Belfield, Dublin 4, Ireland.

This work proposed a method to characterize spatial variability in hyperspectral images by using variogram coupled with principal component analysis (PCA). This method started with the computation of the mean semivariogram of three directions: row, column and diagonal at each intensity image of wavelength. The obtained semivariograms were arranged to form a matrix with the size of λxL (wavelength×lag) and then PCA was applied on it. Therefore the major difference of each wavelength in describing spatial variability was represented in the score values. Wavelengths with the similar score values are prone to own similar spatial pattern. When project PC score values on the mean-centered dataset, the spatial variance is described in PC-variogram images. To verify the effectiveness of this method, some simulated data and a remote sensing image of 380-2500 nm were used. Results of simulated data confirmed that the major spatial pattern can be visualized from PC1-variogram image with noise reduced. The first two PC-variogram images of the remote sensing hypercube consisting of tree, soil, road and water were extracted and 70% of pixels were used as calibration and the remaining 30% as test set. The accuracy of discriminate analysis reached 93% for calibration and 95% for prediction, while conventional PCA applied on the raw hypercube provided 90% for calibration and 92% for prediction, which proves the advantage of using spatial analysis in case of hyperspectral image analysis.



NoiseGen - Measurement Error Simulation Software

Stephen Driscoll, Peter Wentzell

Trace Analysis Research Centre, Department of Chemistry, Dalhousie University, PO Box 15000, Halifax, NS B3H 4R2 Canada

The simulation of analytical measurement errors is important in the evaluation and development of data processing methods. Unfortunately, most approaches to simulating analytical signals in the literature rely on the simulation of iid normal noise (independent and identically distributed from a normal distribution). Although most analytical signals do exhibit iid normal noise, it is never dominant and almost always in the presence of heteroscedastic and correlated errors. Developed as a MATLAB toolbox [1], NoiseGen was created to make the simulation of realistic analytical measurement errors efficient and approachable. The NoiseGen algorithm revolves around rotating and scaling iid normal noise via the theoretical error covariance matrix (ECM) of the desired noise, making the simulation of any combination of errors possible if the ECM is known. This poster will present the main development points of the NoiseGen software as well as practical examples of the software in use.

[1] MATLAB, version R2016b, The MathWorks Inc., Natick, Massachusetts, 2016.




R.R. de Oliveira1, A. Gómez-Sánchez1, C. Avila2, F. Muller2, K. Pissaridi3, K. Krassa3, J. Ferré4, A. de Juan1

1Dept. Chemical Engineering and Analytical Chemistry, Universitat de Barcelona, Diagonal, 645, 08028 Barcelona, Spain.
2School of Chemical and Process Engineering, University of Leeds, Leeds, United Kingdom.
3Megara Resins Anastasios Fanis S.A., 38th Km New National Rd Athens-Corinth, Megara 19100, Greece.
4Department of Analytical Chemistry and Organic Chemistry, Universitat Rovira i Virgili, Tarragona, Spain.

Data fusion strategies for Multivariate Statistical Process Control (MSPC) combining diverse NIR-derived information or NIR information with other sensors are explored through real process examples.

The first process is a real-time monitoring of a lab-scale distillation process, where temperature and NIR spectra are acquired simultaneously [1]. NIR data from several batches are first decomposed by principal component analysis (PCA) and multivariate curve resolution alternating least squares (MCR-ALS) [2]. Both methods compress the original data into model information and residuals (i.e. Hotelling’s T2 and residuals Q in PCA, and concentration profiles of distilled fractions, C, and residuals Q in MCR-ALS). The two kinds of NIR-based compressed information are combined with temperature to build data fusion models for on-line MSPC process control [1].

The second process is the monitoring of a lab-scale saturated polyester resin polymerization. Here, viscosity and acid value were the two quality parameters predicted by in-line NIR spectroscopy and related PLS models. In this case, a data fusion model is developed using the predicted PLS parameters and the Q and T2 values from NIR spectra. This fusion accounts for all possible information derived from the initial NIR measurement, related or not to the PLS predicted parameters.

In all examples, NIR compressed information is separated into the part following the model (based on scores in PCA or concentration profiles in MCR) and out of it, i.e., residuals. Fusion is defined in a wide sense, as connecting information of different sensors or parameter-specific and general information from the same sensor.

[1]       R.R. de Oliveira, R.H.P. Pedroza, A.O. Sousa, K.M.G. Lima, A. de Juan, Anal. Chim. Acta 985 (2017) 41–53.
[2]       A. de Juan, J. Jaumot, R. Tauler, Anal. Methods 6 (2014) 4964.

Acknowledgement: The research project receives funding from the European Community‘s Framework program for Research and Innovation Horizon 2020 (2014-2020) under grant agreement number 637232



Camila Assis1, Ednilton M. Gama1, Clésia C. Nascentes1, Leandro S. Oliveira2, Marcelo M. Sena1

1Chemistry Department, ICEx,Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil
2Mechanical Engineering Department, ICEx,Universidade Federal de Minas Gerais, 31270-901 Belo Horizonte, MG, Brazil

Coffee is one of the most appreciated beverages in the world. Although the generic name Coffea has many species, only two are economically important: Coffea arabica and Coffea canephora (robusta). In addition to the chemical differences between these two species, it is important to note their difference in economic value, with Arabica coffees presenting 20-25% higher market prices. For this reason, coffee has been target of fraud, mainly related to 100% Arabica blends. Thus, it is necessary to develop increasingly fast, efficient and safe analytical methods aiming to ensure the authenticity of coffee blends.

The objective of this work is to develop an analytical method to quantify the presence of robusta coffee in blends with arabica, using portable NIR and TXRF spectrometers. For NIR analysis, blends were formulated at different concentrations of robusta in arabica (1-33%, step 1%), totaling 40 samples, which were toasted at 1850C (light roast). For TXRF analysis, the same samples were extracted using water at 90oC, with subsequent filtration and centrifugation. A Gallium solution was used as an internal standard. PLS regression was applied and two data fusion approaches were used (low and mid levels). Two methods of variable selection were tested: genetic algorithm (GA) [1] and ordered predictors selection (OPS) [2]. 

The low level data fusion model coupled with GA presented the best performance (RMSEP=1.5%, r=0.98, 5LV, nvar=71). The  selected variables from NIR spectra correspond to vibrations of multiple constituents from the coffee. For TXRF, the following elements were selected: K, Ti, Mn, Cu, Zn, Br, Rb and Sr. The interpretation of this model allowed to find correlations between molecular (NIR) and atomic (TXRF) compositions, characterizing the two most important coffee species.

[1]         D. Broadhurst, R. Goodacre, A. Jones, J. J. Rowland, D. B. Kell, Anal. Chim. Acta, 348, 71–86 (1997).
[2]         R. F. Teófilo, J. P. A. Martins, M. M. C. Ferreira, J. Chemom., 23, 32–48 (2009).

Acknowledgement: CNPq, CAPES and FAPEMIG for financial support.



Mourad Kharbach1,2, Johan Viaene1, Abdelaziz Bouklouze2  and Yvan Vander Heyden1

1 Department of Analytical Chemistry, Applied Chemometrics and Molecular Modelling, Vrije Universiteit Brussel (VUB), Laarbeeklaan 103, B-1090 Brussels, Belgium
2 Biopharmaceutical and Toxicological Analysis Research Team. Laboratory of Pharmacology and Toxicology, Faculty of Medicine and Pharmacy, University Mohammed V- Rabat- Morocco

Phenolic-compounds profiling from food has been an area of intensive research because of their nutritional value and biological activities. Information about food's origin is necessary to verify specifications and to guarantee quality because food from different origin may have distinct qualities.

Two untargeted fingerprinting techniques (UPLC–ESI–TOF/MS and UPLC-DAD) were investigated to classify ninety-five Extra-Virgin Argan Oils (EVAO), with Protected Geographical Indication, originating from five geographical Moroccan regions (‘Ait-Baha’, ‘Agadir’, ‘Essaouira’, ‘Tiznit’ and ‘Taroudant’). First, the chromatographic data was pretreated prior to the application of classification tools. Principal component analysis (PCA) was carried out as an exploratory technique to visualize the five groups. Three multivariate pattern-recognition techniques, Partial Least Squares Discriminant Analysis (PLS-DA), Soft Independent Modeling of Class Analogy (SIMCA) and Support Vector Machines (SVM), were applied to the two sets of chromatographic-fingerprinting data. The classification models built were very accurate and sensitive, with high recognition and prediction abilities (∼99% of correct classification rate). The predictability of the models was tested with an external data set. Both, the results based on UPLC-ESI-TOF/MS profiling or the UPLC-DAD data were able to classify the samples with high accuracy. Potential phenolic biomarkers, responsible for the classification, were identified and quantified using both techniques.

Metabolic profiling approaches based on the phenolic composition result in very useful data to trace the geographical origin of Moroccan Argan oils.




Marcin Zabadaj, Aleksandra Szuplewska, Michał Chudy, Patrycja Ciosek-Skibińska

The Chair of Medical Biotechnology, Faculty of Chemistry, Warsaw University of Technology, Noakowskiego 3, 00-664 Warsaw, Poland

Gustation is one of the most meaningful physiological functions of mammalian organisms. Cells, especially taste receptor cells (TRCs), are popularly recognized to be biological functional elements applied for evaluation of responses to basic taste stimuli. So far, there are two classes of cells available for the fabrication of bioanalytical tools, which are natural taste receptor cells and bioengineered taste receptor cells. Bioengineered cells are the category of functional cells treated specially to be used as effective and sensitive elements for bioanalytical procedures. They can respond to the tastants or the specific compounds. On the other hand, natural cells are derived from biological taste systems, which keep natural morphologies and properties. One of the examples of cells used as tasting system model are chemosensoric cells, isolated from e.g. murine intestine.

In this study, we monitored the responses of STC-1 cells (derived from musculus gut; sensitive to bitter, sweet and umami taste) to five basic tastants via chromatographic fingerprinting. Ion chromatography combined with conductometric detection as a well-established separation technique for determination of wide range of nonorganic as well as organic ions was applied to obtain chromatographic fingerprints of cell culture media after treatment the cells with sodium chloride (salty taste), citric acid (sour taste), caffeine citrate (bitter taste), aspartame (sweet taste), and sodium glutamate (umami taste). Chromatographic fingerprints were classified with the use of PLS-DA, showing good discrimination of bitter, sweet and umami taste comparing to controls, and worse in the case of 2 other tastants, which is in good accordance with the natural ability of the applied cells to taste sensing.

Acknowledgement: This work has been supported by National Science Centre, Poland, within a framework of OPUS project "(Bio)electrochemical imaging for cellular microphysiometry" UMO-2014/15/B/ST4/04807.




Rita El Hajj1, Wadih Skaff2, Nathalie Estephan1

1Faculty of Sciences, Holy Spirit University of Kaslik, BP 446 Jounieh, Lebanon 
2ESIAM-Université Saint-Joseph, Taanayel, Rue de Damas, BP 159 Zahlé, Lebanon

Honey production in Lebanon is predominantly mountain poly-floral honey, and orange blossom honey. 

The aim of this work is to authenticate and characterize Lebanese honey depending on its botanical origin. As several analytical methods are simultaneously necessary for a reliable authentication of honeys, such work is time-consuming and costly as it is based on sensory and pollen analysis as well as several physico-chemical methods containing at least measurements of the electrical conductivity and the sugar content. Thus, there is a need for methods that allow a rapid and reproducible authentication of the botanical origin of honey at low cost.

Infrared Spectroscopy is becoming one of the commonly used methods because of its reputation as a rapid and non-destructive powerful analytical tool requiring minimum sample preparation.

In this study, classical physico-chemical methods as well as near and mid infrared spectroscopy are used to authenticate and characterize Lebanese honey. In total, 100 honey samples were collected from different Lebanese regions, altitudes and floral characteristics.

Numerical data of the physico-chemical and spectroscopic analysis are treated by chemometric methods including: exploratory methods (PCA, ICA) and discriminant methods (FDA, PLS-DA).

Chemometric analysis shows a correlation between physico-chemical and spectral data. According to the results obtained, honey samples are classified with high discrimination between the 2 categories of honey: floral and honeydew. PCA and ICA applied on mid and near infrared show a distribution of honey samples in function of their botanical origin.

Acknowledgement: CNRS - Lebanon for the financial support




Tim Offermans1, Jeroen Jansen1

1Radboud University, Nijmegen

‘Integrating Sensor Based Process Monitoring and Advanced Process Control’ (INSPEC) is an ISPT-managed project in which academic institutes and industrial chemical companies collaborate to provide dedicated process control and monitoring solutions for optimizing energy and raw material use of chemical plants. As industrial processes have to deal with many exogenous sources of variability, they are operated within overly conservative control regimes. INSPEC: Chemometrics meets Advanced Process Control A more advanced use of available knowledge, process data and quality data will allow industries to operate in control regimes that will lead to a more economic use of energy and raw material. The development of chemometrics routines to integrate Process Analytical Techniques (PAT), like on-line spectroscopic measurements, and process variable data, like in-line temperature measurements, with Advanced Process Control (APC) is a key effort in this project.

Acknowledgement: Corbion, DSM, FrieslandCampina, Huntsman, ISPT, Technical University Eindhoven




Denise Meinhardt1,2, Mathias Sawall1, Henning Schröder1, Klaus Neymeyr1,2

1University of Rostock, Ulmenstr. 69, 18057 Rostock, Germany
2LIKAT, Leibniz-Institute for Catalysis, Albert-Einstein-Str. 29A, 18059 Rostock, Germany

Data preprocessing of NMR data by means of phase and baseline corrections is often a necessary step for successful chemometric analyses [1,2,3]. Usually, these two corrections are performed in separate steps. Here, we present a new algorithm for the automated and simultaneous correction of the phase and baseline of NMR data called SINC (simultaneous NMR signal correction) [4]. Testing the algorithm on sample data results in improved outcomes and significant time savings, compared to separate corrections.

The numerical algorithm is based on the minimization of a target function taking chemical objectives into account. For the solution of the resulting nonlinear least squares problem an extended Gauss-Newton method (including quasi-Newton update formula) and a genetic algorithm are combined in order to find the global minimum. In addition, a modified Savitzky-Golay filter is used for baseline detection, and the baseline is subsequently corrected by a Whittaker smoother. The software is written in C; a MATLAB-GUI is provided for the data handling.

[1] Q. Bao, J. Feng, L. Chen, F. Chen, Z. Liu, B. Jiang and C. Liu. A robust automatic phase correction method for signal dense spectra. J. Magn. Reson., 234, pp 82-89, 2013.
[2] J.C. Cobas, M.A. Bernstein, M. Martin-Pastor and P.G. Tahoces. A new general-purpose fully automatic baseline-correction procedure for 1D and 2D NMR data. J. Magn. Reson., 183, pp 145-151, 2006.
[3] H. de Brouwer. Evaluation of algorithms for automated phase correction of NMR spectra. J. Magn. Reson., 201(2):230-238, 2009.
[4] M. Sawall, E. von Harbou, A. Moog, R. Behrens, H. Schröder, J. Simoneau, E. Steimers and K. Neymeyr. Multi-objective optimization for an automated and simultaneous phase and baseline correction of NMR spectral data. Accepted for J. Magn. Reson., 2018.




Denise Meinhardt1,2, Mathias Sawall1, Christoph Kubis2, Detlef Selent2, Henning Schröder1, Armin Börner2, Klaus Neymeyr1,2

1University of Rostock, Ulmenstr. 69, 18057 Rostock, Germany
2LIKAT, Leibniz-Institute for Catalysis, Albert-Einstein-Str. 29A, 18059 Rostock, German

The analysis of spectroscopic data by multivariate curve resolution methods suffers from the so-called rotational ambiguity. Unique solution cannot be expected but a continuum of possible solutions exists. The area of feasible solutions (AFS) covers the ambiguity of the factorization. Often additional information and chemically expertise is required to extract meaningful solutions.

In this contribution, we investigate and discuss ideas to design experiments with the goal of reducing the intrinsic ambiguity of the factorization. The focus for developing strategies is on the concentration factor. Additionally, we impose constraints to divide the frequencies and spectra into relevant and nonrelevant ones with respect to their impact on the rotational ambiguity. The investigations are done against the background of an AFS representation. Results are presented for several experimental data sets.

[1] Manne: On the resolution problem in hyphenated chromatography. Chemom. Intell. Lab. Syst. 27(1), pp 89-94, 1995.
[2] Sawall, Kubis, Selent, Börner, Neymeyr: A fast polygon inflation algorithm to compute the area of feasible solutions for three-component systems. I: Concepts and applications. J. Chemom. 27(5), pp 106-116, 2013.
[3] Rajkó, Abdollahi, Beyramysoltan, Omidikia: Definition and detection of data-based uniqueness in evaluating bilinear (two-way) chemical measurements. Anal. Chim. Acta, 855, pp 21-33, 2015.
[4] Sawall, Jürß, Neymeyr: FACPACK: A software for the computation of multi-component factorizations and the area of feasible solutions, Revision 1.3,, 2015.




Ana Herrero-Langreo1,2, Gauri Ravindra Chemburkar1, Amalia G.M. Scanell2,3,4, Aoife Gowen1,2 

1UCD School of Biosystems and Food Engineering, University College of Dublin (UCD), Belfield, Dublin 4 Ireland
2UCD Institute of Food and Health
3UCD Center for Food Safety
4UCD School of Agriculture and Food Science

The application of hyperspectral imaging for the identification of microbiological samples is a recently growing field with promising practical applications and specific challenges. While traditional methods require highly specialized training and long-time periods (up to 7 days), hyperspectral imaging can provide a quick, user-friendly and label-free tool for microbial assessment. The present work is a preliminary study for the identification of foodborne bacteria by combining different spectral modalities. Two strains of Bacillus and one of Cronobacter were grown in nutrient agar from six decimal dilutions using the spread-plate technique. Samples were imaged with a VIS-vNIR (400–1000nm) and a NIR (880-1720nm) push broom hyperspectral systems. Spectral profiles for each bacteria genus were compared through Principal Components Analysis. Partial Least Squares Discriminant Analysis for each spectral range was calibrated on the highest microbial concentration and applied on subsequent dilutions to predict microbial identification. Preliminary results show 99 % of correct pixel classification (CC) between Cronobacter and Bacillus on calibration images, while prediction on lower dilution test images presented lower CC (65% to 18%). Future work will assess the effect of spectral pre-treatments to correct the effect of variations in agar thickness. Further experimentation is planned to provide a validation set by replicating the experimental setup and including FT-IR microscopy imaging of specific colonies identified in the VIS-vNIR and NIR spectral images. This multimodal approach is being investigated to provide the knowledge base for a more robust and early-stage identification of food borne bacteria.

Acknowledgement: Authors would like to thank Multi-scale Hyperspectral Imaging for Enhanced Understanding and Control of Food Microbiology (HyperMicroMacro) project, funded by Science Foundation of Ireland.




A. Bouklouze1, I. Barra1, M. Alaoui1,  Y. Cherrah1

1Laboratory of Pharmacology and Toxicology, Faculty of Medicine and Pharmacy, University Mohammed V- Rabat- Morocco.

In order to discriminate between products of four known diesel suppliers in Morocco, a chemometric model was built by the use of the MIR-FTIR known with its ease in use, reduced price of analyzes, non-destructive method and does not require sample preparation. Coupled with two chemometric tools, on the one hand, PCA was useful to visualize the data and to verify the existence of any outliers, with 4 PCs 70% of variability was explained, and it was noticed the formation of four groups perfectly separated, each Group represents samples belonging to each supplier. And on the other hand, the real model of discrimination was developed by PLS-DA, after fixing the number of optimal latent variables of 3 LVs, the discrimination of each group of samples compared to others has been achieved with a perfect selectivity and Sensitivity (almost 100%) and very low RMSEC and RMSEP errors, indicating the success of our model.




Mohammed Alaoui Mansouri1,2, Pierre-Yves Sacré1, Eric Ziemons1, Mourad Kharbach2,3,Yahia Cherrah2, Philippe Hubert1, Abdelaziz Bouklouze2, Roland Marini1

University of Liege (ULiege), CIRM, VibraSanté Hub, Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, Quartier Hôpital, Avenue Hippocrate 15, B36, B-4000 Liege, Belgium
2Laboratory of Pharmacology and Toxicology, Biopharmaceuticals and Toxicological Analysis Research Team, Faculty of Medicine and Pharmacy, Rabat, Morocco
3Department of analytical chemistry and pharmaceutical technology, CePhaR, Vrije Universiteit Brussel (VUB), Laarbeeklaan 103, 1090 Brussels, Belgium

The main goal of this work was to prove the ability of the combination between vibrational spectroscopy techniques and PLS-DA to differentiate between different polymorphic forms of fluconazole in pharmaceutical products. These are mostly manufactured based on fluconazole polymorphic form- II and form- III. These crystalline forms may undergo polymorphic transition during the manufacturing or storage conditions process. Therefore, it is important to confirm if the expected polymorphic form is still present or not.

FT-IR, FT-NIR and Raman spectroscopies were associated to PLS-DA and used to build robust classification models to distinguish between form- II, Form- III and monohydrate form. Based on the results, it is shown that PLS-DA models have a high efficiency to classify various fluconazole polymorphs, with a high sensitivity and specificity. Finally, the selectivity of the PLS-DA models is proven based on the analysis of two samples of itraconazole and miconazole that belong to the same antifungal class as fluconazole. These two samples mimic potential contaminants. Based on the plots of Hotelling T² vs Q residuals, miconazole and itraconazole are significantly considered outliers and rejected.

[1] E. Ziémons, H. Bourichi, J. Mantanus, E. Rozet, P. Lebrun, E. Essassi, Y. Cherrah, A. Bouklouze, P. Hubert, Journal of Pharmaceutical and Biomedical Analysis Determination of binary polymorphic mixtures of fluconazole using near infrared spectroscopy and X-ray powder diffraction: A comparative study based on the pre-validation stage results, 55 (2011) 1208–1212. doi:10.1016/j.jpba.2011.02.019.

[2] N.L. Calvo, T.S. Kaufman, R.M. Maggio, Mebendazole crystal forms in tablet formulations. An ATR-FTIR/chemometrics approach to polymorph assignment, J. Pharm. Biomed. Anal. 122 (2016) 157–165. doi:10.1016/j.jpba.2016.01.035.



 Rapid detection of smuggled non-compliant diesel in Moroccan market using mid infrared spectroscopy coupled to a PLS-DA model

Issam Barraa1 , Mohammed Alaoui Mansouri 1,2, Yahia Cherrah 1, Abdelaziz Bouklouze1

1Pharmaceutical and Toxicological Analysis Research Team. Laboratory of Pharmacology and Toxicology, Faculty of Medicine and Pharmacy, University Mohammed V- Rabat- Morocco.

2University of Liege (ULiege), CIRM, VibraSanté Hub, Department of Pharmacy, Laboratory of Pharmaceutical Analytical Chemistry, Quartier Hôpital, Avenue Hippocrate 15, B36, B-4000 Liege, Belgium

In order to detect smuggled diesel samples and discriminate them from authentic products Mid Infrared Spectroscopy was chosen as a quick, non-destructive, less expensive and a very sensitive technique especially when coupled to Chemometric tools. After recording infrared spectra of all diesel samples, the data was projected to Principal Components Analysis (PCA) to visualize relations between samples and even detect if there is any outliers. The classification was carried out by a Partial Least Square Discriminate Analysis model. Results obtained by PCA shows that dissimilarities between the two diesel classes are easy to highlight thanks to the formation of two separable groups on PCA scores plot, using just two components already explained more than 68% of the total variance (PC1 explicated 51,09% and 16,3% for the second).

PLS-DA model allowed good discrimination between the two diesel classes with high specificity and sensitivity and reduced errors, it can also be said that the result is better than that obtained when applying PCA even with two latent variables. Then in this case 62,15% of the total variance was explained (52,32 by LV1 and 9,83 by LV2).




André van den Doel1,2, Geert van Kollenburg1,2, Lutgarde Buydens1 and Jeroen Jansen1

1Institute of Molecules and Materials, Department of Analytical Chemistry, Radboud University, PO Box 9010 6500 GL Nijmegen, The Netherlands

2TI-COAST, Science Park 904, 1098 XH, Amsterdam, The Netherlands

The European Water Framework Directive sets goals for the ecological and chemical condition of surface water and ground water [1]. Rijkswaterstaat is part of the Dutch ministry of Infrastructure and the Environment and is responsible for the management of the main waterway networks. At several measurement stations along the major rivers they measure the water that flows past with LC-MS or GC-MS. There is a limited number of target compounds that are screened daily, but it is not feasible to monitor all (possible) contaminants on a daily basis. That is why we use GC-MS fingerprints to analyse the water quality and to identify emerging contaminants. Compared to the univariate thresholds that are currently used our approach has the advantage of taking into account correlations and co-occurrences of compounds. This leads to a more holistic view of the water quality and better identification of processes that influence it.



[1] Directive 2000/60/EC of the European Parliament and of the Council of 23 October 2000 establishing a framework for Community action in the field of water policy. Official Journal L327, 22/12/2000 p0001-0073 (2000).


Acknowledgement: This research received funding from the Netherlands Organisation for Scientific Research (NWO) in the framework of the Programmatic Technology Area PTA-COAST3 of the Fund New Chemical Innovations. This publication reflects only the author’s view and NWO is not liable for any use that may be made of the information contained herein.




Beatriz Carrasco1,  Alberto Gil1, José Ramón Quevedo2, Elena Montañés2, Guanghui Shen3, 4, Vincent Baeten4, Juan Antonio Fernández Pierna4

1Blendhub, Murcia, Spain.
2Artificial Intelligence Center, Universidad de Oviedo, Gijón, Spain
3China Agricultural University (CAU), China
4Walloon Agricultural Research Centre (CRA-W), Belgium /

In this work, two different approaches are studied to characterize complex blends in food products: a Support Vector Machines (SVM)-based stacking approach and a local approach.

Stacking has become a quite common step in machine learning field to make more than one simple approach cooperate in order to obtain more accurate predictions. This work presents a stacking approach, where the output of one of the more simple approaches feeds the input of another forming a stack. Particularly, the proposal consists of using the output provided by a support vector machine approach as input of a support vector regression approach. Both support vector approaches have shown promising performance with high dimensional and noisy data.

A local approach is based on a group of methods based on selecting from a large database, a set of samples spectrally similar to an unknown sample whose properties are to be predicted. Following this strategy, a specific local model is then developed for that sample using the previously selected “neighborhood” samples as calibration set. This means that each sample is predicted with a different calibration equation. Here a simple local method based on PCA scores is applied.

Both approaches are applied for the characterization and prediction of complex blends used as additives in the food industry.




Mona Stefanakis1, Edwin Ostertag1, Anita Lorenz1, Marc Brecht1, Karsten Rebner1

1Process Analysis & Technology (PA&T), Reutlingen University, Alteburgstr. 150, 72762 Reutlingen, Germany

Glioblastoma multiforme is the most common and most malignant tumor of the human brain. The average long-term survival rate amount to 15 months. Due to the need for a good prognosis a nearly complete resection and a well-defined classification of the tumor is necessary.

We investigate different human brain tumors with various spectroscopic methods in combination with multivariate methods (e.g. principal component analysis). The objective is a label-free demarcation between malignant and normal brain tissue and a classification following the malignancy schema of the World Health Organisation (WHO). The results are compared to the pathologist’s assessment.

A multimodal spectral imaging concept combining optical microscopy with multimodal molecular and elastic light scattering spectroscopy is presented. Laterally resolved spectroscopic information representing the chemical fingerprint is recorded together with the elastically scattered light linked to the morphology of the same area of the sample. Different spectroscopic techniques from the UV and VIS range to the middle infrared are applied for label-free identification, characterization, demarcation and classification of brain tumor cross-sections with multivariate methods. The combination of the spectroscopic methods and the fusion of their data can help to get rid of the need for staining and to obtain a higher diagnostic safety for the tumor classification.

Acknowledgement: The financial support by the “Ministerium für Wissenschaft, Forschung und Kunst Baden-Württemberg” and the “Promotionskolleg IPMB” of the University of Tübingen and the Reutlingen University. The authors especially thank the team of the Neurosurgery Department of Tübingen University Hospitals, headed by Professor Dr. Marcos Tatagiba, for supplying the brain tumor tissue samples and preparing the cross-sections.




S. Mas1, A. Torro2, N. Bec2, C. Larroque2, P. Martineau2, A. de Juan3, S. Marco1,

1Signal and Information Processing for Sensing Systems, IBEC, Baldiri Reixac 4-8, 08028 Barcelona, Catalonia, Spain
2IRCM, Institut de Recherche en Cancérologie de Montpellier, INSERM U1194, UM, 208 Avenue des Apothicaires, Montpellier, F-34298, France.
3Chemometrics Group. Department of Chemical Engineering and Analytical Chemistry. Universitat de Barcelona. Av. Diagonal, 645. 08028 Barcelona, Catalonia, Spain

In this study, combination of multivariate curve resolution-alternating least squares (MCR-ALS) and K-means clustering is proposed as a general strategy for the characterization of cancer tissues in matrix-assisted laser desorption ionization-mass spectrometry images (MALDI-MSI).

In tissue-based cancer research, information on the spatial location of the cancer cell populations can be known from the traditional histology. This study proposes for the first time the introduction of this information as a constraint in the MCR resolution of MSI data. The use of this local rank constraint will help to decrease or remove the MCR inherent ambiguity. Moreover, this study aims at showing the advantages of combining a resolution method with histology-based local rank information to enhance the quality of information obtained from the combination of MCR-ALS and K-means methods [1-2].

The great potential of this strategy for the characterization of cancer tissues in MALDI-MSI data is shown on a set of fifteen images corresponding to different tissues of experimental colorectal cancer.

[1] Piqueras, S., Duponchel, L., Tauler, R., & De Juan, A. (2011). Analytica Chimica Acta, 705(1-2), 182-192.
[2] Piqueras, S., Krafft, C., Beleites, C., Egodage, K., von Eggeling, F., Guntinas-Lichius, O., Popp, J., Tauler, R., de Juan, A. (2015). Analytica Chimica Acta, 881, 24-36.

This work is part of the BEST Postdoctoral Programme, funded by the European Commission under Horizon 2020’s Marie Skłodowska-Curie Actions COFUND scheme (Grant Agreement no. 712754) and by the Severo Ochoa programme of the Spanish Ministry of Science and Competitiveness (Grant SEV-2014-0425 (2015-2019)).




Kévin Jacq1,2, Yves Perrette1, Bernard Fanget1, Didier Coquin2, Pierre Sabatier1, Fabien Arnaud1, Ruth Martinez-Lamas3,4, Maxime Debret3

1 University Grenoble Alpes, University Savoie Mont Blanc, CNRS, EDYTEM, 73000 Chambéry, France
2 University Grenoble Alpes, University Savoie Mont Blanc, Polytech Annecy-Chambéry, LISTIC, 74000 Annecy, France
3 Normandie Univ, UNIROUEN, UNICAEN, CNRS, M2C, 76000 Rouen, France
4 IFREMER, Laboratory Géodynamique et Enregistrement Sédimentaire (LGS), France

Sedimentary cores are used, thanks to their physical, chemical and biological properties, to infer past climate and environment. Sampling methods (millimetre or centimetre) and routine analysis are destructive and non-spatially resolved methods that consume time and material. The use of hyperspectral imaging makes it possible to have micrometric area in each point of the core.

We use two hyperspectral cameras, the VNIR (spectral range: 400-1000nm, spatial resolution: 60μm) and the SWIR (spectral range: 1000-2500nm, spatial resolution: 189μm). Usually each camera is used separately. The goal of this work is to show the combination of sensor increase performance predictions. A pixel-level data fusion based on the ARSIS method [1] is applied to create a unique cube at the optimal resolution. This new cube can be used with a usual PLSR method to develop a model for the total organic carbon.

Three cores from the lakes Le Bourget, Annecy and Geneva (Western Alps) are been tested (approximately 60cm long and 9cm width each). For both samples, the results show an increase prediction performance rather than data used separately. In the unique cube, the selected wavelength corresponds to those selected by each sensor. Although the analyzes were performed on bulk samples (5 mm x 90 mm x 45 mm slices), the prediction model provides access to the mapping of the surface with a micrometric resolution (the 60μm pixel can be interpreted as relevant information).

[1] Ranchin, T., & Wald, L. (2000). Fusion of high spatial and spectral resolution images: the ARSIS concept and its implementation. Photogrammetric Engineering and Remote Sensing, 66(1), 49–61.




Cannon Giglio1, Steven D. Brown2

1Department of Chemistry, Dalhousie University, P.O. Box 15000,
Halifax, NS, B3H 4R2, Canada 

2Department of Chemistry and Biochemistry, University of Delaware,
163 The Green, Newark, DE 19716, USA

Variable selection is frequently used to remove uninformative variables in multivariate data such as spectra. The most commonly used variable selection methods, such as variable importance in projection (VIP), are based on the partial least squares (PLS) regression. These methods assume a valid PLS model, which is often not the case. A potential alternative is to use the elastic net (EN) regression, which can select variables automatically by using a mixed-norm penalization function. An EN regression can select groups of correlated variables, and can select either sparse or non-sparse sets of variables. However, when using the EN regression on spectra, there is often a tradeoff between variable selection quality and prediction performance. In the present work, the use of the elastic net to select variables, followed by conventional PLS regression on the selected variables (EN-PLS), has been investigated. Variable selection using EN-PLS was compared with that from elastic net regression, sparse partial least squares regression (SPLS), variable importance in projection (VIP), and from selectivity ratio (SR) selection on two datasets of visible/near-infrared spectra. The EN-PLS method was found to give similar prediction performance and interpretability when compared with SR, and improved performance when compared with the other variable selection methods. 

Acknowledgement: This work was supported by the United States National Science Foundation, grant number 1506853. CG gratefully acknowledges partial summer support from the University of Delaware Summer Scholars Program.




Willian Francisco Cordeiro Dantas1, Luis Gustavo Teixeira Alves Duarte1, Fabiano Severo Rodembusch2, Teresa Dib Zambon Atvars1 and Ronei Jesus Poppi1

1Institute of Chemistry, University of Campinas, P.O. Box 6154, 13083-970, Campinas, SP, Brazil 
2Grupo de Pesquisa em Fotoquímica Orgânica Aplicada, Institute of Chemistry, Federal University of Rio Grande do Sul, Bento Gonçalves Avenue 9500, Porto Alegre, Brazil

In this work, we present a comprehensive procedure using excitation-emission fluorescence matrix (EEM) and multivariate curve resolution with alternating least squares (MCR-ALS) [1] for the analysis of the deprotonation balances from 2-(2’-hydroxyphenyl)benzothiazole (HBT) [2] in different concentrations of hydrogen ions. For acidity variation the Hammet acidity and pH scale were applied using several buffer solutions, allowing the identification of HBT structural variation on the electronic ground and excite states.

It was demonstrated that the presented methodology is also suitable for kinetic analysis of deprotonation balance on the excited state, since it can indicate the relative concentrations profiles of HBT prior and after ionization on the S1. MCR-ALS was performed with augmented matrix and using the constraints non-negativity, unimodality, closure and multiway. Moreover, it was possible the construction of concentrations estimation of all species on both electronic ground and excited states.

The combination of EEM analysis and MCR-ALS provided promising results for acidic strengths prediction on the S1 and S0, which can result on a simple strategy to analyze the enhancement of acidic strength due to light absorption on organic aromatic dyes.

[1] J. Jaumot, A. de Juan and R. Tauler, Chemom. Intell. Lab. Syst., 2015, 140,1–12.
[2] J. K Dey, S. K. Dograt, J. Photochem. Photobioi. A: Chem., 1992, 66, 15-31.

Acknowledgement: The authors would like to thank the CAPES, CNPq and FAPESP for financial support.




Ewa Sikorska, Katarzyna Włodarska

Faculty of Commodity Science, Poznań University of Economics and Business, al. Niepodległości 10, 61-875 Poznań Poland

In the present study we investigate the fluorescence of juices obtained from different red fruits using excitation-emission spectroscopy and multivariate data analysis. The red fruits are rich in phytochemicals, including phenolic antioxidants [1]. The intense colour of these fruits is due to the presence of anthocyanins. Phenolic compounds present in these fruits play a vital role in the prevention of diseases and in the health promoting properties. Anthocyanins present a wide range of biological activities including antioxidant, antimicrobial, anti-inammatory, and anti-carcinogenic activities. Some of phenolic compounds are fluorescent. Thusly, the fluorescence spectroscopy with its multidimensional character, high sensitivity and selectivity, may be well suited for study properties of juices related to their antioxidants content and antioxidant capacity.

Juices obtained from red fruits, including: strawberry, red raspberry, black currant and chokeberry were the subject of this study. Commercial juices selected for study covered various categories of products available on the market. The total uorescence spectra (excitation-emission matrices, EEM) were recorded for each juice. The tentative assignment of the specic emission bands based on the comparison to fluorescence of respective standards. The exploratory analysis was performed using Principal Component Analysis (PCA) and Parallel Factor Analysis (PARAFAC). Observed differences in fluorescence characteristics of juices were ascribed to different compositions and concentrations of phenolic compounds. This preliminary study shows the potential of using fluorescence for quality and authenticity assessment of red fruit juices.

[1] G. A. Manganaris, V.Goulas, A. R. Vicente, L. A. Terry, (2014). J. Sci. Food Agric., 94, 825-833 (2014)

Acknowledgement: Grant 2016/23/B/NZ9/03591 from the National Science Centre, Poland, is gratefully acknowledged.




Ewa Sikorska, Katarzyna Włodarska
Faculty of Commodity Science, Poznań University of Economics and Business, al. Niepodległości 10, 61-875 Poznań Poland

The aim of this study was to test feasibility of the NIR spectroscopy in developing the calibration models for predicting the chemical parameters related to the sweet and sour flavour of apple juices. The apple juice is one of the most consumed juices due to its pleasant flavour and benecial health effects. The taste of juice is an important driver for consumer acceptance. For apples, it was demonstrated that sweet and sour flavours are related to the soluble solids content (SSC), titratable acidity (TA), and even more closely to the ratio of SSC to TA (SSC/TA) [1]. This ratio is used as an index of sensory acceptability of the fruit taste. The discrimination between juices with different SSC/TA ratio may be of interest from both, manufacturers and consumers perspectives, replacing the tedious sensory analysis with instrumental methods.

Near infrared (NIR) spectra were recorded for commercial apple juices. The partial least squares regression (PLS-R) was used to develop the model for determination of SSC/TA ratio, and partial least squares discriminant analysis (PLS-DA) was used to discriminate between juices with low and high SSC/TA ratio. Various spectral pre-processing methods were used for models optimization. The optimal spectral variables were chosen using interval PLS (iPLS) and jack-knife based method. For limited number of samples the PLS-R analysis was performed using NIR spectral data and scores for sweet and sour flavour obtained from sensory analysis. The present results show the potential of the NIR spectroscopy for screening for the important quality parameters of apple juices.

[1] N. Abu-Khalaf, B. S. Bennedsen, Int. Agrophys., 18, 203-211 (2004)

Acknowledgement: Grant 2016/23/B/NZ9/03591 from the National Science Centre, Poland, is gratefully acknowledged.




Elizabeth M. Mudge1,2, Susan J. Murch1, Paula N. Brown1,2 

1University of British Columbia, Kelowna, BC 
2AC Institute of Technology, Burnaby, BC

Flowers of Cannabis sativa L. are used medicinally for pain management, nausea, appetite regulation, anxiety, depression, and spasticity. The cannabinoids, delta-9-tetrahydrocannabinol (THC) and cannabidiol (CBD) are the most studied phytochemicals in Cannabis, while strains with similar THC/CBD profiles can exhibit significantly different pharmacological effects. There may be up to 30,000 unique phytochemical metabolites present in a single plant leaf, so chemometric approaches are necessary to fully elucidate the phytochemical diversity and variation of Cannabis strains. Targeted and untargeted metabolomics were there employed for strain characterization. Over thirty strains of Cannabis were obtained from licensed producers in Canada and subjected to cannabinoid analysis by HPLC-UV and untargeted metabolomics using 1H NMR. The strains were categorized into five different groups based on varying THC and CBD contents. Chemometric analysis showed significant overlap across the classes of strains with separation in the first principal component (PC) primarily impacted by the THCA and CBDA contents while the second and third PCs were associated with untargeted metabolites. Unsupervised clustering identified several groups that were not dependent on the cannabinoid classes, where regions of highest variance indicated the presence of unique metabolite clusters. This demonstrates that metabolomic approaches to evaluate Cannabis strains may be better suited than other approaches to improve product quality and explore the pharmacological variation of medicinal Cannabis.




Jamie Finley1, Paula Brown1

1BC Institute of Technology, Burnaby, BC

The authenticity of botanical ingredients in marketed products has been the focus of widespread concern in the natural product and dietary supplement industry. Classically, plants are identified by physical examination of minimally processed biomass for diagnostic macroscopic and microscopic features. However, botanical ingredients in modern supply chains are so highly processed that diagnostic anatomical features have been destroyed or removed. The use of chemical profiling coupled with chemometric analyses shows promise as an approach for establishing the authenticity of botanical products. A multivariate moving window PCA approach based on the PC score residuals has been used to evaluate adulteration of American ginseng roots with leaves and goldenseal from common adulterants. The model computes standard residuals within windows defined in clean spectra and applies PCA to those residuals to determine if they are within the PCA model limits. MWPCA detected leaf adulteration as low as 2% in ginseng roots, and 15% Rumex crispus in goldenseal roots. This represents a significant improvement compared with traditional regression models for detection of low level of contamination.




Sofie B. Knudsen1,2, Nikoline J. Nielsen1, Johan Qvist2, Kim B. Andersen2, Jan H. Christensen1

1Analytical Chemistry Group, Section for Environmental Chemistry and Physics, Department of Plant and Environmental Sciences, University of Copenhagen, Thorvaldsensvej 40, 1871 Frederiksberg C, Denmark.
2Department of Liquid Formulation, Novozymes A/S, Krogshøjvej 36, 2880 Bagsvaerd, Denmark.

Physical stability of liquid enzyme products is one of the major quality parameters in enzyme production and unfortunately also one that may be influenced by even small variation in the dry matter background. A measure of steady-state physical stability can take weeks to obtain, and hence it can be a slow process to detect poor quality by e.g. visual evaluation of precipitate in the sample. This study aims to identify small molecule endpoints (<1200 Da) that could be indicators of the physical stability of a product, and in that way indicate poor quality products in an early stage. 10 different products have been analyzed by LC-TOFMS in positive and negative mode to distinguish the difference between samples. Furthermore, a set of one product were exposed to forced degradation to distinguish between non-degraded and degraded products (good and poor quality). Both datasets, including facilitator samples, were processed by pixel-based chemometric analysis[1]. Different baseline removal, alignment and normalization procedures were tested and finally variable selection was performed on a subset of relevant features. Different enzyme product could be discriminated by use of a pixel-based chemometric approach, as could non-degraded and degraded products.

[1] Furbo, S., et al. (2014). "Pixel-Based Analysis of Comprehensive Two-Dimensional Gas Chromatograms (Color Plots) of Petroleum: A Tutorial." Analytical Chemistry 86(15): 7160-7170.

Acknowledgement: Thanks to the Innovation Fund Denmark for financial support.




Amir Hossein Alinoori1,2

1Institute of Materials and Energy, Iranian Space Research Center, I.R. Iran 1
2Department of Analytical Chemistry, Faculty of Chemistry, University of Kashan, Kashan, I.R. Iran 2

Discrete Gabor transform combined with Gaussian apodization factor analysis (GAFA) has been developed as an enhanced algorithm to assess the purity of hyper spectral data in decomposed data in time-frequency domain by short time Fourier transform with a Gaussian window. In GAFA method, submatrices in time-frequency domain are extracted by Gaussian apodization moving window by weighting the fixed-size moving window via Gaussian formula. Therefore, each submatrix mainly characterizes an spectrum and by performing factor analysis on this Gaussian weighted submatrix, the number of principal components for each evaluated spectrum, is determined by singular value decomposition (SVD). This precise and quick determination of a rank map is successfully used for extract pure components from hyper spectral data in time-frequency domain. An algorithm based on GAFA was applied to resolve different types of overlapped simulated and real complex hyper spectral data. This algorithm find spectra of pure component with GAFA one by one and eliminate obtained components in time-frequency domain and search for next pure component spectra until all of the components are determined. On the next steps reconstruction of data from in time-frequency domainto time domain was applied.




Amir Hossein Alinoori1, Saeed Masoum1

1Department of Analytical Chemistry, Faculty of Chemistry, University of Kashan, Kashan, I.R. Iran 1

A unique portable ion trap ion mobility spectrometry (IMS) was designed and fabricated. This design consists of fast sampling and injecting system (FSI) coupled with ion trap IMS. FSI design provides fast and easily applied method for real time injection of air sample accompanied by particles with no additional sample preparation. Required analysis cycle time for each run is less than 15 sec. The high sample load and sharp injection with the fast separation by programed thermal desorption decrease the peak widths, and improve detection limits. In addition to coupling of FSI to IMS detector and other consideration in hardware design, miniaturization of IMS cell was done to increase sensitivity and selectivity and warm up time. Data analysis was enhanced with adapted Gaussian apodization factor analysis (GAFA) as a multivariate curve resolution algorithm. This homemade customized instrument is an alternative to other time consuming technologies for monitoring of organic particles in air samples without sample preparation. GAFA has been developed as an enhanced algorithm to assess the purity of ion trap-IMS data. In GAFA method, submatrices are extracted by Gaussian apodization moving window by weighting the fixed-size moving window via Gaussian formula. Therefore, each submatrix mainly characterizes a spectrum and by performing factor analysis on this Gaussian weighted submatrix, the number of principal components for each evaluated spectrum, is determined by singular value decomposition (SVD). This precise and quick determination of a rank map is successfully used for extract pure components from ion trap-IMS data.




Márcia Miguel Castro Ferreira, Maria Cristina Andreazza Costa, Pedro Oliveira Mariz Carvalho

Laboratory for Theoretical and Applied Chemometrics, Institute of Chemistry, University of Campinas Campinas-SP, Brazil, 13083-970

A series of 45 1,4-naphthoquinones derivatives, tested against human HL-60 leukemic cells and previously analyzed by two-dimensional structure-activity relationships [1], was submitted to a 4D-QSAR study, with the main objective to investigate the possible interactions with the receptor. Dynamic molecular (MD) calculations [2] were applied to the 3D optimized structures (Gaussian 3.0, DFT-B3LYP, 6-31G**), in order to obtain a conformational ensemble profile (CEP) for each compound. The CEP were aligned and the field descriptors which are the electrostatic and van der Waals interaction energies (Coulomb and Lennard-Jonnes potentials) were calculated through LQTAgrid module from LQTA-QSAR program [3]. The selected descriptors were used to build the regression model using PLS regression method.

The regression model (eq.1) with 2 factors presented R2 = 0.90 and SEC = 0.23; Q2 = 0.88 and SEV = 0.26. For external validation (eight molecules on test set), R2 = 0.75 and SEP = 0.25.

pIC50 = 0.27 C140 −0.28 L143 + 0.38 C225 -0.21 L133 -0.40 C160                                 (eq. 1)

The positive Coulomb coefficients (C) indicate that the biological activity is favored by polar substituents in these regions, while the negative Lennard-Jones coefficients (L) are indicating that the biological activity increase with bulky groups.

[1] Costa, M.C.A.; Ferreira, M.M.C; SAR QSAR Environ. Res. 2017, 28, 325-339.

[2] Lindahl, E.; Hess, B.; van der Spoel, D.;  Journal of Molecular Modeling 2001, 7, 306-317.

[3] Martins, J. P. A.; Barbosa, E. G.; Pasqualato, K. F. M.; Ferreira, M. M. C.; J.Chem. Inf. Model. 2009, 49, 1428.

Acknowledgements: CNPq, FAPESP.




Márcia Miguel Castro Ferreira1, Nelson Mituo Matsumoto2

1Laboratory for Theoretical and Applied Chemometrics, Institute of Chemistry, University of Campinas Campinas-SP 13084-971, P. O. Box 6154, Brazil
2PPPG Industrial do Brasil Tintas e Vernizes Ltda, Sumaré - SP - Brasil

In this study we propose a multivariate accelerated shelf-Life approach for determining the shelf-life of tinting paste and consequently for industrial coatings. The pigments used were red, yellow, blue, white and black. The tinting pastes were prepared and properly stored at 0 oC, 25 oC and 60 oC. The colorimetric parameters ΔL, Δa and Δb were measured in a spectrophotometer (X-Rite) every 15 days, for 460 days. According to the univariate approach, the kinetic charts would be built and by evaluating the reaction velocity profile it would be possible to determine the reaction order and then convert the results from accelerated tests to actual market conditions. For this, a proportionality constant between different storage temperatures should be determined. On the other hand, the multivariate method is based on a Principal Component Analysis (PCA), in which the scores of the time-related components, instead of individual responses, are taken for estimating the multivariate kinetic rate constants (km), the multivariate acceleration factor (αm) and the multivariate activation energy (AEm). Finally, the multivariate parameters from accelerated tests are converted to actual market conditions.

Shelf-life monitoring in room temperature was also performed to validate the method. Shelf-Life obtained for all colours by the PCA accelerated method were quite consistent with the shelf lives obtained under standard conditions. The multivariate accelerated shelf-life test has proven its value by reducing the kinetic study to a single variable, whilst giving information on what are the main parameters affecting product degradation in a direct fashion.


Acknowledgements: CNPq, FAPESP.




Zieger S. E.1, Mistlberger G.1, Troi L.and Klimant I.1

1Graz University of Technology, Institute of Analytical Chemistry and Food Chemistry (TU Graz, ACFC), Stremayrgasse 9, 8010 Graz, Austria

Early stage identification of (harmful) algal blooms (HABs) has gained in significance for monitoring systems over the years. HAB pose a serious threat to marine and human life, having an adverse effect on their surrounding ecosystem. Various biotoxins produced by algae and accumulated across the food web, can lead to massive fish kill and human disorders.

Facing these problems, various approaches for in-situ classification and early-stage identification have been developed. Among them, pigment-based taxonomic classification of HABs is one promising principle for in-situ characterization of bloom compositions, although it is yet underestimated in marine monitoring programs. However, to demonstrate the applicability and importance of this powerful approach for monitoring programs, we developed a miniaturized and low-cost multi-wavelength fluorometer for in situ detection of relevant algal groups.[1]
Based on chemotaxonomic principles, we characterized algae of eight different phytoplankton classes with a spectro-fluorometer and by the means of our fluorometer. Relevant marker pigments were investigated and significant spectral differences were extracted as key features. These key features were used within Fisher’s linear discriminant analysis for reliable differentiation at order level. A comprehensive investigation of the system performance of axenic algal cultures was conducted in terms of standard statistical measures and independent figures of merits. The focus of attention throughout our studies lies on the reliable discrimination of cyanobacteria and dinoflagellates from co-occurring algae. Besides this, robustness evaluations of the algorithm were assessed during growth and under abnormal light conditions. The separation capability of the linear discriminant analysis was further successfully examined in mixed algal suspensions.

[1] Zieger, S. E. et al. Compact and low-cost fluorescence based flow-through analyzer for in situ quantification and early-stage classification of toxic algal blooms. Environ. Sci. Technol. (submitted), (2018).




Peter Harrington, Zewei Chen

Center for Intelligent Chemical Instrumentation, Department of Chemistry and Biochemistry, Ohio University, Athens, OH 45701, USA

Authentication of Cannabis samples from different cultivations is increasingly important since Cannabis has been used for medical purposes in the recent decade.  Two samples sets of botanical extracts were studied.  The first set is referred to as Cannabis that contained plant material from the sativa, indica, and hybrids of the two species.  The second set, contained extracts from the variety of Cannabis sativa with low tetrahydrocannabinol (THC) concentrations (<0.2%) and is referred to as hemp.  An ultraviolet microplate reader provides a cost-effective and high-throughput method for classifying 15 Cannabis and 20 hemp extracts in this study.  By coupling with five multivariate classifiers, i.e. fuzzy rule-building expert system (FuRES), super partial least squares-discriminant analysis (sPLS-DA), support vector machine (SVM) and two tree-based support vector machines (SVMtreeG and SVMtreeH), good classification rates were achieved from the evaluation of bootstrapped Latin partitions.  For the Cannabis extracts, SVMtreeG yielded the best performance and the classification rate is high as 99.1% for the extract/solvent ratio of 1:10 and 97.1% for the ratio of 1:20.  The results from a matched sample t-test indicates that the classification for the 15 Cannabis extracts can be affected by different dilution ratios.  For the hemp extracts, SVM classifier performed the best with 97.4% of classification rate.  These results demonstrate that ultraviolet microplate reader coupled with multivariate classifiers can be used as a high-throughput and cost-effective approach to authenticate Cannabis and hemp extracts from different cultivations.




Cristina Malegori, Paola Malaspina, Paolo Oliveri, Paolo Giordani, Monica Casale 

Department of Pharmacy, University of Genoa, Viale Cembrano, 4 -14148, Genoa, Italy

Lichens are symbiotic associations between a fungal partner and photosynthetic partners which is most often either a green alga or a cyanobacterium. Lichens, as poikilohydric organisms, cannot actively regulate their water content, so that they are completely dependent on the atmosphere for their water uptake. Thus, lichens are desiccation-tolerant organisms that can tolerate deep desiccation periods but, as water becomes available again, they do recover their metabolic process. Cyanolichens can achieve positive net photosynthesis when directly exposed to liquid water. The time necessary to fill the internal water storage depends on the thickness of the lichen thallus: thick lichens generally need a longer time, but they can also retain water for extended periods after hydration events. Until now, numerous research works have dealt with the effects of thallus hydration on the lichen metabolism, but only a limited number of them have investigated on the intrathalline water localization. In this work, we aimed to visualize the pattern of distribution of water during wet and drying cycle in cyanolichens. Authors analyzed thalli of four cyanolichen species characteristic of environments with a high relative humidity but with different thallus structures and different thallus thickness, using hyperspectral imaging (NIR-HSI) system operating in the 1000-2500 nm spectral region. Lichens were hydrated with deionized water and the desiccation process was followed by NIR-HSI until the complete dehydration of the thallus. The promising results show that NIR-HSI can be regarded as an effective analytical technique for mapping and chemically understanding the wetting and drying cycle in cyanolichens.




Paolo Oliveri1, Cristina Malegori1, Monica Casale1

1University of Genova – Department of Pharmacy (DIFAR) – Viale Cembrano, 4 – 16147 Genova – Italy

Tools for signal pre-processing include a wide number of mathematical transformations generally aimed at minimising unwanted effects (both random and systematic), with the result of improving data quality and, consequently, converting raw data to exploitable useful information.

Examples of common pre-processing corrections are the standard normal variate transform (SNV) – or row autoscaling – and derivatives of different orders, usually applied in combination with smoothing, to overcome the enhancement of random noise, which is usually characterised by high-frequency slope variations.

Along with the desired corrections, application of mathematical transforms may produce undesired secondary effects. In particular, some transforms may introduce artefacts; other may confound interpretation of the final outcomes of the whole signal processing process – a risk that is often underestimated.

In the present work, desired and undesired effects of the most commonly applied signal transforms are reviewed. Moreover, their effects on the interpretation of the outcomes of common chemometric methods (such as principal component analysis – PCA) are critically described, and efficient strategies to overcome these hurdles are presented.

Acknowledgement: Financial support by the Italian Ministry of Education, Universities and Research (MIUR) is gratefully acknowledged – Research Project SIR 2014 “Advanced strategies in near infrared spectroscopy and multivariate data analysis for food safety and authentication”, RBSI14CJHJ (CUP: D32I15000150008).




Dario Cevoli1, Raffaele Vitale1,2, Siewert Hugelier1, Cyril Ruckebusch1

1Laboratoire de Spectrochimie Infrarouge et Raman - LASIR CNRS - UMR 8516, Université de Lille, 59000, Lille, France
2Molecular Imaging and Photonics Unit, Department of Chemistry, Katholieke Universiteit Leuven, Celestijnenlaan 200F, B-3001, Leuven, Belgium

MCR-ALS [1] has demonstrated to be a valuable tool to resolve hyperspectral images (HSI). However, traditional MCR-ALS analysis of an HSI requires the unfolding of the hyperspectral data cube into a two-way array, which causes the loss of information on adjacency between pixels. In such cases, exploiting the spatial structure of HSI to improve the corresponding MCR-ALS solutions is unfeasible [2]. Several approaches have been developed to overcome this issue. The angle constraint (also known as contrast enhancement constraint) proposed by Winding et al. [3, 4] takes advantage from the duality of the spectral and spatial domain to enhance the selectivity of one of the two. Alternatively, a refolding step can be added in the least squares loop of the MCR-ALS decomposition [2]. This allows image processing constraints like sparseness [5], segmentation [2] or total variation, [6, 7] to be employed for improving the visualisation of regions of interest with sharp edges.

The aim of this work is to compare the outcomes resulting from the MCR-ALS analysis of hyperspectral images when contrast enhancement and total variation constraints are applied. Three simulated and a real dataset are here investigated, with the aim of exploring different experimental scenarios in which their effect is tested. The resolution of each dataset is discussed and the pros and cons of the two approaches are highlighted. Particular attention is given to how overlap among components (both in the spectral and spatial domain) can affect the possibility of achieving a physico-chemical meaningful MCR-ALS solution.

[1] R. Tauler, A. Smilde, B. Kowalski, J. Chemometr., 9, 31-58 (1995)
[2] S. Hugelier, O. Devos, C. Ruckebusch, J. Chemometr., 29, 557-561 (2015)
[3] W. Windig, M. R. Keenan, ApplSpectrosc., 65, 349-357 (2011)
[4] W. Windig, J. M. Shaver, M. R. Keenan, B. M. Wise, ChemometrIntellLab., 117, 159-168 (2012)
[5] S. Hugelier, S. Piqueras, C. Bedia, A. de Juan, C. Ruckebusch, AnalChimActa, 1000, 100-108 (2018)
[6] L. Rudin, S. Osher, E. Fatemi, Physica D, 60, 259-268 (1992)
[7] S. Hugelier, R. Vitale, C. Ruckebusch, ApplSpectrosc., 73, 420-431 (2018)




Hugelier Siewert1, Ahmad Mohamad1, Vitale R.1,2,Ruckebusch Cyril1

1Université de Lille, LASIR, F-59655 Villeneuve d’Ascq, Cedex, France.
2KU Leuven, Laboratory of Photochemistry and Spectroscopy, B-3001 Heverlee, Belgium.

SPIDER [1] is a super-resolution image deconvolution approach for high density fluorescence microscopy images. It applies penalized least squares regression with an L0-norm penalty to reach a spatial resolution beyond the diffraction limit of light. In this framework, a parameter λ has to be tuned to balance between the goodness of fit (LS term) and the number of emitters found (penalty term) in the final super-resolved image. As of writing, no automatic procedure to select this parameter is available due to the non-convex nature of the optimization, which results in it being tuned manually via visual inspection.

In this work, we propose a heuristic approach for the optimization of the trade-off parameter λ. We look for λ values that would correspond to the minimum of the Sum of the Normalized Terms (SNT), in the penalized least squares loss function. This heuristic approach has been tested on simulations and turns out to provide exceptional results. We validated it by investigating the robustness of the parameter selection in situations with low to high noise, at different emitter density (ranging from 0.5 – 15 µm-2). The obtained value gives the best relative balance between a low reproduction error and the smallest number of emitters; one tries to find the sparsest solution that gives the best reproduction of the original signal. Furthermore, the approach has also been applied to experimental data sets (i.e. a HEK293-T cell, of which the mitochondria are labelled with the fluorescent protein DAKAP-Dronpa), offering an easy and quick decision criterion for the penalty parameter. Additionally, it reveals an interesting evolution of the penalty parameter with time – which can be related to the fluorophore bleaching – that provides an additional validation of the method. In all, the proposed method is a valuable criterion for the optimization of the penalty parameter and can be used for automatic selection in real experimental situations.

[1] Hugelier et al., Sci. Reports, 2016, 6. 21413.




L. Fernandez1,2, S. Mas1,3, Maria Padilla2, A. Pardo2, S. Marco1,2 

1Signal and Information Processing for Sensing Systems, Institute for Bioengineering of Catalonia (IBEC), The Barcelona Institute of Science and Technology, Baldiri Reixac 10-12, 08028 Barcelona Spain.
2Department of Electronics and Biomedical Engineering, University de Barcelona, Martí i Franqués 1, 08028, Barcelona, Spain.
3Department of Chemical Engineering and Analytical Chemistry, University of Barcelona, Martí I Franqués 1, 08028, Barcelona, Spain.

The resolution of overlapping peak shaped signals is a basic problem in analytical chemistry, since most of the instrumentation techniques provide peak-shaped signals [1]. In estimation theory, the Cramer-Rao inequality[2] provides a lower bound on the variance of unbiased estimators of parametric deterministic signals in the presence of noise. We consider that the empirical signal can be modeled by the two Gaussian functions with unknown parameters A1,2 (the peak height), x1,2 (the position of the center of the peak), and σ1,2 (a measure of the peak width), plus noise assumed to be Additive White Gaussian Noise (AWGN). Other peak shapes are possible in the same formalism. Using this framework, we propose a method to determine the minimum distance between two peak shaped signals so that their positions can be considered statistically different at a certain risk level. We compute the Cramér-Rao bound (CRB) for all the unknown parameters but for the present work we are interested in the variance-covariance of the position of the peaks. Assuming these estimators are gaussian distributed and using as null-hypothesis that both peaks have the same position we can calculate the minimum distance to reject the null hypothesis at a certain risk level. In other words, this minimum distance is the minimum peak separation to resolve the presence of two distinct peaks in the empirical signal. We will show how this minimum separation depends on the relative peak intensities, the peak widths and the noise variance.

[1] Danzer K (2007) Analytical chemistry: theoretical and metrological fundamentals. Springer.
[2] Kay SM (1993) Fundamentals of statistical signal processing. Prentice-Hall PTR.

Acknowledgement: This work was partially funded by the Spanish MINECO program, under grant TEC2014-59229-R. IBEC is a member of the CERCA Programme/Generalitat de Catalunya. CIBER-BBN is an initiative of the Spanish ISCIII. Additional financial support has been provided by the Institut de Bioenginyeria de Catalunya (IBEC).




Pál Péter Hanzelik1, Szilveszter Gergely2, Csaba Gáspár3,4, László Győry1

1Exploration & Production Applied Materials Technology & Labs, MOL Group, Október huszonharmadika utca 18., H-1117 Budapest, Hungary
2Department of Applied Biotechnology and Food Science, Budapest University of Technology and Economics, Szent Gellért tér 4., H-1111 Budapest, Hungary
3Department of Telecommunication and Media Informatics, Budapest University of Technology and Economics, Magyar tudósok körútja 2., H-1111 Budapest, Hungary
4Dmlab Ltd, Aradi utca 8-10, H-1062 Budapest, Hungary

Interests in use of chemometric and data science methods for laboratory techniques have grown rapidly over the last 10 years, for the reason that they are cheaper and faster than traditional analytical methods of material testing.

This study uses 888 rock samples collected from the exploration and production sector of oil industry. Based on the Fourier-transform infrared (FT-IR) spectra of these rock samples their solubility predictions have been developed and investigated with nine methods including both linear and non-linear ones. These programs that were written by the authors are based either on commercial applications or open source libraries. The investigation starts with spectral data pre-processing carried out by standard normal variate (SNV), baseline correction and feature selection methods creating the feature set for all machine learning (ML) applications. The accuracy of predictions have been evaluated with mean squared error as a performance metric for each investigated method.

The comparisons of predicted values to real data of test samples have shown that mineral solubility in acids can be well predicted in the range of the uncertainties of real laboratory measurements, therefore it can be used to improve the response time of these investigations and reduce the risk in industrial applications. In those cases, where the unknown samples have got some out of the range features, the limitations in the accuracy of predictions have become clear. This finding further emphasizes the need for data base building efforts, so that the real potential in big data and machine learning can be realized.

Acknowledgement: The foundation of this work has been laid down by our colleague, Imre Drávucz, whose contribution to the data base building and introducing chemometric methods into our laboratory practice has been instrumental.



Fully automated PARAFAC2 based analysis of GC-MS data

Anne Bech Risum1, Rasmus Bro1

1Chemometrics and Analytical Technology, Univ. Copenhagen, Rolighedsvej 26, Frederiksberg, Denmark

Untargeted GC-MS often gives complex data. No specific chemical analytes are in focus a priori and hence it is not possible to optimize the procedure as in classical analytical chemistry. It has been shown that the PARADISe software[1], based on PARAFAC2, is capable of separating co-eluting peaks and baselines, outperforming state of the art analytical chemical software both in the number of quantified compounds, and in the reproducibility of the analysis.

However, analysing GC-MS datasets using the PARADISe software still involves several manual steps, which are labour intensive and require a skilled tensor data analyst. We want to to develop a fully automated “one button” solution, allowing non-chemometricians to easily utilize this advanced data analysis technique.

To achieve this goal, a number of subproblems have been identified and solved: 1) Division of the chromatogram into intervals suitable for PARAFAC2 analysis has been automated using a brute force approach involving many representations of each peak to find the optimal, 2) Automatic choice of the number of PARAFAC2 components by an expert system, developed and extended from earlier work[2], and 3) Classification of the PARAFAC2 elution profile components as either attractive chemical peaks for quantification or not (e.g. baselines or half-peaks) by a deep convolutional neural network.

Here we show the combined result of these efforts, in the first ever fully automated PARAFAC2 based GC-MS analysis.

[1] Johnsen, L. G., Skou, P. B., Khakimov, B., & Bro, R. Gas chromatography–mass spectrometry data processing made easy. Journal of Chromatography A, 1503, 57-64 (2017)
[2] Johnsen, L. G., Amigo, J. M., Skov, T., & Bro, R. Automated resolution of overlapping peaks in chromatographic data. Journal of Chemometrics, 28(2), 71-82 (2014)