Rome University, La Sapienza
Chemistry Department
Rome, Italy, Europe
Dr. Giovanni Visco, Ott 2007
Chemiometria, analisi multivariata, clustering, pattern recognition, exploratory data analysis ...
Corso di Laurea in: Scienze Applicate ai Beni Culturali ed alla Diagnostica per la loro Conservazione
Corso di laurea in: Chimica Ambientale
previous slide, 2 all lessons, these slides index next slide, 4

Cosa si intende per Chemiometria?

what lies behind the Chemometrics or Chemometry name?

  All chemometricians reply in different ways to this query. My purpose is not to write a new ones but I want to mention the "others". Here you can see three lists.

  Taking back the preceding slide, I don't believe there is, all over the world, a degree course that can cover the contents of this slide, needs a definition of precedence keeping in mind of the necessary specialisation to the CC.HH. or Environment's students.

the indexes: from Brereton, from Hardy, from Massart
in fuga
Prof. James K. Hardy

  From Hardy Research Group, Department of Chemistry, The University of Akron, Ohio, USA. A long list of slides (including those on Chemometry) is written in a simple language so that students can easily understand them. These slides, characterised by a correct language and few mistakes, soon began famous. The list is the first answer to our question.

Prof. D.L. Massart

  Taken by the last version of his book (composed by two volumes) and considered the Chemometry's bible. D.L. Massart, B.G.M. Vandeginste, L.M.C. Buydens, S. De Jong, P.J. Lewi, J. Smeyers-Verbeke, Handbook of Chemometrics and Qualimetrics, Elsevier, 1997-1998

http://ull.chemistry.uakron.edu/chemometrics/
  1. Introduction
    • What is Chemometrics
    • Complex samples
    • Chemometric methods
    • What are data
    • Types of data
    • Obtaining meaningful data
  2. Basic statistics
    • Error, uncertainty and probability
    • Normal distribution
    • Large data sets
    • Smaller data sets
    • Univariate tools
    • Pooled statistics
  3. Simple ANOVA
    • Simple analysis of variance
    • Confidence intervals
    • The t test
    • Do two means differ?
    • The F test
  4. Rejection of data
    • Outliers
    • Huge errors
    • Dixon test
    • Grubbs test
  5. Experimental design
    • Experimental Design
    • Simple analysis of variance
    • Two way analysis of variance
    • Randomized blocks and ANOVA
    • Latin squares
    • Factorial design
  6. Simple modeling
    • Types of models
    • Linear regression
    • Goodness of fit
    • Correlation coefficient
    • Data transformations
    • Multiple linear regression
    • Non-linear regression
  7. Signal detection
    • Signals
    • Limit of detection
    • Precision
    • Optimization
    • Averaging
    • Integration
    • Filtering
    • Multiplex spectroscopy
    • Post collection
  8. Calibration
    • Constructiong a calibration curve
    • Linear modeling
    • Linear models and uncertainty
    • Detection limit, sensitivity and linear range
    • Using the residuals
    • Standard addition
  9. Exploration
    • Complex samples
    • Leverage
    • Pattern recognition
    • Pre-processing
    • Translation & scaling
    • Autoscaling
    • Feature weighting
    • Eigenvectors
  10. Hierarchical Cluster Analysis
    • Distance and similarity
    • Clustering Methods
    • Dendrograms
    • Examples
    • So what's it good for?
  11. Principal Component Analysis
    • PCA
    • NIPALS
    • Varimax rotation
    • PCA of artifacts
    • Classification of whiskey
    • GC/MS data
    • Other examples
  12. Classification
    • Data sets
    • Similarity classification
    • Linear learning machine
    • K nearest neighbor
    • SIMCA
  13. Multivariate calibration
    • Principal component regression
    • Partial least squares regression
    • Regression examples
  14. Neural networks
    • Network components
    • Learning in neural networks
    • Backpropagation
    • Dynamic learning vector quantization
    • Self-organizing maps
Part. A: ISBN: 0-444-89724-0
  1. Statistical Description of the Quality of Processes and Measurements
    • Introductory concepts about chemical data
    • Measurement of quality
    • Quality of processes and statistical process control
    • Quality of measurements in relation to quality of processes
    • Precision and bias of measurements
    • Some other types of error
    • Propagation of errors
    • Rounding and rounding errors
  2. The Normal Distribution
    • Population parameters and their estimators
    • Moments of a distribution: mean, variance, skewness
    • The normal distribution: description and notation
    • Tables for the standardized normal distribution
    • Standard errors
    • Confidence intervals for the mean
    • Small samples and the t-distribution
    • Normality tests: a graphical procedure
    • How to convert a non-normal distribution into a normal one
  3. An Introduction to Hypothesis Testing
    • Comparison of the mean with a given value
    • Null and alternative hypotheses
    • Using confidence intervals
    • Comparing a test value with a critical value
    • Presentation of results of a hypothesis test
    • Level of significance and type I error
    • Power and type II errors
    • Sample size
    • One- and two-sided tests
    • An alternative approach: interval hypotheses
  4. Some Important Hypothesis Tests
    • Comparison of two means
    • Multiple comparisons
    • ß error and sample size
    • Comparison of variances
    • Outliers
    • Distribution tests
  5. Analysis of Variance
    • One-way analysis of variance
    • Assumptions
    • Fixed effect models: testing differences between means of columns
    • Random effect models: variance components
    • Two-way and multi-way ANOVA
    • Interaction
    • Incorporation of interaction in the residual
    • Experimental design and modelling
    • Blocking
    • Repeated testing by ANOVA
    • Nested ANOVA
  6. Control Charts
    • Quality control
    • Mean and range charts
    • Charts for attributes
    • Moving average and related charts
    • Further developments
  7. Straight Line Regression and Calibration
    • Introduction
    • Straight line regression
    • Correlation
    • References
  8. Vectors and Matrices
    • The data table as data matrix
    • Vectors
    • Matrices
  9. Multiple and Polynomial Regression
    • Introduction
    • Estimation of the regression parameters
    • Validation of the model
    • Confidence intervals
    • Multicollinearity
    • Ridge regression
    • Multicomponent analysis by multiple linear regression
    • Polynomial regression
    • Outliers
  10. Non-linear Regression
    • Introduction
    • Mechanistic modelling
    • Empirical modelling
  11. Robust Statistics
    • Methods based on the median
    • Biweight and winsorized mean
    • Iteratively reweighted least squares
    • Randomization tests
    • Monte Carlo methods
  12. Internal Method Validation
    • Definition and types of method validation
    • The golden rules of method validation
    • Types of internal method validation
    • Precision
    • Accuracy and bias
    • Linearity of calibration lines
    • Detection limit and related quantities
    • Sensitivity
    • Selectivity and interferences
  13. Method Validation by Interlaboratory Studies
    • Types of interlaboratory studies
    • Method-performance studies
    • Laboratory-performance studies
  14. Other Distributions
    • Introduction-probabilities
    • The binomial distribution
    • The hypergeometric distribution
    • The Poisson distribution
    • The negative exponential distribution and the Weibull distribution
    • Extreme value distributions
  15. The 2×2 Contingency Table
    • Statistical descriptors
    • Tests of hypothesis
  16. Principal Components
    • Latent variables
    • Score plots
    • Loading plots
    • Biplots
    • Applications in method validation
    • The singular value decompostion
    • The resolution of mixtures by evolving factor analysis and the HELP method
    • Principal component regression and multivariate calibration
    • Other latent variable methods
  17. Information Theory
    • Uncertainty and information
    • An application to thin layer chromatography
    • The information content of combined procedures
    • Inductive expert systems
    • Information theory in data analysis
  18. Fuzzy Methods
    • Conventional set theory and fuzzy set theory
    • Definitions and operations with fuzzy sets
    • Applications
  19. Process Modelling and Sampling
    • Introduction
    • Measurability and controllability
    • Estimators of system states
    • Models for process fluctuations
    • Measurability and measuring system
    • Choice of an optimal measuring system: cost considerations
    • Multivariate statistical process control
    • Sampling for spatial description
    • Sampling for global description
    • Sampling for prediction
    • Acceptance sampling
  20. An Introduction to Experimental Design
    • Definition and terminology
    • Aims of experimental design
    • The experimental factors
    • Selection of responses
    • Optimization strategies
    • Response functions: the model
    • An overview of simultanous (factorial) designs
  21. Two-level Factorial Designs
    • Terminology: a pharmaceutical technology example
    • Direct estimation of effects
    • Yates' method of estimating effects
    • An example from analytical chemistry
    • Significance of the estimated effects: visual interpretation
    • Significance of the estimated effects: by using the standard deviation of the effects
    • Significance of the estimated effects: by ANOVA
    • Least squares modelling
    • Artefacts
  22. Fractional Factorial Designs
    • Need for fractional designs
    • Confounding: example of a half-fraction factorial design
    • Defining contrasts and generators
    • Resolution
    • Embedded full factorials
    • Selection of additional experiments
    • Screening designs
  23. Multi-level Designs
    • Linear and quadratic response surfaces
    • Quality criteria
    • Classical symmetrical designs
    • Non-symmetrical designs
    • Response surface methodology
    • Non-linear models
    • Latin square designs
  24. Mixture Designs
    • The sum constraint
    • The ternary diagram
    • Introduction to the Simplex design
    • Simplex lattice and centroid designs
    • Upper or lower bounds
    • Upper and lower bounds
    • Combining mixture and process variables
  25. Other Optimization Methods
    • Introduction
    • Sequential optimization methods
    • Steepest ascent methods
    • Multicriteria decision making
    • Taguchi methods
  26. Genetic Algorithms and Other Global Search Strategies
    • Introduction
    • Application scope
    • Principle of genetic algorithms
    • Configuration of genetic algorithms
    • Search behaviour of genetic algorithms
    • Hybridization of genetic algorithms
    • Example
    • Applications, Simulated annealing
    • Tabu search
Part. B: ISBN: 0-444-82853-2
  1. Introduction to Vectors and Matrices
    • Vectors
    • Matrices and Operations on Matrices
    • Vector space
    • Geometrical properties of vectors
    • Matrices
    • Matrix product
    • Dimensions and rank
    • Eigenvectors and eigenvalues
    • Statistical interpretation of matrices
    • Geometrical interpretation of matrix products
  2. Cluster Analysis
    • Clusters
    • Measures of (dis)similarity
    • Clustering algorithms
  3. Analysis of Measurement Tables
    • Introduction
    • Principal components analysis
    • Geometrical interpretation
    • Preprocessing
    • Algorithms
    • Validation
    • Principal coordinates analysis
    • Non-linear principal components analysis
    • PCA and cluster analysis
  4. Analysis of Contingency Tables
    • Contingency table
    • Chi-square statistic
    • Closure
    • Weighted metric
    • Distance of chi-square
    • Correspondence factor analysis
    • Log-linear model
  5. Supervised Pattern Recognition
    • Supervised and unsupervised pattern recognition
    • Derivation of classification rules
    • Feature of selection and reduction
    • Validation of classification rules
  6. Curve and Mixture Resolution by Factor Analysis and Related Techniques
    • Abstract and true factors
    • Full-rank methods
    • Evolutionary and local rank methods
    • Pure column (or row) techniques
    • Quantitative methods for factor analysis
    • Application of factor analysis for peak purity check in HPLC
    • Guidance for the selection of a factor analysis method
  7. Relations between Measurement Tables
    • Introduction
    • Procrustes analysis
    • Canonical correlation analysis
    • Multivariate least squares regression
    • Reduced rank regression
    • Partial least squares regression
    • Continuum regression methods
    • Concluding remarks
  8. Multivariate Calibration
    • Introduction
    • Calibration methods
    • Validation
    • Other aspects
    • New developments
  9. Quantitative Structure-Activity Relationships (QSAR)
    • Extrathermodynamic methods
    • Principal components models
    • Canonical variate models
    • Partial least squares models
    • Other approaches
  10. Analysis of Sensory Data
    • Introduction
    • Difference tests
    • Multidimensional scaling
    • The analysis of Quantitative Descriptive Analysis profile data
    • Comparison of two or more sensory data sets
    • Linking sensory data to instrumental data
    • Temporal aspects of perception
    • Production formulation
  11. Pharmacokinetic Models
    • Introduction
    • Compartmental analysis
    • Non-compartmental analysis
    • Compartment models versus non-compartmental analysis
    • Linearization of non-linear models
  12. Signal Processing
    • Signal domains
    • Types of signal processing
    • The Fourier transform
    • Convolution
    • Signal processing
    • Deconvolution by Fourier transform
    • Other transforms
  13. Kalman Filtering
    • Introduction
    • Recursive regression of a straight line
    • Recursive multicomponent analysis
    • System equations
    • The Kalman filter
    • Adaptive Kalman filtering
    • Applications
  14. Applications of Operations Research
    • An overview
    • Linear programming
    • Queueing problems
    • Discrete event simulation
    • A shortest path problem
  15. Artificial Intelligence: Expert and Knowledge Based Systems
    • Artificial intelligence and expert systems
    • Expert systems
    • Structure of expert systems
    • Knowledge representation
    • The interference engine
    • The interaction module
    • Tools
    • Developments of an expert system
    • Conclusion
  16. Artificial Neural Networks
    • Introduction
    • Historical overview
    • The basic unit - the neuron
    • The linear learning machine and the perception network
    • Multilayer feed forward (MLF) networks
    • Radial basis function networks
    • Kohonen networks
    • Adaptive resonance theory networks
  17. Valid HTML 4.01 Transitional

Prof. Richard Brereton

  From Bristol Chemometrics, Bristol University, U.K. The new book Applied Chemometrics for Scientists, J. Wiley & Sons (2007), ISBN: 0470016868. From one of the fathers of chemometrics the update of previous, 2003, book with more and more applications.

ISBN: 0-470-01686-8
  1. Introduction
    • Development of Chemometrics
    • - Early Developments
    • - 1980s and the Borderlines between Other Disciplines
    • - 1990s and Problems of Intermediate Complexity
    • - Current Developments in Complex Problem Solving
    • Application Areas
    • How to Use this Book
    • Literature and Other Sources of Information
  2. Experimental Design
    • Why Design Experiments in Chemistry?
    • Degrees of Freedom and Sources of Error
    • Analysis of Variance and Interpretation of Errors
    • Matrices, Vectors and the Pseudoinverse
    • Design Matrices
    • Factorial Designs
    • - Extending the Number of Factors
    • - Extending the Number of Levels
    • An Example of a Factorial Design
    • Fractional Factorial Designs
    • Plackett-Burman and Taguchi Designs
    • The Application of a Plackett-Burman Design to the Screening of Factors Influencing a Chemical Reaction
    • Central Composite Designs
    • Mixture Designs
    • - Simplex Centroid Designs
    • - Simplex Lattice Designs
    • - Constrained Mixture Designs
    • A Four Component Mixture Design Used to Study Blending of Olive Oils
    • Simplex Optimization
    • Leverage and Confidence in Models
    • Designs for Multivariate Calibration
  3. Statistical Concepts
    • Statistics for Chemists
    • Errors
    • - Sampling Errors
    • - Sample Preparation Errors
    • - Instrumental Noise
    • - Sources of Error
    • Describing Data
    • - Descriptive Statistics
    • - Graphical Presentation
    • - Covariance and Correlation Coefficient
    • The Normal Distribution
    • - Error Distributions
    • - Normal Distribution Functions and Tables
    • - Applications
    • Is a Distribution Normal?
    • - Cumulative Frequency
    • - Kolmogorov-Smirnov Test
    • - Consequences
    • Hypothesis Tests
    • Comparison of Means: the t-Test
    • F-Test for Comparison of Variances
    • Confidence in Linear Regression
    • Linear Calibration
    • - Example
    • - Confidence of Prediction of Parameters
    • More about Confidence
    • - Confidence in the Mean
    • - Confidence in the Standard Deviation
    • Consequences of Outliers and How to Deal with Them
    • Detection of Outliers
    • - Normal Distributions
    • - Linear Regression
    • - Multivariate Calibration
    • Shewhart Charts
    • More about Control Charts
    • - Cusum Chart
    • - Range Chart
    • - Multivariate Statistical Process Control
  4. Sequential Methods
    • Sequential Data
    • Correlograms
    • - Auto-correlograms
    • - Cross-correlograms
    • - Multivariate Correlograms
    • Linear Smoothing Functions and Filters
    • Fourier Transforms
    • Maximum Entropy and Bayesian Methods
    • - Bayes' Theorem
    • - Maximum Entropy
    • - Maximum Entropy and Modelling
    • Fourier Filters
    • Peakshapes in Chromatography and Spectroscopy
    • - Principal Features
    • - Gaussians
    • - Lorentzians
    • - Asymmetric Peak Shapes
    • - Use of Peak Shape Information
    • Derivatives in Spectroscopy and Chromatography
    • Wavelets
  5. Pattern Recognition
    • Introduction
    • - Exploratory Data Analysis
    • - Unsupervised Pattern Recognition
    • - Supervised Pattern Recognition
    • Principal Components Analysis
    • - Basic Ideas
    • - Method
    • Graphical Representation of Scores and Loadings
    • - Case Study 1
    • - Case Study 2
    • - Scores Plots
    • - Loadings Plots
    • - Extensions
    • Comparing Multivariate Patterns
    • Preprocessing
    • Unsupervised Pattern Recognition: Cluster Analysis
    • Supervised Pattern Recognition
    • - Modelling the Training Set
    • - Test Sets, Cross-validation and the Bootstrap
    • - Applying the Model
    • Statistical Classification Techniques
    • - Univariate Classification
    • - Bivariate and Multivariate Discriminant Models
    • - SIMCA
    • - Statistical Output
    • K Nearest Neighbour Method
    • How Many Components Characterize a Dataset?
    • Multiway Pattern Recognition
    • - Tucker3 Models
    • - PARAFAC
    • - Unfolding
  6. Calibration
    • Univariate Calibration
    • - Classical Calibration
    • - Inverse Calibration
    • - Calibration Equations
    • - Including Extra Terms
    • - Graphs
    • Multivariate Calibration and the Spectroscopy of Mixtures
    • Multiple Linear Regression
    • Principal Components Regression
    • Partial Least Squares
    • How Good is the Calibration and What is the Most Appropriate Model?
    • - Autoprediction
    • - Cross-validation
    • - Test Sets
    • - Bootstrap
    • Multiway Calibration
    • - Unfolding
    • - Trilinear PLS1
    • - N-PLSM
  7. Coupled Chromatography
    • Preparing the Data
    • - Preprocessing
    • - Variable Selection
    • - Chemical Composition of Sequential Data
    • Univariate Purity Curves
    • Similarity Based Methods
    • - Similarity
    • - Correlation Coefficients
    • - Distance Measures
    • - OPA and SIMPLISMA
    • Evolving and Window Factor Analysis
    • - Expanding Windows
    • - Fixed Sized Windowsv
    • - Variations
    • Derivative Based Methods
    • Deconvolution of Evolutionary Signals
    • Noniterative Methods for Resolution
    • - Selectivity: Finding Pure Variables
    • - Multiple Linear Regression
    • - Principal Components Regression
    • - Partial Selectivity
    • Iterative Methods for Resolution
  8. Equilibria, Reactions and Process Analytics
    • The Study of Equilibria using Spectroscopy
    • Spectroscopic Monitoring of Reactions
    • - Mid Infrared Spectroscopy
    • - Near Infrared Spectroscopy
    • - UV/visible Spectroscopy
    • - Raman Spectroscopy
    • - Summary of Main Data Analysis Techniques
    • Kinetics and Multivariate Models for The Quantitative Study of Reactions
    • Developments in The Analysis of Reactions using On-Line Spectroscopy
    • - Constraints and Combining Information
    • - Data Merging
    • - Three-Way Analysis
    • he Process Analytical Technology Initiative
    • - Multivariate Tools for Design, Data Acquisition and Analysis
    • - Process Analysers
    • - Process Control Tools
    • - Continuous Improvement and Knowledge Management Tools
  9. Improving Yields and Processes Using Experimental Designs
    • Use of Statistical Designs for Improving the Performance of Synthetic Reactions
    • Screening for Factors that Influence the Performance of a Reaction
    • Optimizing the Process Variables
    • Handling Mixture Variables using Simplex Designs
    • - Simplex Centroid and Lattice Designs
    • - Constraints
    • More about Mixture Variables
    • - Ratios
    • - Minor Constituents
    • - Combining Mixture and Process Variables
    • - Models
  10. Biological and Medical Applications of Chemometrics
    • Introduction
    • - Genomics, Proteomics and Metabolomics
    • - Disease Diagnosis
    • - Chemical Taxonomy
    • Taxonomy
    • Discrimination
    • - Discriminant Function
    • - Combining Parameters
    • - Several Classes
    • - Limitations
    • Mahalanobis Distance
    • Bayesian Methods and Contingency Tables
    • Support Vector Machines
    • Discriminant Partial Least Squares
    • Micro-organisms
    • - Mid Infrared Spectroscopy
    • - Growth Curves
    • - Further Measurements
    • - Pyrolysis Mass Spectrometry
    • Medical Diagnosis using Spectroscopy
    • Metabolomics using Coupled Chromatography and Nuclear Magnetic Resonance
    • - Coupled Chromatography
    • - Nuclear Magnetic Resonance
  11. Biological Macromolecules
    • Sequence Alignment and Scoring Matches
    • Sequence Similarity
    • Tree Diagrams
    • - Diagrammatic Representations
    • - Dendrograms
    • - Evolutionary Theory and Cladistics
    • - Phylograms
    • Phylogenetic Trees
  12. Multivariate Image Analysis
    • Scaling image
    • - Scaling Spectral Variables
    • - Scaling Spatial Variables
    • - Multiway Image Preprocessing
    • Filtering and Smoothing the Image
    • Principal Components for the Enhancement of image
    • Regression of image
    • Alternating Least Squares as Employed in Image Analysis
    • Multiway Methods In Image Analysis
  13. Food
    • Introduction
    • - Adulteration
    • - Ingredients
    • - Sensory Studies
    • - Product Quality
    • - Image Analysis
    • How to Determine the Origin of a Food Product using Chromatography
    • Near Infrared Spectroscopy
    • - Calibration
    • - Classification
    • - Exploratory Methods
    • Other Information
    • - Spectroscopies
    • - Chemical Composition
    • - Mass Spectrometry and Pyrolysis
    • Sensory Analysis: Linking Composition to Properties
    • - Sensory Panels
    • - Principal Components Analysis
    • - Advantages
    • Varimax Rotation
    • Calibrating Sensory Descriptors to Composition
lo raggiungono?
previous slide, 2 all lessons, these slides index next slide, 4