Statistically Based Techniques
Multivariate Outlier Detection Using Robust Statistics
Authors: K. J. Tvarlapati1†, K. A. Hoo*, M. J. Piovoso‡, and R. Hajare±
† Department of Chemical Engineering, University of South Carolina, Columbia, SC
* Department of Chemical Engineering, Texas Tech University
‡ Graduate School, Penn State University, Malvern, PA
± ExxonMobil, Houston, TX
ABSTRACT
Robust multivariate methods for dealing with problems caused by outliers in the data are essential especially when process data are used to validate mechanistic models, develop regression models, and in applications such as controller design and process monitoring. Gross outliers are easily detected by simple methods such as range checking however, a multivariate outlier is very difficult to discern and techniques that rely on data to generate empirical models may produce erroneous results.
In this work, a methodology to perform multivariate outlier replacement in the score space generated by Principal Component Analysis is proposed. The objective is to find an accurate estimate of the covariance matrix of the data so that a Principal Component Analysis model might be developed that can then be used for monitoring and fault detection and identification. The methodology uses the concept of winsorization to provide robust estimates of the mean (location) and the standard deviation (scale) iteratively, yielding a robust set of data. The paper develops the approach, discusses the concept of robust statistics and winsorization, and presents the procedures for robust multivariate outlier filtering. One simulated and two industrial examples are provided to demonstrate the approach.
Publication Information: Computers & Chemical Engineering, 26, pp 17-39, 2002
Corresponding Author: Karlene A. Hoo
Improvements in the Development of Statistical Models for Process Monitoring & Detection
Authors: Daguang Zheng, Karlene A. Hoo* and Michael J. Piovoso†
* Department of Chemical Engineering, Texas Tech University
† Graduate School, Penn State University, Malvern, PA
ABSTRACT
Producing a uniform product is important for several reasons such as maintaining a competitive position, reducing the number of shutdowns and startups, and eliminating of the sources of variability. Multivariate statistical methods caQ assist in the identification of process correlations and the development of process monitoring models [1, 2]. This work extends these concepts by demonstrating that the correlations and resulting monitoring models can be improved greatly with the addition of prefiltering the time signals using a median filter and time-scale decomposition using a multiresolution wavelet function. After the data are filtered and decomposed, the multivariate statistical method of principal component analysis (PCA) is used to develop a process monitoring model. Data taken from a difficult to operate industrial process are used to demonstrate these ideas.
Publication Information: Presented at ASA Conference, Chicago, IL, 1997.
Corresponding Author: Karlene A. Hoo
The Use of Multivariate Statistics in Process Control
Authors: Daguang Zheng, Karlene A. Hoo* and Michael J. Piovoso†
* Department of Chemical Engineering, Texas Tech University
† Graduate School, Penn State University, Malvern, PA
ABSTRACT
33.1 Introduction ........................................................ 561
33.2 Multivariate Statistics .......................................... 562
Principal Component Analysis
Principal Component Regression
Partial Least Squares.
33.3 Example Areas of Applications ............................ 565
Data Analysis
Batch Processes
Inferential Control
Binary Distillation
33.4 Summary ............................................................ 573
33.5 References ......................................................... 573
Publication Information: The Control Handbook, Chapter 33, CRS Press and IEEE Press.
Corresponding Author: Karlene A. Hoo
Process Data Chemometrics
Authors: Karlene A. Hoo*, Michael J. Piovoso†, and James P. Yuk‡
* Department of Chemical Engineering, Texas Tech University
† Graduate School, Penn State University, Malvern, PA
‡ Dupont Chemical Company, Wilmington, DE
ABSTRACT
Data rich but information poor is an excellent way to characterize most chemical processes today. The lack of data analysis tools and adequate fundamental and experimental models makes it difficult to pursure product quality and improved understanding of a process. One data analysis technique successfully applied in spectroscopy to reduce a large quantity of data into meaningful information is Chemometrics. Data when properly interpreted by statistical data analysis tools and fundamental and heuristic models yield meaningful information. In this paper we discuss the use of Chemometrics as a multivariate analyzer to provide a composite measurement of the state of a chemical process operation. An application of this analyzer on a Du Pont Plant is presented, and we introduce two measures to detect and identify important process shifts.
Publication Information: IEEE Transactions on Instrumentation and Measurement, 41:2., pp 262-268, 1992.
Corresponding Author: Karlene A. Hoo
Improved Process Understanding Using Multiway Principal Component Analysis
Authors: Karlene A. Hoo*, Kenneth S. Dahl†, and Michael J. Piovoso‡
* Department of Chemical Engineering, Texas Tech University
† Department of Chemical Engineering, University of South Carolina
‡ Dupont Chemical Company, Wilmington, DE
ABSTRACT
Producing a uniform polymer by batch processing is important for the following reasons: to improve the downstream processing performance, to enable material produced at one site to be used by another, and to remain competitive. Eliminating the sources of batch-to-batch variability and tightening the control of key variables are but two ways to accomplish these objectives. In this work, it is shown that multiway principal component analysis (MPCA) can be used to identify major sources of variability in the processing steps. The results show that the major source of batch-to-batch variability is due to reactor temperature variations arising from disturbances in the heating system and other heat-transfer limitations. Correlations between the variations in the processing steps and the final product properties are found, and recommendations to reduce the sources of variations are discussed.
Publication Information: Ind. & Eng. Chem. Res., Vol. 35, pp 138-146, 1996.
Corresponding Author: Karlene A. Hoo
Translating Third-Order Data Analysis Methods to Chemical Batch Processes
Authors: Karlene A. Hoo*, Kenneth S. Dahl†, and Michael J. Piovoso‡
* Department of Chemical Engineering, Texas Tech University
† Department of Chemical Engineering, University of South Carolina
‡ Dupont Chemical Company, Wilmington, DE
ABSTRACT
Measurements collected from batch processes naturally produce a third-order or three-dimensional data form. Such a structure also results when multiple samples are measured using hyphenated analysis techniques such as liquid chromatography-diode array detection. Analysis of third-order data by a method such as principal components analysis (PCA) is achieved by a non-unique rearrangement that produces a two-dimensional array. This explicitly and preferentially models only one of three possible orders. In contrast, methods such as parallel factor analysis (PARAFAC) apply a particular decomposition that accounts for all three orders. The results from either method should be related if data are to be interpreted reliably for applications such as on-line monitoring and control. This work compares and contrasts these two approaches from an applied point of view. To accomplish this objective, exemplary methods are selected from each type of analysis, parallel factor analysis (PARAFAC), and multiway principal components analysis (MPCA). These are employed to analyze industrial data taken from the manufacture of a condensation polymer in a batch reactor.
Publication Information: Chemometrics and Intelligent Laboratory Systems, Vol. 46, pp 161-180, 1999.
Corresponding Author: Karlene A. Hoo