Tải bản đầy đủ (.pdf) (360 trang)

practical grey-box process identification, springer (2006)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.23 MB, 360 trang )

Advances in Industrial Control
Other titles published in this Series:
Digital Controller Implementation
and Fragility
Robert S.H. Istepanian and
James F. Whidborne (Eds.)
Optimisation of Industrial Processes
at Supervisory Level
Doris S áez, Aldo Cipriano and
Andrzej W. Ordys
Robust Control of Diesel Ship Propulsion
Nikolaos Xiros
H ydraulic Servo-systems
Mohieddine Jelali and Andreas Kroll
Strategies for Feedback Linearisation
Freddy Garces, Victor M. Becerra,
Chandrasekhar Kambhampati and
Kevin W arwick
Robust A utonomous Guidance
Alberto Isidori, Lorenzo Marconi and
Andrea Serrani
Dyna mic Modelling of Gas Turbines
Gennady G. Kulikov and Haydn A.
Thompson (Eds.)
ControlofFuelCellPowerSystems
Jay T. Pukrushpan, Anna G. Stefanopoulou
and Huei Peng
Fuzzy Logic, Identification and Pr edictiv e
Control
Jairo Espinosa, Joos Vandewalle and


Vincent Wertz
Optimal Real-time Control of Sewer
Networks
Magdalene Marinaki and Markos
Papageorgiou
Process Modelling for Control
Benoît Codrons
Computational Intelligence in Time Series
Forecasting
Ajoy K. Palit and Dobrivoje Popovic
Modelling and Control of mini-Flying
Machines
Pedro Castillo, Rogelio Lozano and
Alejandro Dzul
Rudder and Fin Ship Roll Stabilization
Tristan Per ez
Hard Disk Drive Servo Systems (2nd
Edition)
Ben M. Chen, Tong H. Lee, Kemao Peng
and Venkatakrishnan Venkataramanan
Measurement, Control, and
Communication Using IEEE 1588
John Eidson
Piezoelectric Transducers for Vibration
Control and Damping
S.O. Reza Moheimani and Andrew J.
Fleming
Windup in Control
Peter Hippe
Manufacturing Systems Control Design

Stjepan Bogdan, Frank L. Lewis, Zdenko
Kova
ˇ
ci
´
c and José Mireles Jr.
Nonlinea r H
2
/H

Constrained Feedback
Control
Murad Abu-Khalaf, Jie Huang and
Frank L. Lewis
Modern Supervisory and Optimal Control
Sandor A. M ark o n, Hajime Kita, Hiroshi
Kise and Thomas Bartz-Beielstein
Publication due July 2006
Wind Turbine Control Systems
Fernando D. Bianchi, Hernán De Battista
and Ricardo J. Mantz
Publication due August 2006
Soft Sensors for Monitoring and Control of
Industrial Processes
Luigi Fortuna, Salvatore Graziani,
Alessandro Rizzo and Maria Ga briella
Xibilia
Publication due August 2006
Practical PID Control
Antonio V i sioli

Publication due November 2006
Magnetic Control of Tokamak Plasmas
MarcoAriolaandAlfredoPironti
Publication due May 2007
Torsten Bohlin
Practical Grey-box
Process Identification
Theory and Applications
With 186 Figures
123
Torsten Bohlin
Automa tic Control, Signals, Sensors and Systems
Ro yal Institute o f Technology (KTH)
SE-100 44 Stockholm
Sweden
British Library Cataloguing in Publication Data
Bohlin, Torsten, 1931-
Practical grey-box process identification : theory and
applications. - (Advances in industrial control)
1.Process control - Mathematical models 2.Process control -
Mathematical models - Case studies
I.Title
670.4’27
ISBN-13: 9781846284021
ISBN-10: 1846284023
Library of Congress Co ntrol Number: 2006925303
Advances in Industrial Control series ISSN 1430-9491
ISBN-10: 1-84628-402-3 e-ISBN 1-84628-403-1 Printed on acid-free paper
ISBN-13: 978-1-84628-402-1
© Springer-Verlag London Limited 2006

MATLAB® and Simulink® are registered trademarks of The MathWorks, Inc., 3 Apple Hill Drive, Natick,
MA 01760-2098, U.S.A.
Modelica® is a registered trademark of the “Modelica Association” />Dymola
TM
is a trademark of Dynasim AB, R esearch Park Ideon, Lund 223 70, Sweden. www.Dynasim.se
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as
permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduc ed,
stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers,
or in the case of reprographic reproduct ion in accordance with the terms of licences issued by the
Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to
the publishers.
The use of registered names, trademarks, etc. in this publication does not imply, even in the absence of a
specific statement, that such names are exempt from the relevant laws and regulations and therefore free
for general use.
The publisher makes no representation, express or implied, with regard to the accuracy of the information
contained in this book and cannot accept any legal responsibility or liability for any errors or omissions
that may be made.
Printed in Germany
987654321
Springer Science+Business Media
springer.com
Advances in Industrial Control
Series Editors
Professor Michael J. Grimble, Professor of Industrial Systems and Director
Professor Michael A. Johnson, Professor (Emeritus) of Control Systems
and Deputy Director
Industrial Con trol Centre
Department of Electronic and Electrical Engineering
U niversity of Strathclyde
Graham Hills Building

50 Geor ge Street
Glasgow G1 1QE
United Kingdom
Series Advisory Board
Pr ofessor E.F. Camacho
Escuela Superior de Ingenieros
UniversidaddeSevilla
Camino de los Descobrimientos s/n
41092 Sevilla
Spain
Professor S. Engell
Lehrstuhl für Anlagensteuerungstechnik
Fachbereich Chemietechnik
Universität Dortmund
44221 Dortmund
Germany
Professor G. Goodwin
Department of Electrical and Computer Engineering
The University of Newcastle
Callaghan
NSW 2308
Australia
Professor T.J. H arris
Department of Chemical Engineering
Queen’s University
Kingston, Ontario
K7L 3N6
Canada
Professor T.H. Lee
Department of Electrical Eng ineering

National University of Singapore
4 Engineering Drive 3
Singapore 117576
Professor Emeritus O.P. Malik
Department of Electrical and Computer Engineering
University of Calgary
2500, University Drive, NW
Calgary
Alberta
T2N 1N4
Canada
Professor K F. Man
Electronic Engineer ing Department
City Uni versity of Hong Kong
Tat C hee Avenue
Kowloon
Hong Kong
Professor G. Olsson
Department of I ndustrial Electrical Engineering and Automation
Lund Institute of Technology
Box 118
S-221 00 Lund
Sweden
Professor A. Ray
Pennsylvania State University
Department of Mechanical Engineering
0329 Reber Building
University Park
PA 16802
USA

Professor D.E. Seborg
Chemical Engineering
3335 Engineering II
University of California Santa Barbara
Santa Barbara
CA 93106
USA
Doctor K.K. Tan
Department of Electrical Eng ineering
National University of Singapore
4 Engineering Drive 3
Singapore 117576
Professor Ikuo Yamamoto
Kyushu University Graduate School
Marine Technology Research and Development Program
MARITEC, Headquarters, JAMSTEC
2-15 Natsushima Yokosuka
Kanagawa 237-0061
Japan
To the KTH class of F53
Series Editors’ Foreword
The series Advances in Industrial Control aims to report and encourage technology
transfer in control engineering. The rapid development of control technology has
an impact on all areas of the control discipline. New theory, new controllers,
actuators, sensors, new industrial processes, computer methods, new applications,
new philosophies}, new challenges. Much of this development work resides in
industrial reports, feasibility study papers and the reports of advanced collaborative
projects. The series offers an opportunity for researchers to present an extended
exposition of such new work in all aspects of industrial control for wider and rapid
dissemination.

Experienced practitioners in the field of industrial control often say that about
70 – 80% of project time is spent on understanding and modelling a process,
developing a simulation and then testing, calibrating and validating the simulation.
Control design and investigations will then absorb the other 20 – 30% of the
project time; thus, it is perhaps a little surprising that there is so little published on
the formal procedures and tools for performing these developmental modelling
tasks compared with the provision of simulation software tools. There is a very
clear difference between these two types of activities: simulation tools usually
comprise libraries of numerical routines and a logical framework for their
interconnection often based on graphical representations like block diagrams of the
actual steps needed to arrive at a consistent model but replicating observed
physical process behaviour is a far more demanding objective. Such is the agenda
underlying the inspirational work of Torsten Bohlin reported in his new Advances
in Industrial Control monograph, Practical Grey-box Identification.
The starting point for this work lies in the task of providing models for a range
of industrial production processes including: Baker’s yeast production, steel rinsing
(the rinsing of moving steel strip in a rolling-mill process), continuous pulp
digestion, cement milling, an industrial recovery boiler process (pulp production
process unit) and cardboard manufacturing. The practical experience of producing
these models supplied the raw data for understanding and abstracting the steps
needed in a formal grey-box identification procedure; thus, it was a project that has
been active for over 15 years and over this period, the grey-box identification
procedure was formulated, tested, re-formulated and so-on until a generic
procedure of wide applicability finally emerged.
x Series Editors’ Foreword
In parallel with this extraction of the fundamental grey-box identification
procedure has been the development of the Process Model Calibrator and Validator
software, the so-called MoCaVA software. This contains the tools that implement
the general steps of grey-box identification. Consequently it is based on an holistic
approach to process modelling that uses a graphical block-diagram representation

but incorporates routines like loss function minimisation for model fitting and
other statistical tools to allow testing of model hypotheses. The software has been
tested and validated through its use and development with an extensive and broadly
based group of individual processes, some of which are listed above.
This monograph captures three aspects of Torsten Bohlin’s work in this area.
Firstly, there is an introduction to the theory and fundamentals of grey-box
identification (Part I) that carefully defines white-box, black-box and grey-box
identification. From this emerge the requirements of a grey-box procedure and the
need for software to implement the steps. Secondly, there is the MoCaVa software
itself. This is available for free download from a Springer website whose location
is given in the book. Part II of the monograph is a tutorial introduction and user’s
guide to the use of the MoCaVa software. For added realism, the tutorial is based
on a drum boiler model. Finally the experience of the tutorial introduction is put to
good use with the two fully documented case studies given as Part III of the
monograph. Process engineers will be able to work at their own pace through the
model development for a rinsing process for steel strip in a rolling mill and the
prediction of quality in a cardboard manufacturing process. The value of the case
studies is two-fold since they provide a clear insight into the procedures of grey-
box identification and give in-depth practical experience of using the MoCaVa
software for industrial processes; both of these are clearly transferable skills.
The Advances in Industrial Control monograph series has often included
volumes on process modelling and system identification but it is believed that this
is only the second ever volume in the series on the generic steps in an holistic grey-
box identification procedure. The volume will be welcomed by industrial process
control engineers for its insights into the practical aspects of process model
identification. Academics and researchers will undoubtedly be inspired by the
more generic theoretical and procedural aspects that the volume contributes to the
science and practice of system identification.
M.J. Grimble and M.A. Johnson
Industrial Control Centre

Glasgow, Scotland, U.K.
Preface
Those who have tried the conventional approaches to making mathematical models
of industrial production processes have probably also experienced the limitations of
the available methods. They have either to build the models from first principles, or
else to apply one of the ‘black−box’ methods based on statistical estimation theory.
Both approaches work well under the circumstances for which they were designed,
and they have the advantage that there are well developed tools for facilitating the
work. Generally,themodelling tools (basedonfirst principles)havetheir applications
to electrical, mechanical, and hydrodynamical systems, where much is known about
the principles governing such systems. In contrast, the statistical methods have their
applications in cases where little is known in advance, or when detailed knowledge is
irrelevant for the purpose of the modelling, typically for design of feedback control.
In modelling for the process industry, however, prior knowledge is typically par-
tial, theeffects ofunknown input (’disturbances’) are not negligible, andit isdesirable
to have reproducibility of the model, for instance for the monitoring of unmeasured
variables, for feed−forward control, or for long−range prediction of variables with
much delayed responses to control action. Conceivably, ‘grey−box’ identification,
which is a ‘hybrid’ of the two approaches, would help the situation by exploiting both
of the two available sources of information, namely i) such invariant prior knowledge
that may be available, and ii) response data from experiments. Thus, grey−box meth-
ods would have their applications, whenever there is some invariant prior knowledge
of the process and it would be a waste of information not to use it.
After the first session on grey−box identification at the 4th IFAC Symposium on
Adaptive Systems in Control and Signal Processing in 1992, and the firstspecial issue
in Int. J. Adaptive Control and Signal Processing in 1994
, the approach has now been
reasonably well accepted as a paradigm for how to address the practical problems in
modelling physical processes. There are now quite a number of publications, most
about special applications. (A Google search for “Grey box model” in 2005 gave 691

hits.)
However, the problems of designing tools for grey−box identification are many.
Mainly, prior knowledge of industrial processes is usually diversified and primarily
ill adapted to the purpose of the model making. It is in the nature of things that prior
knowledge is more or less precise, reliable, and relevant (it may even be false). This
raises a number of fundamental questions, in addition to the practical problems: How
can I make use of what I do know? How much of my prior knowledge is useful and
even correct, when used in the particular environment? What do I do about the un-
known disturbances I cannot get rid of? Are my experiment data sufficient and rele-
vant? How do I know when the model is good enough?
Prefacexii
It wasthe desire tofind some answers tothese questions thatinitiated a long−range
project at the Automatic Control department of KTH. The present book is based on
the results of that project. It stands on three legs:
i) A theoretical investigation of the fundamentals of grey−box identification. It re-
vealed that sufficiently many theoretical principles were available in the literature for
answering the questions that needed to be answered. The compilation was published
in a book (Bohlin, 1991a), which ended with a number of procedures for doing grey−
box identification properly.
ii)Asoftware tool MoCaVa (Process Model Calibrator & Validator) based on one of
the procedures (Bohlin and Isaksson, 2003).
iii) A number of case studies of grey−box identification of industrial processes. They
were carriedout inorder to seewhether thetheoretical procedure would alsobea prac-
tical one, and to test the software being developed in parallel. Most case studies have
been done by PhD students at the department under the supervision of the author. The
extent of the work was roughly one thesis per case.
This book will focus on the software and the case studies. Thus it will serve as a
manual to MoCaVa, as well as illustrating how to apply MoCaVa efficiently. Success
in grey−box identification, as in other design, will no doubt depend of the skill of the
craftsman using the tool, and I believe that skill is best gained by exercise, and case

studies to be a good introduction.
In addition, there is a ‘theory’ chapter with the purpose of describing the basic de-
liberations, derivations, and decisions behind MoCaVa and the way it is constructed.
The purpose is to provide additional information to anyone who wants to understand
more of its properties than revealed in the user’s manual. Thismayhelp theuser to ap-
praise the strengths and weaknesses of the program, either in order to be able to do the
same with the models that come out of it, or even to develop MoCaVa further. (The
source code can be downloaded from Springer.) The focus is therefore on the applica-
bility of the theories for the purpose of MoCaVa, rather than on the theories them-
selves.
Still, the chapter involves some not elementary mathematics, but mathematics
stripped from the painstaking exactness of strict mathematics. This too is motivated
by akindof ‘grey−boxthinking’, this time totry and bridge thenotorious gapbetween
theory andpractice. Itwould befutiletrying toadhere tothe codeof strictmathematics
when dealing with problems that cannot be solved in that way, and, in addition, meant
to be understood by readers who are not used to strict mathematics. And, conversely,
itwould beimpractical totryand solveallproblems ofgrey−boxidentification byrely-
ing on intuition and reasoning alone, however clever. Therefore, the mathematics is
interpreted in intuitive terms, and necessary approximations motivated in the same
way,whenever the mathematical problems become unsurmountable, or anexactsolu-
tion would take prohibitively long for a computer to process. The following is one of
my favorite quotations: “The man thinks. The theory helps him to think, and to main-
tain his thinking consistent in complex situations” (Peterka).
The method presented in this book for building grey−box models of physical ob-
jects has three kinds of support: A systematic procedure to follow, asoftwarepackage
for doing it, and case studies for learning how to use it. Part I motivates and describes
the procedure and the MoCaVa software. Part II is a tutorial on the use of MoCaVa
based on simple examples. Part III contains two extensive case studies of full−scale
industrial processes.
Preface xiii

How to Use this Book
Successful grey−box identification of industrial processes requires knowledge of two
kinds, i) how the process works, and ii) how the software works. Since the knowledge
is normally not resident within the same person, two must contribute. Call them “pro-
cess engineer” and“model designer”. The latter should preferably havetaken a course
in ‘Process identification’.
Part I is for the “model designer”, who needs to understand how the MoCaVa soft-
ware operates, in order to appreciate its limitations − what it can and cannot do.
Part II isfor both. Itisa tutorial on running MoCaVa, written for anyone who actu-
ally wants to build a grey−box model. It is also useful as an introduction to the case
studies, since it is based on two pervading simple examples.
Part III is alsofor both. It develops the case studies in some detail, highlighting the
contributions of the three ‘actors’ in the session, viz. the engineer, the model designer/
program operator, and the MoCaVa program. The technical details in Part III is prob-
ably ofinterest onlyto thoseworking inthe relevantbusinesses (steelorpaper &pulp),
but are still important as illustrations of the issues that must be considered in practical
grey−box process identification.
The style of parts II and III deviates somewhat from what is customary in text
books, namely to use sentences in passive form, free of an explicit subject. The idea
of the customary practice is that science and engineering statements should be valid
irrespective of the subject. Unfortunately, the custom is devastating for the under-
standing, whendescribingprocesses wherethere are indeedseveral subjects involved.
“Who does what” becomes crucial. Therefore, part II is written more like a user’s
manual. In describing grey−box identification practice there is, logically, no less that
five ‘actors’ involved:
: Thecustomer/engineer (providing thepriorinformation aboutthe physicalprocess
and appraising the result)
: The model designer/user of the program tools (often the same person as the cus-
tomer, but not if he/she lacks sufficient knowledge of the physical process to be
modelled).

: The computer and program (analyzing the evidence of the data).
: The author of this book (trying to reason with a reader)
: The reader of the book (trying to understand what the author tries to say).
In order to reduce the risk of confusion when describing a grey−box identification
session − a process that involves at least the first three actors − the following conven-
tion will be used in the book:
The contributions of the different actors are marked withsymbolsat the beginning
of the paragraph, viz.  for the operator (doing key pressing and mouse clicking), 
for MoCaVa (computing and displaying the results), and  for the model builder
(watching the screen, deliberating, and occasionally calculating on paper). It will no
doubt helpthe reader whowants tofollow theexamples onacomputer,that thesymbol
 statesexplicitly what todo in eachmoment, and thesymbol  pointsto theexpected
response. There are also paragraphs without an initiating symbol − they have the ordi-
nary rôle of the author talking to a reader.
Also as a convention, Courier fonts are used for code, as well as for variables
that appear in the code, and for names of submodels, files, and paths. HelveticaNarrow
is used for user communication windows and for labels that appear in screen images.
Prefacexiv
Thebookusesa numberof specialterms andconcepts ofrelevance toprocess iden-
tification. Some, but not all should be well−known, or self−explanatory to model de-
signers, butprobably notall. The“Glossary ofTerms”contains shortdefinitions, with-
out mathematics, andsomewith clarifying examples. Thelistserves thesame purpose
as the ‘hypertext’ function in HTML documentation, although less conveniently.
The contents in Part II is also available in HTML format. This form has the well−
known advantage thatexplanations ofsome keyconcepts becomeavailableata mouse
click, and only if needed. In Part II explanations appear either under the headers Help
or Hints, or else as references to sections in the appendix, which unavoidably means
either wading through text mass (that can possibly be skipped), or looking up the ap-
propriate sections in the appendix. In order to reduce the length of Part II the number
of printed screen images is also smaller than those in the HTML document.

MoCaVa is downloadable from www.springer.com/1−84628−402−3 to-
gether with all material needed for running the case studies. (The package also con-
tains the HTML−manual as well as on−linehelp facilities.) This offers a possibility to
get moredirect experience ofthe model−designsession. Itwould therefore be possible
to use PartsIIand III as study material for a course ingrey−boxprocess identification.
Acknowledgements
The author is indebted to the following individuals who participated in the Grey−box
development project:
Stefan Graebe, who wrote the first C−version of the IdKit tool box, and later partici-
pated in the Continuous Casting case study.
James Sørlie, who investigated possible interfaces to other programs.
Bohao Liao, who investigated search methods.
Ning He, who investigated real−time evaluation of Likelihood.
Anders Hasselkvist, who wrote Predat.
Tomas Wenngren, who wrote the first GUI.
Germund Mathiasson and Jiri Uosukainen who wrote the first version of Validate.
Olle Ehrengren, who wrote the first version of Simulate.
Ping Fan, who did the Baker’s Yeast case study.
Björn Sohlberg, who did the first Steel Rinsing case study.
Jonas Funkquist, who did the Pulp Digester case study.
Oliver Havelange, who did the Cement Milling case study.
Jens Pettersson, who did the second Cardboard case study.
Ola Markusson, who did the EEG−signals case study.
Bengt Nilsson, who contributed process knowledge to the Cardboard case study.
Jan Erik Gustavsson, whocontributed process knowledge to theRecovery Boiler case
study.
Alf Isaksson, who participated in the Pulp Refiner and Drive Train cases, and headed
the MoCaVa project between 1998 and 2001.
Linus Loquist, who designed the MoCaVa home page.
Contents

Part I Theory of Grey−box Process Identification
1 Prospects and Problems 3
1.1 Introduction 3
1.2 White, Black, and Grey Boxes 4
1.2.1 White−box Identification 5
1.2.2 Black−box Identification 6
1.2.3 Grey−box Identification 10
1.3 Basic Questions 13
1.3.1 Calibration 14
1.3.2 How to Specify a Model Set 15
1.4 and a Way to Get Answers 17
1.5 Tools for Grey−box Identification 18
1.5.1 Available Tools 18
1.5.2 Tools that Need to Be Developed 21
2 The MoCaVa Solution 23
2.1 The Model Set 23
2.1.1 Time Variables and Sampling 24
2.1.2 Process, Environment, and Data Interfaces 25
2.1.3 Multi−component Models 27
2.1.4 Expanding a Model Class 29
2.2 The Modelling Shell 31
2.2.1 Argument Relations and Attributes 34
2.2.2 Graphic Representations 37
2.3 Prior Knowledge 41
2.3.1 Hypotheses 42
2.3.2 Credibility Ranking 43
2.3.3 Model Classes with Inherent Conservation Law 43
2.3.4 Modelling ‘Actuators’ 44
2.3.5 Modelling ‘Input Noise’ 46
2.3.6 Standard I/O Interface Models 49

2.4 Fitting and Falsification 51
Contentsxvi
2.4.1 The Loss Function 52
2.4.2 Nesting and Fair Tests 54
2.4.3 Evaluating Loss and its Derivatives 55
2.4.4 Predictor 56
2.4.5 Equivalent Discrete−time Model 56
2.5 Performance Optimization 57
2.5.1 Controlling the Updating of Sensitivity Matrices 58
2.5.2 Exploiting the Sparsity of Sensitivity Matrices 59
2.5.3 Using Performance Optimization 60
2.6 Search Routine 62
2.7 Applicability 65
2.7.1 Applications 65
2.7.2 A Method for Grey−box Model Design 67
2.7.3 What is Expected from the User? 68
2.7.4 Limitations of MoCaVa 69
2.7.5 Diagnostic Tools 69
2.7.6 What Can Go Wrong? 71
Part II Tutorial on MoCaVa
3 Preparations 77
3.1 Getting Started 77
3.1.1 System Requirements 77
3.1.2 Downloading 77
3.1.3 Installation 77
3.1.4 Starting MoCaVa 78
3.1.5 The HTML User’s Manual 78
3.2 The ‘Raw’ Data File 78
3.3 Making a Data File for MoCaVa 78
4 Calibration 83

4.1 Creating a New Project 83
4.2 The User’s Guide and the Pilot Window 85
4.3 Specifying the Data Sample 86
4.3.1 The Time Range Window 86
4.4 Creating a Model Component 88
4.4.1 Handling the Component Library 89
4.4.2 Entering Component Statements 90
4.4.3 Classifying Arguments 92
4.4.4 Specifying I/O Interfaces 95
4.4.5 Specifying Argument Attributes 98
4.4.6 Specifying Implicit Attributes 100
4.4.7 Assigning Data 100
4.5 Specifying Model Class 101
4.6 Simulating 103
4.6.1 Setting the Origin of the Free Parameter Space 103
4.6.2 Selecting Variables to be Plotted 104
4.6.3 Appraising Model Class 105
Contents xvii
4.7 Handling Data Input 106
4.8 Fitting a Tentative Model Structure 107
4.8.1 Search Parameters 108
4.8.2 Appraising the Search Result 111
4.9 Testing a Tentative Model Structure 113
4.9.1 Appraising a Tentative Model 116
4.9.2 Nesting 118
4.9.3 Interpreting the Test Results 119
4.10 Refining a Tentative Model Structure 121
4.11 Multiple Alternative Structures 122
4.12 Augmenting a Disturbance Model 124
4.13 Checking the Final Model 132

4.14 Terminals and ‘Stubs’ 134
4.15 Copying Components 135
4.16 Effects of Incorrect Disturbance Structure 138
4.17 Exporting/Importing Parameters 140
4.18 Suspending and Exiting 141
4.18.1 The Score Table 142
4.19 Resuming a Suspended Session 143
4.20 Checking Integration Accuracy 143
5 Some Modelling Support 147
5.1 Modelling Feedback 147
5.1.1 The Model Class 148
5.1.2 User’s Functions and Library 153
5.2 Rescaling 154
5.3 Importing External Models 159
5.3.1 Using DymolaZ as Modelling Tool for MoCaVa 160
5.3.2 Detecting Over−parametrization 166
5.3.3 Assigning Variable Input to Imported Models 170
5.3.4 Selective Connection of Arguments to DymolaZ Models 173
Part III Case Studies
6 Case 1: Rinsing of the Steel Strip in a Rolling Mill 185
6.1 Background 185
6.2 Step 1: A Phenomenological Description 185
6.2.1 The Process Proper 185
6.2.2 The Measurement Gauges 188
6.2.3 The Input 189
6.3 Step 2: Variables and Causality 189
6.3.1 The variables 189
6.3.2 Cause and effect 190
6.3.3 Data Preparation 191
6.3.4 Relations to Measured Variables 192

6.4 Step 3: Modelling 194
6.4.1 Basic Mass Balances 194
6.4.2 Strip Input 201
Contentsxviii
6.5 Step 4: Calibration 203
6.6 Refining the Model Class 206
6.6.1 The Squeezer Rolls 206
6.6.2 The Entry Rolls 211
6.7 Continuing Calibration 213
6.8 Refining the Model Class Again 215
6.8.1 Ventilation 215
6.9 More Hypothetical Improvements 217
6.9.1 Effective Mixing Volumes 217
6.9.2 Avoiding the pitfall of ‘Data Description’ 219
6.10 Modelling Disturbances 222
6.10.1 Pickling 222
6.10.2 State Noise 223
6.11 Determining the Simplest Environment Model 225
6.11.1 Variable Input Acid Concentration 225
6.11.2 Unexplained Variation in Residual Acid Concentration 225
6.11.3 Checking for Possible Over−fitting 229
6.11.4 Appraising Roller Conditions 233
6.12 Conclusions from the Calibration Session 233
7 Case 2: Quality Prediction in a Cardboard Making Process 235
7.1 Background 235
7.2 Step 1: A Phenomenological Description 235
7.3 Data Preparation 237
7.4 Step 2: Variables and Causality 244
7.4.1 Relations to Measured Variables 247
7.5 Step 3: Modelling 248

7.5.1 The Bending Stiffness 248
7.5.2 The Paper Machine 253
7.5.3 The Pulp Feed 260
7.5.4 Control Input 262
7.5.5 The Pulp Mixing 265
7.5.6 Pulp Input 267
7.5.7 The Pulp Constituents 269
7.6 Step 4: Calibration 271
7.7 Expanding the Tentative Model Class 279
7.7.1 The Pulp Refining 279
7.7.2 The Mixing−tank Dynamics 284
7.7.3 The Machine Chests 287
7.7.4 Filtering the “Kappa” Input 289
7.8 Checking for Over−fitting: The SBE Rule 290
7.9 Ending a Calibration Session 293
7.9.1 ‘Black−box’ vs ‘White−box’ Extensions 293
7.9.2 Determination vs Randomness 294
7.10 Modelling Disturbances 295
7.11 Calibrating Models with Stochastic Input 296
7.11.1 Determination vs Randomness Revisited 299
7.11.2 A Local Minimum 304
7.12 Conclusions from the Calibration Session 306
Contents xix
Appendices
A Mathematics and Algorithms 313
A.1 The Model Classes 313
A.2 The Loss Derivatives 316
A.3 The ODE Solver 317
A.3.1 The Reference Trajectory 317
A.3.2 The State Deviation 318

A.3.3 The Equivalent Discrete−time Sensitivity Matrices 318
A.4 The Predictor 321
A.4.1 The Equivalent Discrete−time Model 322
A.5 Mixed Algebraic and Differential Equations 322
A.6 Performance Optimization 326
A.6.1 The SensitivityUpdateControl Function 327
A.6.2 Memoization 330
A.7 The Search Routine 330
A.8 Library Routines 331
A.8.1 Output Conversion 331
A.8.2 Input Interpolators 331
A.8.3 Input Filters 334
A.8.3 Disturbance Models 335
A.9 The Advanced Specification Window 337
B.2.1 Optimization for Speed 337
B.2.2 User’s Checkpoints 338
B.2.3 Internal Integration Interval 338
B.2.4 Debugging 339
Glossary 341
References 345
Index 349
1
Prospects and Problems
1.1 Introduction
The task of making a mathematical model of a physical object, such as an industrial
process, involves a diversity of problems. Some of these have traditionally been the
subject of theoretical research and software development. Such a problem is “System
identification”, typically definedas follows:“Given a parametric class of models, find
the member that fits given experiment data with the minimum loss according to a given
criterion” (Ljung, 1987). Now, the three “given” conditions concern anyone who in-

tends to apply the software, whether that isin the form of theory, method, or computer
program. Sometimes “given” means that prerequisites are built into the software,
sometimes that they are expected as input from the user of the software.
When one is faced with a given object instead, and possibly also with a given pur-
pose for the model, it is certainly not obvious how to get the answers to the questions
posed by identification software. Itis therefore important that developers of suchsoft-
ware dowhat theycan tofacilitate theanswering. It isnotnecessarilya desirableambi-
tion to make the software more automatic by demanding less from the user. He or she
is still responsible for the quality of the result, and any input that a user is able to pro-
vide, but is not asked for, may be a waste of information and reduce the quality of the
model. A better goal is therefore to make the software demand its input in a form that
the user can supply more easily.
Secondly, user input (both prior knowledge and experiment data) is often uncer-
tain, irrelevant, contradictory, or even false. A second goal for the software designer
is therefore to provide tools for appraising the user’s input. Admittedly, any software
must have something ‘given’, but it makes a difference whether the software wants
assumptions, taken for facts, or justhypotheses, that will besubject to tests. Thismoti-
vates the decision to base MoCaVa on the ‘grey−box’ approach.
The general and somewhat vague idea of grey box identification is that when one
ismaking a mathematical modelof a physicalobject, there aretwo sourcesof informa-
tion, namely response data and prior knowledge. And grey−box identification meth-
ods are such methods that can use both.
Inpractice,“prior knowledge”means differentthings.And generally,priorknowl-
edge isnot easytoreconcile withtheformofthe modelsassumed byaparticular identi-
fication method. In fact, each method starts with assuming a model class, and each
model class requires its particular form of prior knowledge. What one can generally
doinorder totakeprior knowledge intoaccount isto startwitha versatileclassof mod-
els, for which there are general tools available for analysis and identification, and try
Practical Grey−box Process Identification
4

and adapt its freedom, its ‘design parameters’, i.e., the specifications one has to enter
into theidentification program, to thepriorknowledge. This meansthatthe ‘grey−box
identification methods’ tend tobeas many andas diversified asthe conventional iden-
tification methods, also starting with given classes of models. This makes it hard to
delimit grey−box identification from other identification and also to make a survey of
‘grey box identification methods’.
Neither is that the purpose of this chapter. Instead it is to survey the fundamentals
the MoCaVa softwareis basedon.A userof theprogram willconceivably benefit from
an understanding of the purposes of the operations performed by various routines in
the program. Generally, MoCaVa isconstructed by specializingand codifying thegen-
eral concepts used in (Bohlin, 1991a) and following one of the procedures derived in
that book.
In addition, thechapterwill briefly discussthe prospects andproblems of develop-
ing grey−box identification software further.
1.2 Black, White, and Grey Boxes
Commercially available tools for making mathematical models of dynamic processes
are of two kinds, with different demands on the user. On one hand there are Modelling
tools, generally associated with simulation software (e.g., DymolaZ, http://www
.dy-
nasim.se/www/Publications.pdf
), which require the user to provide complete specifi-
cation of the equations governing the process, either expressed as statements written
in somemodellinglanguage, such asModelicaX (Tiller, 2001),or by connecting com-
ponents from alibrary. This alternativemay be supported bycombining the modelling
tools with tools for parameter optimization (e.g., HQP, http://sourcefor
ge.net/pro-
jects/hqp
). Call this “white−box” identification.
On the other hand there are “black−box” system identification tools (e.g.,MAT-
LABX System Identification Tool Box), which require the user to accept one of the ge-

neric model structures (e.g., linear) andthento determine whichtoolsto usein thepar-
ticular case, and in what order, as well as the values of a number of design parameters
(order numbers, weighting factors, etc.). Finally, the user must interpret the resulting
model, whichis expressedinaform that isnot primarilyadaptedto thephysical object.
Unless the model is to be used directly for design of feedback control, there is some
further translation to do.
Generally, the user hastwo sources of information on which tobase the model ma-
king: prior knowledge and experiment data. “White−box” identification uses mainly
one source and “black−box” identification the other. The strength of “white−box”
identification is that it will allow the user to exploit invariant prior knowledge. Its
weakness is its inability to cope with the unknown and with random effects in the ob-
ject anditsenvironment. The latter isthe strength of “black−box”identification based
on statistical methods, but also means that the reproducibility of its results may be in
doubt. In essence, “black−box” identification produces ‘data descriptions’, and re-
peating the experiment may well produce a much different model. This may or may
not be a problem, depending on what the model is to be used for.
The idea of “grey−box” identification is to use both sources, and thus to combine
the strengths of the two approaches in order to reduce the effects of their weaknesses.
When following Ljung’s definition of “System identification”, and regardless of
the ‘colour’ of the ‘box’, the designer of a model of a physical object must do two
things, i) specifya classof models,and ii)fit itsfree elementsto data.Callthis “Model-
51 Prospects and Problems
ling” and “Fitting”. A method with a darker shade of ‘grey’ uses less prior knowledge
to delimit the model class. Even if most available identification methods tend to be
more or less ‘grey’, the following notations allow aformal distinction between the ge-
neric ‘white’, ‘black’, and ‘grey box’ approaches to model design.
1.2.1 White−box Identification
Since both the model class definition and the fitting procedure are implemented as al-
gorithms they can be described formally as functions:
Model: F(u

t
, θ) → z(t|θ) (1.1)
Fitting: min
θ
E[y
N
, z
N
(θ)] (1.2)
The model designer specifies the model class F, which may contain a given number
of unknown parameters θ. Given a control sequence u
t
(where the subscript denotes
the input history from some initial time up to present time t), and the parameter vector
θ, a simulation program allows the computing of the model’s response z(t|θ) at any
time. Any unknown parameters θ are then estimated by applying an optimization pro-
gram minimizing the deviation between measured response data y
N
and those compo-
nents of the model’s output z
N
that correspond to the measured values. The deviation
is measured by a given lossfunction E. The latter is usually asum of squared instanta-
neous deviations, but various filtering schemes may be used to suppress particular
types of data contamination.
The following aresome well−knownobstacles to designing “whiteboxes” inprac-
tice:
: Unknown relations between some variables: Engineers oftendonot have the com-
plete mathematical knowledge of the object to beable to write asimulation model.
: Too many relations for convenience: When they do have the knowledge, the result

is often too complex a model to be possible to simulate with the ease required for
parameter fitting.Many physicalphenomena aredescribable onlyby partialdiffer-
ential equations. Simulation would then require supercomputers, and identifica-
tion anorder ofmagnitude more. (Carandairplane designers could possiblyafford
the luxury.)
: Unknown complexity: It falls solely on the designer to determine how much of the
known relations to include in the model.
: Sensitivity to low−frequency disturbances: Comparing output of deterministic
models with data in the presence of low−frequency disturbances generally gives
poor parameter estimates.
: Primitive validation:If onewould tryand useonly literaturevalues forparameters,
or make separate experiments to determine some of them, in order to avoid the
cumbersome calibration of a complex model and the usually expensive exper-
imentation ona largeprocess, thismakes itthemore difficult tovalidate themodel.
Remark 1.1. The sensitivity to disturbances can sometimes be reduced by clever
design of the loss function. This requires some prior information on the object’s envi-
ronment.
Example 1.1
Consider a cylindrical tank with cross−sectionarea A filledwithliquid of density à up
to a level z, under pressure p, and having a free outlet at the bottom with area a. The
Practical Grey−box Process Identification
6
tank isreplenished with the volume flowf. According toBernoulli’s lawthe variations
in the level will be governed by the following differential equation:
dzdt = − azg+ pÃ

+ fA (1.3)
With u = (f, p)as varying controlvariables, the equationcannotbe solved analytical-
ly, but given values of θ = (A, a, Ã, g), an ODE solver will be able to produce a se-
quence of values {z(kh|θ)|k = 1, , N} of z sampled with interval h. Hence F is

defined as the ODE solver operating on an equation of some form like
der(z) = −a*sqrt(z*g + p/rho) + f/A
with given constant parameters a,A,g,rho and variable control input p,f.
With a recorded sequence of measurements y
N
= {y(kh)|k = 1, ., N} of the
tank level z during an experiment with known, step−wise changing input sequences
u
N
, it will be possible to set up and evaluate the loss function
E(u
N
, θ) =

N
k=1
[y(kh) − z(kh|θ)]
2
(1.4)
for any given value of θ. Applying an optimization program, it will then be possible
to minimize the loss function with respect to any combination of the parameters, and
in this way estimate the values of any unknowns among (A, a, Ã), but not the value of
gravity g.
1.2.2 Black−box Identification
Defining this case is somewhat more complicated, since the task usually involves de-
termining one or more integer ‘order’ numbers, the values of which determine the
number of parameters to be fitted (Ljung, 1987; Söderström and Stoica, 1989):
Model: F
n
(u

t
, ω
t
, θ
n
) → z(t|θ
n
) (1.5)
Predictor: P
n
(u
t
, y
t−m
, θ
n
) → z
^
(t|m, θ
n
) (1.6)
Fitting: Q
n
= min
θ
E
[y
N
, z
^

N
(m, θ
n
)] (1.7)
Test : Q
n−1
− Q
n
< χ
2
(1.8)
The designer cannot change F
n
, whichisparticular to themethod, except by specify-
ing an order index n. The latter normally determines the number of unknown parame-
ters θ
n
. However, the model class accepts a second, random input signal ω
t
(usually
‘white noise’) in order to model the effects of random disturbances. For given order
numbers the parameters θ
n
are estimated by minimizing the deviation between re-
sponse data and m steps predicted output (usually one step) according to a given loss
functionE
. The difference between the model and the predictor is that the latter uses
previous, m stepsdelayed response data y
t−m
in addition tothecontrol sequence u

t
for
computing the predicted responses. The predictor P
n
is uniquely determined by F
n
.
However, exact and applicable predictors are known only for special classes F
n
, and
this limits the versatility of black box identification programs. Unknown orders n are
usually determined by increasing the order stepwise, and stopping when the loss re-
duction drops below a threshold χ
2
. A popular alternative is to use a loss function that
71 Prospects and Problems
weights the increasing complexity associated with increasing n, which allows mini-
mization with respect to both integer parameters n and real parameters θ
n
(Akaike,
1974). Themodelclassesaremost oftenlinear,but nonlinear black−boxmodel classes
are also used (Billings, 1980).
The following are practical difficulties:
: Restricted and unfamiliar form: Many engineers do not feel comfortable with
models produced by black−box identification programs based on statistical meth-
ods. Mainly, the model structure and parameters do not have a physical interpreta-
tion, and this makes it difficult to compare the estimates with values from other
sources.
: Over−parametrization: Thenumber ofparametersincreases rapidly withthe num-
ber of variables, and even more so when the model class is nonlinear. This leads

easily to ‘over−fitting’, with all sorts of numerical problems and poor accuracy.
: Poor reproducibility: What is produced is a ‘data description’. If this is also to be
an ‘object description’ the model class must contain a good model of the object.
If it does not; if much of the variation in the data is caused by phenomena that are
not modelled well enough by F
n
as effects of known input u
t
, the fitting proce-
dure tends to use the remaining free arguments ω
t
and θ
n
to reduce the deviations.
In other words, what cannot be modelled as response to control, will be modelled
as disturbance. In this way even a structurally wrong model may still predict well
atashort range.If the datasequence is long,the estimated parameter accuracymay
even be high. This means that one may well get a good model, with good short−
range predictingability and ahigh theoreticalaccuracy,butwhen theidentification
is repeated with a different data set, an equally ‘good’ but different model is ob-
tained. That will not necessarily mean that the object has changed between the ex-
periments; it may be a consequence of fitting a model with the wrong structure.
Generally, it will be difficult to get reproducibility with black−box models, unless
the dynamics of the whole object are invariant, including the properties of distur-
bances, and the model structure is right.
The basic cause of the poor reproducibility of black boxes, is that it is not possible to
enter the invariant and object−specific relations that are the basis of white−boxmod-
els. To gain the advantages of convenience and quick result, the model designer is in
fact willingto discardany resultfrom previousresearchon theobject ofthemodelling.
Remark 1.2. Adaptive control will conceivably be able to alleviate the effects of

poor reproducibility, andbenefit fromthegood predictive abilityof themodel, but this
can be exploited only for feedback control of such variables that have online sensors.
Monitoring of unmeasured variables, as well as control with long delays will still be
hazardous.
Remark 1.3. Tulleken (1993) has suggested a way to force some prior knowledge
into black−box structures, thus making the models less ‘black’.
Example 1.2
With the same tank object as in Example 1.1, one could choose to ignore the findings
of Bernoulli anddescribe the process asa “black box”. Alinear model is the mostpop-
ular first choice, but if one would suspect that the process is nonlinear, and also take
into account some rudimentary prior knowledge (that a hole at the bottom tends to re-
duce the level), the following heuristic form would also be conceivable:
dzdt = p
1
z + p
2
p + p
3
f − [p
4
z + p
5
p + p
6
f]
α
(1.9)
Practical Grey−box Process Identification
8
Incidentally, this form contains the ‘true’ process, Equation 1.3, with p

1
= p
2
=
p
6
= 0, p
3
= 1A, p
4
= a
2
g, p
5
= a
2
Ã, and α = ½. But normally, that is not the
case.
A more likely, and ‘blacker’ form, would be
dzdt = p
1
+ p
2
z + p
3
p + p
4
f + p
5
z

2
+ p
6
p
2
+ p
7
f
2
(1.10)
This will define a deterministic black box of second order F
n
(u
t
,0,θ
n
) → z(t|θ
n
),
where n =2,u = (f, p) and θ
2
= (p
1
, , p
7
). It can be processed as in Example 1.1.
If the parameters are many enough, if measurements are accurate, and if the experi-
ment is not subject to external or internal disturbance, the resulting model may even
perform almost as well as the white box.
If, however, the varying pressure p is not recorded, it might still be possible to use

the following form
dzdt = p
1
+ p
2
z + p
4
f + p
5
z
2
+ p
7
f
2
+ v (1.11)
dvdt = p
8
ω (1.12)
where ω is ‘white noise’, and v is ‘Brownian motion’ to model the unknown term
p
3
p + p
6
p
2
. Hence, θ
2
= (p
1

, p
2
, p
4
, p
5
, p
7
, p
8
).
When models have unknown input it becomes necessary to find the one−step (or
m−step)predictor associatedwith it, inorder to be ableto minimize thesum of squares
of prediction errors. Exact predictors are known only for some classesof models. And
even if themodel belongs to aclass which doesallow a predictor tobe derived, deriva-
tion is usually no simple task.
However,black−boxidentification programs have alreadydone this forfairly gen-
eral classes of models that do allow exact derivation. One such class is the NARMAX
(for Nonlinear Auto Regressive Moving Average with eXternal input) discrete−time
model class (Billings, 1980)
y(τ) +

n
z
ν=1

n
a
k=1
a

ν
k
P
ν
[y(τ − k)]=

n
z
ν=1

n
b
k=1
b
ν
k
P
ν
[u(τ − k)]
+ c
0
w(τ) +

n
c
k=0
c
k
w(τ − k) (1.13)
where τ is discrete time, and P

k
are known functions of u, for instance powers or Le-
gendre polynomials, and w are independent Gaussian random variables with zero
means and unit variances. The model has four order numbers, n = (n
a
, n
b
, n
c
, n
z
),
where the first three are the orders of the dynamics of the system, and the fourth the
degree of nonlinearity. Hence, the more common linear ARMAX model has n
z
= 1.
The parameter array θ
n
is the collection of all a
ν
k
, b
ν
k
, and c
k
in Equation 1.13.
Notice that Equation 1.13 contains only measured output y in addition to the input
u,which isan essentialrestriction, butmakesiteasyto derivea predictor(which iswhy
the class isdefined in thisway). Since the values of w(τ) canbe computed recursively

from Equation 1.13, and since E{w(τ|y
τ−1
)} = 0, the predictor follows directly as

×