www.pdfgrip.com
EXPERT SYSTEMS IN
CHEMISTRY RESEARCH
5323X.indb 1
11/13/07 2:08:37 PM
www.pdfgrip.com
5323X.indb 2
11/13/07 2:08:37 PM
www.pdfgrip.com
EXPERT SYSTEMS IN
CHEMISTRY RESEARCH
Markus C. Hemmer
Boca Raton London New York
CRC Press is an imprint of the
Taylor & Francis Group, an informa business
5323X.indb 3
11/13/07 2:08:37 PM
www.pdfgrip.com
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487‑2742
© 2008 by Taylor & Francis Group, LLC
CRC Press is an imprint of Taylor & Francis Group, an Informa business
No claim to original U.S. Government works
Printed in the United States of America on acid‑free paper
10 9 8 7 6 5 4 3 2 1
International Standard Book Number‑13: 978‑1‑4200‑5323‑4 (Hardcover)
This book contains information obtained from authentic and highly regarded sources. Reprinted
material is quoted with permission, and sources are indicated. A wide variety of references are
listed. Reasonable efforts have been made to publish reliable data and information, but the author
and the publisher cannot assume responsibility for the validity of all materials or for the conse‑
quences of their use.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com ( or contact the Copyright Clearance Center, Inc. (CCC)
222 Rosewood Drive, Danvers, MA 01923, 978‑750‑8400. CCC is a not‑for‑profit organization that
provides licenses and registration for a variety of users. For organizations that have been granted a
photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Library of Congress Cataloging‑in‑Publication Data
Hemmer, Markus C.
Expert systems in chemistry research / Markus C. Hemmer.
p. cm.
Includes bibliographical references and index.
ISBN 978‑1‑4200‑5323‑4 (hardback : alk. paper)
1. Chemistry‑‑Data processing. 2. Chemistry‑‑Research. I. Title.
QD39.3.E46.H46 2007
542’.85633‑‑dc22
2007031226
Visit the Taylor & Francis Web site at
and the CRC Press Web site at
5323X.indb 4
11/13/07 2:08:38 PM
www.pdfgrip.com
Dedicated to my father, who has aided and
accompanied my scientific career all these years.
5323X.indb 5
11/13/07 2:08:38 PM
www.pdfgrip.com
5323X.indb 6
11/13/07 2:08:38 PM
www.pdfgrip.com
Contents
Preface....................................................................................................................xvii
Acknowledgments....................................................................................................xix
Trademark Information............................................................................................xxi
Chapter 1 Introduction........................................................................................... 1
1.1 Introduction...................................................................................................... 1
1.2 What We Are Talking About...........................................................................1
1.3 The Concise Summary.....................................................................................3
1.4 Some Initial Thoughts......................................................................................3
References................................................................................................................... 8
Chapter 2 Basic Concepts of Expert Systems........................................................ 9
2.1
2.2
2.3
What Are Expert Systems?.............................................................................. 9
The Conceptual Design of an Expert System................................................ 10
Knowledge and Knowledge Representation.................................................. 12
2.3.1 Rules................................................................................................... 12
2.3.2 Semantic Networks............................................................................. 14
2.3.3 Frames................................................................................................. 16
2.3.4 Advantages of Rules........................................................................... 18
2.3.4.1 Declarative Language............................................................ 18
2.3.4.2 Separation of Business Logic and Data................................ 18
2.3.4.3 Centralized Knowledge Base................................................ 18
2.3.4.4 Performance and Scalability................................................. 19
2.3.5 When to Use Rules.............................................................................. 19
2.4 Reasoning.......................................................................................................20
2.4.1 The Inference Engine..........................................................................20
2.4.2 Forward and Backward Chaining....................................................... 22
2.4.3 Case-Based Reasoning........................................................................ 22
2.5 The Fuzzy World............................................................................................24
2.5.1 Certainty Factors................................................................................24
2.5.2 Fuzzy Logic........................................................................................25
2.5.3 Hidden Markov Models......................................................................26
2.5.4 Working with Probabilities — Bayesian Networks............................ 27
2.5.5 Dempster-Shafer Theory of Evidence.................................................28
2.6 Gathering Knowledge — Knowledge Engineering....................................... 29
2.7 Concise Summary.......................................................................................... 31
References................................................................................................................. 32
vii
5323X.indb 7
11/13/07 2:08:38 PM
www.pdfgrip.com
viii
Expert Systems in Chemistry Research
Chapter 3 Development Tools for Expert Systems.............................................. 35
3.1 Introduction.................................................................................................... 35
3.2 The Technical Design of Expert Systems...................................................... 35
3.2.1 Knowledge Base................................................................................. 35
3.2.2 Working Memory................................................................................ 35
3.2.3 Inference Engine................................................................................. 36
3.2.4 User Interface...................................................................................... 36
3.3 Imperative versus Declarative Programming................................................ 37
3.4 List Processing (LISP)...................................................................................40
3.5 Programming Logic (PROLOG)................................................................... 41
3.5.1 PROLOG Facts................................................................................... 41
3.5.2 PROLOG Rules................................................................................... 42
3.6 National Aeronautics and Space Administration’s (NASA’s)
Alternative — C Language Integrated Production System (CLIPS)............. 43
3.6.1 CLIPS Facts........................................................................................44
3.6.2 CLIPS Rules....................................................................................... 45
3.7 Java-Based Expert Systems — JESS............................................................. 47
3.8 Rule Engines — JBoss Rules......................................................................... 48
3.9 Languages for Knowledge Representation.................................................... 49
3.9.1 Classification of Individuals and Concepts (CLASSIC)..................... 50
3.9.2 Knowledge Machine........................................................................... 51
3.10 Advanced Development Tools........................................................................ 53
3.10.1 XpertRule............................................................................................ 55
3.10.2Rule Interpreter (RI)........................................................................... 56
3.11 Concise Summary.......................................................................................... 57
References................................................................................................................. 58
Chapter 4 Dealing with Chemical Information................................................... 61
4.1 Introduction.................................................................................................... 61
4.2 Structure Representation................................................................................ 61
4.2.1 Connection Tables (CTs)..................................................................... 61
4.2.2 Connectivity Matrices......................................................................... 62
4.2.3 Linear Notations................................................................................. 63
4.2.4 Simplified Molecular Input Line Entry Specification (SMILES)....... 63
4.2.5 SMILES Arbitrary Target Specification (SMARTS).........................64
4.3 Searching for Chemical Structures................................................................64
4.3.1 Identity Search versus Substructure Search.......................................64
4.3.2 Isomorphism Algorithms.................................................................... 65
4.3.3 Prescreening........................................................................................66
4.3.4 Hash Coding.......................................................................................66
4.3.5 Stereospecific Search.......................................................................... 67
4.3.6 Tautomer Search................................................................................. 67
4.3.7 Specifying a Query Structure............................................................. 68
4.4 Describing Molecules.................................................................................... 69
4.4.1 Basic Requirements for Molecular Descriptors.................................. 70
4.4.1.1 Independency of Atom Labeling........................................... 71
5323X.indb 8
11/13/07 2:08:39 PM
www.pdfgrip.com
Contents
4.5
4.6
4.7
4.8
4.9
5323X.indb 9
ix
4.4.1.2 Rotational/Translational Invariance...................................... 71
4.4.1.3 Unambiguous Algorithmically Computable Definition........ 71
4.4.1.4 Range of Values..................................................................... 71
4.4.2 Desired Properties of Molecular Descriptors..................................... 72
4.4.2.1 Reversible Encoding.............................................................. 73
4.4.3 Approaches for Molecular Descriptors............................................... 73
4.4.4 Constitutional Descriptors.................................................................. 73
4.4.5 Topological Descriptors...................................................................... 74
4.4.6 Topological Autocorrelation Vectors.................................................. 74
4.4.7 Fragment-Based Coding..................................................................... 75
4.4.8 3D Molecular Descriptors................................................................... 76
4.4.9 3D Molecular Representation Based on Electron Diffraction............ 77
4.4.10Radial Distribution Functions............................................................. 77
4.4.11 Finding the Appropriate Descriptor................................................... 78
Descriptive Statistics...................................................................................... 79
4.5.1 Basic Terms......................................................................................... 79
4.5.1.1 Standard Deviation (SD)....................................................... 79
4.5.1.2 Variance................................................................................ 79
4.5.1.3 Covariance.............................................................................80
4.5.1.4 Covariance Matrix................................................................80
4.5.1.5 Eigenvalues and Eigenvectors...............................................80
4.5.2 Measures of Similarity....................................................................... 81
4.5.3 Skewness and Kurtosis....................................................................... 83
4.5.4 Limitations of Regression................................................................... 85
4.5.5 Conclusions for Investigations of Descriptors.................................... 86
Capturing Relationships — Principal Components....................................... 87
4.6.1 Principal Component Analysis (PCA)................................................ 87
4.6.1.1 Centering the Data................................................................ 89
4.6.1.2 Calculating the Covariance Matrix....................................... 89
4.6.2 Singular Value Decomposition (SVD)................................................ 91
4.6.3 Factor Analysis...................................................................................94
Transforming Descriptors.............................................................................. 95
4.7.1 Fourier Transform............................................................................... 95
4.7.2 Hadamard Transform..........................................................................96
4.7.3 Wavelet Transform..............................................................................96
4.7.4 Discrete Wavelet Transform...............................................................97
4.7.5 Daubechies Wavelets.......................................................................... 98
4.7.6 The Fast Wavelet Transform...............................................................99
Learning from Nature — Artificial Neural Networks................................. 102
4.8.1 Artificial Neural Networks in a Nutshell.......................................... 103
4.8.2 Kohonen Neural Networks — The Classifiers................................. 105
4.8.3 Counterpropagation (CPG) Neural Networks — .
The Predictors................................................................................... 107
4.8.4 The Tasks: Classification and Modeling........................................... 109
Genetic Algorithms (GAs)........................................................................... 110
11/13/07 2:08:39 PM
www.pdfgrip.com
Expert Systems in Chemistry Research
4.10 Concise Summary........................................................................................ 112
References............................................................................................................... 115
Chapter 5 Applying Molecular Descriptors....................................................... 119
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
5.10
5.11
5.12
5.13
5.14
5323X.indb 10
Introduction.................................................................................................. 119
Radial Distribution Functions (RDFs)......................................................... 119
5.2.1 Radial Distribution Function............................................................ 119
5.2.2 Smoothing and Resolution................................................................ 120
5.2.3 Resolution and Probability................................................................ 122
Making Things Comparable — Postprocessing of RDF Descriptors......... 123
5.3.1 Weighting.......................................................................................... 123
5.3.2 Normalization................................................................................... 124
5.3.3 Remark on Linear Scaling................................................................ 124
Adding Properties — Property-Weighted Functions................................... 125
5.4.1 Static Atomic Properties................................................................... 125
5.4.2 Dynamic Atomic Properties............................................................. 126
5.4.3 Property Products versus Averaged Properties................................ 126
Describing Patterns...................................................................................... 128
5.5.1 Distance Patterns.............................................................................. 129
5.5.2 Frequency Patterns............................................................................ 129
5.5.3 Binary Patterns................................................................................. 130
5.5.4 Aromatic Patterns............................................................................. 130
5.5.5 Pattern Repetition............................................................................. 130
5.5.6 Symmetry Effects............................................................................. 130
5.5.7 Pattern Matching with Binary Patterns............................................ 131
From the View of an Atom — Local and Restricted RDF Descriptors...... 131
5.6.1 Local RDF Descriptors..................................................................... 132
5.6.2 Atom-Specific RDF Descriptors....................................................... 132
Straight or Detour — Distance Function Types.......................................... 133
5.7.1 Cartesian RDF.................................................................................. 133
5.7.2 Bond-Path RDF................................................................................. 133
5.7.3 Topological Path RDF....................................................................... 134
Constitution and Conformation.................................................................... 135
Constitution and Molecular Descriptors...................................................... 136
Constitution and Local Descriptors............................................................. 139
Constitution and Conformation in Statistical Evaluations........................... 140
Extending the Dimension — Multidimensional Function Types................ 145
Emphasizing the Essential — Wavelet Transforms..................................... 147
5.13.1 Single-Level Transforms................................................................... 150
5.13.2Wavelet-Compressed Descriptors..................................................... 151
A Tool for Generation and Evaluation of RDF Descriptors — ARC.......... 151
5.14.1 Loading Structure Information......................................................... 153
5.14.2 The Default Code Settings................................................................ 153
5.14.3 Calculation and Investigation of a Single Descriptor....................... 154
5.14.4 Calculation and Investigation of Multiple Descriptor Sets............... 155
5.14.5Binary Comparison........................................................................... 155
11/13/07 2:08:39 PM
www.pdfgrip.com
Contents
xi
5.14.6 Correlation Matrices......................................................................... 155
5.14.7 Training a Neural Network............................................................... 155
5.14.8 Investigation of Trained Network..................................................... 157
5.14.9 Prediction and Classification for a Test Set...................................... 157
5.15 Synopsis....................................................................................................... 157
5.15.1 Similarity and Diversity of Molecules.............................................. 162
5.15.2Structure and Substructure Search................................................... 162
5.15.3Structure–Property Relationships..................................................... 162
5.15.4Structure–Activity Relationships...................................................... 162
5.15.5Structure–Spectrum Relationships................................................... 162
5.16 Concise Summary........................................................................................ 163
References............................................................................................................... 165
Chapter 6 Expert Systems in Fundamental Chemistry...................................... 167
6.1
6.2
6.3
6.4
6.5
6.6
6.7
5323X.indb 11
Introduction.................................................................................................. 167
How It Began — The DENDRAL Project.................................................. 167
6.2.1 The Generator — CONGEN............................................................. 168
6.2.2 The Constructor — PLANNER....................................................... 168
6.2.3 The Testing — PREDICTOR........................................................... 169
6.2.4 Other DENDRAL Programs............................................................ 171
A Forerunner in Medical Diagnostics......................................................... 171
Early Approaches in Spectroscopy.............................................................. 175
6.4.1 Early Approaches in Vibrational Spectroscopy................................ 176
6.4.2 Artificial Neural Networks for Spectrum Interpretation.................. 177
Creating Missing Information — Infrared Spectrum Simulation............... 178
6.5.1 Spectrum Representation.................................................................. 178
6.5.2 Compression with Fast Fourier Transform....................................... 179
6.5.3 Compression with Fast Hadamard Transform.................................. 179
From the Spectrum to the Structure — Structure Prediction...................... 179
6.6.1 The Database Approach.................................................................... 181
6.6.2 Selection of Training Data................................................................ 181
6.6.3 Outline of the Method....................................................................... 182
6.6.3.1 Preprocessing of Spectrum Information............................. 182
6.6.3.2 Preprocessing of Structure Information.............................. 182
6.6.3.3 Generation of a Descriptor Database.................................. 182
6.6.3.4 Training............................................................................... 182
6.6.3.5 Prediction of the Radial Distribution Function (RDF)
Descriptor............................................................................ 183
6.6.3.6 Conversion of the RDF Descriptor...................................... 184
6.6.4 Examples for Structure Derivation................................................... 184
6.6.5 The Modeling Approach................................................................... 187
6.6.6 Improvement of the Descriptor......................................................... 188
6.6.7 Database Approach versus Modeling Approach.............................. 189
From Structures to Properties...................................................................... 190
6.7.1 Searching for Similar Molecules in a Data Set................................ 191
11/13/07 2:08:40 PM
www.pdfgrip.com
xii
Expert Systems in Chemistry Research
6.7.2 Molecular Diversity of Data Sets...................................................... 193
6.7.2.1 Average Descriptor Approach............................................. 194
6.7.2.2 Correlation Approach.......................................................... 194
6.7.3 Prediction of Molecular Polarizability............................................. 199
6.8 Dealing with Localized Information — Nuclear Magnetic Resonance
(NMR) Spectroscopy................................................................................... 201
6.8.1 Commercially Available Products.................................................... 201
6.8.2 Local Descriptors for Nuclear Magnetic Resonance
Spectroscopy.....................................................................................202
6.8.3 Selecting Descriptors by Evolution...................................................205
6.8.4 Learning Chemical Shifts.................................................................206
6.8.5 Predicting Chemical Shifts...............................................................207
6.9 Applications in Analytical Chemistry.........................................................208
6.9.1 Gamma Spectrum Analysis..............................................................208
6.9.2 Developing Analytical Methods — Thermal Dissociation .
of Compounds...................................................................................209
6.9.3 Eliminating the Unnecessary — Supporting Calibration................ 215
6.10 Simulating Biology...................................................................................... 217
6.10.1 Estimation of Biological Activity..................................................... 217
6.10.2Radioligand Binding Experiments.................................................... 218
6.10.3Effective and Inhibitory Concentrations........................................... 219
6.10.4Prediction of Effective Concentrations............................................. 221
6.10.5Progestagen Derivatives.................................................................... 221
6.10.6Calcium Agonists.............................................................................. 223
6.10.7 Corticosteroid-Binding Globulin (CBG) Steroids............................224
6.10.8Mapping a Molecular Surface.......................................................... 226
6.11 Supporting Organic Synthesis..................................................................... 229
6.11.1 Overview of Existing Systems.......................................................... 230
6.11.2 Elaboration of Reactions for Organic Synthesis............................... 232
6.11.3 Kinetic Modeling in EROS............................................................... 233
6.11.4 Rules in EROS.................................................................................. 233
6.11.5 Synthesis Planning — Workbench for the Organization of Data
for Chemical Applications (WODCA).............................................. 234
6.12 Concise Summary........................................................................................ 236
References............................................................................................................... 239
Chapter 7 Expert Systems in Other Areas of Chemistry................................... 247
7.1
7.2
5323X.indb 12
Introduction.................................................................................................. 247
Bioinformatics.............................................................................................. 247
7.2.1 Molecular Genetics (MOLGEN)......................................................248
7.2.2 Predicting Toxicology — Deductive Estimation of Risk .
from Existing Knowledge (DEREK) for Windows.......................... 249
7.2.3 Predicting Metabolism — Meteor.................................................... 251
7.2.4 Estimating Biological Activity — APEX-3D................................... 251
7.2.5 Identifying Protein Structures..........................................................254
11/13/07 2:08:40 PM
www.pdfgrip.com
Contents
xiii
7.3
Environmental Chemistry............................................................................ 257
7.3.1 Environmental Assessment — Green Chemistry Expert .
System (GCES)................................................................................. 257
7.3.2 Synthetic Methodology Assessment for Reduction Techniques....... 258
7.3.3 Green Synthetic Reactions................................................................ 259
7.3.4 Designing Safer Chemicals...............................................................260
7.3.5 Green Solvents/Reaction Conditions................................................ 261
7.3.6 Green Chemistry References............................................................ 261
7.3.7 Dynamic Emergency Management — Real-Time Expert
System (RTXPS)............................................................................... 262
7.3.8 Representing Facts — Descriptors................................................... 262
7.3.9 Changing Facts — Backward-Chaining Rules................................. 263
7.3.10 Triggering Actions — Forward-Chaining Rules.............................. 263
7.3.11 Reasoning — The Inference Engine.................................................264
7.3.12 A Combined Approach for Environmental Management................. 265
7.3.13 Assessing Environmental Impact — EIAxpert................................266
7.4 Geochemistry and Exploration.................................................................... 267
7.4.1 Exploration........................................................................................ 267
7.4.2 Geochemistry.................................................................................... 268
7.4.3 X-Ray Phase Analysis....................................................................... 268
7.5 Engineering.................................................................................................. 269
7.5.1 Monitoring of Space-Based Systems — Thermal Expert System
(TEXSYS)......................................................................................... 269
7.5.2 Chemical Equilibrium of Complex Mixtures — CEA..................... 270
7.6 Concise Summary........................................................................................ 271
References............................................................................................................... 274
Chapter 8 Expert Systems in the Laboratory Environment............................... 277
8.1
8.2
8.3
5323X.indb 13
Introduction.................................................................................................. 277
Regulations................................................................................................... 277
8.2.1 Good Laboratory Practices............................................................... 278
8.2.1.1 Resources, Organization, and Personnel............................. 278
8.2.1.2 Rules, Protocols, and Written Procedures.......................... 278
8.2.1.3 Characterization.................................................................. 278
8.2.1.4 Documentation.................................................................... 278
8.2.1.5 Quality Assurance............................................................... 279
8.2.2 Good Automated Laboratory Practice (GALP)................................ 279
8.2.3 Electronic Records and Electronic Signatures (21 CFR Part 11).....280
The Software Development Process............................................................ 281
8.3.1 From the Requirements to the Implementation................................ 282
8.3.1.1 Analyzing the Requirements............................................... 282
8.3.1.2 Specifying What Has to Be Done....................................... 282
8.3.1.3 Defining the Software Architecture.................................... 282
8.3.1.4 Programming...................................................................... 282
8.3.1.5 Testing the Outcome........................................................... 283
11/13/07 2:08:40 PM
www.pdfgrip.com
xiv
Expert Systems in Chemistry Research
8.3.1.6 Documenting the Software.................................................. 283
8.3.1.7 Supporting the User............................................................. 283
8.3.1.8 Maintaining the Software................................................... 283
8.3.2 The Life Cycle of Software............................................................... 283
8.4 Knowledge Management.............................................................................. 287
8.4.1 General Considerations..................................................................... 287
8.4.2 The Role of a Knowledge Management System (KMS)................... 288
8.4.3 Architecture...................................................................................... 289
8.4.4 The Knowledge Quality Management Team....................................290
8.5 Data Warehousing........................................................................................290
8.6 The Basis — Scientific Data Management Systems.................................... 293
8.7 Managing Samples — Laboratory Information Management Systems
(LIMS)......................................................................................................... 295
8.7.1 LIMS Characteristics........................................................................ 296
8.7.2 Why Use a LIMS?............................................................................ 297
8.7.3 Compliance and Quality Assurance (QA)........................................ 297
8.7.4 The Basic LIMS................................................................................ 298
8.7.5 A Functional Model.......................................................................... 298
8.7.5.1 Sample Tracking.................................................................. 298
8.7.5.2 Sample Analysis.................................................................. 299
8.7.5.3 Sample Organization........................................................... 299
8.7.6 Planning System............................................................................... 299
8.7.7 The Controlling System....................................................................300
8.7.8 The Assurance System......................................................................300
8.7.9 What Else Can We Find in a LIMS?................................................ 301
8.7.9.1 Automatic Test Programs.................................................... 301
8.7.9.2 Off-Line Client.................................................................... 301
8.7.9.3 Stability Management......................................................... 301
8.7.9.4 Reference Substance Module..............................................302
8.7.9.5 Recipe Administration........................................................302
8.8 Tracking Workflows — Workflow Management Systems...........................302
8.8.1 Requirements.................................................................................... 303
8.8.2 The Lord of the Runs........................................................................ 303
8.8.3 Links and Logistics...........................................................................304
8.8.4 Supervisor and Auditor.....................................................................304
8.8.5 Interfacing......................................................................................... 305
8.9 Scientific Documentation — Electronic Laboratory Notebooks (ELNs).... 305
8.9.1 The Electronic Scientific Document.................................................307
8.9.2 Scientific Document Templates........................................................309
8.9.3 Reporting with ELNs........................................................................ 310
8.9.4 Optional Tools in ELNs.................................................................... 310
8.10 Scientific Workspaces.................................................................................. 312
8.10.1 Scientific Workspace Managers........................................................ 313
8.10.2Navigation and Organization in a Scientific Workspace.................. 315
8.10.3Using Metadata Effectively.............................................................. 315
5323X.indb 14
11/13/07 2:08:41 PM
www.pdfgrip.com
Contents
8.11
8.12
8.13
8.14
8.15
8.16
5323X.indb 15
xv
8.10.4Working in Personal Mode............................................................... 319
8.10.5Differences of Electronic Scientific Documents.............................. 319
Interoperability and Interfacing................................................................... 320
8.11.1 eXtensible Markup Language (XML)-Based Technologies............. 320
8.11.1.1 Simple Object Access Protocol (SOAP).............................. 321
8.11.1.2 Universal Description, Discovery, and Integration
(UDDI)................................................................................ 321
8.11.1.3 Web Services Description Language (WSDL).................... 321
8.11.2 Component Object Model (COM) Technologies.............................. 321
8.11.3 Connecting Instruments — Interface Port Solutions........................ 322
8.11.4 Connecting Serial Devices................................................................ 322
8.11.5 Developing Your Own Connectivity — Software .
Development Kits (SDKs)................................................................. 324
8.11.6 Capturing Data — Intelligent Agents............................................... 325
8.11.7 The Inbox Concept............................................................................ 327
Access Rights and Administration.............................................................. 328
Electronic Signatures, Audit Trails, and IP Protection................................ 329
8.13.1 Signature Workflow.......................................................................... 329
8.13.2Event Messaging............................................................................... 331
8.13.3Audit Trails and IP Protection.......................................................... 331
8.13.4Hashing Data.................................................................................... 331
8.13.5Public Key Cryptography................................................................. 332
8.13.5.1Secret Key Cryptography.................................................... 333
8.13.5.2Public Key Cryptography.................................................... 333
Approaches for Search and Reuse of Data and Information....................... 333
8.14.1 Searching for Standard Data............................................................. 334
8.14.2Searching with Data Cartridges........................................................ 334
8.14.3Mining for Data................................................................................ 335
8.14.4The Outline of a Data Mining Service for Chemistry...................... 336
8.14.4.1Search and Processing of Raw Data................................... 336
8.14.4.2Calculation of Descriptors.................................................. 337
8.14.4.3Analysis by Statistical Methods.......................................... 337
8.14.4.4Analysis by Artificial Neural Networks.............................. 337
8.14.4.5Optimization by Genetic Algorithms.................................. 338
8.14.4.6Data Storage........................................................................ 338
8.14.4.7 Expert Systems.................................................................... 338
A Bioinformatics LIMS Approach.............................................................. 338
8.15.1 Managing Biotransformation Data................................................... 339
8.15.2Describing Pathways.........................................................................340
8.15.3Comparing Pathways........................................................................ 342
8.15.4Visualizing Biotransformation Studies............................................. 343
8.15.5Storage of Biotransformation Data...................................................344
Handling Process Deviations.......................................................................344
8.16.1 Covered Business Processes............................................................. 345
8.16.2Exception Recording.........................................................................346
8.16.2.1Basic Information Entry......................................................346
8.16.2.2Risk Assessment..................................................................346
11/13/07 2:08:41 PM
www.pdfgrip.com
xvi
Expert Systems in Chemistry Research
8.16.2.3Cause Analysis.................................................................... 347
8.16.2.4Corrective Actions............................................................... 347
8.16.2.5Efficiency Checks................................................................348
8.16.3Complaints Management..................................................................348
8.16.4Approaches for Expert Systems........................................................ 349
8.17 Rule-Based Verification of User Input......................................................... 350
8.17.1 Creating User Dialogues................................................................... 350
8.17.2 User Interface Designer (UID)......................................................... 351
8.17.3 The Final Step — Rule Generation.................................................. 354
8.18 Concise Summary........................................................................................ 354
References............................................................................................................... 358
Chapter 9 Outlook.............................................................................................. 361
9.1
9.2
9.3
Introduction.................................................................................................. 361
Attempting a Definition............................................................................... 361
Some Critical Considerations....................................................................... 362
9.3.1 The Comprehension Factor............................................................... 363
9.3.2 The Resistance Factor....................................................................... 363
9.3.3 The Educational Factor..................................................................... 363
9.3.4 The Usability Factor.........................................................................364
9.3.5 The Commercial Factor.................................................................... 365
9.4 Looking Forward......................................................................................... 365
Reference................................................................................................................ 366
Index....................................................................................................................... 367
5323X.indb 16
11/13/07 2:08:41 PM
www.pdfgrip.com
Preface
Sitting at the breakfast room of my hotel I thought — and not for the first time
— about this term expert. Would I consider myself a specialist for jam, just because
I eat it every morning at breakfast? And then — why not? Isn’t a consumer a specialist because of his experience with consuming products? Certainly, he would not be
considered an expert for the raw materials, the production process, or the quality
control in jam production. Still, he is the consumer, so he knows the most about
consuming jam.
Whatever constitutes an expert, one of the fascinating topics of computer science arises from the question of how to take advantage of the expert’s knowledge in
a computer program. Wouldn’t it be great if we could transfer some of the expert’s
knowledge and reasoning to a computer program and to use this program for education and problem solving?
If we continue thinking in this direction, we encounter a serious problem: We all
know that human reasoning and decision making are a result of knowledge, experience, and intuition. How can such a thing like intuition be expressed in logical
terms? The answer is that it cannot.
In fact, knowledge and reasoning cannot really be expressed in static terms,
since it is a result of a complex combination of all three properties. However, computer software is able to store facts. If we were able to describe the relationships
between facts and the more complex topics of knowledge, experience, and intuition,
we could imagine software that does reasoning and makes decisions.
At the beginning of a lecture about artificial intelligence at the university, I
explained to my students that this topic is basically simple, since it has to do with
our perception of the world rather than the sometimes complex computational point
of view. One of them looked at me and asked, “Why then do I have to learn it here?”
The answer is, “Because simpler things may be much more difficult to understand
due to their general nature.”
This is the very crux of expert systems: On one the hand, we have a somewhat
complex logic that we have to develop and encode in a computer program, to make
things easier for the expert sitting in front of the screen; on the other hand, the more
generalized context of the expert might be even harder to understand than the program running in the background. And, there is still a big gap between the expert and
the expert system.
During my Internet research for this book I stumbled across a nice phrase in a
presentation from Joy Scaria from the Biological Sciences Group of the Birla Institute of Technology and Science in Pilani, India. In an introductionary slide, she
stated, “Bioinformatics is … complicating biology with introducing algorithms,
scripts, statistics, and confusing softwares so that no one understands it anymore.…”
Sometimes, I think, this applies to expert systems and the underlying field of artificial intelligence, as well. We tend too often to complicate things rather than to
simplify them. Consequently, one of my goals in writing this book was to simplify
things as well as possible; unfortunately, I did not succeed entirely.
xvii
5323X.indb 17
11/13/07 2:08:41 PM
www.pdfgrip.com
xviii
Expert Systems in Chemistry Research
It lies in the nature of the topic that the mathematical, information technology,
and regulative aspects have to be formulated in a more complex manner; whereas
conceptual aspects could be described in a more entertaining fashion. The book
shall finally be a good mixture of scientific literature and a captivating novel.
There is nothing else than to wish an exciting journey into chemistry’s future.
Markus C. Hemmer
Bonn, Germany
5323X.indb 18
11/13/07 2:08:42 PM
www.pdfgrip.com
Acknowledgments
This book is finally a result of a series of discussions and contributions around the
scientific idea of expert systems, and particularly around simplifying the matters. A
series of people contributed to this goal in one way or the other, and I would like to
express my gratitude to them: Dr. Joao Aires de Sousa (Department of Chemistry,
New University of Lisbon, Caparica, Portugal), Dr. Jürgen Angerer (Institute and
Outpatient Clinic of Occupational, Social and Environmental Medicine, University
of Erlangen, Germany), Ulrike Burkard (Institute for Didactics in Physics, University of Bremen, Germany), Dr. Roberta Bursi (N.V. Organon, The Netherlands), Dr.
Antony N. Davies (Division of Chemistry and Forensic Science, University of Glamorgan, United Kingdom), Dr. Thomas Engel (Chemical Computing Group AG, Köln,
Germany), Dr. Thorsten Fröhlich (Waters GmbH, Frechen), Dr. Johann Gasteiger
(Computer Chemistry Center, University of Erlangen, Germany), Dr. Wolfgang Graf
zu Castell (Institute of Biomathematics and Biometrics, GSF, Neuherberg, Germany),
Alexander von Homeyer (Computer Chemistry Center, University of Erlangen, Germany), Dr. Ulrich Jordis (Institute for Applied Synthesis Chemistry, University
of Vienna, Austria), Michael McBrian (Advanced Chemistry Development, Inc.,
Toronto, Canada), Dr. Reinhard Neudert (Wiley-VCH, Weinheim, Germany), Dr.
Livia Sangeorzan (Department of Computer Science, University Transilvania of
Brasov, Romania), Dr. Thomas Sauer (Institute of Mathematics, University of Erlangen, Germany), Dr. Axel Schunk (Institute for Didactics of Chemistry, University
of Frankfurt Main, Germany), Dr. Christoph Schwab (Molecular Networks GmbH,
Erlangen, Germany), Dr. Valentin Steinhauer (Danet Group, Darmstadt, Germany),
Dr. Jürgen Sühnel (Bioinformatics Group, Institute of Molecular Biology, Jena, Germany), Dr. Lothar Terfloth (Computer Chemistry Center, University of Erlangen,
Germany), and Dr. Heiner Tobschall (Department of Applied Geology, University of
Erlangen, Germany).
Parts of the research in this work were supported by the German Federal Ministry of Education and Research (BMFT), the German National Research and Education Network (DFN), the German Academic Exchange Service (DAAD), the
National Cancer Institute (U.S. National Institutes of Health), and Waters Corporation, Milford, Massachusetts.
xix
5323X.indb 19
11/13/07 2:08:42 PM
www.pdfgrip.com
5323X.indb 20
11/13/07 2:08:42 PM
www.pdfgrip.com
Trademark Information
Apache® is a registered trademark of Apache Software Foundation.
CA® and CAS® are registered trademarks of the American Chemical Society.
Citrix® is a registered trademark of Citrix Systems, Inc.
Citrix®, the Citrix logo, ICA®, Program Neighborhood®, MetaFrame®, WinFrame®,
VideoFrame®, MultiWin®, and other Citrix product names referenced herein are
trademarks of Citrix Systems, Inc.
CleverPath® and Aion® are registered trademarks of Computer Associates International, Inc., New York.
Contergan® is registered trademark of Grünenthal GmbH, Aachen, Germany.
CORINA® and WODCA® are registered trademarks of Molecular Networks GmbH,
Erlangen, Germany.
EXSYS® and EXSYS CORVID® are registered trademarks of EXSYS Inc.,
Albuquerque, New Mexico.
HTML®, XML®, XHTML®, and W3C® are trademarks or registered trademarks of
W3C®, World Wide Web Consortium, Massachusetts Institute of Technology.
HyperChem® is a registered trademark of Hypercube, Inc.
IBM®, DB2®, OS/2®, Parallel Sysplex®, and WebSphere® are trademarks of IBM
Corporation in the United States and/or other countries.
Java® is a registered trademark of Sun Microsystems, Inc.
JavaScript® is a registered trademark of Sun Microsystems, Inc., used under license
for technology invented and implemented by Netscape®.
JBoss® is a registered trademark of Red Hat, Inc., New Orleans, Louisiana.
Jess® is a registered trademark of Sandia National Laboratories.
Loom® is a registered trademark of the University of Southern California.
MDL® and ISIS® are trademarks of Elsevier MDL.
Microsoft®, WINDOWS®, NT®, EXCEL®, Word®, PowerPoint®, ODBC®, OLE®,
.NET®, ActiveX®, and SQL Server® are registered trademarks of Microsoft
Corporation.
MOLGEN® is a registered trademark of Stanford University.
MySQL®is a trademark of MySQL AB, Sweden.
NUTS® is a registered trademark of Acorn NMR, Inc.
Oracle® is a registered trademark of Oracle Corporation.
SAP® is a registered trademark of SAP AG, Germany.
SMILESTM is a trademark and SMARTS® is a registered trademark of Daylight
Chemical Information Systems Inc., Aliso Viejo, California.
Sun®, Sun Microsystems®, Solaris®, Java®, JavaServer Web Development Kit®, and
JavaServer Pages® are trademarks or registered trademarks of Sun Microsystems,
Inc.
UNIX®, X/Open, OSF/1, and Motif are registered trademarks of the Open Group.
UNIX®is a registered trademark of the Open Group.
xxi
5323X.indb 21
11/13/07 2:08:42 PM
www.pdfgrip.com
xxii
Expert Systems in Chemistry Research
VAX® and VMS® are registered trademarks of Digital Equipment Corporation,
Maynard, Massachusetts.
V-Modell® is a registered trademark of Bundesministerium des Innern (BMI),
Germany.
All other product names mentioned in this book are trademarks of their respective owners.
5323X.indb 22
11/13/07 2:08:42 PM
www.pdfgrip.com
1
Introduction
1.1 Introduction
Allen Newell — a researcher in computer science and cognitive psychology in the
School of Computer Science at Carnegie Mellon University — was involved in the
design of some of the earliest developments in artificial intelligence. In 1976 he presented a speech at Carnegie Mellon University that was later published in his essay
“Fairy Tales,” from which the following statement was taken [1]:
Exactly what the computer provides is the ability not to be rigid and unthinking but,
rather, to behave conditionally. That is what it means to apply knowledge to action: It
means to let the action taken reflect knowledge of the situation, to be sometimes this
way, sometimes that, as appropriate.
What is interesting about this statement is that it apparently contradicts our perception of computers: Computers usually have the stigma of being not as reasonable
and adaptive as the human brain. However, if we think about what a human decision
characterizes, we find that exactly what Newell described as behaving conditionally
is an ultimate basis for reasoning, evaluation, and assessment.
A second fundamental idea behind this phrase is the concept of applying knowledge to action. Although theoretical research is important for modeling a scientific
basis, the application of scientific ideas in the best sense is what finally leads to
advancement, a fundamental concept of evolution on Earth.
Expert systems particularly use conditional methodologies and are designed to
apply knowledge to practical problems. Knowledge and experience play a major role
in the handling of scientific information. Computers are indispensable tools for processing and retrieval of the huge amounts of laboratory data, and expert systems can
aid an expert in making decisions about a certain problem. Human experts rely on
experience as well as on knowledge. Experience can be regarded as a specialized kind
of knowledge created by a complex interaction of rules and decisions. Instead of representing knowledge in a static way, rule-based systems represent knowledge in terms
of rules that lead to conclusions. Computer software will never be able to replace
the human expert in interpreting information. However, expert systems can assist the
human expert by organizing information and by making estimations and predictions.
1.2 What We Are Talking About
The patient reader will find out that this book is unconventional in several aspects.
This is partially due to the fact that expert systems are an unconventional topic and
require having from time to time a different point of view than one would expect at
the beginning.
5323X.indb 1
11/13/07 2:08:42 PM
www.pdfgrip.com
Expert Systems in Chemistry Research
This book is primarily written for scientists — particularly chemists and biochemists — and the thread leading through this book is the application of expert
systems and similar software in chemistry, biochemistry, and related research areas. I
know that scientists expect an overview on existing research before the book or article
starts with the news. We will follow this approach, however, only if necessary for the
context of the new content that is presented. The information technology aspect is
described to an extent that allows a scientist to understand the principles rather than
instructions given on how to develop expert systems. Nevertheless, we will deal with
a series of mathematical aspects from cheminformatics and chemometrics that are
required to represent chemical information in a computer. The mathematical techniques are introduced at a reasonable level and provide the background to understand
the examples of expert systems and their application that are introduced afterward.
The expert systems introduced are selected to cover different application areas. Even
though the list is far from being complete, we will find some historical systems and
more recent noncommercial and commercial software. Since applying expert systems
requires integrating them with other software, we will finally have a look at the laboratory software environment. This final topic will give the reader an idea about the
challenges occuring with the practical implementation of such systems in the research
laboratories and, hopefully, will support his own ideas for developing or applying
expert systems. The following gives a summary on what to find in this book.
Later in this chapter, we will start thinking about some initial aspects of intelligence to get a framework for what we want to address.
Chapter 2 gives an overview of the ideas and concepts underlying the term expert
systems. This is a necessary basis for understanding the approaches that are subsequently described. At this point, we will focus on two domains: the conceptional
background and the scientific methodologies that support the concepts.
Chapter 3 provides a concise summary on technical design, programming paradigms, and development tools for expert systems.
Chapter 4 covers technologies for representing and processing chemical information in a computer as well as a summary of supporting technologies that typically
appear in expert systems.
Chapter 5 deals with a particular method for describing molecules in a computer,
which acts as representative for molecular descriptors, a topic of particular importance when dealing with chemistry in software. It introduces applications for the
different molecular descriptors depicted in the previous chapter and closes with the
introduction of a software package developed for the investigations.
Chapter 6 covers experts systems and their application in chemistry research,
starting with the historical forerunners of these systems and then focusing on different application areas.
Chapter 7 comprises expert system applications in areas related to chemistry
research, such as bioinformatics and industrial areas.
Chapter 8 deals with the software environment in the laboratory, gives an overview on
typical software packages, and describes requirements that have to be taken into account
when using expert systems in conjunction with the existing laboratory software.
Chapter 9 closes with a generic definition and a critical assessment of expert
systems and an outlook.
5323X.indb 2
11/13/07 2:08:43 PM