Tải bản đầy đủ (.pdf) (340 trang)

Service quality cloud based applications bauer 1028 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (9.37 MB, 340 trang )



SERVICE QUALITY
OF CLOUD-BASED
APPLICATIONS



SERVICE QUALITY
OF CLOUD-BASED
APPLICATIONS

Eric Bauer
Randee Adams

IEEE PRESS


Copyright © 2014 by The Institute of Electrical and Electronics Engineers, Inc.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey. All rights reserved
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or
by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as
permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior
written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to
the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax
(978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be
addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030,
(201) 748-6011, fax (201) 748-6008, or online at />Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or
completeness of the contents of this book and specifically disclaim any implied warranties of


merchantability or fitness for a particular purpose. No warranty may be created or extended by sales
representatives or written sales materials. The advice and strategies contained herein may not be suitable
for your situation. You should consult with a professional where appropriate. Neither the publisher nor
author shall be liable for any loss of profit or any other commercial damages, including but not limited to
special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our
Customer Care Department within the United States at (800) 762-2974, outside the United States at
(317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may
not be available in electronic formats. For more information about Wiley products, visit our web site at
www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Bauer, Eric.
  Service quality of cloud-based applications / Eric Bauer, Randee Adams.
    pages cm
  ISBN 978-1-118-76329-2 (cloth)
  1.  Cloud computing.  2.  Application software–Reliability.  3.  Quality of service (Computer
networks) I.  Adams, Randee.  II.  Title.
  QA76.585.B3944 2013
  004.67'82–dc23

2013026569
Printed in the United States of America
10  9  8  7  6  5  4  3  2  1


CONTENTS

Figures


xv

Tables and Equations

xxi

1

I
2

INTRODUCTION
1.1
Approach
1.2
Target Audience
1.3
Organization

1
1
3
3

CONTEXT

7

APPLICATION SERVICE QUALITY
2.1

Simple Application Model
2.2
Service Boundaries
2.3
Key Quality and Performance Indicators
2.4
Key Application Characteristics
2.4.1
Service Criticality
2.4.2
Application Interactivity
2.4.3
Tolerance to Network Traffic Impairments
2.5
Application Service Quality Metrics
2.5.1
Service Availability
2.5.2
Service Latency
2.5.3
Service Reliability
2.5.4
Service Accessibility
2.5.5
Service Retainability

9
9
11
12

15
15
16
17
17
18
19
24
25
25
v


vi

Contents

2.6

2.7

3

4

2.5.6
Service Throughput
2.5.7
Service Timestamp Accuracy
2.5.8

Application-Specific Service Quality Measurements
Technical Service versus Support Service
2.6.1
Technical Service Quality
2.6.2
Support Service Quality
Security Considerations

25
26
26
27
27
27
28

CLOUD MODEL
3.1
Roles in Cloud Computing
3.2
Cloud Service Models
3.3
Cloud Essential Characteristics
3.3.1
On-Demand Self-Service
3.3.2
Broad Network Access
3.3.3
Resource Pooling
3.3.4

Rapid Elasticity
3.3.5
Measured Service
3.4
Simplified Cloud Architecture
3.4.1
Application Software
3.4.2
Virtual Machine Servers
3.4.3
Virtual Machine Server Controllers
3.4.4
Cloud Operations Support Systems
3.4.5
Cloud Technology Components Offered “as-a-Service”
3.5
Elasticity Measurements
3.5.1
Density
3.5.2
Provisioning Interval
3.5.3
Release Interval
3.5.4
Scaling In and Out
3.5.5
Scaling Up and Down
3.5.6
Agility
3.5.7

Slew Rate and Linearity
3.5.8
Elasticity Speedup
3.6
Regions and Zones
3.7
Cloud Awareness

29
30
30
31
31
31
32
32
33
33
34
35
35
36
36
36
37
37
39
40
41
42

43
44
44
45

VIRTUALIZED INFRASTRUCTURE IMPAIRMENTS
4.1
Service Latency, Virtualization, and the Cloud
4.1.1
Virtualization and Cloud Causes of Latency Variation
4.1.2
Virtualization Overhead
4.1.3
Increased Variability of Infrastructure Performance
4.2
VM Failure
4.3
Nondelivery of Configured VM Capacity

49
50
51
52
53
54
54


vii


Contents

4.4
4.5
4.6
4.7
4.8
4.9

II
5

6

Delivery of Degraded VM Capacity
Tail Latency
Clock Event Jitter
Clock Drift
Failed or Slow Allocation and Startup of VM Instance
Outlook for Virtualized Infrastructure Impairments

57
59
60
61
62
63

ANALYSIS


65

APPLICATION REDUNDANCY AND CLOUD COMPUTING
5.1
Failures, Availability, and Simplex Architectures
5.2
Improving Software Repair Times via Virtualization
5.3
Improving Infrastructure Repair Times via Virtualization
5.3.1
Understanding Hardware Repair
5.3.2
VM Repair-as-a-Service
5.3.3
Discussion
5.4
Redundancy and Recoverability
5.4.1
Improving Recovery Times via Virtualization
5.5
Sequential Redundancy and Concurrent Redundancy
5.5.1
Hybrid Concurrent Strategy
5.6
Application Service Impact of Virtualization Impairments
5.6.1
Service Impact for Simplex Architectures
5.6.2
Service Impact for Sequential Redundancy
Architectures

5.6.3
Service Impact for Concurrent Redundancy
Architectures
5.6.4
Service Impact for Hybrid Concurrent Architectures
5.7
Data Redundancy
5.7.1
Data Storage Strategies
5.7.2
Data Consistency Strategies
5.7.3
Data Architecture Considerations
5.8
Discussion
5.8.1
Service Quality Impact
5.8.2
Concurrency Control
5.8.3
Resource Usage
5.8.4
Simplicity
5.8.5
Other Considerations

67
68
70
72

72
72
74
75
79
80
83
84
85

LOAD DISTRIBUTION AND BALANCING
6.1
Load Distribution Mechanisms
6.2
Load Distribution Strategies

97
97
99

85
87
88
90
90
91
92
92
93
93

94
94
95


viii

Contents

6.3
6.4
6.5
6.6
6.7
6.8
6.9
6.10

6.11

7

8

Proxy Load Balancers
Nonproxy Load Distribution
Hierarchy of Load Distribution
Cloud-Based Load Balancing Challenges
The Role of Load Balancing in Support of Redundancy
Load Balancing and Availability Zones

Workload Service Measurements
Operational Considerations
6.10.1 Load Balancing and Elasticity
6.10.2 Load Balancing and Overload
6.10.3 Load Balancing and Release Management
Load Balancing and Application Service Quality
6.11.1 Service Availability
6.11.2 Service Latency
6.11.3 Service Reliability
6.11.4 Service Accessibility
6.11.5 Service Retainability
6.11.6 Service Throughput
6.11.7 Service Timestamp Accuracy

99
101
102
103
103
104
104
105
105
106
107
107
107
108
108
109

109
109
109

FAILURE CONTAINMENT
7.1
Failure Containment
7.1.1
Failure Cascades
7.1.2
Failure Containment and Recovery
7.1.3
Failure Containment and Virtualization
7.2
Points of Failure
7.2.1
Single Points of Failure
7.2.2
Single Points of Failure and Virtualization
7.2.3
Affinity and Anti-affinity Considerations
7.2.4
No SPOF Assurance in Cloud Computing
7.2.5
No SPOF and Application Data
7.3
Extreme Solution Coresidency
7.3.1
Extreme Solution Coresidency Risks
7.4

Multitenancy and Solution Containers

111
111
112
112
114
116
116
117
119
120
121
122
123
124

CAPACITY MANAGEMENT
8.1
Workload Variations
8.2
Traditional Capacity Management
8.3
Traditional Overload Control
8.4
Capacity Management and Virtualization
8.5
Capacity Management in Cloud

127

128
129
129
131
133


ix

Contents

8.6
8.7
8.8
8.9
8.10
8.11

9

10

Storage Elasticity Considerations
Elasticity and Overload
Operational Considerations
Workload Whipsaw
General Elasticity Risks
Elasticity Failure Scenarios
8.11.1 Elastic Growth Failure Scenarios
8.11.2 Elastic Capacity Degrowth Failure Scenarios


135
136
137
138
140
141
141
143

RELEASE MANAGEMENT
9.1
Terminology
9.2
Traditional Software Upgrade Strategies
9.2.1
Software Upgrade Requirements
9.2.2
Maintenance Windows
9.2.3
Client Considerations for Application Upgrade
9.2.4
Traditional Offline Software Upgrade
9.2.5
Traditional Online Software Upgrade
9.2.6
Discussion
9.3
Cloud-Enabled Software Upgrade Strategies
9.3.1

Type I Cloud-Enabled Upgrade Strategy:
Block Party
9.3.2
Type II Cloud-Enabled Upgrade Strategy:
One Driver per Bus
9.3.3
Discussion
9.4
Data Management
9.5
Role of Service Orchestration in Software Upgrade
9.5.1
Solution-Level Software Upgrade
9.6
Conclusion

145
145
146
146
148
149
150
151
153
153

END-TO-END CONSIDERATIONS
10.1 End-to-End Service Context
10.2 Three-Layer End-to-End Service Model

10.2.1 Estimating Service Impairments via the
Three-Layer Model
10.2.2 End-to-End Service Availability
10.2.3 End-to-End Service Latency
10.2.4 End-to-End Service Reliability
10.2.5 End-to-End Service Accessibility
10.2.6 End-to-End Service Retainability
10.2.7 End-to-End Service Throughput
10.2.8 End-to-End Service Timestamp Accuracy
10.2.9 Reality Check

163
163
169

154
156
157
158
159
160
161

171
172
173
174
175
176
176

177
177


x

Contents

10.3

10.4
10.5

Distributed and Centralized Cloud Data Centers
10.3.1 Centralized Cloud Data Centers
10.3.2 Distributed Cloud Data Centers
10.3.3 Service Availability Considerations
10.3.4 Service Latency Considerations
10.3.5 Service Reliability Considerations
10.3.6 Service Accessibility Considerations
10.3.7 Service Retainability Considerations
10.3.8 Resource Distribution Considerations
Multitiered Solution Architectures
Disaster Recovery and Geographic Redundancy
10.5.1 Disaster Recovery Objectives
10.5.2 Georedundant Architectures
10.5.3 Service Quality Considerations
10.5.4 Recovery Point Considerations
10.5.5 Mitigating Impact of Disasters with
Georedundancy and Availability Zones


III RECOMMENDATIONS
11 ACCOUNTABILITIES FOR SERVICE QUALITY
11.1
11.2
11.3
11.4

11.5

11.6

12

Traditional Accountability
The Cloud Service Delivery Path
Cloud Accountability
Accountability Case Studies
11.4.1 Accountability and Technology Components
11.4.2 Accountability and Elasticity
Service Quality Gap Model
11.5.1 Application’s Resource Facing Service
Gap Analysis
11.5.2 Application’s Customer Facing Service
Gap Analysis
Service Level Agreements

SERVICE AVAILABILITY MEASUREMENT
12.1 Parsimonious Service Measurements
12.2 Traditional Service Availability Measurement

12.3 Evolving Service Availability Measurements
12.3.1 Analyzing Application Evolution
12.3.2 Technology Components
12.3.3 Leveraging Storage-as-a-Service

177
178
178
179
181
182
182
182
182
183
184
184
185
186
187
189

191
193
193
194
197
200
201
203

205
206
208
210
213
214
215
217
218
223
224


xi

Contents

12.4
12.5
12.6
12.7

13

14

Evolving Hardware Reliability Measurement
12.4.1 Virtual Machine Failure Lifecycle
Evolving Elasticity Service Availability Measurements
Evolving Release Management Service Availability

Measurement
Service Measurement Outlook

APPLICATION SERVICE QUALITY REQUIREMENTS
13.1 Service Availability Requirements
13.2 Service Latency Requirements
13.3 Service Reliability Requirements
13.4 Service Accessibility Requirements
13.5 Service Retainability Requirements
13.6 Service Throughput Requirements
13.7 Timestamp Accuracy Requirements
13.8 Elasticity Requirements
13.9 Release Management Requirements
13.10 Disaster Recovery Requirements
VIRTUALIZED INFRASTRUCTURE MEASUREMENT
AND MANAGEMENT
14.1 Business Context for Infrastructure Service Quality
Measurements
14.2 Cloud Consumer Measurement Options
14.3 Impairment Measurement Strategies
14.3.1 Measurement of VM Failure
14.3.2 Measurement of Nondelivery of Configured
VM Capacity
14.3.3 Measurement of Delivery of Degraded VM
Capacity
14.3.4 Measurement of Tail Latency
14.3.5 Measurement of Clock Event Jitter
14.3.6 Measurement of Clock Drift
14.3.7 Measurement of Failed or Slow Allocation and
Startup of VM Instance

14.3.8 Measurements Summary
14.4 Managing Virtualized Infrastructure Impairments
14.4.1 Minimize Application’s Sensitivity to Infrastructure
Impairments
14.4.2 VM-Level Congestion Detection and Control
14.4.3 Allocate More Virtual Resource Capacity

226
226
228
229
231
233
234
237
237
238
239
239
240
240
241
241

243
244
245
247
247
249

249
249
250
250
250
251
252
252
252
253


xii

Contents

14.4.4
14.4.5
14.4.6
14.4.7
14.4.8
14.4.9

15

16

Terminate Poorly Performing VM Instances
Accept Degraded Performance
Proactive Supplier Management

Reset End Users’ Service Quality Expectations
SLA Considerations
Changing Cloud Service Providers

253
253
254
254
254
254

ANALYSIS OF CLOUD-BASED APPLICATIONS
15.1 Reliability Block Diagrams and Side-by-Side Analysis
15.2 IaaS Impairment Effects Analysis
15.3 PaaS Failure Effects Analysis
15.4 Workload Distribution Analysis
15.4.1 Service Quality Analysis
15.4.2 Overload Control Analysis
15.5 Anti-Affinity Analysis
15.6 Elasticity Analysis
15.6.1 Service Capacity Growth Scenarios
15.6.2 Service Capacity Growth Action Analysis
15.6.3 Service Capacity Degrowth Action Analysis
15.6.4 Storage Capacity Growth Scenarios
15.6.5 Online Storage Capacity Growth Action Analysis
15.6.6 Online Storage Capacity Degrowth Action Analysis
15.7 Release Management Impact Effects Analysis
15.7.1 Service Availability Impact
15.7.2 Server Reliability Impact
15.7.3 Service Accessibility Impact

15.7.4 Service Retainability Impact
15.7.5 Service Throughput Impact
15.8 Recovery Point Objective Analysis
15.9 Recovery Time Objective Analysis

255
256
257
259
260
261
261
262
263
264
264
265
265
266
266
267
267
267
267
267
267
268
270

TESTING CONSIDERATIONS

16.1 Context for Testing
16.2 Test Strategy
16.2.1 Cloud Test Bed
16.2.2 Application Capacity under Test
16.2.3 Statistical Confidence
16.2.4 Service Disruption Time
16.3 Simulating Infrastructure Impairments
16.4 Test Planning
16.4.1 Service Reliability and Latency Testing
16.4.2 Impaired Infrastructure Testing

273
273
274
275
275
276
276
277
278
279
280


xiii

Contents

16.4.3
16.4.4

16.4.5
16.4.6
16.4.7
16.4.8
16.4.9
16.4.10
16.4.11

17

Robustness Testing
Endurance/Stability Testing
Application Elasticity Testing
Upgrade Testing
Disaster Recovery Testing
Extreme Coresidency Testing
PaaS Technology Component Testing
Automated Regression Testing
Canary Release Testing

CONNECTING THE DOTS
17.1 The Application Service Quality Challenge
17.2 Redundancy and Robustness
17.3 Design for Scalability
17.4 Design for Extensibility
17.5 Design for Failure
17.6 Planning Considerations
17.7 Evolving Traditional Applications
17.7.1 Phase 0: Traditional Application
17.7.2 Phase I: High Service Quality on Virtualized

Infrastructure
17.7.3 Phase II: Manual Application Elasticity
17.7.4 Phase III: Automated Release Management
17.7.5 Phase IV: Automated Application Elasticity
17.7.6 Phase V: VM Migration
17.8 Concluding Remarks

280
282
284
285
285
286
286
286
286
287
287
289
292
292
293
294
296
298
298
299
299
300
300

301

Abbreviations

303

References

307

About the Authors

311

Index

313



FIGURES

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

1.1.
2.0.
2.1.
2.2.
2.3.
2.4.
2.5.
2.6.
2.7.
2.8.
2.9.
2.10.
2.11.

2.12.
2.13.
2.14.
3.1.
3.2.
3.3.
3.4.
3.5.
3.6.
3.7.
3.8.

Sample Cloud-Based Application.
Organization of Part I: Context.
Simple Cloud-Based Application.
Simple Virtual Machine Service Model.
Application Service Boundaries.
KQIs and KPIs.
Application Consumer and Resource Facing Service Indicators.
Application Robustness.
Sample Application Robustness Scenario.
Interactivity Timeline.
Service Latency.
Small Sample Service Latency Distribution.
Sample Typical Latency Variation by Workload Density.
Sample Tail Latency Variation by Workload Density.
Understanding Complimentary Cumulative Distribution Plots.
Service Latency Optimization Options.
Cloud Roles for Simple Application.
Elastic Growth Strategies.

Simple Model of Cloud Infrastructure.
Abstract Virtual Machine Server.
Provisioning Interval (TGrow).
Release Interval TShrink.
VM Scale In and Scale Out.
Horizontal Elasticity.

2
8
10
10
11
12
14
14
15
16
19
22
22
23
23
24
30
32
34
35
38
39
40

40
xv


xvi

Figures

Figure
Figure
Figure
Figure
Figure
Figure

3.9.
3.10.
3.11.
3.12.
3.13.
4.1.

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

Figure

4.2.
4.3.
4.4.
4.5.
4.6.
4.7.
4.8.
4.9.
4.10.

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

4.11.
4.12.
4.13.
4.14.
5.1.
5.2.
5.3.
5.4.

5.5.

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

5.6.
5.7.
5.8.
5.9.
5.10.
5.11.
5.12.
5.13.
5.14.
5.15.
5.16.
5.17.
5.18.

5.19.

Figure 5.20.
Figure 5.21.
Figure 5.22.

Scale Up and Scale Down of a VM Instance.
Idealized (Linear) Capacity Agility.
Slew Rate of Square Wave Amplification.
Elastic Growth Slew Rate and Linearity.
Regions and Availability Zones.
Virtualized Infrastructure Impairments Experienced by
Cloud-Based Applications.
Transaction Latency for Riak Benchmark.
VM Failure Impairment Example.
Simplified Nondelivery of VM Capacity Model.
Characterizing Virtual Machine Nondelivery.
Nondelivery Impairment Example.
Simple Virtual Machine Degraded Delivery Model.
Degraded Resource Capacity Model.
Degraded Delivery Impairment Example.
CCDF for Riak Read Benchmark for Three Different Hosting
Configurations.
Tail Latency Impairment Example.
Sample CCDF for Virtualized Clock Event Jitter.
Clock Event Jitter Impairment Example.
Clock Drift Impairment Example.
Simplex Distributed System.
Simplex Service Availability.
Sensitivity of Service Availability to MTRS (Log Scale).

Traditional versus Virtualized Software Repair Times.
Traditional Hardware Repair versus Virtualized Infrastructure
Restoration Times.
Simplified VM Repair Logic.
Sample Automated Virtual Machine Repair-as-a-Service Logic.
Simple Redundancy Model.
Simplified High Availability Strategy.
Failure in a Traditional (Sequential) Redundant Architecture.
Sequential Redundancy Model.
Sequential Redundant Architecture Timeline with No Failures.
Sample Redundant Architecture Timeline with Implicit Failure.
Sample Redundant Architecture Timeline with Explicit Failure.
Recovery Times for Traditional Redundancy Architectures.
Concurrent Redundancy Processing Model.
Client Controlled Redundant Compute Strategy.
Client Controlled Redundant Operations.
Concurrent Redundancy Timeline with Fast but
Erroneous Return.
Hybrid Concurrent with Slow Response.
Application Service Impact for Very Brief Nondelivery Events.
Application Service Impact for Brief Nondelivery Events.

41
42
43
43
45
50
52
55

55
56
56
57
58
58
59
60
61
61
62
68
68
70
71
72
73
74
75
76
76
77
77
78
79
80
81
82
83
83

84
86
86


xvii

Figures

Figure
Figure
Figure
Figure
Figure
Figure

5.23.
5.24.
6.1.
6.2.
6.3.
7.1.

Figure 7.2.
Figure 7.3.
Figure 7.4.
Figure 7.5.
Figure
Figure
Figure

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

7.6.
8.1.
8.2.
8.3.
8.4.
8.5.
8.6.
8.7.
8.8.
8.9.
9.1.
9.2.
9.3.
9.4.

Figure
Figure
Figure

Figure
Figure

9.5.
10.1.
10.2.
10.3.
10.4.

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

10.5.
10.6.
10.7.
10.8.
10.9.
10.10.
10.11.
10.12.
10.13.

Nondelivery Impact to Redundant Compute Architectures.

Nondelivery Impact to Hybrid Concurrent Architectures.
Proxy Load Balancer.
Proxy Load Balancing.
Load Balancing between Regions and Availability Zones.
Reliability Block Diagram of Simplex Sample System
(with SPOF).
Reliability Block Diagram of Redundant Sample System
(without SPOF).
No SPOF Distribution of Component Instances across
Virtual Servers.
Example of No Single Point of Failure with
Distributed Component Instances.
Example of Single Point of Failure with Poorly Distributed
Component Instances.
Simplified VM Server Control.
Sample Daily Workload Variation (Logarithmic Scale).
Traditional Maintenance Window.
Traditional Congestion Control.
Simplified Elastic Growth of Cloud-Based Applications.
Simplified Elastic Degrowth of Cloud-Based Applications.
Sample of Erratic Workload Variation (Linear Scale).
Typical Elasticity Orchestration Process.
Example of Workload Whipsaw.
Elastic Growth Failure Scenarios.
Traditional Offline Software Upgrade.
Traditional Online Software Upgrade.
Type I, “Block Party” Upgrade Strategy.
Application Elastic Growth and Type I,
“Block Party” Upgrade.
Type II, “One Driver per Bus” Upgrade Strategy.

Simple End-to-End Application Service Context.
Service Boundaries in End-to-End Application Service Context.
Measurement Points 0–4 for Simple End-to-End Context.
End-to-End Measurement Points for Simple
Replicated Solution Context.
Service Probes across User Service Delivery Path.
Three Layer Factorization of Sample End to End Solution.
Estimating Service Impairments across the Three-Layer Model.
Decomposing a Service Impairment.
Centralized Cloud Data Center Scenario.
Distributed Cloud Data Center Scenario.
Sample Multitier Solution Architecture.
Disaster Recovery Time and Point Objectives.
Service Impairment Model of Georedundancy.

88
89
98
100
104
116
117
118
118
119
120
128
129
130
134

135
138
139
139
141
150
151
154
155
156
164
165
166
167
168
170
171
172
178
179
184
185
187


xviii

Figures

Figure 11.1.

Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure
Figure

11.2.
11.3.
11.4.
11.5.
11.6.
11.7.
11.8.
11.9.
11.10.
11.11.
12.1.
12.2.
12.3.
12.4.


Figure 12.5.
Figure
Figure
Figure
Figure
Figure
Figure
Figure

12.6.
12.7.
12.8.
12.9.
12.10.
12.11.
12.12.

Figure 12.13.
Figure 12.14.
Figure 12.15.
Figure 12.16.
Figure 12.17.
Figure 12.18.
Figure 12.19.
Figure
Figure
Figure
Figure
Figure
Figure

Figure
Figure

13.1.
14.1.
14.2.
14.3.
15.1.
15.2.
15.3.
16.1.

Traditional Three-Way Accountability Split: Suppliers,
Customers, External.
Example Cloud Service Delivery Chain.
Service Boundaries across Cloud Delivery Chain.
Functional Responsibilities for Applications Deployed on IaaS.
Sample Application.
Service Outage Accountability of Sample Application.
Application Elasticity Configuration.
Service Gap Model.
Service Quality Zone of Tolerance.
Application’s Resource Facing Service Boundary.
Application’s Customer Facing Service Boundary.
Traditional Service Operation Timeline.
Sample Application Deployment on Cloud.
“Network Element” Boundary for Sample Application.
Logical Measurement Point for Application’s
Service Availability.
Reliability Block Diagram of Sample Application (Traditional

Deployment).
Evolving Sample Application to Cloud.
Reliability Block Diagram of Sample Application on Cloud.
Side-by-Side Reliability Block Diagrams.
Accountability of Sample Cloud Based Application.
Connectivity-as-a-Service as a Nanoscale VPN.
Sample Application with Database-as-a-Service.
Accountability of Sample Application with
Database-as-a-Service.
Sample Application with Outboard RAID Storage Array.
Sample Application with Storage-as-a-Service.
Accountability of Sample Application with
Storage-as-a-Service.
Virtual Machine Failure Lifecycle.
Elastic Capacity Growth Timeline.
Outage Normalization for Type I “Block Party”
Release Management.
Outage Normalization for Type II “One Driver per
Bus” Release Management.
Maximum Acceptable Service Disruption.
Infrastructure impairments and application impairments.
Loopback and Service Latency.
Simplified Measurement Architecture.
Sample Side-by-Side Reliability Block Diagrams.
Worst-Case Recovery Point Scenario.
Best-Case Recovery Point Scenario.
Measuring Service Disruption Latency.

195
195

196
198
201
201
203
205
206
207
208
216
217
218
218
219
220
220
221
221
222
224
224
225
225
226
227
229
230
231
235
244

246
251
256
268
269
277


xix

Figures

Figure 16.2.
Figure 16.3.
Figure 17.1.
Figure
Figure
Figure
Figure
Figure
Figure

17.2.
17.3.
17.4.
17.5.
17.6.
17.7.

Service Disruption Latency for Implicit Failure.

Sample Endurance Test Case for Cloud-Based Application.
Virtualized Infrastructure Impairments Experienced
by Cloud-Based Applications.
Application Robustness Challenge.
Sequential (Traditional) Redundancy.
Concurrent Redundancy.
Hybrid Concurrent with Slow Response.
Type I, “Block Party” Upgrade Strategy.
Sample Phased Evolution of a Traditional Application.

277
283
288
289
290
290
291
293
296



TABLES AND EQUATIONS

TABLES
TABLE 2.1. Mean Opinion Scores [P.800]
TABLE 13.1. Service Availability and Downtime Ratings

26
236


EQUATIONS
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation
Equation

2.1. Availability Formula
5.1. Simplex Availability
5.2. Traditional Availability
10.1. Estimating General End-to-End Service Impairments
10.2. Estimating End-to-End Service Downtime
10.3. Estimating End-to-End Service Availability
10.4. Estimating End-to-End Typical Service Latency
10.5. Estimating End-to-End Service Defect Rate
10.6. Estimating End-to-End Service Accessibility
10.7. Estimating End to End Service Retainability (as DPM)
13.1. DPM via Operations Attempted and Operations Successful
13.2.  DPM via Operations Attempted and Operations Failed

13.3. DPM via Operations Successful and Operations Failed
14.1. Computing VM FITs
14.2. Converting FITs to MTBF

18
68
69
171
172
173
173
175
175
176
238
238
238
248
249
xxi



1
INTRODUCTION

Customers expect that applications and services deployed on cloud computing infrastructure will deliver comparable service quality, reliability, availability, and latency as
when deployed on traditional, native hardware configurations. Cloud computing infrastructure introduces a new family of service impairment risks based on the virtualized
compute, memory, storage, and networking resources that an Infrastructure-as-a-Service
(IaaS) provider delivers to hosted application instances. As a result, application developers and cloud consumers must mitigate these impairments to assure that application

service delivered to end users is not unacceptably impacted. This book methodically
analyzes the impacts of cloud infrastructure impairments on application service delivered to end users, as well as the opportunities for improvement afforded by cloud. The
book also recommends architectures, policies, and other techniques to maximize the
likelihood of delivering comparable or better service to end users when applications
are deployed to cloud.

1.1  APPROACH
Cloud-based application software executes within a set of virtual machine instances,
and each individual virtual machine instance relies on virtualized compute, memory,
Service Quality of Cloud-Based Applications, First Edition. Eric Bauer and Randee Adams.
© 2014 The Institute of Electrical and Electronics Engineers, Inc. Published 2014 by John Wiley & Sons, Inc.

1


×