Tải bản đầy đủ (.ppt) (34 trang)

research in internet-scale computing systems

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.22 MB, 34 trang )


Berkeley RAD Lab:
Research in
Internet-scale Computing Systems
Randy H. Katz

28 March 2007
2
Five Year Mission

Observation: Internet systems complex, fragile, manually
managed, evolving rapidly

To scale Ebay, must build Ebay-sized company

To scale YouTube, get acquired by a Google-sized company

Mission: Enable a single person to create, evolve, and
operate the next-generation IT service

“The Fortune 1 Million” by enabling rapid innovation

Approach: Create core technology spanning systems,
networking, and machine learning

Focus: Making datacenter easier to manage to enable
one person to Analyze, Deploy, Operate a scalable IT
service
3
Jan 07 Announcements by
Microsoft and Google



Microsoft and Google race to build next-gen DCs

Microsoft announces a $550 million DC in TX

Google confirm plans for a $600 million site in NC

Google two more DCs in SC; may cost another $950
million about 150,000 computers each

Internet DCs are the next computing platform

Power availability drives deployment decisions
4
Datacenter is the Computer

Google program == Web search, Gmail,…

Google computer ==

Warehouse-sized
facilities and workloads
likely more common
Luiz Barroso’s talk at RAD Lab 12/11/06
Sun Project Blackbox
10/17/06
Compose datacenter from 20 ft. containers!

Power/cooling for 200 KW


External taps for electricity,
network, cold water

250 Servers, 7 TB DRAM,
or 1.5 PB disk in 2006

20% energy savings

1/10th? cost of a building
5
See web2.wsj2.com/ruby_on_rails_11_web_20_on_rocket_fuel.htm
See />Datacenter Programming
System

Ruby on Rails: open source Web framework
optimized for programmer happiness and
sustainable productivity:

Convention over configuration

Scaffolding: automatic, Web-based, UI to stored data

Program the client: write browser-side code in Ruby, compile to
Javascript

“Duck Typing/Mix-Ins”

Proven Expressiveness

Lines of code Java vs. RoR: 3:1


Lines of configuration Java vs. RoR: 10:1

More than a fad

Java on Rails, Python on Rails, …
6
Datacenter Synthesis + OS

Synthesis: change DC via written specification

DC Spec Language compiled to logical configuration

OS: allocate, monitor, adjust during operation

Director using machine learning, Drivers send commands
Synth
OS
7
“System” Statistical
Machine Learning

S
2
ML Strengths

Handle SW churn: Train vs. write the logic

Beyond queuing models: Learns how to handle/make
policy between steady states


Beyond control theory: Coping with complex cost
functions

Discovery: Finding trends, needles in data haystack

Exploit cheap processing advances: fast enough to
run online

S
2
ML as an integral component of DC OS
8
Datacenter Monitoring

S
2
ML needs data to analyze

DC components come with sensors already

CPUs (performance counters)

Disks (SMART interface)

Add sensors to software

Log files

D-trace for Solaris, Mac OS


Trace 10K++ nodes within and between DCs

*Trace: App-oriented path recording framework

X-Trace: Cross-layer/-domain including network layer
9
Middleboxes in Today’s DC

Middle boxes inserted on
physical path

Policy via plumbing

Weakest link: 1 point of
failure, bottleneck

Expensive to upgrade
and introduce new
functionality

Identity-based Routing
Layer: policy not plumbing
to route classified packets
to appropriate middlebox
services
High Speed Network
load
balancer
intrusion

detector
firewall
10
First Milestone:
DC Energy Conservation

DCs limited by power

For each dollar spent on servers, add $0.48 (2005)/$0.71
(2010) for power/cooling

$26B spent to power and cool servers in 2005 grows to
$45B in 2010

Attractive application of S
2
ML

Bringing processor resources on/off-line: Dynamic
environment, complex cost function, measurement- driven
decisions

Preserve 100% Service Level Agreements

Don’t hurt hardware reliability

Then conserve energy

Conserve energy and improve reliability


MTTF: stress of on/off cycle vs. benefits of off-hours
11
DC Networking and Power

Within DC racks, network equipment often the “hottest”
components in the hot spot

Network opportunities for power reduction

Transition to higher speed interconnects (10 Gbs) at DC scales
and densities

High function/high power assists embedded in network element
(e.g., TCAMs)
12
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Thermal Image of Typical
Cluster Rack
Rack
Switch
M. K. Patterson, A. Pratt, P. Kumar,
“From UPS to Silicon: an end-to-end evaluation of datacenter efficiency”, Intel Corporation
13
DC Networking and Power

Selectively power down ports/portions of net elements

Enhanced power-awareness in the network stack


Power-aware routing and support for system virtualization

Support for datacenter “slice” power down and restart

Application and power-aware media access/control

Dynamic selection of full/half duplex

Directional asymmetry to save power,
e.g., 10Gb/s send, 100Mb/s receive

Power-awareness in applications and protocols

Hard state (proxying), soft state (caching),
protocol/data “streamlining” for power as well as b/w reduction

Power implications for topology design

Tradeoffs in redundancy/high-availability vs. power consumption

VLANs support for power-aware system virtualization
14
Why University Research?

Imperative that future technical leaders learn to deal
with scale in modern computing systems

Draw on talented but inexperienced people


Pick from worldwide talent pool for students & faculty

Don’t know what they can’t do

Inexpensive allows focus on speculative ideas

Mostly grad student salaries

Faculty part time

Tech Transfer engine

Success = Train students to go forth and replicate

Promiscuous publication, including source code

Ideal launching point for startups
15
Why a New Funding Model?

DARPA has exiting long-term research in experimental
computing systems

NSF swamped with proposals, yielding even more
conservative decisions

Community emphasis on theoretical vs. experimental-
oriented systems-building research

Alternative: turn to Industry for funding


Opportunity to shape research agenda
16
QuickTime ™ a nd a
TIFF (Uncompre s s e d) de compre s s or
a re ne e de d to s e e this picture .
New Funding Model

30 grad students + 5 undergrads+ 6 faculty + 4 staff

Foundation Companies: $500K/yr for 5 years

Google, Microsoft, Sun Microsystems

Prefer founding partner technology in prototypes

Many from company attend retreats, advise on directions, head start
on research results

Putting IP in Public Domain so partners use but not sued

Large Affiliates $100K/yr: Fujitsu, HP, IBM, Siemens

Small Affiliates $50K/yr: Nortel, Oracle

State matching programs add $1M/year: MICRO, Discovery
17
Summary

“DC is the Computer”


OS: ML+VM, Net: Identity-based Routing, FS: Web
Storage

Prog Sys: RoR, Libraries: Web Services

Development Environment: RAMP (simulator), AWE
(tester), Web 2.0 apps (benchmarks)

Debugging Environment: *Trace + X-Trace

Milestones

DC Energy Conservation + Reliability Enhancement

Web 2.0 Apps in RoR
18
Conclusions

Develop-Analyze-Deploy-Operate modern systems at
Internet scale

Ruby-on-Rails for rapid applications development

Declarative datacenter for correct-by-construction system
configuration and operation

Resource management by System Statistical Machine Learning

Virtual Machines and Network Storage for flexible resource

allocation

Power reduction and reliability enhancement by fast power-
down/restart for processing nodes

Pervasive monitoring, tracing, simultation, workload generation for
runtime analysis/operation
19
Discussion Points

Jointly designed datacenter testbed

Mini-DC consisting of clusters, middleboxes, and
network equipment

Representative network topology

Power-aware networking

Evaluation of existing network elements

Platform for investigating power reduction schemes in
network elements

Mutual information exchange

Network storage architecture

System Statistical Machine Learning
20

Ruby on Rails = DC PL

Reasons to love Ruby on Rails
1. Convention over Configuration

Rails framework feature enabled by Ruby language
feature (Meta Object Programming)
2. Scaffolding: automatic, Web based, (pedestrian)
User Interface to stored data
3. Program the client: v 1.1 write browser-side code
in Ruby then compile to Javascript
4. “Duck Typing/Mix-Ins”

Looks like string, responds like string, it’s a string!

Mix-in improvement over multiple inheritance
21
DC Monitoring

Imagine a world where path information always
passed along so that can always track user
requests throughout system

Across apps, OS, network components and
layers, different computers on LAN, …

Unique request ID

Components touched


Time of day

Parent of this request
22
*Trace: The 1% Solution

*Trace Goal: Make Path Based Analysis have low
overhead so it can be always on inside datacenter

“Baseline” path info collection with ≤ 1% overhead

Selectively add more local detail for specific requests

*Trace: an end-to-end path recording framework

Capture & timestamp a unique requestID across all system
components

“Top level” log contains path traces

Local logs contain additional detail,
correlated to path ID

Built on X-trace
23
X-Trace: comprehensive tracing
through Layers, Networks, Apps

Trace connectivity of distributed
components


Capture causal connections
between requests/responses

Cross-layer

Include network and middleware
services such as IP and LDAP

Cross-domain

Multiple datacenters, composed
services, overlays, mash-ups

Control to individual
administrative domains
• “Network path” sensor

Put individual
requests/responses, at
different network layers, in the
context of an end-to-end
request
24
Actuator:
Policy-based Routing Layer

Assign ID to incoming packets (hash + table lookup)

Route based on IDs, not locations (i.e., not IP addr)


Sets up logical paths without changing network topology

Set of common middle boxes get single ID

No single weakest link: robust, scalable throughput
Identity-based Routing Layer
Firewall
(ID
F
)
Load-
Balancer
(ID
LB
)
Intrusion-
Detection
(ID
ID
)
Service
(ID
S
)
(ID
ID
,ID
S
)

pkt
(ID
F
,ID
LB
)
pkt
pkt

So simple
can be done
in FPGA?

More general
than MPLS
25
Other RAD Lab Projects

Research Accelerator for MP (RAMP)
= DC simulator

Automatic Workload Evaluator (AWE)
= DC tester

Web Storage (GFS, Bigtable, Amazon S3)
= DC File System

Web Services (MapReduce, Chubby)
= DC Libraries

×