Bài giảng Bảo mật cơ sở dữ liệu: Security methods for statistical databases - Trần Thị Kim Chi

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.34 MB, 24 trang )

Security Methods for
Statistical Databases

Introduction
§

§

§

Statistical Databases containing medical information
are often used for research
Some of the data is protected by laws to help protect
the privacy of the patient
Proper security precautions must be implemented to
comply with laws and respect the sensitivity of the
data

Accuracy vs. Confidentiality
Accuracy –

Confidentiality –

Researchers
want to extract
accurate and
meaningful data

Patients, laws

and database
administrators
want to maintain
the privacy of
patients and the
confidentiality of
their information

Laws
§

§
§

§

§

Health Insurance Portability and Accountability Act
– HIPAA (Privacy Rule)
Covered organizations must comply by April 14, 2003
Designed to improve efficiency of healthcare system by
using electronic exchange of data and maintaining security
Covered entities (health plans, healthcare clearinghouses,
healthcare providers) may not use or disclose protected
information except as permitted or required
Privacy Rule establishes a “minimum necessary standard”
for the purpose of making covered entities evaluate their
current regulations and security precautions

HIPAA Compliance
§

§

§

Companies offer
covered entities

3rd

Party

Certification

of

Such companies will check your company and
associating companies for compliance with
HIPAA
Can help with rapid implementation
compliance to HIPAA regulations

and

Types of Statistical Databases

§

§

Static – a static
database is made
once and never
changes
Example: U.S. Census

§

§

Dynamic – changes
continuously to reflect
real-time data
Example: most online
research databases

Security Methods
§

Access Restriction

§

Query Set Restriction

§

Microaggregation

§

Data Perturbation

§

Output Perturbation

§

Auditing

§

Random Sampling

Access Restriction
§

§

Databases normally have different access
levels for different types of users
User ID and passwords are the most common

methods for restricting access
§

In a medical database:
§

Doctors/Healthcare Representative – full access to information

§

Researchers – only access to partial information (e.g. aggregate information)

Query Set Restriction
§

§

§

A query-set size control can limit the number
of records that must be in the result set
Allows the query results to be displayed only
if the size of the query set satisfies the
condition
Setting a minimum query-set size can help
protect against the disclosure of individual
data

Query Set Restriction
§

Let K represents the minimum number or
records to be present for the query set

§

Let R represents the size of the query set

§

The query set can only be displayed if

K

R

Query Set Restriction
Query 2

Query 1

Original
Database
Query 1
Results

K

Query
Results

Query 2
Results

K

Query
Results

Microaggregation
§

§

§

§

Raw (individual) data is grouped into small aggregates
before publication
The average value of the group replaces each value of
the individual
Data with the most similarities are grouped together to
maintain data accuracy
Helps to prevent disclosure of individual data

Microaggregation
§

§

§

National Agricultural Statistics Service (NASS)
publishes data about farms
To protect against data disclosure, data is only
released at the county level
Farms in each county are averaged together to
maintain as much purity, yet still protect against
disclosure

Microaggregation
Age

Microaggregated
Age

10

11.67

12

Average

11.67

13

11.67

57

56.67

54
59

Average

56.67
56.67

Microaggregation
User

Original
Data

Averaged

Microaggregated
Data

Data Perturbation
§

§

§

Perturbed data is raw data with noise added
Pro: With perturbed databases, if unauthorized data is
accessed, the true value is not disclosed
Con: Data perturbation runs the risk of presenting biased
data

Data Perturbation
User 1
Noise Added

Original
Database

Perturbed
Database

User 2

Output Perturbation

§

§

Instead of the raw data being transformed as in Data
Perturbation, only the output or query results are
perturbed
The bias problem is less severe than with data
perturbation

Output Perturbation
Query
User 1

Results

Noise Added
to Results

Original
Database

Query

Results
User 2

Auditing

§

§

§

Auditing is the process of keeping track of all queries made by
each user
Usually done with up-to-date logs
Each time a user issues a query, the log is checked to see if the
user is querying the database maliciously

Random Sampling
§

§

§

Only a sample of the records meeting the requirements
of the query are shown
Must maintain consistency by giving exact same results
to the same query
Weakness - Logical equivalent queries can result in a
different query set

Comparison Methods
The following criteria are used to determine the most effective

methods of statistical database security:
§

Security – possibility

of exact disclosure, partial
disclosure, robustness

§

Richness of Information – amount

§

Costs – initial

of non-confidential
information eliminated, bias, precision,
consistency
implementation cost, processing
overhead per query, user education

A Comparison of Methods
Method

Security

Richness of
Information

Costs

Query-set Restriction

Low

Low1

Low

Microaggregation

Moderate

Moderate

Moderate

Data Perturbation

High

High-Moderate

Low

Moderate

Moderate-low

Low

Auditing

Moderate-Low

Moderate

High

Sampling

Moderate

Moderate-Low

Moderate

Output Perturbation

1 Quality is low because a lot of information can be
eliminated if the query does not meet the requirements

Sources
§

§

This presentation is posted on
/>Adam, Nabil R. ; Wortmann, John C.; SecurityControl Methods for Statistical Databases: A
Comparative Study; ACM Computing Surveys, Vol.
21, No. 4, December 1989 (

/>)
§

§

Official HIPAA – ( incur
Bernstein, Stephen W.; Impact of HIPAA on
BioTech/Pharma Research: Rules of the Road (
/>
§

Service Bureau; 3rd Party Testing (
/>

Bài giảng Bảo mật cơ sở dữ liệu: Security methods for statistical databases - Trần Thị Kim Chi

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về