Tải bản đầy đủ (.pdf) (15 trang)

A Privacy-preserving Query on Outsourced

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1004.11 KB, 15 trang )

Journal of Information & Computational Science 9: 3 (2012) 619–633
Available at

A Privacy-preserving Query on Outsourced
Database with B-tree ⋆
Sha Ma a,∗,
a Department
b School

Bo Yang b , Kangshun Li a , Feng Xia a

of Informatics, South China Agricultural University, Guangzhou 510642, China
of Computer Science, Shaanxi Normal University, Shaanxi 710062, China

Abstract
In outsourced database, once the data is encrypted, query processing is more difficult compared with
traditional plaintext database. Providing query service with preserving privacy is of essential concern
in such framework. This paper proposes a novel method of a privacy-preserving query on outsourced
database with B-tree by searching on B-tree with PIR and then obtaining query results with PIR again.
We describe the scheme that enable a user to access an encrypted database accurately, privately retrieve
information and only obtain query results without leaking other information. Our contributions include
a set of security notion for such a system as well as a construction which is secure under the newly
introduced security notions.
Keywords: Outsourced Database; Database Security; Private Information Retrieval; B-tree;
Order-perserving Sysmmetric Encryption

1

Introduction

The proliferation of a new bread of data management applications that store and process data


at remote locations has led to the emergence of data outsourcing or database as a service as
an important research problem [1, 2, 3]. In a typical setting of the problem, data is stored as
the remote location in an encrypted form. A query generated at the client-side is transformed
into a representation such that it can be evaluated directly on encrypted data at the remote
location. The results might be processed by the client after decryption to determine the final
answers. The nature of data processing starts to change when the level of trust in the serviceprovider itself begins to decrease from complete to partial to (perhaps) none at all! Such a varying
trust scenario necessitates the usage of various security enhancing techniques in the context of
outsourced database [4, 5, 6, 7, 8, 9]. Our motivation is to preserve the query privacy in the
passive adversary (e.g., the database administrator or the user) model.


This work is supported by the National Natural Science Foundation of China under Grants 60973134, 61173164
and 70971043, and the Natural Science Foundation of Guangdong Province under Grant 10351806001000000

Corresponding author.
Email address: martin (Sha Ma).

1548–7741 / Copyright © 2012 Binary Information Press
March 2012


620

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

Why do we care about the privacy of a database query? Consider the following typical real-life
scenario: An outsourced database sever contains diagnosis information about various diseases.
Alice thinks that she may have some disease, so she wants to investigate it further. After Alice sends the description of the disease to the outsourced database, it will then tell Alice the
corresponding the diagnosis information. If Alice’s query is found in the database, the server
immediately knows that Alice may have such a disease; even worse, after receiving Alice’s disease’s description, it can derive much else about Alice, such as other health problems that Alice

might have. If the server is not trustworthy, it could disclose the information about Alice to
other parties, and Alice might have difficulty getting employment, insurance, credit, etc. But
even if Alice trust the server, and it has no intention of disclosing Alice’s private information,
the server himself might prefer that Alice’s query be kept private out of liability concern: If the
server knows Alice’s disease information, and that information is accidentally disclosed (perhaps
by a external system irruption), the server might face an expensive lawsuit from Alice. From this
perspective, a trusted server will actually prefer not to know either Alice’s query or its response.
The known Private Information Retrieval (PIR) techniques is the most related to this problems,
which has been widely studied [10, 11, 12, 13]. The PIR problem consists of devising a protocol
involving a user and a database server, each having a secret input. The database’s secret input
is called the data string, an n-bit string B=b1 , b2 , · · · , bn . The user’s secret input is an integer
i between 1 and n. The protocol should enable the user to learn bi in a communication-efficient
way and at the same time hide i from the database. However, PIR technique cannot be directly
utilized in outsourced database. There are three main reasons. The first one is that the user does
not know the physical address, e.g., i =2, in outsourced database. The user usually sends a SQL
sentence including predicts, e.g., attribute op constant (op includes =, <, >, ≤, ≥, etc.)to the
database server and the database server retrieves the correct results according to the predicts.
The second one is that most research related PIR focuses on the user privacy without concerning
about data privacy. In another way, the user may obtain other physical bits of the data (i.e., xj
for j ̸= i) or other information such as the exclusive-or of certain subsets of the bits of x except
for a single physical bit of x. Although Oblivious Transfer (OT) protocol in cryptography can
meet this requirement, it is not a good solution to utilize OT protocol on outsourced database
due to significant communication complexity. The last one is that data confidentiality can not be
guaranteed by PIR, which means the data are still in plaintext. However, in outsourced database,
the stored data in service provider should be encrypted because the data are out of control by
the data owner and may be stolen by a malicious adversary. A new protocol is required to realize
a privacy-preserving query on outsourced database.
A trivial solution of a query with preserving privacy on outsourced database is to send all
encrypted data to the client, which can operate the decryption and execute querying on plaintext
data. Obviously, it weakens the advantage of outsourced database because of the drastic increase

of the user’s computational cost. In addition, the data privacy is not guaranteed any more
because the client can obtains all information after decryption. While much research focuses
on how to query efficiently on the encrypted data [14, 2, 15], the research concerning about
privacy-preserving in this scenario has been an interesting direction [16, 17, 18, 19]. This paper
points to a special query manner on outsourced database with PIR technique using B-tree index.
Since the search on B-tree index needs specified nodes, e.g., the root node of the tree, which
has stable physical addresses as long as the tree’s structure is not changed, we can utilize PIR
technique to realize the query with preserving privacy. However, a problem we face is that when
the user receives specified nodes of the index tree, the decryption of the nodes may still disclose


S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

621

other information, which should be hidden from the user, because the data after the decryption
may be out of the query results, leading to violating the data privacy. Our solution is to use
order-preserving encryption to search keys of nodes in the B-tree index.
The rest of this paper is organized as follows. Section 2 describes the preliminaries used in
our construction. In Section 3, we present a general framework of privacy-preserving query on
outsourced database with B-tree and security definitions. Section 4 describes our construction
and proves that it is secure under the introduced security notions. Finally, Section 5 concludes.

2
2.1

Preliminaries
B-tree

To speed up data access, B-tree index structure is very popular in modern database application

(in this paper, we denote all variant of B-tree as B-tree, e.g. B+tree.). In [15], the author chooses
to encrypt each tree node as a whole because protecting a tree-based index by encrypting each
of its field would disclose the ordering relationship between the index values. The original tree is
then stored as a table with two attributes: the node ID, automatically assigned by the system on
insertion, and an encrypted value representing the node content. The advantage of this solution
is that the content of the B-tree nodes is not visible to the untrusted DBMS. The drawback,
however, is that the user privacy and data privacy are not protected during the query process.
Intuitively, to execute an interval query, the front end has to perform a sequence of queries that
retrieve tree nodes at progressively deeper levels; The user’s access pattern may be disclosed since
the information collected during the retrieve of tree nodes will disclose the construction of the
whole tree. Fig. 1 above shows an example of the B-tree on attribute Customer with sample
values.
Assume the frond end will produce a sequence of queries that will access in sequence node 0,
1, and 4; then, the server knows that the user was accessing node 0, 1, and 4, and node 0 is
the root, node 1 is an internal node, node 4 is a leaf node of the tree. Using such information
collected gradually, together with statistical methods, the server can rebuild the whole tree and
infer sensitive information from the encrypted database. To solve the problem, we utilize PIR
technique to access each layer nodes of B-tree obtaining user privacy. In addition, during the
query, the user will get more information showing that there are at least two other customers
named Jane and Donna in the database through the decryption of node 0, 1, 4 so the data
privacy cannot be satisfied. To solve the problem, we utilize encryption twice to each node
of B-tree, firstly using OPE algorithm and then a general encryption. Our solution originates
the primitive idea: preserving the typical structure of B-tree through encryption each its fields
by OPE and meanwhile breaking the correlation of data and its corresponding index items by
different identifying information and different encryption.

2.2

PIR


A Private Information Retrieval (PIR) scheme allows a user to retrieve information from a
database while maintaining the query private from the database managers. In this model, the
database is viewed as a n-bit string x out of which the user retrieves the i-th bit xi , while giving


622

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

Fig. 1: (a) B-tree and (b) Plaintext table and encrypted table for B-tree
the database no information about the index i. The main cost measure for such a scheme is its
communication complexity. The notion of PIR was introduced in Ref. [10], where it was shown
that if there is only one copy of the database available then n bits of communication are needed
(for information-theoretic user-privacy). However, if there are k ≥ 2 non-communicating copies
of the database, then there are solutions with much better communication complexity. Gertner
firstly introduces a model of Symmetrically-private Information Retrieval (SPIR) [20], where the
privacy of the data, as well as the privacy of the user, is guaranteed. That is, in every invocation
of a SPIR protocol, the user learns only a single physical bit of x and no other information about
the data. The SPIR is realized based on k databases (k ≥ 2) as the first implementation of a
distributed version of 1-out-of-n oblivious transfer. A Single-Database Private Information Retrieval is proposed by Giovanni Di Crescenzo on EUROCRYPT 2000 [13], which is a non-trivial
PIR protocol. At the end of the execution of the protocol, the following two properties must hold:
(1) after applying the reconstruction function, the user obtains the i-th data bit xi ; and (2) the
distributions on the query sent to the database are computationally indistinguishable for any two
indices i, i′ .
Definition 1 (PIR) Let (D, U) be an interactive protocol, and let R be a polynomial time algorithm. P rob[R1 ; · · · ; Rn : E] is denoted as the probability of event E, after the execution of
random processes R1 ; · · · ; Rn . The notation tA,B (x, rA , y, rB ) denotes the transcript of an execution of an interactive protocol (A, B) with input x for A and y for B and with random string rA
for A and rB for B and (rA , rB , t) ← tA,B (x, ·, y, ·) is denoted the case where the random strings
for both A and B are chosen uniformly at random. We say that (D, U, R) is a private information
retrieval (PIR) scheme if:
1. (Correctness) For each n ∈ N , each i ∈ {1, ..., n}, each x ∈ {0, 1}n , where x = x1 ◦ · · · ◦ xn ,



S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

623

and xl ∈ {0, 1} for l = 1, ..., n, and for all constants c, and all sufficiently large k,
P rob[(rD , rU , t) ←− tD,U ((1k , x), ·, ((1k , n, i)), ·) : R(1k , n, i, rU , t) = xi ] ≥ 1 − k −c
2. (User Privacy) For each n ∈ N , each i, j ∈ {1, ..., n}, each x ∈ {0, 1}n , where x = x1 ◦
· · · ◦ xn , xl ∈ {0, 1} for l = 1, ..., n, for each polynomial time D′ , for all constant c, and all
sufficiently large k, it hold that |pi − pj | ≤ k −c , where
pi = P rob[(rD′ , rU , t) ←− tD′ ,U ((1k , x), ·, ((1k , n, i)), ·) : D′ (1k , x, rD′ , t) = 1]
pj = P rob[(rD′ , rU , t) ←− tD′ ,U ((1k , x), ·, ((1k , n, j)), ·) : D′ (1k , x, rD′ , t) = 1]

2.3

OPE

Order-preserving Symmetric Encryption (OPE) is a deterministic encryption scheme whose encryption function preserves numerical ordering of the plaintexts. Let us define what we mean
by this. For A, B ⊆ N with|A| ≤ |B|, a function f : A → B is order-preserving if for all i, j
∈ A, f (i) > f (j ) iff i > j. OPE has a long history in form of one-part codes, which are list of
plaintexts and the corresponding ciphertexts, both arranged in alphabetical or numerical order
so only a single copy is required for efficient encryption and decryption. Agrawal et al. firstly
suggests a primitive of OPE for allowing efficient range queries on encrypted data in the database
community [21]. However, the construction is rather ad-hoc and has certain limitations, namely
its encryption algorithm must take as input all the plaintexts in the database. It is not always
practical to assume that users know all these plaintexts in advance, so a stateless scheme whose
encryption algorithm can process single plaintexts on the fly is preferable. Moreover, It does not
define security nor provide any formal security analysis. Alexandra Boldyreva et al. proposes an
efficient OPE scheme and proves its security based on pseudorandomness of an underlying blockcipher [22]. Their construction is based on a natural relation between a random order-preserving

function and the hypergeometric probability distribution. In this paper, OPE is used for each
field of B-tree to make query processing to be done exactly as efficiently as for unencrypted data.
The user can locate the desired ciphertext in nodes without getting more information, which can
satisfy data privacy.
Definition 2 (OPE)Let SE = (K, Enc, Dec) be an order-preserving encryption scheme with
plaintext-space [M] and ciphertext-space [N] for M, N ∈ N such that 2k−1 ≤ N < 2k for some k
∈ N . Then there exist an IND-OCPA(indistinguishability under ordered chosen-plaintext attack)
adversary A against SE such that
2k
Advind−cpa
(A) ≥ 1 −
SE
M −1
So, k in the theorem should be almost as large as M for A’s advantage to be small.

3
3.1

Model and Definitions
Model

Fig. 2 illustrates the four primary entities of the DAS model: Data Owner (DO), user (U),
trusted front (F) and Database Service Provider (DSP). We assume that DO stores the encrypted


624

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

Database (DB) at the DSP and the outsourced data allows certain amount of query processing

for U to occur at the DSP without jeopardizing privacy. Below, we propose a protocol to ensure
the security requirements of this DAS model resorting to F. Our assumption for this protocol is
that F will not collude with DO, U or DSP in any cases. Furthermore, F is usually the deputy of
the DO and responsible for query transformation. It can send queries to the server on behalf of
DO when allowed since the user has registered to use the data owner’s service. We briefly depict
the properties of our protocol below.
Data
owner

Tr
us

ted

DSP

Untr
usted
Trusted
DB
Trusted

d

ste

tru
Un

F


d
ste

tru
Un

User

Fig. 2: Our model
In our model, DO outsources information to the DSP and charges U for using their data.
The outsourced information is valuable thus all the information should be encrypted to prevent
analysis by DSP and other intruders, which we call data confidentiality. Meanwhile the outsourced
information is important and the user is not allowed to get more information other than what
she is querying on DB, which we call data privacy. In addition, whenever the user accesses DB,
she does not want DO and DSP to know exactly what she is concern about, both the query and
its result, which we call user privacy.

3.2

Adversarial Model

There are three types of adversaries in our model.
1. A naive player (U or DSP): who gets a copy of the encrypted data stored in the outsourced
database and wants to infer some information.
2. A curious service provider: who wants to infer some information from a query or the response
to a query.
3. A curious user: who wants to infer some information from the response to a query.

3.3


Storage Model

In order to illustrate the storage model, in this section we give a simple example. DSP uses a
table for storing and maintaining data entries. The table stores encrypted data entries associate
with a unique id. For example, consider a regular table that has 3 attributes, such as name,
age and salary. The encrypted table contains 2 columns: tid and etuple, where tid is the unique
number of a tuple, which is usually numbered sequentially starting from 1 and the etuple is the
encrypted value of the plaintext tuple. In addition, encrypted table for storing the B-tree consists


S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

625

of 2n + 2 attributes: a unique number which is generated differently from tid since the nid, a
unique number of the node in B-tree, is generated as a random number, n search keys and n + 1
pointers, where a parameter n associated with each B-tree index and determines the layout of
all blocks of the B-tree. See Fig. 3. In more detail, plaintext data entries are encrypted by a
general encryption algorithm as a tuple in the table because encrypting by row is preferable to
encrypting by field for queries from the TPC-H benchmark. Encrypted table for B-tree is used
to support search functions by which the user can obtain the exact results without the leakage
of other information. Specifically, all pointers are encrypted once by another key which is not
the same as the one used to encrypted the plain entries and all search keys are encrypted twice:
firstly, encrypted using OPE, then using the same encryption algorithm for pointers.

Fig. 3: (a) Plaintext table and B-tree and (b) Encrypted table for data entries and B-tree

3.4


Operations

We now provide an inaccurate description of our solution. Consider a database system D, a
data owner DO, the database serve provider DSP, a user U, the trusted party F. Suppose the
database D consists of m records {d1 , · · · , dm }, each of which contains n attributes {a1 , · · · , an },
for a record di , we use id(di )to denote the identifying information that is uniquely associated
with di , such as the value of primary key. The DSP not only hosts the encrypted version of D,
denoted by D′ = {d′1 , · · · , d′m }, where d′i = ⟨id(di ), E(di )⟩(E(di ) is an encryption of di ), but also
hosts an encrypted version of B-tree denoted by BT ree′ , which is constructed on each attribute
aj (j ∈ {1 · · · n}). P re is the predicate expression of the query whose value is TURE or FALSE


626

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

representing the satisfaction of predicates or the converse respectively.
Definition 3 A privacy-preserving query on outsourced database with B-tree consists of the following probabilistic polynomial time algorithms and protocols:
1. KeyGen(1s ) outputs public and private keys: (Apublic , Aprivate ) for the encryption and decryption of data entries, (Bpublic , Bprivate ) for the encryption and decryption of B-tree and
Cprivate for OPE algorithm.
2. StoreDO,DSP,F (D, BT ree, Apublic , Bpublic , Cprivate ) is a protocol that allow DO to send D′ to
DSP , which is the encryption of D under Apublic , and also associate BT ree′ for each attribute, which is the encryption of B-tree under Bpublic and Cprivate . Cprivate are held only
by F.
3. QueryU,DSP,F (P re, Aprivate , Bprivate ) is a protocol that retrieves all records satisfying P re for
U. P re, Aprivate , and Bprivate are held only by U.

3.5

Security Properties


Firstly, we define the security of database encryption.
Definition 4 (Security of database encryption [4]) An encryption scheme (Gen, Enc, Dec) for
database tables, which consists of key generation scheme Gen, encryption function Enc, and decryption function Dec, has indistinguishable encryptions if for every polynomial-size circuit family
{Cn }, every polynomial p, and all sufficiently large n, every database R1 and R2 ∈ {0, 1}poly(n)
with the same schema and the same number of tuples (i.e., |R1 | = |R2 |):|P r{Cn (EncGen(1n (R1 )) ) =
1
1} − P r{Cn (EncGen(1n (R2 )) ) = 1}| < p(n)
. The probability in the above terms is over the internal
coin tosses of G and E.
Next, we describe correctness and privacy for such a system.
Definition 5 (Query Correctness) Let Apublic , Aprivate , Bpublic , Bprivate ←− KeyGen(1s ). Fix a fi′
′ n
nite sequence of messages and indexes: {{di }m
i=1 , {BT reej }j=1 }. Suppose that, for all i ∈ [m]
and j ∈ [n], the protocol StoreDO,DSP,F (D, BT ree, Apublic , Bpublic , Cprivate ) is executed by DO,
DSP and F. Denote by RP re the results that U receives after the execution of QueryU,DSP,F
(P re, Aprivate , Bprivate ). Then, a privacy-preserving query on outsourced database with B-tree is


′ n
said to be correct on the sequence {{di }m
i=1 , {BT reej }j=1 } if P r⌈RP re(aw ) = {di |P re(di .aw ) =
TRUE}⌉ > 1 − neg(1s ), for each predicate, where the probability is taken over all internal randomness used in the protocols Store and Query. A privacy-preserving query on outsourced database
with B-tree is said to be correct if it is correct on all such finite sequences.
DO’s privacy consists of two folds: the first one is that all stored data should be indistinguishable
to the DSP and the second one is that the user cannot learn any other information besides the
results of user’s query.
Definition 6 For DO’s privacy to DSP, consider the following game between an adversary A
and a challenger C. A will play the role of DSP and C will play the role of a DO.The game
consists of the following steps:



S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

627

1. KeyGen(1s ) is executed by C who sends the output Apublic and Bpublic to A.
2. A asks queries of the form (D, BT ree) where D is the plaintext database and BT ree is
the plaintext index on D; C answers by executing the protocol Store(D, BT ree, Apublic ,
Bpublic , Cprivate ).
3. A chooses two pairs (D0 , BT ree0 ), (D1 , BT ree1 ) to be sent to C, where D0 and D1 are of
equal size, and BT ree0 and BT ree1 are of equal size.
4. C picks a random bit b ∈R {0, 1} and executes Store(Db , BT reeb , Apublic , Bpublic , Cprivate ) with
A.
5. A asks more queries of the form (D, BT ree) and C responds by executing protocol Store(Db ,
BT reeb , Apublic , Bpublic , Cprivate ) with A.
6. A outputs a bit b′ ∈ {0, 1}.


We define the adversary’s advantage as AdvA (1s ) = |P r[b = b ] − 12 |. We say that a privavypreserving query on outsourced database is DO’s privacy to DSP if, for all A ∈ PPT, we have
that AdvA (1s ) is a negligible function.
Definition 7 For DO’s privacy to U, consider the following game between an adversary A and
a challenger C. A will play the role of U and C will play the role of a DO. The game consists of
the following steps:
1. KeyGen(1s ) is executed by C who sends the output Apublic , Bpublic to A.
2. A asks queries of the form (D, BT ree) where D is the plaintext database and BT ree is the
plaintext index on D; C answers by executing the protocol Store(D, BT ree, Apublic , Bpublic ,
Cprivate );
3. A chooses two pairs (D0 , BT ree0 ), (D1 , BT ree1 ) and sends this to C, where the database
and BTrees are of equal size, respectively.

4. C picks a random bit b ∈R {0, 1} and executes Store(Db , BT reeb , Apublic , Bpublic ) with A.
5. A asks queries of the form P re, where the predicate is on a certain attribute; C answers by
executing the protocol Query(P re, Aprivate , Bprivate ) with A.
6. A asks more queries P re and C responds by executing the protocol Query(P re, Aprivate ,
Bprivate ) with A.
7. A outputs a bit b′ ∈ {0, 1}.
We define the adversary’s advantage as AdvA (1s ) = |P r[b = b′ ] − 12 |. We say that a privacypreserving query on outsourced database is DO’s private to U if, for all A ∈ PPT, we have that
AdvA (1s ) is a negligible function.
Definition 8 For query privacy, consider the following game between an adversary A and a
challenger C. A plays the role of DSP, and C plays the role of U. The game proceeds as follows:


628

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

1. KeyGen(1s ) is executed by C who sends the output Apublic , Bpublic to A.
2. A asks queries of the form P re, where the predicate is on the certain attribute; C answers
by executing the protocol Query(P re, Aprivate , Bprivate ) with A.
3. A chooses two predicates P re(a0 ), P re(a1 ) and sends them to C. a0 and a1 both are attributes of D.
4. C picks a random bit b ∈R {0, 1} and executes the protocol Query(P re(ab ), Aprivate , Bprivate )
with A.
5. A asks more queries Pre and C responds by executing the protocol Query(P re, Aprivate , Bprivate )
with A.
6. A outputs a bit b′ ∈ {0, 1}.
We define the adversary’s advantage as AdvA (1s ) = |P r[b = b′ ] − 21 |. We say that a privacypreserving query on outsourced database is query privacy if, for all A ∈ PPT, we have that
AdvA (1s ) is a negligible function.

4


Our Construction

We present a construction of a privacy-preserving query on outsourced database in a “semihonest” model. In our context, the term “semi-honest” refers to a party that correctly executes
the protocol, but may collect information during the protocol’s execution. Correctness and privacy
will be proved under a computational assumption. We assume the outsourced data are encrypted
by a semantically secure public-key encryption satisfying the Definition 4. The key generation,
encryption algorithms will be denoted by K and E, respectively. We define the required algorithms
below. Firstly, let us describe our assumption about the parties involved again: DO, U, DSP and
F. In general, there could be many data owners but, for the purpose of describing the protocol,
we need only to name one. DO is assumed to hold the data, B-tree and the public key. U holds
the private key and submits query to the database. DSP stores the encrypted data and B-tree
and provides search service. F is the deputy of DO and assists in the execution of user’s query.
1. KeyGen(s): Run K(1s ), the key generation algorithm of the underlying cryptosystem, to
create public and private keys, (Apublic , Aprivate ) for the encryption and decryption of data
entries, (Bpublic , Bprivate ) for the encryption and decryption of B-tree and Cprivate for the
search keys of nodes in B-tree. Private and public parameters for a PIR scheme are also
generated by this algorithm.
2. StoreDO,DSP,F (D, BT ree, Apublic , Bpublic , Cprivate ): DO sends the encrypted database and indexes to the DSP. The protocol consists of the following steps:
(a) DO sends the encrypted version of the database D′ = {(idi , EApublic (di ))}m
i=1 and its

′ n
BT rees = {BT reesj }j=1 to DSP. Specially, all pointers are encrypted once using
Bpublic and all search keys are encrypted twice: firstly encrypted using Cprivate for
OPE, then encrypted using Bpublic for a general encryption algorithm.


S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

629


(b) DSP receives and stores D′ and BT rees′ .
(c) DSP sends the address ρ of the root node of each encrypted B-tree back to DO and F.
3. QueryU,DSP,F (P re, Aprivate , Bprivate ): U wishes to retrieve all record satisfying P re from DSP.
Suppose P re is hold on the attribute ai . The protocol proceeds as follows:
(a) U sends P re to F to generates an encrypted P re′ (e.g. EOP E (ai ) op EOP E (v) for (ai
op v). EOP E (x) is denoted as the encryption of x by OPE algorithm.)
(b) U receives the address ρ of BT ree′i from F.
(c) U executes an efficient PIR protocol to get the encrypted root node of the BT ree′i :
NODElevel0 .
(d) U decrypts the answers for the PIR queries to obtain EOP E (NODElevel0 .value), using
the key Bprivate .
(e) U compares EOP E (NODElevel0 .value) with EOP E (v). If the result is “>”, U just decrypts
the encrypted right pointer to get the address of the next level node. If the result is
“≤”, U just decrypts the encrypted left pointer to get the address of the next level
node.
(f) U executes the step (c) to (e) again to the next level of NODE.
(g) U executes the step (f) until the leaf node.
(h) U decrypts pointers according to the ids of the final results using the private key
Bprivate .
(i) If there are k records in the final results, U executes k efficient PIR protocols to get
the records from the encrypted table.
(j) U decrypts the encrypted results using the private key Aprivate .
Theorem 1 The privacy-preserving query on outsourced database with B-tree is correct according
to Definition 5.
Proof The correctness of the protocol is straightforward. Suppose that a record d′i is generated
by DO, where P re(di ) = TRUE. Note that the correctness of the protocol includes the correctness
of the traversal on the B-tree and the correctness of getting data based on the results of B-tree
traversal. OPE allows indexing processing to be done exactly as same as for unencrypted data on
B-trees and meanwhile PIR guarantees U to correctly retrieve required items from the database.

Theorem 2 Assuming security of the underlying cryptosystem, the privacy-preserving query on
outsourced database is DO’s privacy to DSP, according to Definition 6.
Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 6, with some non-negligible advantage. So, under those conditions, A can distinguish the distribution of Store(D0 , BT ree0 , Apublic , Bpublic , Cprivate ) from the
distribution of Store(D1 , BT ree1 , Apublic , Bpublic , Cprivate ), where the word “distribution” refers
to the distribution of the transcript of the interaction between the parties. A transcript of
Store(D, BT ree, Apublic , Bpublic , Cprivate ) essentially consists of just EApublic (D) and EBpublic (BT ree).


630

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

We assumed that there exists an adversary A that can distinguish these two distributions.
Hence, the encrypted table or the encrypted B-tree cannot be computationally indistinguishable
repectively. So, there exists an adversary A′ ∈ PPT that can distinguish between EApublic (D),
EBpublic (BT ree) or the correlation between EApublic (D) and EBpublic (BT ree).
1. If A′ ∈ PPT distinguishes the encrypted tables, it has distinguished EApublic (D0 ) from
EApublic (D1 ) which violates our assumption of Definition 4.
2. If A′ ∈ PPT distinguishes the encrypted B-tree, it has distinguished EBpublic (BT ree0 ) from
EBpublic (BT ree1 ) which violates our assumption of Definition 2 and Definition 4.
3. If A′ ∈ PPT distinguishes the correlation between EApublic (D) and EBpublic (BT rees). Obviously, there is no correlation between EApublic (D) and EBpublic (BT rees) because the data
itself and the corresponding index for the same element is independent.
So we conclude that no such A exists in the first place, and hence the system is secure according
to Definition 6.
Theorem 3 Assuming security of the underlying cryptosystem, the privacy-preserving query on
outsourced database is DO’s privacy to U, according to Definition 7.
Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security
game, from Definition 7, with some non-negligible advantage. So, under those conditions, A can
distinguish EApublic (D0 ) from EApublic (D1 ) according to the transcript of the interaction between
the parties. A transcript of Query protocol consists of a value encrypted using OPE, the address

of the B-tree, a sequence of PIR protocols that occur in Query denoted by {PIR(NODEleveli )}i=1
and {PIR(xi )}ℓi=1 suppose that the depth of BTree is and the number of query results is ℓ.
We assumed that there exists an adversary A that can distinguish between EApublic (D0 ) from
EApublic (D1 ). Since EApublic (D0 ) from EApublic (D1 ) are computationally distinguishable, so there
exists an adversary A′ ∈ PPT that can receive information about more than the query results. If
A′ can receive information about more that the query results, consider the following transcript:
ECprivate (v) ρ PIR(NODElevel0 ) · · · PIR(NODElevel ) PIRx1 · · · PIRxℓ
Since the first and the second item are unrelated to DO’data, A′ can infer other information
from {PIR(NODEleveli )}i=1 or {PIR(xi )}ℓi=1 .
1. If A′ ∈ PPT can infer other information from PIR{(NODEleveli )}i=1 , it means that A′ can
receive more information from the NODEleveli by decryption using Bprivate . In our model, the
construction of B-tree is described in section 3.3. Each node of the B-tree consists of the
following items:
EBpublic (p0 ), EBpublic (ECprivate (v1 )), EBpublic (p1 ), · · · , EBpublic (ECprivate (vm )), EBpublic (pm )
Although A′ can obtain the pointers of each node by decryption using Bprivate , it is also
unrelated to the DO’data itself. So A′ can infer from {ECprivate (vi )}m
i=1 , which violates our
assumption of Definition 2.


S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

631

2. If A′ ∈PPT infer other information from {PIR(xj )}ℓj=1 more than the query results, it can
infer from PIR(xi ), which violates our assumption of Definition 1.
So we conclude that no such A exists in the first place, and hence the system is secure according
to Definition 7.
Theorem 4 Assuming security of the underlying PIR preliminary, the privacy-preserving query
on outsourced database with PIR is query privacy, according to Definition 8.

Proof Suppose that there exists an adversary A ∈ PPT that can succeed in breaking the security game, from Definition 8, with some non-negligible advantage. Then, A can distinguish
QueryU,DSP,F (P re0 , Aprivate , Bprivate ) from QueryU,DSP,F (P re1 , Aprivate , Bprivate ) with non-negligible
advantage. The transcript of a Query protocol consists of a value encrypted using OPE, the address of B-tree, a sequence of PIR protocols that occur in Query denoted by {PIR(NODEleveli )}i=1
and {PIR(xi )}ℓj=1 . Obviously, the first item is the same; the seconde item is indistinguishable
since the address of B-trees is randomness. Suppose that the database, the B-tree and the result
have the same size respectively, there will be equal number of these PIR queries regardless of
the predicate P re. Moreover, the number of PIR queries on B-trees is dependent on the level of
B-trees and the number of PIR queries on data itself is dependent of the results. Consider the
following sequence of distributions:
ECprivate (v) ρ0 PIR0 (NODElevel0 ) · · · PIR0 (NODElevel ) PIR0 (x1 ) · · · PIR0 (xℓ )
ECprivate (v) ρ1 PIR1 (NODElevel0 ) · · · PIR1 (NODElevel ) PIR1 (x1 ) · · · PIR1 (xℓ )
The first line is the transcript distribution of Query on D0 and the seconde line is the transcript
distribution of Query on D1 . Since there exists A ∈ PPT that can distinguish the first distribution
from the second distribution, then there must exist an adversary A′ ∈ PPT that can distinguish
a pair of corresponding PIR queries. Therefore, for some i ∈ and j ∈ ℓ we have that A′
can distinguish PIR0 (NODEleveli ) from PIR1 (NODEleveli ) on B-tree or PIR0 (xi ) from PIR1 (xi ) on
data entries. In both cases, a contradiction of our initial assumption according to Definition 1.
Therefore, no such A ∈ PPT exists, and hence our construction is secure according to Definition
8.
Theorem 5 (Communication Complexity) The privacy-preserving query on outsourced database
from the proceeding construction has sub-linear communciation complexity in n, the number of
records held by the DSP .
Proof Suppose n is the maximum number of tuples to be stored, O(logn) is the depth of Btree. Additionally, there are other parameters τ , which is the proportion of the size of search
key and the size of a tuple, and ω, which is the proportion of the number of results and the
total number n. So, the total size of B-tree with storage is O(τ · n). Obviously, we see that the
encryption value in P re using OPE and the address of B-tree ρ is independent of n(i.e., their
value does not deteriorate as n grows). Therefore the total size of communication of the protocol
is O(logn · polylog(τ · n)) + O(ω · polylog(n)) using any polylog(n) PIR protocol, e.g. [11, 12].
Since τ and ω are values between 0 and 1, so the complexity complexity is O(logn · polylog(n)).



632

5

S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

Conclusion

This paper proposes a novel method of privacy-preserving query on outsourced database with
PIR. We firstly perform range query using encrypted B-tree index with PIR, and then obtains
encrypted records using PIR again based on the results of search on B-tree. We mainly solve two
main problems. The first one is how to search on encrypted B-tree. We utilize OPE algorithm
to support searching on encrypted data. The second one is how to keep the privacy of database
query. We propose the formal security definition of DO’s privacy to DSP, DO’s privacy to U and
query privacy, and then give proofs of our construction.

References
[1]

W. Lehner, K. U. Sattler, Database as a service (dbaas), in 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010), 1-6 March 2010, Piscataway, NJ, USA, 2010, 1216-1217

[2]

H. Hacigumus, B. Iyer, C. Li, S. Mehrotra, Executing SQL over encrypted data in the databaseservice-provider model, Proceedings of the ACM SIGMOD International Conference on Managment of Data, June 3-6, 2002, Madison, WI, United States, 2002, 216-227

[3]

D. Agrawal, A. El Abbadi, F. Emekci, A. Metwally, Database management as a service: Challenges
and opportunities, in 2009 IEEE 25th International Conference on Data Engineering (ICDE 2009),

29 March - 2 April 2009, Piscataway, NJ, USA, 2009, 1709-1716

[4]

M. Kantarcioglu, C. Clifton, Security issues in querying encrypted data, Data and Applications
Security 19, 2005, 325

[5]

G. Amanatidis, A. Boldyreva, A. O’Neill, Provably-secure schemes for basic query support in
outsourced databases, Data and Applications Security XXI, 2007, 14-30

[6]

J. Li, E. Omiecinski, Efficiency and security trade-off in supporting range queries on encrypted
databases, Data and Applications Security XIX, 2005, 69-83

[7]

M. Xie, H. Wang, J. Yin, X. Meng, Integrity auditing of outsourced data. Proc. VLDB Endow.,
2007, 782-793

[8]

D. X. Song, D. Wagner, A. Perrig, Practical techniques for searches on encrypted data, Proceedings
of the IEEE Computer Society Symposium on Research in Security and Privacy. Berkeley, CA,
USA: IEEE, 2000, 44-55

[9]


H. Pang, J. Zhang, K. Mouratidis, Scalable verification for outsourced dynamic databases, Proc.
VLDB Endow., vol. 2, no. 1, 2009, 802-813

[10] B. Chor, O. Goldreich, E. Kushilevitz, M. Sudan, Private information retrieval, in Proceedings of
the 1995 IEEE 36th Annual Symposium on Foundations of Computer Science, Milwaukee, WI,
USA, 1995, 41-50
[11] C. Cachin, S. Micali, M. Stadler, Computationally private information retrieval with polylogarithmic communication, in Advances in Cryptology - Eurocrypt’99, Lecture Notes in Computer
Science, vol. 1592, 1999, 402-414
[12] Y. Chang, Single database private information retrieval with logarithmic communication, in ACISP.
Springer, 2004, 50-61
[13] G. Di Crescenzo, T. Malkin, R. Ostrovsky, Single database private information retrieval implies
oblivious transfer, in Proceedings of Advances in Cryptology - Eurocrypt 2000, 14-18 May 2000,
Berlin, Germany, 2000, 122-138


S. Ma et al. / Journal of Information & Computational Science 9: 3 (2012) 619–633

633

[14] F. Li, M. Hadjieleftheriou, G. Kollios, L. Reyzin, Dynamic authenticated index structures for
outsourced databases, in 2006 ACM SIGMOD International Conference on Management of Data,
June 27-29, 2006, Chicago, IL, United States, 2006, 121-132
[15] E. Damiani, S. D. C. D. Vimercati, S. Jajodia, S. Paraboschi, P. Samarati, Balancing confidentiality
and efficiency in untrusted relational DBMSs, in Proceedings of the 10th ACM Conference on
Computer and Communications Security, CCS 2003, October 27-31, 2003, Washington, DC, United
States, 2003, 93-102
[16] F. Bao, R. H. Deng, X. Ding, Y. Yang, Private query on encrypted data in multi-user settings,
in 4th Information Security Practice and Experience Conference, ISPEC 2008, April 21-23, 2008,
Lecture Notes in Computer Science, vol. 4991, 2008, 71-85
[17] B. Thompson, S. Haber, W. Horne, T. Sander, D. Yao, Privacy-preserving computation and

verification of aggregate queries on outsourced databases, in Privacy Enhancing Technologies,
2009, 185-201
[18] Z. Yang, S. Zhong, R. Wright, Privacy-preserving queries on encrypted data, Proceeding of the
11th European Symposium on Research in Computer Security (CESORICS 2006), 2006, 479-495
[19] B. Carbunar, R. Sion, Joining privately on outsourced data, in 7th VLDB Workshop on Secure
Data Management, SDM 2010, Lecture Notes in Computer Science, vol. 6358, 2010, 70-86
[20] Y. Gertner, Y. Ishai, E. Kushilevitz, T. Malkin, Protecting data privacy in private information
retrieval schemes, in Proceedings of 13th Annual ACM Symposium on Theory of Computing
(STOC’98), 23-26 May 1998, vol. 60, USA, 2000, 592-629
[21] R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, Order preserving encryption for numeric data, in
Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD
2004, Jun 13-18 2004, Paris, France, 563-574
[22] A. Boldyreva, N. Chenette, Y. Lee, A. O’Neill, Order-preserving symmetric encryption, Advances
in Cryptology-Eurocrypt 2009, 2009, 224-241



×