Tải bản đầy đủ (.doc) (102 trang)

Nghiên cứu phát triển các phương pháp của lý thuyết đồ thị và otomat trong giấu tin mật và mã hóa tìm kiếm

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.34 MB, 102 trang )

MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
||||||||||

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY
AND AUTOMATA IN STEGANOGRAPHY AND SEARCHABLE
ENCRYPTION

DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS

Hanoi - 2020


MINISTRY OF EDUCATION AND TRAINING
HANOI UNIVERSITY OF SCIENCE AND TECHNOLOGY
||||||||||

Nguyen Huy Truong

RESEARCH ON DEVELOPMENT OF METHODS OF GRAPH THEORY
AND AUTOMATA IN STEGANOGRAPHY AND SEARCHABLE
ENCRYPTION
Major: Mathematics and Informatics
Major code: 9460117

DOCTORAL DISSERTATION IN MATHEMATICS AND INFORMATICS

SUPERVISORS:
1. Assoc. Prof. Dr. Sc. Phan Thi Ha Duong


2. Dr. Vu Thanh Nam

Hanoi - 2020


DECLARATION OF AUTHORSHIP
I hereby certify that I am the author of this dissertation, and that I have completed it
under the supervision of Assoc. Prof. Dr. Sc. Phan Thi Ha Duong and Dr. Vu Thanh Nam. I
also certify that the dissertation’s results have not been published by other authors.

Hanoi, February 03, 2020
PhD. Student

Nguyen Huy Truong

Supervisors

Assoc. Prof. Dr. Sc. Phan Thi Ha Duong

Dr. Vu Thanh Nam


ACKNOWLEDGMENTS
I am extremely grateful to Assoc. Prof. Dr. Sc. Phan Thi Ha Duong.
I want to thank Dr. Vu Thanh Nam.
I would also like to extend my deepest gratitude to Late Assoc. Prof. Dr. Phan Trung
Huy.
I would like to thank my co-workers from School of Applied Mathematics and
Informatics, Hanoi University of Science and Technology for all their help.
I also wish to thank members of Seminar on Mathematical Foundations for

Computer Science at Institute of Mathematics, Vietnam Academy of Science and
Technology for their valuable comments and helpful advice.
I give thanks to PhD students of Late Assoc. Prof. Dr. Phan Trung Huy for sharing
and exchanging information in steganography and searchable encryption.
Finally, I must also thank my family for supporting all my work.


CONTENTS

Page
iii
iv
v
vi
. 1
. 4
4
4
4
6
7
8
11
12
15

LISTOFSYMBOLS.................................
LISTOFABBREVIATIONS ............................
LISTOFFIGURES .................................
LISTOFTABLES..................................

INTRODUCTION..............................
CHAPTER1 PRELIMINARIES......................
1.1 Basic Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Strings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.2 Graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Deterministic Finite Automata . . . . . . . . . . . . . . . . . . . .
1.1.4 The Galois Field GF (pm) . . . . . . . . . . . . . . . . . . . . . . .
1.2 Digital Image Steganography . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Exact Pattern Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.4 Longest Common Subsequence . . . . . . . . . . . . . . . . . . . . . . . . .
1.5 Searchable Encryption . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 2 DIGITAL IMAGE STEGANOGRAPHY BASED ON THE
GALOIS FIELD USING GRAPH THEORY AND AUTOMATA . . . . . 16
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 The Digital Image Steganography Problem . . . . . . . . . . . . . . . . . .
2.3 A New Digital Image Steganography Approach . . . . . . . . . . . . . . . .
2.3.1 Mathematical Basis based on The Galois Field . . . . . . . . . . . .

16
18
19
19
m

2.3.2 Digital Image Steganography Based on The Galois Field GF (p )
Using Graph Theory and Automata . . . . . . . . . . . . . . . . . .
21
2.4 The Near Optimal and Optimal Data Hiding Schemes for Gray and Palette
Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29

2.5 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
2.6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
CHAPTER 3
AN AUTOMATA APPROACH TO EXACT PATTERN
MATCHING.................................
. 39
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
3.2 The New Algorithm - The MRc Algorithm . . . . . . . . . . . . . . . . . .
41
3.3 Analysis of The MRc Algorithm . . . . . . . . . . . . . . . . . . . . . . . .
47
3.4 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
50
3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
CHAPTER 4 AUTOMATA TECHNIQUE FOR THE LONGEST
COMMON SUBSEQUENCE PROBLEM . . . . . . . . . . . . . . . . .
. 56
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
i


4.2
4.3
4.4
4.5


Mathematical Basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Automata Models for Solving The LCS Problem . . . . . . . . . . . . . . .
Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CHAPTER 5 CRYPTOGRAPHY BASED ON STEGANOGRAPHY
AND AUTOMATA METHODS FOR SEARCHABLE ENCRYPTION . . 68
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 A Novel Cryptosystem Based on The Data Hiding Scheme (2; 9; 8) . . . . .
5.3 Automata Technique for Exact Pattern Matching on Encrypted Data . . .
5.4 Automata Technique for Approximate Pattern Matching on Encrypted Data
5.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
CONCLUSION.................................
BIBLIOGRAPHY ..................................
LISTOFPUBLICATIONS .............................

ii

57
61
66
67

68
70
74
76
78
80
81

88


LIST OF SYMBOLS

?
jSj
juj
m
GF (p )
n

m

(GF (p ); +; )
LCS(p; x)
lcs(p; x)
LeftID(u)

An alphabet
The set of all strings on
The empty set
The empty string
The number of elements of a set S
The length of a string u
The Galois eld is constructed from the polynomial ring Z p[x],
where p is prime and m is a positive integer
m
A vector space over the eld GF (p )
A longest common subsequence of p and x

The length of a LCS(p; x)
The least element the leftmost location of u
The last component of LeftID(u) in p

Rmp(u)
(I; M; K; Em; Ex) A data hiding scheme
A set of all image blocks with the same size and image format
I
A nite set of secret elements
M
A nite set of secret keys
K
Em
An embedding function embeds a secret element in an image
block
Ex
An extracting function extracts an embedded secret element
from an image block
q
The number of di erent ways to change the colour of each
colour
pixel in an arbitrary image block
I
An image block
M
A secret element
K
A secret key
Adjacent(cp; a)


An adjacent vertex of cp

Posp(z)

The last position of appearance of z in p

Mp
Con g(p)
Wp(u)
Wp(C)
WCon g(p)
i
Wp (a)
Wmp(a)
W (a)

An automaton accepting the pattern p
The set of all the con gurations of p
The weight of u in p
The weight of C
The set of the weights of all the con gurations of p
The weight of a at the location i in p
The heaviest weight of a in p
The weight of a in p

c block

A string of length c

iii



LIST OF ABBREVIATIONS
AOSO
BF
BFS
BMH
BNDM
CTL
EBOM
ER
FJS
FOPA
FSBNDM
HASH
HCIH
LBNDM
LCS
LSB
MSDR
MSE
NP
OPA
PA
PCT
PSNR
RGB
SA
SAE
SBNDM

SE
SSE
TVSBS
WF
WL

Average Optimal Shift Or
Brute Force
Breadth First Search
Boyer Moore Horspool
Backward Nondeterministic Dawg Matching
Chang Tseng Lin
Extended Backward Oracle Matching
Embedding Rate
Franek Jennings Smyth
Fastest Optimal Parity Assignment
Forward SBNDM
Hashing
High Capacity of Information Hiding
Long BNDM
Longest Common Subsequence
Least Signi cant Bit
Maximal Secret Data Ratio
Mean Square Error
Nondeterministic Polynomial
Optimal Parity Assignment
Parity Assignment
Pan Chen Tseng
Peak Signal to Noise Ratio
Red Green Blue

Shift Add
Searchable Asymmetric Encryption
Simpli ed BNDM
Searchable Encryption
Searchable Symmetric Encryption
Thathoo Virmani Sai Balakrishnan Sekar
Wagner Fischer
Wu Lee

iv


LIST OF FIGURES

Figure
Figure
Figure
Figure
Figure
Figure

1.1. A simple graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
1.2. A spanning tree of the graph given in Figure 1.1 . . . . . . . . . . .
6
1.3. The transition diagram of A in Example 1.3 . . . . . . . . . . . . .
7
1.4. The basic diagram of digital image steganography . . . . . . . . . . .
9
1.5. The degree of appearance of the pattern p . . . . . . . . . . . . . . .

12
2.1. The nine commonly used 8-bit gray cover images sized 512 512 pixels 35

Figure 2.2. The nine commonly used 8-bit palette cover images sized 512 512
pixels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Figure 2.3. The binary cover image sized 2592 1456 pixels . . . . . . . . . . .

35
36

Figure 3.1. Sliding window mechanism . . . . . . . . . . . . . . . . . . . . . . .
Figure 3.2. The basic idea of the proposed approach . . . . . . . . . . . . . . . .
Figure 3.3. The transition diagram of the automaton Mp, p = abcba . . . . . . .

40
44
46

v


LIST OF TABLES

Table
Table
Table
Table

1.1. An adjacency list representation of the simple graph given in Figure 1.1 5
1.2. The performing steps of the BF algorithm . . . . . . . . . . . . . . . .

11
1.3. The dynamic programming matrix L . . . . . . . . . . . . . . . . . .
13
2

2.1. Elements of the Galois eld GF (2 ) represented by binary strings and
decimal numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2

Table 2.2. Operations + and on the Galois eld GF (2 ) . . . . . . . . . . . . .
Table 2.3. The representation of E and the arc weights of G for the gray image .
Table 2.4.
The payload, ER and PSNR for the optimal data hiding scheme

30
30
31

n
36
(1; 2 1; n) for palette images with qcolour = 1 . . . . . . . . . . . . .
Table 2.5. The payload, ER and PSNR for the near optimal data hiding scheme
37
(2; 9; 8) for gray images with qcolour = 3 . . . . . . . . . . . . . . . . . .
Table 2.6. The payload, ER and PSNR for the near optimal data hiding scheme
37
(2; 9; 8) for palette images with qcolour = 3 . . . . . . . . . . . . . . . .
Table 2.7.
The comparisons of embedding and extracting time between the
chapter’s and Chang et al.’s approach for the same optimal data hiding


scheme (1; N; blog2(N + 1)c), where N = 2

n

1, for the binary image

with qcolour = 1. Time is given in second unit . . . . . . . . . . . . . .

37

Table 3.1. The performing steps of the MR algorithm . . . . . . . . . . . . . . .
1
Table 3.2. Experimental results on rand4 problem . . . . . . . . . . . . . . . . .
Table 3.3. Experimental results on rand8 problem . . . . . . . . . . . . . . . . .
Table 3.4. Experimental results on rand16 problem . . . . . . . . . . . . . . . . .
Table 3.5. Experimental results on rand32 problem . . . . . . . . . . . . . . . . .
Table 3.6. Experimental results on rand64 problem . . . . . . . . . . . . . . . . .
Table 3.7. Experimental results on rand128 problem . . . . . . . . . . . . . . . .
Table 3.8. Experimental results on rand256 problem . . . . . . . . . . . . . . . .
Table 3.9. Experimental results on a genome sequence (with j j = 4) . . . . . . .
Table 3.10. Experimental results on a protein sequence (with j j = 20) . . . . . .

46

Table 4.1. The Refp of p = bacdabcad . . . . . . . . . . . . . . . . . . . . . . . .
Table 4.2. The comparisons of the lcs(p; x) computation time for n = 50666 . . .
Table 4.3. The comparisons of the lcs(p; x) computation time for n = 102398 . .

59


vi

51
51
52
52
53
53
54
54
55
66
67


INTRODUCTION
In the modern life, when the use of computer and Internet is more and more essential,
digital data (information) can be copied as well as accessed illegally. As a result,
information security becomes increasingly important. There are two popular methods to
provide security, which are cryptography and data hiding [2, 5, 6, 20, 56, 62, 81].
Cryptography is used to encrypt data in order to make the data unreadable by a third party
[5]. Data hiding is used to embed data in digital media. Based on the purpose of the
application, data hiding is generally divided into steganography that hides the existence of
data to protect the embedded data and watermarking that protects the copyright
ownership and authentication of the digital media carrying the embedded data.

Steganography can be used as an alternative way to cryptography. However,
steganography will become weak if attackers detect existence of hidden data. Hence
integrating cryptography with steganography is as a third choice for data security [2, 5,

6, 12, 19, 61, 62, 81, 86, 93].
With the rapid development of applications based on Internet infrastructure, cloud
computing becomes one of the hottest topics in the information technology area. Indeed, it is a
computing system based on Internet that provides on-demand services from application and
system software, storage to processing data. For example, when cloud users use the storage
service, they can upload information to the servers and then access it on the Internet online.
Meanwhile, enterprises can not spend big money on maintaining and owning a system
consisting of hardware and software. Although cloud computing brings many bene ts for
individuals and organizations, cloud security is still an open problem when cloud providers can
abuse their information and cloud users lose control of it. Thus, guaranteeing privacy of
tenants’ information without negating the bene ts of cloud computing seems necessary [28,
38, 40, 41, 60, 95, 102]. In order to protect cloud users’ privacy, sensitive data need to be
encoded before outsourcing them to servers. Unfortunately, encryption makes the servers
perform search on ciphertext much more di cult than on plaintext. To solve this problem, many
searchable encryption techniques have been presented since 2000. Searchable encryption
does not only store users’ encrypted data securely but also allows information search over
ciphertext [26, 28, 29, 38, 40, 60, 71, 85, 102].

Searchable encryption for exact pattern matching is a new class of searchable
encryption techniques. The solutions for this class have been presented based on
algorithms for [26] or approaches to [41, 89] exact pattern matching.
As in retrieving information from plaintexts, the development of searchable
encryption with approximate string matching capability is necessary, where the search
string can be a keyword determined, encrypted and stored in cloud servers or an
arbitrary pattern [28, 40, 71].
From the above problems, together with methods using graph theory and automata
proposed by P. T. Huy et al. of solving problems of exact pattern matching (2002), longest
common subsequence (2002) and steganography (2011, 2012 and 2013), and their potential
applications in steganography and searchable encryption, as well as under the direction of
supervisors, the dissertation title assigned is research on development of methods of

1


graph theory and automata in steganography and searchable encryption.
The purpose of the dissertation is to research on the development of new and
quality solutions using graph theory and automata, suggesting their applications in,
and applying them to steganography and searchable encryption.
Based on the results and suggestions introduced by P. T. Huy et al., the dissertation will
focus on following four problems in steganography and searchable encryption:
- Digital image steganography;
- Exact pattern matching;
- Longest common subsequence;
- Searchable encryption.
The rst problem is stated newly in Chapter 2, the three remaining problems are
recalled and clari ed in Chapter 1.
For the rst three problems, the dissertation’s work is to nd new and e cient solutions
using graph theory and automata. Then they will be used and applied to solve the last
problem.
The dissertation has been completed with structure as follows. Apart from
Introduction at the beginning and Conclusion at the end of the dissertation, the main
content of it is divided into ve chapters.
Chapter 1. Preliminaries. This chapter recalls basic knowledge indicated throughout
the dissertation (strings, graph, deterministic nite automata, digital images, the basic
model of digital image steganography, some parameters to determine the quality of
digital image steganography, the exact pattern matching problem, the longest common
subsequence problem, and searchable encryption), re-presents important concepts
and results used and researched on development in remaining chapters of the
dissertation (adjacency list, breadth rst search, Galois eld, the fastest optimal parity
assignment method, the module method and the concept of the maximal secret data
ratio, the concept of the degree of fuzziness (appearance), the Knapsack Shaking

approach, and the de nition of a cryptosystem).
Chapter 2. Digital image steganography based on the Galois eld using graph theory
and automata. Firstly, from some proposed concepts of optimal and near optimal
secret data hiding schemes, this chapter states the interest problem in digital image
steganography. Secondly, the chapter proposes a new approach based on the Galois
eld using graph theory and automata to design a general form of steganography in
binary, gray and palette images, shows su cient conditions for existence and proves
existence of some optimal and near optimal secret data hiding schemes, applies the
proposed schemes to the process of hiding a nite sequence of secret data in an image
and gives security analyses. Finally, the chapter presents experimental results to show
the e ciency of the proposed results.
Chapter 3. An automata approach to exact pattern matching. This chapter proposes
a exible approach using automata to design an e ective algorithm for exact pattern
matching in practice. In given cases of patterns and alphabets, the e ciency of the
proposed algorithm is shown by theoretical analyses and experimental results.
Chapter 4. Automata technique for the longest common subsequence problem. This
chapter proposes two e cient sequential and parallel algorithms for computing the
length of a longest common subsequence of two strings in practice, using automata
technique. Theoretical analysis of parallel algorithm and experimental results

2


con rm that the use of the automata technique in designing algorithms for solving the
longest common subsequence problem is the best choice.
Chapter 5. Cryptography based on steganography and automata methods for
searchable encryption. This chapter rst proposes a novel cryptosystem based on a data
hiding scheme proposed in Chapter 2 with high security. Additionally, ciphertexts do not
depend on the input image size as existing hybrid techniques of cryptography and
steganography, encoding and embedding are done at once. The chapter then applies

results using automata technique of Chapters 3 and 4 to constructing two algorithms for
exact and approximate pattern matching on secret data encrypted by the proposed
cryptosystem. These algorithms have O(n) time complexity in the worst case, together
with an assumption that the approximate algorithm uses d(1 )me processors, where ; m
and n are the error of the string similarity measure proposed in this chapter and lengths of
the pattern and secret data, respectively. In searchable encryption, the cryptosystem can
be used to encode and decode secret data on users side and pattern matching algorithms
can be used to perform pattern search on cloud providers side.
The contents of the dissertation are written based on the paper [T1] published in,
the revised manuscript [T4] submitted to KSII Transactions on Internet and Information
Systems (ISI), and the papers [T2, T3] published in Journal of Computer Science and
Cybernetics in 2019. The main results of the dissertation have been presented at:
- Seminar on Mathematical Foundations for Computer Science at Institute of
Mathematics, Vietnam Academy of Science and Technology,
th

- The 9 Vietnam Mathematical Congress, Nha Trang, August 14-18, 2018,
- Seminar at School of Applied Mathematics and Informatics, Hanoi University of
Science and Technology.

3


CHAPTER 1

PRELIMINARIES
This chapter will attempt to recall terminologies, concepts, algorithms and results
which are really needed in order to present the dissertation’s new results clearly and
logically, as well as help readers follow the content of the dissertation easily. The
background knowledge re-presented here consists of basic structures (Section 1.1:

strings (Subsection 1.1.1), graph (Subsection 1.1.2), deterministic nite automata
m

(Subsection 1.1.3), and the Galois eld GF (p ) (Subsection 1.1.4)), digital image
steganography (Section 1.2), exact pattern matching (Section 1.3 ), longest common
subsequence (Section 1.4) and searchable encryption (Section 1.5).

1.1 Basic Structures
1.1.1 Strings
In this dissertation, secret data are considered as strings. So, some terms related to
strings will be recalled here [11, 24, 83].
A nite set is called an alphabet. The number of elements of is denoted by j j. An
element of is called a letter. A string x of length n on the alphabet is a nite sequence of
letters of and we write
x = x[1]x[2]::x[n]; x[i] 2

;1

i

n;

where n is a positive integer.
A special string is the empty string having no letters, denoted by . The length of the
string x is the number of letters in it, denoted by jxj. Then j j = 0.
Notice that for the string x = x[1]x[2]::x[n], we can also write x = x[1::n] in short. The
set of all strings on the alphabet is denoted by . The operator of strings is
concatenation that writes strings as a compound. The concatenation of the two strings
u1 and u2 is denoted by u1u2.
Let x be a string. A string p is called a substring of the string x, if x = u 1pu2 for some

strings u1 and u2. In case u1 = (resp. u2 = ), the string p is called a pre x (resp. su x) of
the string x. The pre x (resp. su x) p is called proper if p 6= x. Note that the pre x or the
su x can be empty.
1.1.2 Graph
Besides some basic concepts in graph theory, this subsection recalls the way representing
a graph by adjacency lists and breadth rst search [82]. These are used in Chapter 2.

A nite undirected graph (hereafter, called a graph for short) G = (V; E) consists of a
nonempty nite set of vertices V and a nite set of edges, where each edge has either
one or two vertices associated with it. A graph with weights assigned to their edges is
called a weighted graph.
4


Communication

Channel

Cover
Image

Embed

Stego
Image

Stego
Image

Extract


Cover
Image

An edge connecting a vertex to itself Sendis calledto a loop. Multiple edges are edges connecting the
same vertices. A graph having no loops and no multiple edges is called a simple graph.

In a simple graph, the edge associated to an unordered pair of vertices fi; jg is called the
edge fSecreti;jg. Key
Secret Key
Two vertices i and j in a graph G are called adjacent if they are vertices of an edge of

G.

Sender

Receiver
A graph without multiple edges can be described by using adjacency lists, which
specify adjacent vertices of any vertex of the graph.
Example 1.1. Using adjacency lists, the simple graph given in Figure 1.1 can be
represented as in Table 1.1.

1
2

4
2

1


3

5

4

Figure 1.1. A simple graph
Table 1.1. An adjacency list representation of the simple graph given in Figure 1.1

Vertex

Adjacent vertices

1

2, 3

2

1,3,4

3

1,2,4,5

4

2,3,5

5


3, 4

Given a simple graph G, a subgraph of G that is a tree including every vertex of G is
called a spanning tree of G. A spanning tree of a connected simple graph can be built by
using breadth rst search (BFS). This algorithm is shown in pseudo-code as follows.

Breadth First Search:
Input: A connected simple graph G with vertices ordered as i 1; i2; : : : ; in.
Output: A spanning tree T .
Set T to be a tree consisting only i1;
Set L to be an empty list;
Put i1 in L
While (L is not empty)
f
Remove the rst vertex i from L;
5


Secret Data
Communication
Channel

For each adjacent vertex j of i
If (j is not in L and T )
Cover

Stego

Send to


Image

Extract
Image
f Add j to the end of L;
Add j and the edge fi; jg to T ;

g

Secret Key

g
Return T ;

Receiver

End.

Example 1.2. For a graph given in Figure 1.1, a spanning tree of this graph is found by
using BFS as in Figure 1.2.

1
2

3

4

5


2

3

4

5

Figure 1.2. A spanning tree of the graph given in Figure 1.1

A graph with directed edges (or arcs) is called a directed graph. Each arc is
associated with the ordered pair of vertices. In a simple directed graph, the arc
associated with the ordered pair (i; j) called the arc (i; j). And the vertex i is said to be
adjacent to the vertex j and the vertex j is said to be adjacent from the vertex i.
1.1.3 Deterministic Finite Automata
Study on the problem of the construction and the use of deterministic nite automata
is one of objectives of the dissertation. Hence, this subsection will clarify this model of
computation [44, 82].
De nition 1.1 ([44]). Let be a nite alphabet. A deterministic nite automaton (hereafter,
called an automaton for short) A = ( ; Q; q0; ; F ) over consists of:
A nite set Q of elements called states, An
initial state q0, one of the states in Q,
A set F of nal states. The set F is a subset of Q,
A state transition function (or simply, transition function), denoted by , that takes
as arguments a state and a letter, and returns a state, so that : Q ! Q,
The transition function can be extended so that it takes a state and a string, and
returns a state. Formally, this extended transition function can be de ned recursively by
:Q!Q
such that 8q 2 Q, (q; ) = q, 8s 2


, 8a 2

; (q; as) = ( (q; a); s).
6


An alternative and simple way presenting an automaton is to use the notation
\transition diagram". A transition diagram of an automaton A = ( ; Q; q 0; ; F ) is a
directed graph given as follows [44].
a) Each state of Q is a vertex.
0
b) Let q = (q; a), where q is a state of Q and a is a letter of . Then the transition
0
diagram has an arc (q ; q) labeled a. If there are several letters that cause transitions
0
0
from q to q, then the arc (q ; q) is labeled by a list of these letters.
c) There is an arrow into the initial state q0. This arrow does not originate at any
vertex.
d) States not in F have a single circle. Vertices corresponding to nal states are
marked by a double circle.
Example 1.3. Consider an automaton A = ( ; Q; q 0; ; F ) over

= fa; bg, where

Q = fq0; q1; q2g, F = fq2g, and is given by the following table. Then the transition
diagram of A is shown in Figure 1.3.
a


b

q

q

q

q

q

q

q

0
1
2

a
q0

0

q

2

q


2

1
1
2

a, b
b

q1

a

q2

b
Figure 1.3. The transition diagram of A in Example 1.3

De nition 1.2

to be accepted

by the automaton

([82]). A string p is said

Secret Data

A = ( ; Q; q0; ; F


) if it takes the initial state q0

a state in F .

Communication

m

The Galois Field GF (p )
1.1.4
This

Cover
Image

Channel

Stego

Embed
m

to a nal state, it means thatSecret(qData0;p) is

subsection describes how to

Stego
m


construct a nite eld with p

Image

Extract

Image

elements, called the

GF p
p
1 is an integer [88]. The
algebraic structure
Galois eld
( ), where is prime and m
Send to
will be used in Chapter 2.
Let p be a prime number. De ne Zp[x] to be the set of all polynomials with the variable
x, whose coe cients belong to the eld Zp. Addition and multiplication in Zp[x] are de ned

modulo p at the end.

Secret Key Secret Key in the usual way and then reduce the coe cients

For f(x) 2 Zp[x], the degree of f(x), denoted by deg(f), is the largest exponent of x in f(x).
A polynomialSenderf(x) 2 Zp[x] is called to be irreducible if there does Receivernotexist

7


Cover
Image


polynomials f1(x); f2(x) 2 Zp[x] such that
f(x) = f1(x)f2(x);
= m1.

where deg(f1) > 0 and deg(f2) > 0.

De ne
1 in Zp[x]. Addition

Let f(x) 2 Zp[x] be an irreducible polynomial with deg(f)
m
Zp[x]=(f(x)) to be the set of p polynomials of degree at most m

and multiplication in Zp[x]=(f(x)) are given as in Z p[x], followed by a reduction modulo
f(x). Then Zp[x]=(f(x)) with these operations is a eld having p
m

m

elements, called the
m

Galois eld GF (p ). Note that for p is prime and m 1, the Galois eld GF (p ) is unique.

1.2 Digital Image Steganography
The interest problem in Chapter 2 is digital image steganography. This section will

recall the concept of digital images, the basic model of digital image steganography, some
parameters to determine the e ciency of digital image steganography and lastly re-present
results researched on development and used in Chapter 2 such as the fastest optimal
parity assignment (FOPA) method, the module method and the concept of the maximal
secret data ratio (MSDR) [18, 20, 21, 39, 49, 50, 51, 53, 61, 63, 65, 76, 78, 104].

A digital image is a matrix of pixels. Each pixel is represented by a non negative
integer number in the form of a string of binary bits. This value indicates the colour of
the pixel [39].
Note that based on the way representing of colours of pixels, digital images can be
divided into following di erent types [78].
1. Binary image: Each pixel is represented by one bit. In this image type, the colour
of a pixel is white, \1" value, or black, \0" value.
2. Gray image: Each pixel is typically represented by eight bits (called 8-bit gray
image). Then the colour of any pixel is a shade of gray, from black corresponding to
colour value \0" to white corresponding to colour value \255".
3. Red green blue image: Each pixel is usually represented by a string of 24 bits
(called 24-bit RGB image), where the rst 8 bits, the next 8 bits and the last 8 bits
corresponds to shades of red, green and blue, specifying the red, green and blue
colour components of the pixel, respectively. Then the colour of the pixel is a
combination of these three components.
4. Palette image: The colour of each pixel is not shown directly by the number
representing the pixel as for RGB images. Instead, this number is a colour index of the
colour of the pixel existed in the colour table (the palette), an ordered set of values
(strings of 24 bits) which represent all colours as in RGB images used in the image
and contained in the le with the image. The size of the palette is the same as the
length of a bit string representing a pixel and is limited by 8 bits. For a string of 8 bits,
call palette images 8-bit palette images.
The objective of digital image steganography is to protect data by hiding the data in a
digital image well enough so that unauthorized users will not even be aware of their existence

[21, 18]. Figure 1.4 shows the basic model of digital image steganography, where the cover
image is a digital image used as a carrier to embed secret data into, the stego image is digital
image obtained after embedding secret data into the cover image by the
8


function block Embed with the secret key on the Sender side. For steganography generally,
a

a, b

the secret data needs to be extracted fully by the block Extract with the secret key on
the Receiver side [20, 61, 63, 76].
The total number of the secret data sequence bits embedded in the cover image is called
b
a
q2
q0
q1
a Payload. Corresponding to a certain Payload, to measure the embedding capacity of the

cover image, the embedding rate (ER) is used and de ned as follows [104].

b

ER = Payload (bpp);
W H

(1.1)


where W and H are the cover image’s width and height.

Secret Data

Secret Data
Communication

Cover

Channel

Stego
Embed

Image

Stego
Extract

Image

Image

Cover
Image

Send to

Secret Key


Secret Key

Sender

Receiver

Figure 1.4. The basic diagram of digital image steganography
The peak signal to noise ratio (PSNR) is used to evaluate quality of stego image. Based on
the value of PSNR, we can know the degree of similarity between the cover image and stego
image. If the PSNR value is high, then quality of stego image is high. Conversely,
quality of stego image is low. In general, for the digital image, PSNR is de ned by the

2

following formula [20, 53]

4

2

PSNR = 10 log10 255 (dB);
MSE
1

where
W 1
P
M SE =

i=0


H 1
P

0

2

(1.2)

0

2

0

2

((B(i; j) B (i; j)) + (G(i; j) G (i; j)) + (R(i; j) R (i; j)) )
3 W H

i=0

0

3

0

5


;

0

where B(i; j); G(i; j); R(i; j); B (i; j); G (i; j) and R (i; j) are the colour value of the Blue,
Green and Red components of a pixel at position (i; j) in the cover and stego image,
respectively. For human’s eyes, the threshold value of PSNR value is 30dB [20, 53,
65, 104], it means that the PSNR value is higher than 30dB, it is hard to distinguish
between the cover image and its stego image.
Let G be a palette image and P = fc1; c2; : : : ; cn g be its palette, where ci is the
colour of a pixel of G corresponding to the colour index i. Each colour c in P is
considered as a vector consisting of red, green and blue components. Suppose d is a
distance function on P . The FOPA method [50] tries to get functions Next, Next: P ! P ,
and Val, Val: P ! Z2, where two conditions are satis ed for all c 2 P as follows.
9

2

4


1. d(c; Next(c)) = minv6=c2P d(c; v),
2. Val(c) =Val(Next(c)) + 1 on the eld Z2.
Call GP = (VP ; EP ) a weighted complete undirected graph of the palette image G,
0
0
where VP = P and the weight of the edge fc; c g is d(c; c ). The function Nearest,
0
0

Nearest: P ! P , is given by Nearest(c) = c holding d(c; c ) = minv6=c2P d(c; v). A rho
forest F = (V; E) is a directed graph with vertices weighted by the functionVal, where V
= VP , E is a set of all arcs (v; Next(v)), the vertex v has the weightVal(v) for all v 2 V .
The construction of a algorithm determining F is the essence of the FOPA method.
Algorithm for FOPA:
Input: A weighted complete undirected graph GP , the function Nearest.
Output: A rho forest F = (V; E).
Choose a vertext c 2 P , set V = fcg, and set C = P
nfcg; SetVal(c) = 0; // Or 1 randomly While (C is not
empty) // Update F f
a) Take one element v 2 C;
b) Initialize v0 = v, setVal(v0) = 0 (or 1 randomly), by a nite loop, nd a longest
sequence of k + 1 di erent elements in P consecutively, v0; v1; : : : ; vk, such that

Nearest(vi) = vi+1 for 8i = 0; k
Next(vi) = vi+1; i = 0; k

1; vi 2 C; vk 2 C or vk 2 V , and set

1;

b1) Case vk 2 C: SetVal(vi) = 1+Val(vi 1); i = 1; k and Next(vk) = vk 1;
Set V = V [ fv0; v1; : : : ; vkg and C = Cnfv0; v1; : : : ; vkg;
b2) Case vk 2 F : SetVal(vi) = 1+Val(vi+1); i = k

1; : : : ; 1; 0;

Set V = V [ fv0; v1; : : : ; vk 1g and C = Cnfv0; v1; : : : ; vk 1g;

g

Return F ;
End.
De nition 1.3 ([51]). Let M be a module over the ring Z m, k > 0 be a natural number,
and U be a subset of Mnf0g. Call U a k-base of M if for any v in Mnf0g, there exist t
elements v1; v2; : : : ; vt 2 U; t k, together with a1; a2; : : : ; at 2 Zm such that v = v1a1 +
v2a2 + :: + vtat.
Let G be a digital image, call CG the set of all colours of pixels in G. Consider the
case m = 2 and G is a binary image. Then C G = f0; 1g, and for n is a positive integer,
n
the set M = Z2 = f(x1; x2; : : : ; xn)jxi 2 Z2; i = 1; ng with element addition and scalar
multiplication de ned as usual is a module over the ring Z 2 [49]. For k = 1, the set
U = Mnf0g is an unique 1-base of M [51]. Two functions Next, Next: C G ! CG, and Val,
Val: CG ! Z2, satisfying the condition Val(c) =Val(Next(c)) + 1 on the ring Z 2, are de ned
in [49]. Suppose that for N jUj, I = fI1; I2; : : : ; IN g is an arbitrary image block
of G, K = fK1; K2; : : : ; KN jKi 2 Z2; i = 1; Ng is a secret key, d is any element in M, and
h is a surjective function from I to U. In the module method, d is considered as a secret
data, embedded in and extracted from the image block I with the key K by the blocks
Embed and Extract as follows [49, 51].
10


The block Embed (embedding d in I):
N

Step 1) Compute m =
i=1 h(Ii)(Val(Ii) + Ki);
Step 2) Case = : P
Case d 6= m: Find v 2 U such that d + ( m) = v. Based on v and h,
0
determine an element Ii of I. Then change Ii to Ii = Next(Ii);

d

m Ke ep I inta ct;

0

Return I ;
The block Extract (extracting d from I0): d =

PN

0

i=1

0

h(Ii )(Val(Ii ) + Ki);

De nition 1.4 ([49]). MSDRk(N) is the largest number of embedded bits of secret data
in an image block of N pixels by changing colours of at most k pixels in the image
block, where k; N are positive integers.
Given a positive integer qcolour, call qcolour the number of di erent ways to change
the colour of each pixel in an arbitrary image block of N pixels. According to [49]
1

2

2


k

k

MSDRk(N) = blog2(1 + qcolourCN + qcolour CN + + qcolour CN )c:

(1.3)

1.3 Exact Pattern Matching
This section will restate the exact pattern matching problem, and recall the concept
of the degree of fuzziness (appearance) used in Chapter 3 [24, 52, 68].
Let x be a string of length n. Denote the substring x[i]x[i + 1]::x[j] of x by x[i::j] for 81
th

i j n, the i element of x by x[i] and i is called a position in x. Let p be a substring of
length m of x, where m is a positive integer, then there exists i for 1 i n m + 1 such that
p = x[i::i + m 1]. And say that i is an occurrence of p in x or p occurs in x at position i.
De nition 1.5 ([68]). Let p be a pattern of length m and x be a text of length n over the
alphabet . Then the exact pattern matching problem is to nd all occurrences of the
pattern p in x.
The following example uses the Brute Force (BF) algorithm [24] to demonstrate the
most original way solving this problem.
Table 1.2. The performing steps of the BF algorithm

Step x
1

p

2


p

3

p

4

p

5

p

6

p

7

p

8

p

d

f


f

a h
f

a h

f

k

f

a h

a

a h
f

a h
f

a h
f

a h
f


a h
f

a h
f

11

a

h


Example 1.4. Given a pattern p = fah and a text x = dfahfkfaha. Then there are two
occurrences of p in x as shown below: dfahfkfaha. The BF algorithm is performed by
the following steps presented in Table 1.2, the bold letters correspond to the
mismatches, the underlined letters represent the matches when comparing the letters
of the pattern and the text. We know that many letters scanned will be scanned again
by the BF algorithm because each time either a mismatch or a match occurs, the
pattern is only moved to the right one position.
Chapter 3 uses the degree of fuzziness in [52] to determine the longest pre x of the
pattern in the text at any position. However, this terminology can lead to several
misunderstandings for the readers. So throughout this dissertation, the degree of
fuzziness will be replaced with the degree of appearance. The concept of the degree
of appearance is restated as follows.
De nition 1.6 ([52]). Let p be a pattern and x be a text of length n over the alphabet .
Then for each 1 i n, a degree of appearance of p in x at position i is equal to the length
of a longest substring of x such that this substring is a pre x of p, where the right end
letter of the substring is x[i].
Notice that obviously, if the degree of appearance of p in x at an arbitrary position i

equals jxj, then a match for p in x occurs at position i j pj + 1. Figure 1.3 illustrates the
concept of the degree of appearance of the pattern p in x.

The degree of appearance of p in x at the position being scanned is equal to 4

.... c d b c b a d b c c ....
x
p

b

c b a c

(a prefix of p)
Figure 1.5. The degree of appearance of the pattern p

1.4 Longest Common Subsequence
This section will recall the longest common subsequence (LCS) problem, and the
Knapsack Shaking approach addressing the problem studied on development in
Chapter 4 [24, 47, 94, 101].
De nition 1.7 ([101]). Let p be a string of length m and u be a string over the alphabet .
Then u is a subsequence of p if there exists a integer sequence j 1; j2; : : : ; jt such that
1 j1 < j2 < : : : < jt m and u = p[j1]p[j2]::p[jt].

De nition 1.8 ([101]). Let u; p and x be strings over the alphabet . Then u is a common
subsequence of p and x if u is a subsequence of p and a subsequence of x.
De nition 1.9 ([101]). Let u; p and x be strings over the alphabet . Then u is a longest
common subsequence of p and x if two following conditions are satis ed.
(i) u is a common subsequence of p and x,
(ii) There does not exist any common subsequence v of p and x such that jvj > juj.

12


Denote an arbitrary longest common subsequence of p and x by LCS(p; x). The
length of a LCS(p; x) is denoted by lcs(p; x).
By convention, if two strings p and x does not have any longest common
subsequences, then the lcs(p; x) is considered to equal 0.
Example 1.5. Let p = bgcadb and x = abhcbad. Then string bcad is a LCS(p; x) and
lcs(p; x) = 4.
Let p and x be two strings of lengths m and n over the alphabet ; m n. The longest
common subsequence problem for two strings (LCS problem) can be stated in two
following forms [24, 47].
Problem 1. Find a longest common subsequence of p and x.
Problem 2. Compute the length of a longest common subsequence of p and x.
The simple way to solve the LCS problem is to use the algorithm introduced by Wagner
and Fischer in 1974 (called the Algorithm WF). This algorithm de nes a dynamic
programming matrix L(m; n) recursively to nd a LCS(p; x) and compute the lcs(p; x ) as
follows [94].
1)+1
p[i] = x[j];
8
L(i; j) = L(i 1; j

>
>
:
<

0


max L(i; j

i = 0 or j = 0;

f

1); L(i

where L(i; j) is the lcs(p[1::i]; x[1::j]) for 1

i

1; j) otherwise,

g

m, 1

j

n.

Example 1.6. Let p = bgcadb and x = abhcbad. Use the Algorithm WF, the L(m; n) is
obtained below. Then lcs(p; x) = L(6; 7) = 4. In Table 1.3, by traceback procedure,
starting from value 4 back to value 1, a LCS(p; x) found is a string bcad.
Table 1.3. The dynamic programming matrix L

p =
b
g

c
a
d
b

x=
i; j
0
1
2
3
4
5
6

0
0
0
0
0
0
0
0

a
1
0
0
0
0

1
1
1

b
2
0
1
1
1
1
1
2

h
3
0
1
1
1
1
1
2

c
4
0
1
1
2

2
2
2

b
5
0
1
1
2
2
2
3

a
6
0
1
1
2
3
3
3

d
7
0
1
1
2

3
4
4

De nition 1.10 ([47]). Let u = p[j1]p[j2] : : : p[jt] be a subsequence of p. Then an element
of the form (j1; j2; : : : ; jt) is called a location of u in p.

From De nition 1.10, the subsequence u has at least a location in p. If all the di
erent locations of u are arranged in the dictionary order, then call the least element the
leftmost location of u, denoted by LeftID(u). Denote the last component of LeftID(u) by
Rmp(u) [47].

13


Example 1.7. Let p = aabcadabcd and u = abd. Then u is a subsequence of p and has
seven di erent locations in p, in the dictionary order they are
(1; 3; 6); (1; 3; 10); (1; 8; 10); (2; 3; 6); (2; 3; 10); (5; 8; 10); (7; 8; 10):
It follows that LeftID(u) = (1; 3; 6) and Rmp(u) = 6.
De nition 1.11 ([47]). Let p be a string of length m. Then a con guration C of p is de
ned as follows.
1. Or C is the empty set. Then C is called the empty con guration of p, denoted by
C0.
2. Or C = fx1; x2; : : : ; xtg is an ordered set of t subsequences of p for 1 t m such
that the two following conditions are satis ed.
(i) 8i; 1 i t; jxij = i,
(ii) 8xi; xj 2 C, if jxij > jxjj, then Rmp(xi) >Rmp(xj).
Set of all the con gurations of p is denoted by Con g(p).
De nition 1.12 ([47]). Let p be a string of length m on the alphabet , C 2 Con g(p) and a
2 . Then a state transition function ’ on Con g(p) such that

’ : Con g(p)! Con g(p) de ned as follows.
1. ’(C; a) = C if a 2= p.
2. ’(C0; a) = fag if a 2 p.
0
0
3. Set C = ’(C; a). Suppose a 2 p and C = fx1; x2; : : : ; xtg for 1 t m. Then C is
determined by a loop using the loop control variable i whose value is changed from t
down to 0:
a) For i = t, if the letter a appears at a location index in p such that index is greater
than Rmp(xt), then xt+1 = xta;
b) Loop from i = t 1 down to 1, if the letter a appears at a location index in p such
that index 2 (Rmp(xi); Rmp(xi+1)), then xi+1 = xia;
c) For i = 0, if the letter a appears at a location index in p such that index is smaller
than Rmp(x1), then x1 = a;
0

d) C =C.
4. To accept an input string, the state transition function ’ is extended as follows
’ : Con g(p)
such that 8C 2 Con g(p); 8s 2

; 8a 2

! Con g(p)

; ’(C; as) = ’(’(C; a); s) and ’(C; ) = C.

Example 1.8. Let p = bacdabcad and C = fc; ad; babg. Then C is a con guration of p
0


and C = ’(C; a) = fa; ad; ada; babag.
In 2002, P. T. Huy et al. introduced a method to solve the Problem 1 by using the
automaton given as in the following theorem. In this way, they named their method the
Knapsack Shaking approach [47].
Theorem 1.1 ([47]). Let p and x be two strings of lengths m and n over the alphabet ;
m n. Let Ap = ( ; Q; q0; ’; F ) corresponding to p be an automaton over the alphabet ,
where
The set of states Q = Con g(p),
14


The initial state q0 = C0,
The transition function ’ is given as in De nition 1.12,
The set of nal states F = fCng, where Cn = ’(q0; x).
Suppose Cn = fx1; x2; : : : ; xtg for 1 t m. Then
1. For every subsequence u of p and x, there exists x i 2 Cn; 1 i t such that the two
following conditions are satis ed.
(i) juj = jxij,
(ii) Rmp(xi) Rmp(u). 2. A

LCS(p; x) equals xt.

1.5 Searchable Encryption
This section clari es the term of searchable encryption (SE) and recalls the de nition of
a cryptosystem. They will be studied and used in Chapter 5 [26, 40, 60, 85, 88, 102].
Consider a problem to occur in cloud security as follows [60, 85, 102]. Cloud tenants,
for example enterprises and individuals with limited resource including software and
hardware, store data with sensitive information on cloud servers. Assume that these
servers cannot be fully trusted. This means they may not only be curious about the users’
information but also abuse the data received. Then users wish to encrypt their data before

uploading them to servers. Because of limitations of cloud users’ information technology
system, users also wish that cloud providers can help them perform information search
directly on ciphertexts. However, encryption brings di culties for servers to do search on
the encrypted data. These lead to a problem that is to nd a solution to satisfy the two
wishes of cloud users when they choose cloud storage service.
SE is a way to solve the above problem. It is indeed a system consisting of two main
components, a cryptosystem is used to encode and decode on cloud users side and
algorithms for searching on encrypted data are done on cloud providers side [40, 102].
In cryptography, SE can be either searchable symmetric encryption (SSE) or
searchable asymmetric encryption (SAE). In SSE, only private key holders can create
encrypted data and produce trapdoors for search. In SAE, users who have the public key
can make ciphertexts but only private key holders can generate trapdoors [26, 102].

Since the dissertation proposes a new symmetric encryption system for SSE in
Chapter 5, the correctness of this system needs to prove. In this dissertation, the
components and properties of a cryptosystem de ned in [88] will be considered as a
standard form to verify. Here recalls this de nition.
De nition 1.13 ([88]). A cryptosystem is a ve tuple (P; C; K; E; D) such that the
following properties are satis ed.
1. P is a nite set of plaintexts,
2. C is a nite set of ciphertexts,
3. K is a nite set of secret keys,
4. For 8k 2 K, there exists an encrypting function e k 2 E and a corresponding
decrypting function dk 2 D, where ek : P ! C and d k : C ! P holds d k(ek(x)) = x with 8x 2
P.

15



×