MINISTRY OF EDUCATION AND TRAINING
THAI NGUYEN UNIVERSITY
________________
CHU DUC TOAN
STUDY ADAPTIVE CONTROL ALGORITHMS WITH
REFERENCE FLOWS TO IMPROVE THE SPEED OF
SPECISLIZED PARALLEL PROCESSING SYSTEMS
Major: Control Engineering and Automation
Code: 62.52.02.16
SUMMARY OF ENGINEERING THESIS
THAI NGUYEN - 2013
This work has been published in: Thai Nguyen University
Instructor(s): Ass Prof. Dr. Do Xuan Tien
Ass Prof. Dr. Nguyen Huu Cong
Defender 1:
Defender 2:
Defender 3:
Thesis will be defended at supervisory board at university level
meeting at …………………………………………………………….
At hour, ….date month year
This thesis can be found in the library: Library of Industrial
Engineering University - Thai Nguyen University; Learning
materials Center of Thai Nguyen University, National Library of
Vietnam.
1
SUMMARY OF THESIS
1. The necessary of the topic
Many new areas such as computer graphics, artificial
intelligence, number analysis, parallel calculating in the petroleum
industry, the unmanned equipment, equipment of identifying
monitoring mobile targets , require to process very large volumes
of data with high speed. Most of this problem, the sequential
computer does not meet the actual requirements. Research on parallel
processing systems now focus two main researching directions are as
follows:
The one is to study multi-processor systems as supercomputers
(Suppercomputer) [45], [54], large computer (Mainframe),
minicomputer (minicomputer) make versatile: the hardware structure
and software function of the computer that must be multi-functional
organized are complex. Mathematical model is very complex,
beyond ordinary calculating structure. Therefore, when applied to
specialized applications, their processing speed is often slower than
the ability of the micro processing; real-time parameter is not
controlled correctly Parallel multi-processing system has a big
scale together with operating structural software is very complicated.
Therefore, the computer system has a very expensive price. This is a
difficult problem to solve specific problems required high speed, low
cost consistent with economic conditions in Vietnam.
The second is to study parallel multi-CPU processing system such
as: parallel-specialized processing system for one or one class of
specific math, the same function. Therefore, manipulation methods,
2
structure of data group need to be processed; the structure of
resulting data are defined before, so it is more easy to decompose
functions , easy to select data organization and appropriate
processing methods with speed requirements. With specific tasks and
defined data structures, the optimal processing algorithm can be
made, appropriate hardware structure, utilizing system resources
reasonably. Due to the function of parallel specialized processing
systems is limited and explicit, the Monitor program is built in the
most optimal, highly scalable and more importantly is to response
fast requirements of operational processes in the system.
Through the above analysis, the thesis selects the second
direction that is parallel-specialized multi-CPU processing system. In
parallel specialized multi-CPU processing system, shared storage
space (SSS) is very important: the storage of the database to be
processed, the operating program When many reference flows
accessed to shared memory at the same time that can lead to conflict,
then the system may hang or low access speed, the performance of
shared memory is reduced and does not meet the speed requirements
of this problem. The importance of SSS is the control set of reference
flow. On that basis, the problem to be solved is to synthetic the
structure of adaptive controlling of reference flow to SSS to
minimize the probability of conflicts when accessing shared
resources, improving computing speed is very important. From the
above analysis, the research poses the problem for parallel-
specialized multi-CPU processing system met the fast and reliable
processing speed, reasonable price is very necessary and as a basis to
form thesis’s topic: "Study adaptive control algorithms with
3
reference flows to improve the speed of specialized parallel
processing systems”
2. Object and scope of research
- Object of the thesis is SSS in parallel-specialized multi-CPU
processing system.
- The researching scope of the thesis is the limitation in making the
mathematical model in reference flow to SSS in parallel-specialized
multi-CPU processing system; specify the binding conditions
between these parameters and the changeable parameters to
synthesize optimal controlling system (adaptation) in reference flow
to SSS to improve the efficiency and reduce the probability of
conflicts when accessing shared resources.
3. The researching method of the thesis
- Based on the classic theory as a queuing theory, probability theory
namely Morkov process stops, distributes Poat-xông to build and
calculate the performance for reference mathematical model to SSS
in parallel Multi-CPU processing system.
- Describe mathematically model of shared memory in the parallel
multi-CPU processing system.
- Study to control the system by using emulation and practical
technology that is FPGA modern technology.
4. The scientific meaning and practice of the thesis
4.1. In the science
4
The scientific meaning is to study and apply the optimal controller
(adaption) in reference flow to SSS for parallel-specialized multi-
CPU processing system to improve the performance, the speed, and
minimize the probability of conflict when accessing shared
resources.
4.2. In the practice
The research result will be references for students and graduate
students as well as researchers interested in research on multi-CPU
parallel processing dedicated. From the results of this research, topic
as the basis for many further studies aimed to apply widely the
parallel-specialized multi-CPU processing system in the practice in
Vietnam, especially the system has high requirement for speed.
5. The structure of thesis
The thesis consists of three chapters namely explanation,
conclusion and references.
Chapter 1. Architecture of parallel multi-CPU processing
system.
1.1. System Resources
1.1.1. Hardware Resources
1.1.2. Software Resources
1.2. The definition of parallel processing system
1.3. Classification of parallel processing system
5
- Michel J.Flynn gave 4 architecture models of parallel processing
system are: (i) SISD model, (ii) SIMD model, (iii) MISD model, (iv)
MIMD model.
- Handler classifies parallel processing system based on parallel
level and processing level according to the pipeline mechanism of
the hardware structure.
1.4. Overall architecture of parallel multi-CPU processing
system
1.4.1. Model
1.4.2. The issues related to performance
1.5. The architecture of parallel- specialized multi-CPU
processing system.
1.5.1. The characteristics of parallel- specialized multi-CPU
processing system
a. Specialized function
Specialized function is also reflected in the data structure that the
system must process. This data structure is largely the vector data
due to the similar structure of elements and they are arranged in the
order (such as the structure: range-azimuth-height) that allows to
vector easily the basis of this data. The consequence is to perform
data processing operations as the pipeline mechanism easily - a
mechanism to improve the performance of processing system.
b. The structure of minimal hardware
6
Due to parallel- specialized processing system performs a defined
task and this task is established only in a math class so structural
parameters must be determined quite accurately. As a result, the
hardware organization will ensure minimal with standard partitioning
algorithm.
c. The high speed and performance
d. The high reliability
This is a requirement, as well as a characteristic of the parallel-
specialized processing system. At first view, it seems to conflict to
the requirement of high speed’s system However, unlike general-
purpose computers, parallel-specialized processing system is largely
the system which is difficult to maintenance, even impossible to
maintenance (such as processing systems mounted on satellites, on
the self-led missile, or in the early warning system under the sea )
so it must require high reliability.
1.5.2. The architecture of parallel- specialized multi-CPU
processing system.
a. Model of parallel- specialized multi-CPU processing system.
b. These factors affect the performance of parallel- specialized
multi-CPU processing system.
c. Branch instruction
1.6. Commentary, research-oriented of the topic
7
Through the analysis and introduction about parallel multi-CPU
processing system and parallel- specialized multi-CPU processing
system in section (1.4) and (1.5).
With parallel - specialized multi-CPU processing system, the
performance largely depends on the accessing speed into common
resources, but the most importance is SSS because the highest
possibility of conflict is here (because the frequency of using SSS is
much higher than other resources such as I/O port, peripherals ).
One of the tasks of the synthesis stage of the system is to minimize
the possibility of conflict in reference to SSS of CPU unit that is a
critical task. For example, the monitoring system of aircrafts. The
aircrafts are such as: (i) at different distances (ii) the speed is also
very different. The parameters must be monitored for an aircraft: (i)
distance (ii) azimuth and (iii) height, when the parameters are
controlled, we are able to draw the flying orbits. Then we can make
other decisions (to kill, not to kill ).
8
- The situation of researching in the country: The researching of
"Solving level 1 problem for Radar intelligence information”
scientific research at the level of Department of Defense, Dr. Nguyen
Van Lien (2008-2012).
Figure 1.14:
Observation distance of
the system
Outermost range
doughnut
N1024
t
t
Range doughnut
N1023
Range
doughnut N1
Surge generator
375 Hz
Pulse
reflection from
the target on
theother ranges
other
range
The cycle of pulse
U
9
In which, this topic shows clearly that it must solve at the same
time all the parameters such as, range, azimuth and height for 1024
range donuts. However, the topic did not mention to SSS.
- The situation of researching in the foreigner: The typical studies of
three authors from 2000 to now continue studying of parallel
processing system. The study was published in 2000 [5], Baghdadi
A., Zergainoh N. E. in 2004 [13], Chou Y., Fahs B., AND Abraham
S, also in 2004 with the work of the author: Ken Mai, Ron Ho, Elad
Alon, Dean Liu, Dinesh Patil, Mark Horowitz [39]. However, these
topics are applied for large multi-CPU system, supercomputers.
Therefore, these topics study un- limited number of CPU, maybe up
to thousands of CPU. So many tight binding parameters are not
specified, large survey graph drawing gets difficulties. In term of the
parallel- specialized multi-CPU processing system, the number of
CPU is not too much, functional disintegrate is very good.
1.7. Conclusion of Chapter 1
The assessment and analysis in chapter 1 has solved some
problems:
- Introduce to the outline of the parallel multi processing system
and parallel- specialized multi-CPU processing system.
- Select the object of study is that parallel- specialized multi-CPU
processing system namely SSS
- Orient to the study of parallel- specialized multi-CPU processing
system towards modern controlling methods to control reference
flow to SSS in the parallel multi-CPU processing system namely
10
adaptive controlling of reference flow to SSS to minimize the
probability of conflicts when accessing shared resources.
On the basis of preliminary studies of parallel-specialized multi-
CPU processing system, in Chapter 2 the thesis will study farther,
analysis, make mathematical model refer to SSS.
Chapter 2. Making mathematical models refer to common
memory in parallel multi-CPU processing system
2.1. Theoretical Basic
To build the mathematical model for optimal controlling
mechanism of reference flow in parallel-specialized multi-CPU
processing system based on the requirements of the functional
processing system described in chapter 1 that was done with SSS, the
thesis need to use:
- Queuing theory is used to describe the n reference flow to SSS with
queuing mechanism at the entrance / exit.
- Probability theory, namely the Markov process is used to uniform
referring mechanism of n reference flows to SSS with
synchronization mechanism in the operation of the parallel-
specialized multi-CPU processing system. That means the state of
the system established after clocking the system. Moreover, there
used stopping Markov process only to confirm that the future state of
the system depends only on the current state of the system (which
does not depend on the previous states).
- Use the distribution of the reference to SSS of the parallel-
specialized multi-CPU processing system that is Poat-xong
11
distribution: The parallel- specialized multi-CPU processing system
has good disintegrate in functions so the time for reference is much
less than the time for working in single-CPUs system of the system.
2.2. Building mathematical model referenced the shared memory
in parallel multi-CPU processing system.
2.2.1. The traditional reference model to the shared memory in
parallel multi-CPU processing system.
2.2.2. Building the improved reference model to the shared memory
in parallel multi-CPU processing system.
To build a mathematical model, the thesis determined from the
definitions of the performance E. The E-assessing performance is
defined here as the ratio of:
0
/
accacc
NNE
In which : Nacc - Total number of successful reference.
MUX
Control
Shared
memory
+ Address channel
+ Controlling
channel
Figure 2.1: Refering the shared memory in
parallel processing system
12
Nacc0 - Total number of reference launched by the
system. If we call the probability of reference to SSS as E to ensure
referring successfully, we need 1/ E test.
Call P is the probability of the entrance reference registers is
unoccupied (P is the conditional probability of entrance register is
unoccupied), even when you are Q =1- P . To refer successfully, we
need 1/E1 test with a conditional probability P (E1 – the performance
of entrance reference register is unoccupied).
The probability of occupied entrance reference registers is 1 - P to
ensure a successful reference, we need 1/Ep test (Ep – The
performance of busy entrance reference register). So we turn to the
problem of conditional probability, with a relationship:
EE
Q
E
P
N
N
placc
acc
111
0
The expression of performance is rewritten as follows:
lp
pl
QEPE
EE
E
(2.1)
This is a mathematical model to determine the performance of
shared memory’s architecture with a buffering as role of queuing at
the entrance and exit of the physical memory module. To determine
this model and appear controlling parameters, we need calculate
three components: (i) P – the probability of unoccupied entrance
reference register, (ii) Ep – The performance of occupied entrance
reference register, (iii) El - The performance of unoccupied entrance
reference register. These quantities are complex and highly
dependent on parameters related to the system’s structure.
13
2.2.2.1. Determine the quantity P - Probability for entrance
reference registers unoccupied
- To examine P we need to model the entire process of referring
CPUs to the shared memory. On the basis of queuing theory model
[4], [46] described in Picture 2.2.a, combined with the characteristic
of parallel-specialized multi-CPU processing system, can see the
architecture of shared memory as a system consisting of k
independent queues according to M/D/1/m rule. That means: the
reference process to the shared memory is the Poission process and
Markov distribution (M), the service life of the memory is
determined (D); memory space for references is 1 and queue size of
each memory module is m
- The probability of unoccupied entrance reference register will be
determined:
m
n
n
i
in
in
i
m
n
i
in
nii
e
nNPP
0 0
1
0
!
1
11
(2.15)
2.2.2.2. Identify Ep - The performance when the queue of memory
modules is full
Considering the case of the performance when the queue of the
memory modules is full: When launching a reference from any single
CPU, there is still the probability served (reference with certainly
successful probability). So we compute Ep when the memory module
is full, Ep is calculated as follows:
14
bTTnqq
q
E
PP
P
/)1(2112
2
2
(2.18)
2.2.2.3. Identify E
l
– The performance of unoccupied entrance
reference register.
Considering the performance when entrance reference register
into unoccupied memory: When launching a reference, an
unsuccessful probability still exists. So we need to calculate El (the
performance when entrance reference register into unoccupied
memory), is calculated as follows: each flow reference will be in one
of three states: (i) Free state; (ii) Reference flow state implementing
will be successful; (iii) Reference flow state implementing will be
successful. Supposing there exists quantities: q - the probability that
a free reference flow initials a reference; - the probability that a
free reference flow; - the probability of reference flow made
successful reference; - - the probability of reference flow made
unsuccessful reference; - the probability to refer successfully.
)1(2
)1(4)21(21
2
q
qqqqqq
E
l
(2.25)
2.3. Conclusion of Chapter 2
In Chapter 2, the thesis solved the following issues:
- Having built a mathematical model referring to SSS for parallel-
specialized multi-CPU processing system, binding parameters can be
calculated and controlled as the queue size m, b
15
- Mathematical model (2.1) will be used in Chapter 3 in building
the system with adaptive controlling of reference flow to SSS for
parallel-specialized multi-CPU processing system.
Chapter 3. Simulation and controlling model
3.1. Building simulation software
3.1.1. Building the main simulation software module
Figure 3.1: Software interface calculating the performance of the
multi-CPU processing system.
3.1.2. Building software module calculating the performance of the
multi-CPU processing system in relation to the Tc shared
memory’s cycle
Figure 3.2: Software interface calculating the performance of the
multi-CPU processing system in relation to the Tc shared memory’s
cycle
16
3.1.3. Building software module calculating the performance of the
multi-CPU processing system in relation to the number of
reference number n
Figure 3.3: The software interface calculating the performance of
the multi-CPU processing system in relation to the number of
reference number n
3.1.4. Building software module calculating the performance of the
multi-CPU processing system in relation to to the Tc shared
memory’s cycle with values ρ = 0.5
Figure 3.4: The software interface performance calculating the
performance of the multi-CPU processing system in relation to to the
Tc shared memory’s cycle with values ρ = 0.5
3.2. Survey and evaluate the performance of controlling model
by simulation
17
Using the software program has been built and surveyed the
performance of system according to the relationships established, we
have the correlation graph among them. The results are as follows:
Logic memory model consistent with the results from scalar
simulation to achieve the performance which is over 0.6. In
particular, when T = Tc = 16, the simulation result without queue (m
= 0) the result is 0.27; and the performance is 0.65 when using the
queue.
0
0.2
0.4
0.6
0.8
1
1 3 5 7 9 11 13 15 17 19 21
T
Mô hình Bailey (m=0)
m=2
E
Figure 3.5: Efficiency of random reference of logic memory
bandwidth according to T is compared in the two cases when m = 2
and without logic memory bandwidth (m = 0, Tl = Td = 0)
Figure 3.6: The dependence of E according to the physical cycle of
Tc memory modules when m changes
18
0.0
0.2
0.4
0.6
0.8
1.0
1 5 10 15 20 25 30 35 40 45 50
m=6
m=4
m=2
m=0
n
E
0.0
0.2
0.4
0.6
0.8
1.0
1 5 10 15 20 25 30 35 40 45 50
m=6
m=4
m=2
m=0
E
n
a) b)
Figure 3.7: E graph according to the number of reference flow n.
a) Tc = 10, b) Tc = 5
According to the survey‘s results, the more the performance
increased, the more the queue’s size increased. However, we can not
design the buffering with too big size, because after writing data only
a few cycles referenced may be asked to read data immediately. So if
the larger queue’s size is, the longer the waiting time for writing to
memory and other reference flows can misread data.
0.0
0.2
0.4
0.6
0.8
1.0
1 5 10 15 20 25 30 35 40 45 50
Tc
m=6
m=4
m=2
m=0
Figure 3.8: Egraph according to physical cycles of Tc memory
modules while keeping fixed ρ = 0.5
With a system composed n reference flows, l logical memory
bandwidths, T
l
=1, m= 4 6 and q=1.0 (full loading). To achieve the
performance E> 0.90 with
1
0
,
, with m =2, need to choose
<0.2. When increasing Tl, they have impacted to reduce the overall
19
performance, in order to maintain the performance required to keep
<0.5. The rate of similar relationship can be given when the survey
with Tc fixed, and the number of logical bandwidths and number of
reference flows changes. From the results received, based on the
relationship between
and E, we can completely define the
parameters to satisfy the requirements for determining class problem.
3.3. Building model adaptive control
Controllers using FPGA technology since it has the ability to re-
architecture hardware by the program.
Figure 3.10: Adaptive control block diagram for parallel processing
to a dedicated CPU
3.4. FPGA Technology
3.4.1. Reconstruction of architectural hardware program
3.4.2. System Design on FPGA
3.5. Diagram adaptive control theory in the parameter m
The simulation clearly shows the quantitative relation between the
performance E and queue’s size. However, E is dependent on the
density of the reference flow over time (due to parameter n is not a
constant) so we need a controlling mechanism the m size of the
Adaptive controller
Object
control
lp
pl
QEPE
EE
E
E
yc
E
out
∆e
m
-
FPGA
control
FPGA
λ
20
queue in relation to the density of reference flow. This structure was
designed as follows: considering the structure of the queue as a FIFO
structure, the method of FIFO restructuring according to the size
parameter m can be done rapidly and easily by FPGA technology [2].
In the case of the FIFO size with m > 1 (Figure 3.16.a) will use
FPGA according to the manner in Figure 3.16.b.
In this structure, the block of “controlling signal for FPGA” is
essentially the structure of the reference flow and the average density
over time for a decision that how much is queue’s size is optimal.
Actuators will be re-programmed to re-structure for FPGA suitably.
By that way, we are approaching the adaptive system according to
the density of reference flow.
To control FPGA, we need the binary code with the length of at least
3 bits to control to match chain according to functions required are
shown in Figure 3.2. Note that, when started, the first step is to open
code 000 to ensure open the entire circuit.
In control panel 3.2, we have:
The use of 3 late pitch in the entrance of current pipeline: step 1-code
001; step 2- code 100, step 3- code 110.
The use of 2 late pitch in the entrance of current pipeline: step 1-code
001; step 2- code 101, step 3- code 111.
The use of 1 late pitch in the entrance of current pipeline: step 1-code
011; step 2- code 111, step 3- code 111.
21
Figure 3.16: The model driver queue size m
Figure 3.2: Dashboard
Controlling signals for FPGA under 3bit binary code with
000 -> open-circuit 111-> do nothing
Input D2 Input D3
Input in next
pipe floor
Output D1 001 010 011
Output D2 100 101
Output D3 110
Order of controlling
code
Step 1 Step 2 Step 3
Triger D
# 1
Triger D
# 1
Triger D
# 1
IN
D1
Q1
D2
Q2
Dn
Qn
Rhythm clock to data on a)
Triger D
# 1
Triger D
# 1
Triger D
# 1
D1
Q1
D2
Q2
Dn
Qn
FPGA
b)
n =3
Control signals
to the
FPGA
22
3.6. Conclusion of Chapter 3
Chapter 3 has solved the following problems:
- Develop algorithms, simulation diagram in the Delphi
environment.
- Take out the simulation results showing that in the case of a
mathematical model referring to SSS when not using buffering of
queue’s size m have lower performance than the case of using the
buffering of in/out queue’s size m # 0
- Size of m is an important parameter to optimize the structure of
memory according the problem class, parallel multi-CPU processing
system will have not only high performance but also high reliability.
That is the basis for adaptive controlling structure. Now size of m
will be a function of the reference’s frequency of the system into
SSS. If you use more structures to detect and determine the
frequency of accessing, it will be controlled to change queue’s size
of FPGA’s structure for the memory bank to match this frequency.
Conclusions and recommendations
1. The conclusions:
Parallel multi-CPU processing system is increasingly widely
applied in many fields, including civil and military. With advanced
engineering and technology currently, designing studies’ towards of
parallel-specialized multi-CPU processing system has high-
performance, optimal and flexible structure with class of application
problem, is a correct direction. The process of researching thesis has
contributed new issues as follows:
- Have been found and proven mathematical model allowing us to
identify the accessing performance of the shared memory of parallel-
specialized multi-CPU processing system as a function of memory
cycles and indicates the in/out queue’s size m and the other
parameters involved.
- The obtained results allow to calculate the configure of shared
memory in parallel multi-CPU processing system. It also pointed out
that the more the queue m increases, the less E performance is and it
is dependent on the number of reference flow that means we can
increase the number of CPUs up to multi-processing systems to solve
the problem of large numbers such as problems with large databases
23
but high designate coefficients. Using technical solution to resize
queues m by FPGA technology to suit every math class.
- Gather the results of the thesis is used as a supporting tool for
integrated design for parallel-specialized multi-CPU processing
system, to meet practical requirements. The technical solutions given
are feasible and the advanced technology allows to implement.
2. Recommendations
Stopped in the dissertation new model adaptive control system as a
parameter is the queue size m should be flexible and yet highly
flexible. So further research directions of the thesis is to integrate a
number of other parameters in adaptive control mechanisms such as
duty cycle of Tc memory, memory bandwidth numbers of KGNDC b