Extended process scheduler for improving user experience in multi core mobile systems

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (347.83 KB, 8 trang )

Extended Process Scheduler for Improving
User Experience in Multi-core Mobile Systems
Giang Son Tran

Thi Phuong Nghiem

Tuong Vinh Ho

ICTLab, University of Science
∗
and Technology of Hanoi, VAST

ICTLab, University of Science
and Technology of Hanoi, VAST*

Institute Francophone
International, Vietnam National
University
IRD, UMI 209 UMMISCO

Chi Mai Luong
Institute of Information
Technology, VAST*

ABSTRACT
Nowadays, advances in computing infrastructure and technology have made mobile phones become a crucial part of
our daily life. Almost everyone has their own mobile phones

to be used for their daily life activities such as organizing
events with calendar, browsing web, sending and receiving
emails, entertaining, etc.
In order to meet this enormous use of mobile market, manufacturers are in effort of producing mobile devices with as
high capabilities as possible. For example, it is not uncommon nowadays to have mobile phones with 8 cores and low
power consumption of 0.3W in a System-on-Chip model [1].
This effort of manufacturers can be considered as a marketing strategy to improve user satisfaction when using mobile
phones.
In their work, Yong et al., 2006 [2] show that user satisfaction on mobile devices not only depends on technological capabilities of the phones, but also on responsiveness
of mobile user interface to user interactions. Unfortunately,
users often tend to make excessive use of their mobile devices for performing many tasks at the same time. For example, one may simultaneously check email, send message
to his friends, download data from the internet, listen to
music, and read news. These concurrent actions commonly
result in high background load or unresponsiveness of user
interface to user interactions, and consequently reduce user
experience on mobile devices. Responsiveness is one of many
non-functional requirements that affect success of any mobile applications [3].
One direction to overcome the problem of unresponsiveness of user interface to user interactions on mobile devices
is improving CPU allocation so that CPU can process mobile tasks required by users more efficiently. Following this
direction, studies focus on a mechanism of operating system
kernel called process scheduler [4]. In detail, process scheduler is a component of operating system kernel which shares
the CPU resources among running tasks according to their
types (classes), priorities and CPU usages. The main job
of process scheduler is to decide which tasks to execute and
how long each task will be executed. The output decision of
process scheduler is one of the crucial criteria which affects
CPU computational power as well as overall performance of

Mobile phone is being well integrated into people’s daily life.
Due to a large amount of time spending with them, users

expect to have a good experience for their daily tasks. The
mobile operating system’s scheduler is in charge of distributing CPU computational power among these tasks. However,
it currently has not yet taken into account dynamic frequencies of CPU cores at runtime. This unawareness of the
scheduler with CPU frequency increases unresponsiveness of
user interface to user interactions, and consequently reduces
user experience on using mobile devices. In this paper, we
propose an extension of process scheduler which takes into
account the dynamic CPU frequency when scheduling the
tasks. Our method increases smoothness of user interface to
user interactions by lowering and stabilizing interface frame
times. Experimental results show that our proposed scheduler reduces amount of frame time peaks up to 40%, which
helps greatly in improving user experience on mobile devices.

CCS Concepts
•Software and its engineering → Scheduling; •Humancentered computing → Smartphones;

Keywords
Process Scheduler, CPU Frequency, User Experience, Mobile System, Operating System

1.

INTRODUCTION

∗Vietnam Academy of Science and Technology, 18, Hoang
Quoc Viet, Cau Giay, Hanoi, Vietnam
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than
ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission
and/or a fee. Request permissions from

SoICT ’16, December 08-09, 2016, Ho Chi Minh City, Viet Nam
c 2016 ACM. ISBN 978-1-4503-4815-7/16/12. . . $15.00
DOI: />
417

the mobile system [5].
Another important criterion which affects user experience
on mobile phones is their energy consumption [6]. If mobile
phones quickly deplete their battery power, users will easily
get annoyed and consequently encounter a negative user experience. Due to this important role of energy consumption,
operating systems running on mobile devices need to minimize power consumption. To reach this goal, one popular
method is to dynamically adjust CPU frequency with the
demand workload. Following this approach, a CPU governor [4] in the operating system kernel is responsible for this
task: it increases CPU frequency when the required workload is high to meet this demand and vice versa.
In this research context, we follow the direction of improving CPU allocation so as to enhance user experience
on mobile phones. By analyzing the process scheduler, we
are aware that it currently does not take into account CPU
frequency as a criterion to schedule running tasks. As a consequence, this increases unresponsiveness of user interface to
user interactions since the same amount of rendering work
(determined by the scheduler) has to be done in a longer
duration (as CPU frequency is controlled by the governor).
Realizing this problem of the process scheduler, in this paper, we propose an extension for the Linux’s default scheduler (named Completely Fair Scheduler, or CFS). Our extended scheduler takes into account CPU frequency into
scheduler decision when selecting appropriate running tasks.
We will show that our proposal helps in improving the smoothness of user interface to user interactions on mobile systems
in comparison with the Linux’s default scheduler CFS.
The remainder of this paper is organized as follows. Section 2 briefly reviews related works about CPU allocation.
In Section 3, we present the concept of CFS scheduler and
point out its current limitation. Section 4 is devoted to

introducing the principle and algorithm of our proposed
frequency-aware scheduler. In Section 5, we describe our
experiments and an analysis of our results. The paper ends
with Section 6, which includes a general conclusion and possible future works.

2.

tems with heavy workloads rather than mobile systems with
much less active threads per process.
Another noticeable work, namely GRACE-OS, was proposed by Yuan et al., 2003 [5] in order to reduce CPU energy
consumption on mobile devices using soft real-time scheduling. In detail, the method enhances CPU scheduler by performing scheduling and speed scaling at the same time. Although applicable on mobile devices, this approach mainly
targets multimedia applications, which require statistical
performance guarantees (for example, 96% of deadlines is
met [5]), and has not yet taken into account user interaction latency, that is, one important aspect for ensuring user
experience on mobile applications.
Concerning the limitations of kernel scheduler without
taken into account CPU frequency, operating system researchers have raised a research question of developing new
Linux kernel for connecting Linux scheduler and governor [10].
Some researchers discussed that it is possible to merge these
two components into a single entity [11], proposing an optimization in using CPU power for scheduling tasks. Valente
et al., [12] proposes a discussion for future research works
about improving responsiveness of user interface to user interactions. This is of importance for mobile devices since responsiveness of user interface and power consumption saving
are two major criteria for ensuring mobile user experience.
In this work, we propose an extension for Linux kernel
scheduler (CFS) which takes into account the current frequency of CPU cores when making scheduling decision for
mobile systems. Our work focuses on requirement of low latency for user interaction. Unlike the aforementioned works,
we focus on improving user experience when interacting with
mobile devices than saving energy consumption. By lowering and stabilizing interface frame times, our work helps in
increasing responsiveness of user interface to user interactions on mobile devices.

3.

COMPLETELY FAIR SCHEDULER

In this section, we present internal concepts and algorithms of CFS, the standard scheduler of the Linux kernel.
We then show a scenario in which CFS shows inefficiency in
CPU allocation when its frequency is not taken into account.

RELATED WORK

There exist various works in the literature to improve efficiency of CPU allocation. Yang et al., 2001 [7] proposed
a divide-and-conquer algorithm for improving runtime flexibility and reducing computational complexity. The algorithm is divided into two scheduling phases: the design-time
scheduling and runtime scheduling. Besides, the algorithm
proves that energy is an important criterion in scheduling
embedded multiprocessor System-on-Chips.
Another work about energy-aware scheduler is done by
Rizvandi et al., 2010 [8]. In detail, the authors proposed a
slack reclamation algorithm in the scheduler using a linear
combination of the processor’s maximum and minimum frequencies. The method helps in saving energy while still providing enough computational power for applications. Similarly, Mostafa et al., 2016 [9] proposed an energy-saving
scheduler for high performance computing systems. The
authors use a relocation of thread weights for each active
process so as to decrease number of context switches. Although these methods [8, 9] reduce energy consumption of
the scheduler, they currently target desktop or server sys-

3.1

CFS Model

CFS is the Linux kernel scheduler which uses time slice
estimation for selecting running tasks [13]. CFS was developed based on Earliest Eligible Virtual Deadline First

(EEVDF) scheduler [14]. To achieve high responsiveness for
all tasks, CFS tries to divide a certain amount of time (called
period, usually a small value with a minimum of 20ms) to
all runnable tasks. Time slice for task Ti is given by the
following equation:
Si =

ωi
× P,
Ωr

(1)

where
• Si is time slice length for the task Ti at the current
decision time;
• ωi is calculated weight for Ti ;

418

Unresponsive User Interface

CPU Load
Frame time

16.6ms

a

b

c

d

e

f

g

h

i

j

k

Interface Frame Time

High CPU Load
No CPU speed limit
Frame time reduces

Low CPU Load
Reduce CPU speed
Frame time increases

CPU Load

Workload reduces

m

Time

Figure 1: Unresponsiveness of user interface (UI) to user interactions in CFS scheduler.
• Ωr is total weight of the whole run queue of the current
CPU (each weight represents a given process’s priority); and

user interaction in CFS scheduler. The visualization of this
scenario is illustrated in figure 1. Grey bars represent CPU
load needed and red solid line represents the corresponding
user interface’s frame time at the same moment. Frame
time is the duration which differentiates one fully rendered
user interface’s frame from another. A horizontal dotted line
indicates the 16.6ms limit for each frame time, equivalent to
the ability of rendering 60 frames per second (fps). If frame
time is below the limit, user eyes could not perceive real
differences between two consecutive frames [15]. As such,
animations being shown on the user interface appears as
smooth and fluid. Indeed, Claypool et al., 2006 [16] showed
that user’s perception performance improved sevenfold when
increasing frame rate from 3 to 60 fps.
Figure 1 indicates two possibilities of having interface frame
times higher than the optimal 16.6ms limit: overload and
underload. At overload time a and b, with high CPU load,
the UI thread is not provided enough CPU power to maintain the drawing process below 16.6ms. System load reduces

at time c and d, leaving more CPU to the rendering thread.
This load reduction results in a lower interface frame time
(in other words, increases interface frame rate). As system load continues to decrease (to an underload point), the
governor decides that CPU frequency should be reduced to
lower power consumption (between time d and e). Lower
CPU frequency also leads to a reduction of CPU power provided to the UI rendering thread. As a result, the UI thread
struggles in maintaining a good frame rate for the user interface since an optimal user experience must have at least
60fps, or 16.6ms per frame.
Furthermore, the governor works with a larger interval
than the scheduler. Not until time k does the governor notice a high CPU load is present and bump CPU frequency
up. This results in a drop of interface frame time (from time
k to time m), keeping it back to under the 16.6ms limit. As
a result, the user interface is unresponsive in a duration between time e to k. It is caused by the unawareness of the
scheduler with the lowered CPU frequency. If the scheduler
had been aware of this change, it would have re-prioritized
the UI thread by increasing time slice length for it and reducing time slice length of other running background threads.
By reconsidering time slices of all threads, the scheduler can

• P is the target period that the scheduler tries to execute all tasks.
When the number of tasks in the run queue is increased, P
will be lengthened to reduce performance overheads caused
by too many context switches in a short amount of time.
CFS uses an important term called vruntime (virtual runtime) to track performance and scheduling status of each
active thread in its whole lifecycle. Virtual runtime υi of
task Ti is added after each calculated time slice:
υi = υ i +

ti
× N0 ,
ωi

(2)

where ti is the execution time of task Ti in the last execution period, N0 is a constant (N0 = 1024). Nice is a
parameter of each task to representing its priority. These
vruntime values and other scheduling informations of all
tasks are stored in CFS using a self-balancing binary tree
named “Red-Black tree”. CFS tries to put the task Ti with
lowest υi to the left-most node of the tree, so that it can be
retrieved instantly in the next scheduling period.
By looking into the internal CFS scheduling algorithm,
we can see that all calculations of time slice Si and υi in
equations (1) and (2) do not take into account target frequency fj of the target CPU core cj in a multi-core or
multi-processor system. When a running task at the runtime is migrated from one core cj to another ck with different frequency fj = fk , or when the governor reduces core
frequency, CPU power may be greatly lost and consequently
the system would produce a very bad user experience. One
example of this consequence is the case when the migrated
task is responsible for rendering user interface (UI) and the
system becomes very laggy when responding to user interaction on mobile devices.

3.2

CFS Limitation

In order to demonstrate the current limitation of CFS, we
present a scenario where user interface is unresponsive to

419

On the other hand, the governor ’s sampling time is usually configured as a multiple of CFS scheduler’s time slice
in the Linux kernel: τi = π × Si . In other words, CFS is
working with a smaller (and finer) granularity of time than
the governor ’s counterpart. Thus, we have:

potentially provide more CPU power and ensure its fairness
for frequency changes.

4.

IMPROVEMENT TO CFS

In this section, we propose a frequency-aware scheduler
(hereinafter called FA-CFS) as an extension of CFS scheduler. The main idea of our proposed scheduler is to optimize
task weight ωi and time slice Si of target task Ti according
to frequency changes.
We propose a scheduler to balance workload and difference in frequencies. In detail, we model a workload with
its parameters in a multi-core CPU with dynamic frequency
managed by the governor and thread scheduling tasks managed by the scheduler . This workload is executed in a multitasking, time-sharing and preemptive operating system.
Let W be a workload that performed in a single thread
and can be considered as a number of CPU cycles required to
perform a task. A workload is measured as a multiplication
of speed and time. In the simplest case, if this workload is
scheduled on a single core CPU with constant frequency f
(approximately proportional to number of instructions per
second), we have:
W = f × T,

ωi = fij × (π × Si + ζi )

Due to a large difference between the governor ’s sampling
time and scheduler ’s time slice, when the running thread of
the workload W is migrated from one core cj to another ck
with frequencies fij ≥ fik , performance penalty δi (in terms
of work) of a single sampling time slot τi for a lowered CPU
speed can be estimated as:
δi = ωi − ωi ≤ (fij − fik ) × π × Si + fij × ζi − fik × ζi (10)
On the other hand, CFS has scheduling complexity of
O(logN ) (N is number of active tasks) [17]. N is often unchanged unless there is a new creation of thread or process.
Therefore, amount of work (frequency × time) for accounting and scheduling is generally a constant between sampling
intervals. In other words, fij × ζi = fik × ζi . Inequality 10
can be simplified as:

(3)

where T is the total time (in seconds) of execution.
Generally, the scheduler spends a little CPU time (ζi )
for accounting and selecting the next scheduled thread after
each sampling time τi [4]. This time can be considered as
performance overhead of the process scheduler. Therefore,
T in equation (3) becomes:

δi ≤ (fij − fik ) × π × Si

m

n

(τi + ζi ),

(4)

(τi + ζi )

(5)

To reach this goal, in each single time slice Si , it is possible
to counteract with the changes of frequency (i.e. minimizing performance penalty) by providing more CPU computational power to this particular workload. The extra CPU
power can be allocated to this task on core ck by increasing
time slice length to Si (the previously allocated time slice is
Si on core cj ).
When applying this counterbalance, performance penalty
δi in inequality (11) becomes

n

(6)

i=1

where fi is the CPU frequency at sampling time ti . Since
we have a multi-core processor, equation (6) becomes:

δi ≤ fij × π × Si − fik × π × Si

n

fij × (τi + ζi ),

(7)

i=1

fij

is frequency of CPU core cj at sampling time τi .
where
Consider that our global workload W is split into n microworkloads ωi performed during n sampling time: W = n
i=1 ωi .
Each micro-workload at sampling time τi is therefore calculated as:
ωi =

× (τi + ζi )

(13)

p=1

As previously discussed, since the CPU frequency is managed by the governor (in order to minimize power consumption), it fluctuates at runtime based on the total workload
of the whole system. As a result, CPU frequency f is not a
constant:
fi × (τi + ζi ),

((fpj − fpk ) × π × Sp )

Minimize

i=1

fij

(12)

m

n

W =

((fpj − fpk ) × π × Sp )
p=1

After defining total performance penalties because of frequency changes in equation (12), we can state the main objective of our improvement in FA-CFS as:

where n is the total number of sampling times during the
whole execution duration. When taking ζi into account, our
global workload in equation (3) becomes:

W =

m

δp ≤
p=1

i=1

W =f×

(11)

During the workload duration, with m migrations or frequency changes, the total performance penalty (in terms of
amount of work) of inequality (11) becomes:
∆=

T =

(9)

In an ideal situation, this performance penalty can be surpressed (i.e. we completely counterbalance this frequency
difference), δi = 0, therefore
fij × π × Si − fik × π × Si ≥ 0

420

(15)

Thus, we can proportionally resize the time slice scale:
Si ≥

(8)

(14)

fij
× Si
fik

(16)

Like in aforementioned equations (1) and (2) in section 3.1,
time slice estimation of CFS is also proportional to various
task weights and run queue weight:
fj
ωi
ωi
× P ≥ ik ×
×P
Ωr
Ωr
fi

JavaScript with Chronium’s V8 JavaScript engine). In order to avoid preloaded images, we clear the browser cache
before starting each experiment session.
We involved a total of 5 users in our experiment. With
each user, we asked them to perform 16 browsing sessions
(8 on each of the two Android devices, described later). On
each device, users performed 4 sessions with CFS scheduler
and 4 sessions with our FA-CFS scheduler. With each scheduler, 4 governors with different characteristics were used in
order to manage rising and declining system load with frequency ramp up and ramp down. The governors included
in our experiments are interactive (default, fastest ramp up
with intermediate frequencies, best latency), conservative
(slow ramp up), ondemand (fast ramp up, fast ramp down,
almost between minimum and maximum frequencies), and
performance (keep highest frequency, waste energy) [18].
Technical Choices:
On the hardware side, our experiments are performed on
two categories of Android devices: one LG Nexus 4 and one
Asus Nexus 7 Wifi (2012), representing phone and tablet,

respectively. LG Nexus 4 has a better hardware configuration (RAM is doubled and 30% better CPU core frequency)
than the Nexus 7 Wifi 2012.
On the software side, we build from source an aftermarket
open-source operating system called CyanogenMod, based
on Android Open Source Project (AOSP). We use the latest
version of CyanogenMod with their supported Linux kernel
to implement our model. We decided to build CyanogenMod from source because of the ability to customize Linux
kernel and flash (or install) the kernel along with the whole
operating system into our devices.
We use an Android’s developer option called “Profile GPU
rendering” to monitor and gather interface frame times during the experiments. We then use Android’s integrated
“dumpsys” tool on the mobile devices to collect through an
USB cable various statistic informations, including the monitored interface frame times.

(17)

As a result, our scheduler can counteract with frequency
changes by proportionally distribute these weights as follows:
ωi ≥

fij
× ωi
fik

(18)

We implement our proposed frequency-aware scheduler in
Linux environment where our model acts as a frequencyaware extension to the CFS scheduler. We use the CPUFreq
interface to call the governor for collecting CPU frequencies [18]. Having extracted frequencies, we implement our
proposed algorithm to balance time slice in Linux’s CFS.

We use the CPUFreq’s userspace sysfs interface in order to
gather statistical information.

5.

EVALUATION

The goal of this section is to present the improvement of
responsiveness of mobile user interface to user interaction
provided by our FA-CFS scheduler in comparison with CFS
scheduler. We first present the setting of our experiments,
and then provide an analysis of our experimental results.

5.1

Experimental Setup

Interface Frame Time Measurement:
In order to evaluate our proposed FA-CFS scheduler, we
use interface frame time as the main metric to measure the
improvement of responsiveness of mobile user interface to
user interactions. We chose to measure interface frame time
since it plays an important role in ensuring user experience.
A fully rendered frame is passed through a set of steps in
Android rendering pipelines: execute the issued layout commands, process the swapping buffers, prepare the texture
and finally draw the content to the screen.
Evaluation Scenario:
In our experiment, we implement a popular scenario where
users browse an online news website using smartphones and
tablets, which are installed CFS and FA-CFS. Since we want

to compare the efficiency of our FA-CFS with CFS, we divide our scenario into two main steps where in the first step,
users were asked to browse the online news website (http:
//bbc.com in our experiments) with smartphones which was
installed CFS; and in the second step, users were asked to
browse the same online news website but with FA-CFS installed. In both two steps, we recorded interface frame times
created by user interactions during their browsing sessions.
A browsing session in our experiment includes: (1) User
starts the stock browser, (2) he types the URL http://bbc.
com, (3) he waits for page load, and finally (4) he scrolls
up and down as soon as one or more parts of the page
content appears. In this scenario, there are three different types of workload created by the UI thread, background
network threads (to fetch data from remote server) and the
browser engine (in charge of parsing HTML and processing

5.2

Experimental Results

Interface Frame Time Peaks:
Figure 2 shows a set of captured frame times from one user
session on the LG Nexus 4 with CFS and interactive governor. It can be seen from this figure that frame times during
this session are not stabilized, but generally are smaller than
the optimal 16.6ms. In the first part of this session (frame
0 - 100), frame times were relatively high because the web
browser needs to perform 3 tasks at the same time: fetching web content, parsing partial HTML contents as they arrive, and rendering them on the screen. Rendering thread is
not provided with enough computational power because the
background threads are overloading the CPU, thus the UI
thread struggles to maintain an optimal frame time. Since
frame 125, page fetching and HTML parsing tasks are finished, but there exist very high frame times, some exceeded
40ms.

These peaks (or spikes) cause “micro stuttering”, a term
used to indicate irregular delays between frames being rendered [19]. Micro stuttering decreases user experience, even
though the average frame rate is high enough. These high
frame time peaks can be explained as a consequence of CPU
core frequency changes and the UI thread suffers from these

421

50

Draw
Prepare
Process
Execute
16.6ms (60fps) limit

Choppy Frames with Frame Time Peaks

High frame times, Unresponsive User Interface

Time(ms)

40

30

20

10

0

0

50

100

150

Frame

200

250

Figure 2: Interface frame time peaks on the LG Nexus 4 with CFS scheduler and Interactive governor.
Table 1: Average frame time percentile (ms) of CFS vs. FA-CFS with 4 governors on Nexus 7 Wifi
Interactive

Ondemand

Conservative

Performance

%

CFS

FACFS

±

CFS

FACFS

±

CFS

FACFS

±

CFS

FACFS

±

90
91
92
93
94
95
96

97
98
99
100

17.18
17.33
17.76
18.22
18.89
20.44
23.13
27.25
31.29
38.94
48.58

14.27
14.46
14.68
14.93
15.45
15.98
16.08
17.21
23.16
29.12
37.3

-16.9%

-16.6%
-17.3%
-18.1%
-18.2%
-21.8%
-30.5%
-36.8%
-26.0%
-25.2%
-23.2%

17.94
18.11
18.45
18.72
19.96
22.89
23.49
29.83
32.83
41.77
55.12

15.02
15.24
15.45
15.65
15.88
16.25
16.73

21.29
25.57
35.1
44.05

-16.3%
-15.8%
-16.3%
-16.4%
-20.4%
-29.0%
-28.8%
-28.6%
-22.1%
-16.0%
-20.1%

22.26
22.64
22.83
23.34
23.87
27.6
30.03
34.53
45.68
60.34
83.1

21.19

21.55
22.77
23.19
23.56
28.02
31.62
33.24
43.92
62.17
77.38

-4.8%
-4.8%
-0.3%
-0.6%
-1.3%
1.5%
5.3%
-3.7%
-3.9%
3.0%
-6.9%

11.93
12.36
12.75
13.29
13.98
14.73
15.62

16.79
19.03
23.68
31.21

11.81
12.21
12.6
13.42
14.05
14.61
15.88
16.47
19.49
25.23
30.85

-1.0%
-1.2%
-1.2%
1.0%
0.5%
-0.8%
1.7%
-1.9%
2.4%
6.5%
-1.2%

Table 2: Average frame time percentile (ms) of CFS vs. FA-CFS with 4 governors on Nexus 4

Interactive

Ondemand

Conservative

Performance

%

CFS

FACFS

±

CFS

FACFS

±

CFS

FACFS

±

CFS

FACFS

±

90
91
92
93
94
95
96
97
98
99
100

14.54
14.78
14.82
15.48
15.63
16.03
16.49
17.58
18.24
24.52
35.36

14.03
14.18

14.25
14.81
15.27
15.42
15.86
16.56
17.14
20.36
30.31

-3.5%
-4.1%
-3.8%
-4.3%
-2.3%
-3.8%
-3.8%
-5.8%
-6.0%
-17.0%
-14.3%

15.41
15.62
16.03
16.19
16.49
16.91
17.5
18.04

19.27
23.83
36.65

13.14
13.45
13.87
14.32
14.67
15.29
15.83
16.75
19.41
22.01
33.41

-14.7%
-13.9%
-13.5%
-11.6%
-11.0%
-9.6%
-9.5%
-7.2%
0.7%
-7.6%
-8.8%

20.75
21.07

22.37
23.86
25.91
27.77
31.46
36.17
40.34
50.48
73.55

21.25
22.32
22.81
23.99
26.46
27.29
31.86
35.82
41.19
52.16
71.81

2.4%
5.9%
2.0%
0.5%
2.1%
-1.7%
1.3%
-1.0%

2.1%
3.3%
-2.4%

11.58
11.86
12.15
12.51
12.99
13.34
14.02
15.16
16.42
19.32
24.98

12.32
12.53
12.85
13.28
13.75
14.06
14.5
16.42
17.31
19.38
25.21

6.4%
5.6%

5.8%
6.2%
5.9%
5.4%
3.4%
8.3%
5.4%
0.3%
0.9%

422

differences, similar to the scenario that we previously discussed in section 3, figure 1.
Frame Time Percentile:
In order to analyze the effectiveness of our FA-CFS scheduler, we use the statistical metric frame time percentile. The
metric is described as follows: an xth frame percentile at y
milliseconds shows that during the experiment, x% of all
frame times are less than y milliseconds. Frame time percentile represents the stability of frame time and thus, the
“quality” of user experience in interactions. In this part of
evaluation, we focus on analyzing average frame time percentile of all user sessions.
Tables 1 and 2 show average frame time percentiles of
all user sessions on both devices, the LG Nexus 4 and the
Asus Nexus 7 Wifi 2012, with 4 different governors. It is expected that frame time percentiles of the Nexus 7 are larger
than the Nexus 4’s one, because the Nexus 7 has lower hardware configuration yet higher screen resolution. It is worth
reminding that interactive is the default governor on most
mobile phones.
With the two highly dynamic governors, interactive and
ondemand, these tables show a general observation that FACFS achieves better frame time reduction with the Nexus
7 than the Nexus 4. The Nexus 7 benefits greatly from

our time slice optimization, with average 21.8% and 29%
frame time decreased (with interactive and ondemand, respectively) for 95% amount of total rendered frames. While
showing less improvement regarding frame time percentiles,
FA-CFS still achieves 3.8% and 9.6% enhancement. These
differences between the Nexus 7 and Nexus 4 can be interpreted as difference in hardware configuration (30% faster
CPU and 4% less screen pixels on Nexus 4 than Nexus 7).
Not only does our frequency-aware FA-CFS scheduler reduces average frame times but also it provides better frame
time stabilization than traditional CFS: 97th , 98th and 99th
frame time percentiles provide big improvements on both
devices. Especially, with better 99th percentile (25.2% and
16% reduction for interactive and ondemand on Nexus 7),
user has smoother and more responsive interface as well as
experiences less micro stuttering frames during their interactions.
Furthermore, it can be seen from table 1 that FA-CFS
achieves considerably lower average frame time than CFS
with interactive. The lowest gain (from the lowest level 90th
to 95th ) is 16.9%. The difference starts increasing at 96th
percentile (30%), reaches its peak at 97th (36.8%) and still
keeps a wide margin until 100th (maximum frame time).
Additionally, we can see an improvement in terms of frame
time stabilization of FA-CFS with its ability to keep 97%
number of frames under 16.6ms limit (instead of under 90%
with the mainline CFS) on Asus Nexus 7 Wifi.
On the other hand, the right halves of these tables exhibit less improvements for both devices with conservative
and performance governors. These are less dynamic governors than the previously discussed interactive and ondemand counterparts. We observed that with the completely
static governor performance, our FA-CFS barely achieved
improvements throughout all user sessions. This can be explained that performance always provides maximum CPU
computational power to all possible threads without fref

Number of Frames

1000

100

10

1

0
0

5

10

15
20
25
Frame Time (ms)

30

35

40

Figure 3: Frame time distribution of CFS on Motorola Moto X 2nd edition.

Number of Frames

1000

100

10

1

0
0

5

10

15
20
25
Frame Time (ms)

30

35

40

Figure 4: Frame time distribution of FA-CFS on
Motorola Moto X 2nd edition.
as little improvement as 6.9% (on Nexus 7) and 2.4% (on

Nexus 4) at 100th percentile. General frame time did not
earned much reduction because this governor tries to minimize CPU frequency as much as possible without many frequency changes.
The analyses of tables 1 and 2 above show that our FACFS enhances frame time stabilization, increases average
frame rate and reduces frame time peaks (or spikes) with
widely used governors (interactive and ondemand ). Due to
this, our FA-CFS scheduler proves its efficiency in improving
user experiences while interacting with mobile devices.
Frame Time Distribution:
In order to further analyze the effectiveness of our FACFS scheduler, we use another statistical metric frame time
distribution. For this, we setup our experiment on an additional user session with a higher end mobile phone, Motorola
Moto X (2nd edition) with a quad-core CPU, each running
at 2.5GHz. During user interactions, we gather 1189 frame
times (in approximately 19 seconds of browsing BBC homepage) of CFS and FA-CFS with interactive governor, and
represent them in histograms in figures 3 and 4, respectively.
It can be seen from figure 3 that even on a high end
phone, CFS causes micro stuttering with frames longer than
16.6ms. Some frames take even more than 32ms (doubles
the 16.6ms limit). These peaks cause choppy during web
content scrolling in the browser. Applying our FA-CFS into
Cyanogenmod greatly reduces these peaks (figure 4). Maximum frame time for FA-CFS is 24ms, when compared to
37ms on CFS. In total of 1189 frames, FA-CFS produced
only 14 frames longer than the 16.6ms limit. In contrast,
this number of the CFS counterpart is 24. From these re-

j

quency changes ( f ik = 1). Its conservative sibling achieves
i

423

sults, our FA-CFS achieves a reduction of 40% frame time
peaks (24 frames down to 14 frames). Additionally, frame
times are better packed in the mean 8ms range. These two
figures clearly illustrate the benefit to improve user experience of our FA-CFS, even on high end mobile device.

6.

[8]

CONCLUSION AND PERSPECTIVES

This paper proposed a new frequency-aware process scheduler for improving user experience on multi-core mobile systems. We built a model which acts as an extension of the
Linux default scheduler (Completely Fair Scheduler - CFS)
for taking into account the dynamic CPU frequency when
scheduling the tasks. Our model helps in increasing responsiveness of mobile user interface to user interactions by lowering and stabilizing interface frame times. The experiments
showed that our proposed FA-CFS scheduler reduces the
amount of frame time peaks up to 40%, which greatly brings
benefits to multi-core mobile systems where user experience
relies largely on responsiveness of user interface.
Several research directions can be taken into account to
continue this work. First and foremost, since our work helps
in improving user experience on mobile systems, it worths
investigating our model on various workloads to see if it
can bring benefits on larger multi-core and multi-CPU platforms, i.e. desktops and virtualized servers [20]. Secondly,
combining our frequency-aware scheduler with performanceoriented scheduler (e.g. BFS scheduler) is also an interesting
research direction. Our FA-CFS scheduler can take into account BFS scheduler’s advancements to improve UI responsiveness and save CPU power. Last but not least, we wonder
if our frequency-aware improvement can be applied on the
Red-Black tree by restructuring it based on core frequencies

at runtime.

[9]

[10]

[11]

[12]

[13]
[14]

[15]

7.

REFERENCES

[1] C. Van Berkel. Multi-core for mobile phones. In
Proceedings of the Conference on Design, Automation
and Test in Europe, pages 1260–1265. European
Design and Automation Association, 2009.
[2] Y. G. Ji, J. H. Park, C. Lee, and M. H. Yun. A
usability checklist for the usability evaluation of
mobile phone user interface. International Journal of
Human-Computer Interaction, 20(3):207–231, 2006.
[3] A. I. Wasserman. Software engineering issues for
mobile application development. In Proceedings of the
FSE/SDP workshop on Future of software engineering

research, pages 397–400. ACM, 2010.
[4] A. Silberschatz, P. Galvin, and G. Gagne. Applied
operating system concepts. John Wiley & Sons, Inc.,
2001.
[5] W. Yuan and K. Nahrstedt. Energy-efficient soft
real-time cpu scheduling for mobile multimedia
systems. In ACM SIGOPS Operating Systems Review,
volume 37, pages 149–163. ACM, 2003.
[6] D. Ferreira, A. K. Dey, and V. Kostakos.
Understanding human-smartphone concerns: a study
of battery life. In Pervasive computing, pages 19–33.
Springer, 2011.
[7] P. Yang, C. Wong, P. Marchal, F. Catthoor,
D. Desmet, D. Verkest, and R. Lauwereins.

[16]

[17]

[18]

[19]

[20]

424

Energy-aware runtime scheduling for
embedded-multiprocessor socs. IEEE Design & Test of
Computers, (5):46–58, 2001.

N. B. Rizvandi, J. Taheri, A. Y. Zomaya, and Y. C.
Lee. Linear combinations of dvfs-enabled processor
frequencies to modify the energy-aware scheduling
algorithms. In Cluster, Cloud and Grid Computing
(CCGrid), 2010 10th IEEE/ACM International
Conference on, pages 388–397. IEEE, 2010.
S. M. Mostafa and S. Kusakabe. Towards reducing
energy consumption using inter-process scheduling in
preemptive multitasking os. In 2016 International
Conference on Platform Technology and Service
(PlatCon), pages 1–6, Feb 2016.
V. Pallipadi and S. B. Siddha. Processor power
management features and process scheduler: Do we
need to tie them together? LinuxConf Europe, pages
1–8, 2007.
J. H. Sch¨
onherr, J. Richling, M. Werner, and G. M¨
uhl.
Event-driven processor power management. In
Proceedings of the 1st International Conference on
Energy-Efficient Computing and Networking, pages
61–70. ACM, 2010.
P. Valente and M. Andreolini. Improving application
responsiveness with the bfq disk i/o scheduler. In
Proceedings of the 5th Annual International Systems
and Storage Conference, page 6. ACM, 2012.
C. S. Pabla. Completely fair scheduler. Linux Journal,
2009(184):4, 2009.
I. Stoica, H. Abdel-Wahab, K. Jeffay, S. K. Baruah,
J. E. Gehrke, and C. G. Plaxton. A proportional share

resource allocation algorithm for real-time,
time-shared systems. In Real-Time Systems
Symposium, 1996., 17th IEEE. IEEE, 1996.
C. McAnlis, P. Lubbers, B. Jones, D. Tebbs,
A. Manzur, S. Bennett, F. d’Erfurth, B. Garcia,
S. Lin, I. Popelyshev, et al. Applying old-school video
game techniques in modern web games. In HTML5
Game Development Insights. Springer, 2014.
M. Claypool, K. Claypool, and F. Damaa. The effects
of frame rate and resolution on users playing first
person shooter games. In Electronic Imaging 2006,
pages 607101–607101. International Society for Optics
and Photonics, 2006.
P. Pawar, S. Dhotre, and S. Patil. Cfs for addressing
cpu resources in multi-core processors with aa tree.
International Journal of Computer Science and
Information Technologies, 2014.
V. Pallipadi and A. Starikovskiy. The ondemand
governor. In Proceedings of the Linux Symposium,
volume 2, pages 215–230. sn, 2006.
J.-M. Arnau, J.-M. Parcerisa, and P. Xekalakis.
Parallel frame rendering: trading responsiveness for
energy on a mobile gpu. In Proceedings of the 22nd
international conference on Parallel architectures and
compilation techniques. IEEE Press, 2013.
G. Von Laszewski, L. Wang, A. J. Younge, and X. He.
Power-aware scheduling of virtual machines in
dvfs-enabled clusters. In Cluster Computing and
Workshops, 2009. CLUSTER’09. IEEE International
Conference on, pages 1–10. IEEE, 2009.

Extended process scheduler for improving user experience in multi core mobile systems

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về