Tải bản đầy đủ (.pdf) (7 trang)

Flow Sensitive Information Flow Analysis for C Programs

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (114.89 KB, 7 trang )

Flow Sensitive Information Flow Analysis
for C Programs
Jun Furuse1 , Dzung Dinh-Khac2 , and Viet Ha Nguyen2
1

Graduate School of Information Science and Technology, the University of Tokyo
2
A.N.Lab Joint Stock Company, Vietnam
Abstract. VITC compiler aims to provide information security to legacy
C applications, using type based information flow analysis. We have recently modified its typing discipline to flow sensitive, while those of the
other realistic information secure compiler implementations for Java[5]
and ML[8] are flow insensitive. This is because local states in C are too
frequently stored in global variables such as errno.

1

Introduction

Language based information flow analysis verifies non-interference property of
programs: roughly speaking, a non-interferent program cannot leak its secret
information to attackers unintentionally. It is a very strong measure against
program security holes, however, information flow analysis has not been well
applied to C language, which is one of the most popular targets for attackers
who try to steal secrets.
The goal of our VITC compiler project is to secure existing C applications by
providing this information flow based security to C language. Our static/dynamic
type systems track down information flow of a C program annotated by secrecy
specifications and verify its non-interference property according to them.
In this short paper, after brief discussion on VITC’s key features, we explain
our recent achievement: VITC’s flow sensitive information flow typing system,
which is mandatory to analyze information flow of realistic C applications. Such


imperatively written programs often use global variables such as errno to store
program states, which cannot be flow-insensitively typed well.

2

VITC type system overview

VITC programs are annotated with security specifications using the lattice model
of security levels[2], where the security level constants ℓ called labels form a
lattice (L, ⊑):
xRULE(L < H)
int xSEC(L) l = 0;

// specification of the security lattice
int xSEC(H) h = 42;
l = h + 1;

Declaration xRULE(L < H) specifies the finite lattice L = {L, H} where L ⊑ H,
L for the lower secrecy and H for the higher. Using C type attribute syntax,


macro xSEC(ℓ) specifies that variables l and h store information of lower(L) and
higher(H) secrecy respectively. Using these specifications, our static type system
detects that the assignment l = h + 1 illegally leaks the higher secrecy information
derived from h to l of the lower secrecy3 . So-called implicit flows are also tracked
so that insecure codes like if(h){ l = 1;} can be properly rejected.
To make this type system track the information flow correctly, the programs
must be compiled by memory safe C compilation[6, 7]. Once memory-secured, C
becomes a very imperative functional language. Thus, our static type system is
partially based on one for ML[8]: to handle functional aspects of C, C functions

may have polymorphic security types, for example.
Even with the assumption of memory safety, static analyses of C programs
are very hard, due to its type casts. This is also true for information flow analysis,
and some flows around type casts must be checked dynamically at run-time. For
example, when an expression e of type int is type-casted to a pointer type (int
*)e, we may not be able to statically determine how secret its content is. our
compiler enforces programmers to write an explicit annotation here like (int
xSEC(L)*)e to embed code which dynamically type-checks whether the result
of e is a pointer to lower security information or not.
Even when these checks for memory safety and the dynamic typing fail,
a VITC program must continue its execution in a failure oblivious manner[3],
rather than simple fail-safe abortion. It is since careless program termination
may leak secrecy to whom observes the termination: for example, a termination
of code execution if(h){ e; } at e gives a clue that h had a non-zero value.

3

Flow sensitive analysis

3.1

A motivating example: errno

Until recently VITC type system was flow insensitive as [5, 8], that is, a variable
has a fixed secrecy in different contexts. It was acceptable as far as we typecheck very simple examples. This breaks once we tried to compile more realistic
applications with global variables like errno. To demonstrate the problem, let
us consider the following program:
int errno;
int main()
{ int xSEC(H) h;

int xSEC(L) l;
...
if(h) { errno = 1; }
errno = 0;
...
l = errno; }

/* Global variable */
/* variable of higher secrecy */
/* variable of lower secrecy */

Let errno be of secrecy ℓ. It is easily seen that after the if-statement, ℓ = H (or
higher than H) due to the indirect information flow from h to errno.
3

Typing of a variable with an xSEC specification is done flow-insensitively.


In the flow insensitive information flow analysis, the secrecy of errno is
fixed throughout the program. Therefore, it reports an error for the assignment l = errno since a flow from H(higher) to L(lower) is forbidden. However, in practice the example should be analyzed well since after the assignment
errno = 0, the variable carries no information of higher secrecy. In C, lacking
modern language functionalities such as exceptions, such a global variable like
errno is often used to store states which are just locally meaningful. This kind of
use of global variables for temporal states prevents secure programs from being
typed with flow insensitive information flow analysis.
Flow sensitive information flow analysis gives a solution to this problem
since variables can have different secrecy after assignments. After the assignment errno = 0, the secrecy of errno can be lowered to L. If the code between
errno = 0 and l = errno do not raise the secrecy of errno, the last assignment
does not raise any error since it just induce a flow from L to L.
In literature, there have been a number of approaches to flow-sensitive information flow analysis, e.g. [1, 4]. Although they give nice theoretical results, they

do not consider sub-functions, which are very common in C programs, and thus
that makes them less practical. We argue that our system, which is presented
subsequently, is more useful as we allows for functions.
3.2

The language

Syntax For our formalization, we first define a small C-like language which
supports global variables, conditionals, function declarations and function calls:
e ::=
| n
| x|f
| f (e)
s ::=
|
|
|
|

Expressions
Constants
Variables
Function calls

Statements
skip
Skip
x := e
Assignments
s; s

Sequences
if e then s else s Conditionals

t ::= int | char | . . .
d ::=
| t x = n;
Variable decls
| tℓ x = n;
Variables with specs
| t f (t x)
{s; return e; }
Function decls
p ::= d . . . d

Programs

The language can have security level constants ℓ only at variable declarations
tℓ x = n; (ex. int xSEC(L) x = 0;). Such variables with levels give security
specifications of a program, and their typing is flow insensitive while the others
are typed flow sensitively.
Types, constraints, conditions, and subtyping Types τ in our type system
is fully annotated with flow types λ, which is either a level constant ℓ or a type
π
variable α for polymorphism. Functional type τ → τ is annotated with its effect
π: the security lower bound of side effects inside the function:
λ, π ::=
Flow types
| ℓ Level constants
| α Type variables


τ ::=
| tλ
π
| τ →τ

Mono-types
t = int, char...
Functional types


As we allow variables in types, it is necessary to have a formal manner to
express the ordering relation between type variables and flow types. In our system, it is represented by type constraints (or constraints for short) of the form
k ::= λ1 ⊑ λ2 . A set of constraints K forms a trivial constraint system and we
write K ⊢ λ1 ⊑ λ2 when λ1 ⊑ λ2 is inferable from K.
Similarly to [4], to allow types to be flow-sensitive, the type of a variable
must be able to ”vary”, i.e. a variable may have different types before and after a
statement (especially an assignment) is executed. Therefore, to keep track of such
types of variables during program execution, we must have conditions C, which
are partial maps from variables x to mono-types τ . Typing of statements must be
annotated with respectively pre- and post-conditions in order to represent types
of variables before and after the statement is executed. Moreover, differently
to [4], as we allow for function calls in expressions where functions may have
different pre- and post-conditions, each expression is also annotated with a preand a post-condition with the same meaning.
Partial order between security levels ℓ is naturally extended to the following
subtyping relationships between types and conditions. The subtyping of function
parameters is contra-variant:
K ⊢ λ1 ⊑ λ′1 K ⊢ λ′2 ⊑ λ2
K ⊢ π′ ⊑ π

K ⊢ λ ⊑ λ′

K ⊢ tλ ⊑ tλ



π

λ′ π ′

λ′

K ⊢ tλ2 2 → tλ1 1 ⊑ t2 2 → t1 1

Dom(C1 ) = Dom(C2 )
∀x ∈ Dom(C1 ). K ⊢ C1 (x) ⊑ C2 (x)
K ⊢ C1 ⊑ C2

It now suffices to define polymorphic functional types of the form ∀α1 . . . αn [K].
π
{Cpre } τ1 → τ2 {Cpost }. Intuitively, it states that for all type variables αi ’s which
satisfy the constraint set K, if the pre-condition is Cpre , then the corresponding
π
function works as typed τ1 → τ2 and modifies the condition to Cpost .
3.3

Typing rules

Our flow sensitive typing system depicted in Appendix A is a mixture and an
extension of a flow sensitive type system for while programs [4] and a flow insensitive polymorphic constraint typing for ML [8]. Type judgments for expressions
and statements take the form of K, π, Γ ⊢ {C1 } · {C2 }, with two kinds of
typing environments: C is a condition, a flow sensitive environment which memorizes the types of flow sensitive variables. Γ is the flow insensitive counterpart,

a partial mapping from variables to polymorphic types or mono-types, which
is for functions and flow insensitive variables annotated with security specifications. K is a set of constraints which are requirements between type variables
and level constants for the judgment. π is so-called “program counter” to denote the secrecy of program execution flow. Unlike the based type systems,
conditions and a program counter also appear in the judgment for expressions
K, π, Γ ⊢ {C1 } e : τ {C2 }, since we suppose function calls with side effects may
occur inside the expression e.
The core of flow sensitivity is the rule t-Asgn: types of flow sensitive vari′
ables tλ are modified after assignments, to those of assigned values tλ . Thus


errno can have different security levels at different point. Apart from this rule,
pre- and post-conditions must join correctly at each computation step. On the
other hand, assignments to flow insensitive variables with security specifications
are typed not by t-Asgn but by t-AsgnInsens. This is very similar to the
classical flow insensitive typings: the type of the variable at an assignment must
be equal to the assigned value and is never modified.
The type of a function is also flow insensitive, therefore it is also bound in Γ ,
with a polymorphic type. Its polymorphic type is instantiated (t-Instantiate)
then applied (t-FunCall) for each application independently, in order to achieve
polymorphism. The type instantiation S must be meaningful: S(K) must be satisfiable (|= S(K)) and must not contain contradictive constraints like H ⊑ L.
The judgment for declarations has a form of K0 , Γ, C0 ⊢ d. Since all the
definitions are declared at the top-level once and for all, we have no notion of
the program counter nor pre- and post-conditions but the initial condition C0 .
The global constraints K0 are the constraints which must hold throughout the
program and be satisfiable.
t-FunDecl is to type a function declaration. Function body s; return e;
must be typed under a constraint set K0 + Kα and the pre-condition extended
for the function argument x. The free variables α introduced in the typing of body
are the targets of type generalization, and the extended part of the constraint
set Kα must relate with the generalized variables α. In the polymorphic type

of the function, these generalized variables quantify the mono-type, pre- and
post-conditions and the extended part of the constraint Kα .

4

Type inference and implementation issues

The type inference algorithm is almost automatically obtained, using the typing
rules bottom-up, then checking satisfiability of constraints. A problem arises at
function declarations, since t-FunDecl uses polymorphic recursion, which requires complex inference. Currently our algorithm does not support polymorphic
recursion: a recursive function is typed monomorphic inside its body.
Our implementation based on the formalization types various errno examples
like one in Section 3.1 well. Sometimes programmers are forced to lower the
secrecy level of errno by inserting reset assignments like errno = 0, but it is
easy and not comparable against the obtained information security.

5

Conclusion and future works

Flow sensitive typing is required for information flow analysis for C programs,
since they often use global variables such as errno in order to store states that
are just locally meaningful. We have formalized and implemented such a flow
sensitive polymorphic information flow typing system for C.
We leave a formal proof for soundness of the system as a future work. Frankly
speaking, we believe that it is not very difficult to show that since the proof in
[4] can be adjusted to one for our system.


Currently flow sensitivity is only permitted for pure integers, and pointers

are typed flow insensitively. Flow sensitivity for pointers is left as a future work,
which will require detailed pointer analysis as pointed out in [1].

References
1. David Clark, Chris Hankin, and Sebastian Hunt. Information flow for algol-like
languages. Computer Languages, 28(1), 2002.
2. Dorothy E. Denning. A lattice model of secure information flow. Commun. ACM,
19(5):236–243, 1976.
3. Martin Rinard et al. Enhancing server availability and security through failureoblivious computing, December 2004.
4. Sebastian Hunt and David Sands. On flow-sensitive security types. In Proc. Principles of Programming Languages, 33rd Annual ACM SIGPLAN - SIGACT Symposium (POPL’06), pages 79–90, Charleston, South Carolina, USA, January 2006.
ACM Press.
5. Andrew C. Myers. JFlow: Practical mostly-static information flow control. In
Symposium on Principles of Programming Languages, pages 228–241, 1999.
6. George C. Necula, Scott McPeak, and Westley Weimer. CCured: type-safe
retrofitting of legacy code. In Symposium on Principles of Programming Languages,
pages 128–139, 2002.
7. Yutaka Oiwa, Tatsurou Sekiguchi, Eijiro Sumii, and Akinori Yonezawa. Fail-safe
ANSI-C compiler: An approach to making C programs secure (progress report), Fev
2003.
8. Francois Pottier and Vincent Simonet. Information flow inference for ML. In Symposium on Principles of Programming Languages, pages 319–330, 2002.


A

Flow sensitive typing rules
K, π, Γ ⊢ {C} n : tλn {C} (t-Const)

Γ (f ) = ∀α[K]. {C1 } τ {C2 }

C(x) = τ or Γ (x) = τ

K, π, Γ ⊢ {C} x : τ {C}

|= S(K)

S(K), π, Γ ⊢ {S(C1 )} f : S(τ ) {S(C2 )}
(t-Instantiate)

(t-Var)

K, π, Γ ⊢ {C1 } e : τ {C2 }
π
K, π, Γ ⊢ {C2 } f : τ → τ ′ {C3 }
K, π, Γ ⊢ {C1 } f (e) : τ ′ {C3 }
(t-FunCall)

K, π, Γ ⊢ {C1 } e : τ {C2 } K ⊢ τ ⊑ τ ′
K, π, Γ ⊢ {C1 } s {C2 }
K ⊢ C1 ⊑ C1′ K ⊢ C2′ ⊑ C2 K ⊢ π ′ ⊑ π K ⊢ C1 ⊑ C1′ K ⊢ C2′ ⊑ C2 K ⊢ π ′ ⊑ π
K, π ′ , Γ ⊢ {C1′ } e : τ ′ {C2′ }
(t-SubExp)
K, π, Γ ⊢ {C} skip {C} (t-Skip)

K, π, Γ ⊢ {C1 } e : tλ {C2 }

K ⊢ π ⊑ λ C2 (x) = tλ C3 = C2 [x : tλ ]
K, π, Γ ⊢ {C1 } x := e {C3 }
(t-Asgn)
K, π, Γ ⊢ {C1 } e : tλ {C2 }

K ⊢ π′ ⊒ π ⊔ λ


K, π ′ , Γ ⊢ {C1′ } s {C2′ }
(t-SubStmt)
K, π, Γ ⊢ {Ci } si {Ci+1 }

i = 1, 2

K, π, Γ ⊢ {C1 } s1 ; s2 {C3 }
(t-Seq)
K, π, Γ ⊢ {C1 } e : tλ {C2 }
K⊢π⊑λ
Γ (x) = tλ
K, π, Γ ⊢ {C1 } x := e {C2 }
(t-AsgnInsens)
K, π ′ , Γ ⊢ {C2 } si {C3 }

i = 1, 2

K, π, Γ ⊢ {C1 } if e then s1 else s2 {C3 }
(t-Cond)
C(x) = tλn

|= K0

K0 , Γ, C ⊢ tn x = n;
(t-VarDecl)
Γ (x) = tℓn

|= K0


K0 , Γ, C ⊢ tℓn x = n;
(t-VarDeclInsens)

π



Γ (f ) = ∀α[Kα ].{C1 } tλ → t′λ {C3 }

π
α = FV ({C1 } tλ → t′λ {C3 }) \ (FV (Γ ) ∪ FV (C))
K0 + Kα , π, Γ ⊢ {C1 [x : tλ ]} s {C2 }

K0 + Kα , π, Γ ⊢ {C2 } e : t′λ {C3 }
|= K0 + Kα Dom(C1 ) ⊆ Dom(C)
∀k ∈ Kα . FV (k) ∩ α = ∅ ∀k ∈ K0 . FV (k) ∩ α = ∅
K0 , Γ, C ⊢ t′ f (t x) { s; return e; }
(t-FunDecl)



×