Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Chapter 7
Discrete Probability with R
Discrete Structures for Computer Science (CO1007) on
December 7th, 2015
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
Nguyen An Khuong, Huynh Tuong Nguyen
Faculty of Computer Science and Engineering
University of Technology, VNU-HCM
References
7.1
Contents
1 Randomness
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
2 Random sampling with R
3 Probability
4 Probability Rules
Contents
Randomness
Sampling with R
5 Probability calculations and combinatorics with R
6 Discrete Random variables
7 Some Discrete Probability Models
Geometric Model
Binomial Model
8 The built-in distributions in R
Densities
Cumulative distribution functions
Quantiles
Random numbers
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
9 References and Further Reading
7.2
Motivations
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
• Gambling
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
• Real life problems
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
• Computer Science: cryptology, coding theory, algorithmic
complexity,...
Densities
Cdf
Quantiles
Random numbers
References
7.3
Randomness
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Which of these are random phenomena?
• The number you receive when rolling a fair dice
• The sequence for lottery special prize (by law!)
• Your blood type (No!)
• You met the red light on the way to school
• The traffic light is not random. It has timer.
• The pattern of your riding is random.
So what is special about randomness?
In the long run, they are predictable and have relative frequency
(fraction of times that the event occurs over and over and over).
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.4
Randomness in Statistics
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
• Randomness and probability: central to statistics.
• Empirical fact: Most experiments and investigations are not
perfectly reproducible.
• The degree of irreproducibility may vary:
Contents
Randomness
Sampling with R
Probability
• Some experiments in physics may yield data that are accurate
Probability Rules
to many decimal places,
• whereas data on biological systems are typically much less
reliable.
Probability with R
• View of data as something coming from a statistical
distribution: vital to understanding statistical methods.
• We outline the basic ideas of probability and the functions
that R has for random sampling and handling of theoretical
distributions.
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.5
Random Numbers with R
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
• Much of the earliest work in probability theory was about
games and gambling issues, based on symmetry
considerations.
• The basic notion then is that of a random sample: dealing
from a well-shuffled pack of cards or picking numbered balls
from a well-stirred urn.
• In R, we can simulate these situations with the sample
function.
• If we want to pick five numbers at random from the set
1 : 40, then you can write
> sample(1:40,5)
[1] 4 30 28 40 13
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.6
Sample function
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
• The first argument (x) is a vector of values to be sampled
• The second (size) is the sample size.
• Actually, sample(40, 5) would suffice since a single number is
interpreted to represent the length of a sequence of integers.
• Notice that the default behavior of sample is sampling
without replacement.
• That is, the samples will not contain the same number twice,
and size obviously cannot be bigger than the length of the
vector to be sampled.
• If we want sampling with replacement, then we need to add
the argument replace = TRUE.
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.7
Sampling with replacement
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Contents
• Sampling with replacement is suitable for modelling coin
tosses or throws of a die.
• So, for instance, to simulate 10 coin tosses we could write
> sample(c("H","T"), 10, replace=T)
[1] "T" "T" "T" "T" "T" "H" "H" "T" "H" "T"
• In fair coin-tossing, the probability of heads should equal the
probability of tails, but the idea of a random event is not
restricted to symmetric cases.
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.8
Data with nonequal probabilities
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Contents
• You can simulate data with nonequal probabilities for the
outcomes (say, a 90% chance of success) by using the prob
argument to sample, as in
> sample(c("succ", "fail"), 10, replace=T,
prob=c(0.9, 0.1))
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
[1] "succ" "succ" "succ" "succ" "succ"
"fail" "succ" "succ" "succ" "fail"
• This may not be the best way to generate such a sample,
though. See the later discussion of the binomial distribution.
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.9
Terminology
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
• Experiment/trial (thí
nghiệm (ngẫu nhiên)/phép
thử ): a procedure that yields
one of a given set of possible
outcomes randomly.
• Tossing a coin to see the
face
• Rolling a die
• ...
• Sample space (không gian mẫu, Ω): set of all possible
outcomes
• {Head, Tail}
• {1, 2, 3, 4, 5, 6}
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
• Event (sự kiện): a subset of sample space.
• You see Head after an experiment. {Head} is an event.
• {1, 3, 5}
Random numbers
References
7.10
Example
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Example
Experiment: Rolling two dice. What is the sample space?
Answer: It depends on what we’re going to ask!
Contents
Randomness
Sampling with R
Probability
Probability Rules
• The total number?
{2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}
• The number of each die?
{(1,1), (1,2), (1,3), . . ., (6,6)}
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
Which is better?
The latter one, because they are equally likely outcomes
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.11
The Law of Large Numbers (LLN)
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Definition
Contents
The Law of Large Numbers (Luật số lớn) states that the long-run
relative frequency of repeated independent events gets closer and
closer to the true relative frequency as the number of trials
increases.
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Example
Do you believe that the true relative frequency of Head when you
toss a coin is 50%?
Let’s try!
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.12
Be Careful!
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Don’t misunderstand the Law of Large Numbers (LLN). It can
lead to money lost and poor business decisions.
Contents
Randomness
Example
I had 8 children, all of them are girls. Thanks to LLN (!?), there
are high possibility that the next one will be a boy.
(Overpopulation!!!)
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Example
I’m playing Bầu cua tôm cá, the fish has not appeared in recent 5
games, it will be more likely to be fish next game. Thus, I bet all
my money in fish. (Sorry, you lose!)
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.13
Discrete Probability with
R
Probability
Nguyen An Khuong,
Huynh Tuong Nguyen
Definition
The probability (xác suất) of an event E of a finite nonempty
sample space of equally likely outcomes Ω is:
Contents
Randomness
Sampling with R
|E|
p(E) =
.
|S|
Probability
Probability Rules
Probability with R
Discrete RVs
• Note that E ⊆ Ω so 0 ≤ |E| ≤ |Ω|
• 0 ≤ p(E) ≤ 1
• 0 indicates impossibility
• 1 indicates certainty
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
People often say: “It has a 20% probability”
Random numbers
References
7.14
Examples
Example (1)
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
What is the probability of getting a Head when tossing a coin?
Answer:
• There are |Ω| = 2 possible outcomes
• Getting a Head is |E| = 1 outcome, so
p(E) = 1/2 = 0.5 = 50%
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Example (2)
What is the probability of getting a 7 by rolling two dice?
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
Answer:
• Product rule: There are a total of 36 equally likely possible
outcomes
• There are six successful outcomes:
(1,6), (2,5), (3,4), (4,3), (5,2), (6,1)
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
• Thus, |E| = 6, |S| = 36, p(E) = 6/36 = 1/6
7.15
Examples
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Example (3)
We toss a coin 6 times. What is probability of H in 6th toss, if all
the previous 5 are T?
Answer:
Don’t be silly! Still 1/2.
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Example (4)
Which is more likely:
• Rolling an 8 when 2 dice are rolled?
• Rolling an 8 when 3 dice are rolled?
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Answer:
Two dice: 5/36 ≈ 0.139
Three dice: 21/216 ≈ 0.097
Cdf
Quantiles
Random numbers
References
7.16
Discrete Probability with
R
Formal Probability
Nguyen An Khuong,
Huynh Tuong Nguyen
Rule 1
A probability is a number between 0 and 1.
0 ≤ p(E) ≤ 1
Contents
Randomness
Sampling with R
Probability
Rule 2: Something has to happen rule
The probability of the set of all possible outcomes of a trial must
be 1.
p(Ω) = 1.
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
Rule 3: Complement Rule
The probability of an event occurring is 1 minus the probability
that it doesn’t occur.
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
p(E) = 1 − p(E).
7.17
Example (Birthday Problem)
Given a group of n < 365 students. We’ll ignore leap years and
assume that all birthdays are equally likely.
i) If we pick a specific day (say December 7th), then what is the
chance that at least one student was born on that day?
ii) What is the probability that at least one student has the
same birthday as any other student?
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Contents
Randomness
Sampling with R
Probability
Answer i).
Probability Rules
• The sample space is the set of all 365n possible choices of
birthdays for n individuals.
• p1 (n) = P (At least one student was born on December 7th)
= 1 − P (No students were born on December 7th)
=1−
364n
365n .
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
• We have p1 (30) ≈ 7.9%, and p2 (91) ≈ 21.8%.
• In order for the probability of at least one other person to
share your birthday to exceed 50%, we need n large enough
that
p1 (n) = 1 −
Probability with R
364n
365n
> 0.5, or n > 253.
Cdf
Quantiles
Random numbers
References
7.18
Discrete Probability with
R
Birthday Problem (cont’d)
Nguyen An Khuong,
Huynh Tuong Nguyen
Answer ii).
• p2 (n) = P (At least 1 same birthday)
= 1 − P (No same birthdays)
Contents
365×364×···×(365−n+1)
.
365n
1−
Randomness
x
• Using a first-order approximation for e for x
=e
Discrete RVs
×e
Some Discrete
Probability Models
× · · · × (1 −
× ··· × e
n−1
365 )
− n−1
365
1+2+···+(n−1)
365
n(n−1)
− 2×365
=e
Probability
Probability with R
we have
365×364×···×(365−n+1)
365n
1
2
= (1 − 365
) × (1 − 365
)
1
2
− 365
− 365
−
Sampling with R
Probability Rules
ex ≈ 1 + x,
≈e
1:
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
.
• So p2 (n) ≈ 1 − e
Random numbers
n(n−1)
− 2×365
.
References
• For n = 23, p2 (23) ≈ 0.507. (Surprisingly!)
7.19
p2 values
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
n
1
5
10
20
23
50
60
70
100
366
p2 (n)
0
2.7%
11.7%
41.1%
50.7%
97.0%
99.4%
99.9%
99.99997%
100%
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.20
Generalization and Variation of Birthday Problem
• More generally, suppose we have N objects, where N is large.
•
•
•
•
•
•
•
There are r people, and each chooses an object.
Then, similarly to above approximation,
r(r−1)
r2
p = P (there is a match) ≈ 1 − e− 2N ≈ 1 − e− 2N .
r2
≥ ln 2, or
Now if we want p ≥ 1/2, then we can choose 2N
√
r 1.177 N .
√
If there are N possibilities and we have a list of length N ,
then there is a good chance of a match: ≈ 40%.
If we want to increase the chance√of a match, we can make a
list of length of a constant times N .
As a variation, suppose there are N objects and there are
two groups of r people. Each person from each group selects
an object. What is the probability that someone from the first
group choose the same object as someone from the second
group?
r2
P (there is a match between two groups) = 1 − e− N . (Rather
difficult!)
Eg. If we take N = 365 and r = 30, then
2
P (there is a match between two groups) = 1 − e−30 /365 =
0.915.
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.21
Discrete Probability with
R
A birthday attack on discrete logarithm
Nguyen An Khuong,
Huynh Tuong Nguyen
Contents
• We want to solve αx ≡ β (mod p).
• Make two lists, both of length around
Randomness
√
Sampling with R
p:
k
• 1st list: α (mod p) for random k.
• 2nd list: βα−h (mod p) for random h.
• There is a good chance that there is a match:
αk ≡ βα−h (mod p).
• Hence, x = h + k.
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.22
Formal Probability
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
Contents
General Addition Rule
Randomness
Sampling with R
Probability
p(E1 ∪ E2 ) = p(E1 ) + p(E2 ) − p(E1 ∩ E2 )
Probability Rules
Probability with R
Discrete RVs
• If E1 ∩ E2 = ∅: They are disjoint, which means they can’t
occur together
• then, p(E1 ∪ E2 ) = p(E1 ) + p(E2 )
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.23
Discrete Probability with
R
Example
Nguyen An Khuong,
Huynh Tuong Nguyen
Example (1)
If you choose a number between 1 and 100, what is the probability
that it is divisible by either 2 or 5?
Contents
Randomness
Sampling with R
Short Answer:
20
10
50
100 + 100 − 100 =
Probability
3
5
Probability Rules
Probability with R
Discrete RVs
Example (2)
There are a survey that about 45% of VN population has Type O
blood, 40% type A, 11% type B and the rest type AB. What is the
probability that a blood donor has Type A or Type B?
Short Answer:
40% + 11% = 51%
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.24
Conditional Probability (Xác suất có điều kiện)
Discrete Probability with
R
Nguyen An Khuong,
Huynh Tuong Nguyen
• “Knowledge” changes probabilities
Contents
Randomness
Sampling with R
Probability
Probability Rules
Probability with R
Discrete RVs
Some Discrete
Probability Models
Geometric Model
Binomial Model
The built-in
distributions in R
Densities
Cdf
Quantiles
Random numbers
References
7.25