In this article probability theory and random variables are
introduced. If more information is required, the book "Probability and Random
Processes" by Grimmett & Stirzaker (1982)
is a good place to start.
Suppose a needle is dropped onto a floor made up of planks of wood.
The needle may or may not intersect one of the joints between the planks. A
single throw of the needle is called an experiment or trial.
There are two possible outcomes. Either the needle intersects a joint(s)
or it lands between the joints. By repeating the experiment a large number of
times, the probability P of a particular outcome or event can
be calculated. Let A be the event "needle intersects a joint". Let N(A)
be the number of occurrences of A over n trials. As n tends
to infinity, N(A)/n converges to the probability that A
occurs, P(A), on any particular trial. On occasions, a
probability can be assigned to an outcome without experiment.
Buffon (1777) found this to be the case for the needle throwing
experiment. The set of all possible outcomes of an experiment is called the sample
space and is denoted by W. A random variable
is a function X : W ®
Â.
Uppercase letters will be used to represent generic random
variables, whilst lowercase letters will be used to represent possible
numerical values of these variables. To describe the probability of possible
values of X, consider the following definition. The distribution function
of a random variable X is the function FX :
 ® [0, 1] given by FX(x)
= P(X £ x).
Discrete random variables
The random variable X is discrete if it takes values
in some countable subset {x1, x2, …}, only,
of Â. The distribution function of such a random
variable has jump discontinuities at the values x1, x2,
… and is constant in between. The function fX :
 ® [0, 1] given by fX(x)
= P(X = x) is called the (probability) mass function
of X. The mean value, or expectation, or expected value
of X with mass function fX, is defined to be
(1)
The expected value of X is often written as m.
It is often of great interest to measure the extent to which a
random variable X is dispersed. The variance of X or Var(X)
is defined as follows:
(2)
The variance of X is often written as s2,
while its positive square root is called the standard deviation. Since X
is discrete, (2) can be re-expressed accordingly:
(3)
In the special case where the mass function fX(x)
is constant and X takes n real values, (3) reduces to a well
known equation determining the variance of a set of n numbers:
(4)
Events A and B are said to be independent if
and only if the incidence of A does not change the probability of B
occurring. An equivalent statement is P(A Ç
B) = P(A)P(B). Similarly, the discrete
random variables X and Y are called independent if the
numerical value of X does not affect the distribution of Y. In
other words, the events {X = x} and {Y = y} are
independent for all x and y. The joint distribution function
FX, Y : Â2
® [0, 1] of X and Y is given by FX,
Y(x, y) = P(X £
x and Y £ y). Their joint mass function fX, Y
: Â2 ® [0, 1]
is given by fX, Y(x, y) = P(X
= x and Y = y). X and Y are independent if
and only if fX, Y(x, y) = fX(x)fY(y)
for all x, y Î Â.
Consider an archer, shooting arrows at the target shown in Figure
1. Suppose the archer is a very poor shot and hits the target randomly - in
other words, target regions of equal area will have the same probability of
being hit. For simplicity, it is assumed the archer always hits the target. If
the archer is allowed to fire two arrows, the sample space
W = { AA, AB, AC, AD, AE, BA, BB, …,
DD, DE, EA, EB, EC, ED, EE }.

Figure 1: An archery target. A hit in region A scores 4
points, B scores 3 points, C scores 2 points, D scores 1 point and E scores
nothing.
Let the variable X(w) represent
the score of a particular outcome. The scoring guidelines outlined in Figure 1
imply
X(AA) = 8, X(AB) = X(BA) = 7, X(AC) = X(BB)
= X(CA) = 6, …,
X(CE) = X(DD) = X(EC) = 2, X(DE) = X(ED) =
1, X(EE) = 0.
Clearly X is a discrete random variable, mapping the sample
space W to scores (real numbers).
The probability that an arrow hits a target region is directly
proportional to the area of the region. The regions A to E are annuli with
inner and outer radii as shown in Figure 1. The probabilities of hitting A to E
are 1/25, 3/25, 5/25, 7/25 and 9/25 respectively. The mass function of X,
fX(x), is then
fX(0) = P(X = 0) = P(Hit E)P(Hit
E) = 81/625,
fX(1) = P(X = 1) = 2 . P(Hit D)P(Hit
E) = 126/625,
fX(2) = P(X = 2) = 2 . P(Hit C)P(Hit
E) + P(Hit D)P(Hit D) = 139/625 and so on.
From (1), the expected value of X is E(X) =
0.81/625 + 1.126/625 + 2.139/625 + 3.124/625 + … = 2.4. From (3), the variance
of X is Var(X) = (2.42)×81/625
+ (1.42)×126/625 + (0.42)×139/625
+ (0.62)×124/625 + … = 2.72. The
distribution function of X, FX(x), is then
FX(0) = P(X £
0) = fX(0),
FX(1) = P(X £ 1) = fX(1)
+ fX(0),
FX(2) = P(X £ 2) = fX(2)
+ fX(1) + fX(0) and so on.
The distribution function FX(x) is shown
in Figure 2.

Figure 2: The distribution function FX of X
for the archery target.
Continuous random variables
The random variable X is continuous if its
distribution function can be expressed as
(5)
for some integrable function fX :
 ® [0, ¥). In this case, fX
is called the (probability) density function of X. The
fundamental theorem of calculus and (5) imply

f(x)dx can be
thought of as the element of probability P(x £
X £ x + dx)
where
(6)
If B1 is a measurable subset of Â
(such as a line segment or union of line segments) then
(7)
where P(X Î B1)
is the probability that the outcome of this random choice lies in B1.
The expected value (or expectation) of X with density function fX
is
(8)
whenever this integral exists. The variance of X or
Var(X) is defined by the already familiar (2). Since X is
continuous, (2) can be re-expressed accordingly:
(9)
The joint distribution function of the continuous random
variables X and Y is the function FX, Y :
Â2 ® [0, 1] given by FX, Y(x,
y) = P(X £ x, Y
£ y). X and Y are (jointly) continuous
with joint (probability) density function fX, Y
: Â2 ® [0,
¥) if

for each x, y Î
Â. The fundamental theorem of calculus suggests the following
result

fX, Y(x, y)dxdy
can be thought of as the element of probability P(x
£ X £ x + dx,
y £ Y £ y
+ dy) where
(10)
If B2 is a measurable subset of Â2
(such as a rectangle or union of rectangles and so on) then
(11)
where P((X, Y) Î B2)
is the probability that the outcome of this random choice lies in B2.
X and Y are independent if and only if {X
£ x} and {Y £ y} are
independent events for all x, y Î
Â. If X and Y are independent, FX, Y(x,
y) = FX(x)FY(y) for
all x, y Î Â.
An equivalent condition is fX, Y(x, y) = fX(x)fY(y)
whenever FX, Y is differentiable at (x, y).
An example of a continuous random variable can be found in the
needle throwing described earlier. A needle is thrown onto the floor and lands
with random angle w relative to some fixed axis. The
sample space W = [0, 2p).
The angle W is equally likely in the real interval
[0, 2p). Therefore, the probability that the angle
lies in some interval is directly proportional to the length of the interval.
Consider the continuous random variable X(w)
= w. The distribution function of X, shown
graphically in Figure 3, is
FX(0) = P(X £ 0)
= 0,
FX(x) = P(X £ x) = x/2p,
(0 £ x < 2p)
FX(2p) = P(X
£ 2p) = 1.
The density function, fX, of FX
is as follows:


Figure 3: The distribution function FX of X
for the needle.
References
BUFFON, G. L. L. Comte de. Essai d'Arithmétique
Morale. In: Supplément à l'Histoire Naturelle, v. 4. Paris: Imprimerie Royale
(1777).
GRIMMETT, G. and STIRZAKER, D. Probability and
Random Processes, Clarendon Press, Oxford (1982).