Let’s integrate slash do probability the 20th century way! Part 1.

Pre(r)amble

After have-a-dozen blog posts that went nowhere, which is something considering the drivel I have posted, I have something!

In a month and a half, I take my prelim exam in financial mathematics. For the more finance leaning part I’ve been reviewing Shreve’s Stochastic Calculus for Finance series. For the heavier math part I’ve been reviewing Øksendal’s Stochastic Differential Equations. Part of that prep has been doing as many exercises I can.

I’m going to present one of those exercises along with some requisite background. The requisites will be presented in a fairly glossy manner if only because you can (and people do) devote an entire class(es) to this subject.

Warning. Math involved. I’m trying to write as simply as possible without breaking down every single definition, so some mathematical maturity is required. But if something is still unclear, then in all likelihood I’ve done goofed and you should correct me (grammar and spelling too please!).

Probability

Probability is strange. Or really, the discourse and history of probability is strange. To see how strange, just look at one of the few paradoxes attributed to a Bertrand who was not Russell. Please, read it. It’s disturbing. The same problem (what percentage of darts will land in a small circle in the middle), solved three different ways, producing three different results. That’s unsettling.

If you hang around social/medical scientists or statisticians, after three drinks you’ll be sucked into a debate between the frequentism and bayesianism. This is not intellectual masturbation. It’s interesting if you have my perverse tastes and it has a major impact on academic publishing.

But these are all beyond the scope of what will already be a lengthy conversation.

At the end of the post, we should be able to define a random variable in a rigorous factor; one where we can start actually doing probability.

We don’t need to pick a side or argue. We rise above. Enter Kolmogorov. He defines a probability space as having three components and denoted by the triple $(\Omega,\mathcal{F},\mathbb{P})$.

$\Omega$ is simply a set of objects called the sample space. To be concrete if we wanted to talking about throwing a pair of dice throws, then $\Omega_1 = \{(1,1),(1,2),(1,3),\ldots,(5,5),(5,6),(6,6)\}$; i.e., $\Omega_1$ is the set of all possible dice throws. Or if we were talking about infinite series of coin tosses then $\Omega_2=\{x_1 x_2 x_3 \cdots\; x_i \in \{H,T\}\}$. Thus the sequence of all heads $HHHH\cdots$ is in $\Omega_2$, as is the sequence of alternating heads and tails, $HTHTHTHT\cdots$, is also in $\Omega_2$. As is the sequence where the $n$-th entry is heads if $n$ is prime and tails others. We can also have a more mundane space like $\Omega_3 =\mathbb{R}$.

$\mathcal{F}$ is called a $\sigma$-algebra over $\Omega$. That is, $\mathbb{F}$ is family of subsets of $\Omega$; i.e., if $A \in \mathcal{F}$ then $A \subseteq \Omega$. It’s not very difficult to be a $\sigma$-algebra,  but to be one $\mathbb{F}$ has to have certain properties. (1) $\emptyset \in \mathbb{F}$, (2) $\Omega \in \mathcal{F}$, (3) if $E \in \mathcal{F}$ then $E^c \in \mathcal{F}$., and (4) if $A_1,A_2,A_3,\ldots \in \mathcal{F}$ then $\cup_{i =1}^\infty A_i \in \mathbb{F}$. We call an element $A$ in $\mathcal{F}$ an event. We should also note there is no requirement that any subset of $\Omega$ is an event.

Harkening back to our previously defined sample spaces, $\{(i,j) \in \Omega_1 \}; i+j=7 \}$ is an event in $\Omega_1$; i.e, the event where the dice sum to 7. And $\{ Hx_2x_3x_4; x_i \in \{H,T\}\}$ is the set of all sequences where the first toss comes up heads, which is an event in $\Omega_2$.

If we have $\Omega = \mathbb{R}$ then we usually consider the Borel $\sigma$-algebra, which we construct by saying we want all the open sets of $\mathbb{R}$ to be in $\mathcal{F}$ (and consequently all closed sets are also in $\mathcal{F}$), along with all unions compliments, etc. We usually denote the Borel $\sigma$-algebra by $\mathcal{B}$. The way we construct the $\mathcal{B}$, it is the “smallest” $\sigma$-algebra that contains all the open sets of $\mathbb{R}$. Note, we can construct a subset $A$ of the real numbers that is not in $\mathcal{B}$. So there are subsets of $\mathbb{R}$ that are not events. We can construct them, but usually they’re so odd that generally if you’re think of a subset of the reals, what’s in your head is also in $\mathcal{B}$. We will talk more about $\mathcal{B}$ when we actually get to  random variables.

Now $\mathbb{P}$ is a function from $\mathcal{F}$ to the interval $[0,1]$. For each event $A$, $\mathbb{P}(E)$ assigns a value between 0 and 1. In addition, $\mathbb{P}$ must satisfy three properties: $\mathbb{P}(\emptyset)=0$, $\mathbb{P}(\Omega)=1$, and if $A_1,A_2,\ldots$ are mutually disjoint ($A_i \cap A_j = \emptyset$ when $i \neq j$), then $\mathbb{P}(\cup_{i=1}^\infty A_i)= \sum_{i=1}^\infty \mathbb{P}(A_i)$. This is an intuitive set of requirements: the probability of nothing happening is 0, the probability of something happen is 1, and the probability of a collection of disjoint events is the sum of the probability of each event.

Going back to our coin toss example, ($\Omega_2$), it makes sense that the we can assign the probability of the event where the first toss is heads is 1/2 and the probability of the event of all tosses where the first is tails is also 1/2.  We can also assign the probably of the event where the sequences start $HH, HT, TH$, and $TT$ as 1/4. Where the first three tosses are $HHH,HHT,HTH,HTT,THH,THT,TTH$, and $TTT$ is 1/8, and so on and so forth. By this train of assignments, it’s not hard to show $\mathbb{P} ( \{HHHHHHH \cdots \})=0$. So there are non-empty events that have probability (or measure if we get ahead of ourselves) of 0.

This is technical, but it sidesteps a lot of non-technical arguments. What’s the probability of something happening? It’s simply the number which $\mathbb{P}$ assigns to an event $A \in \mathcal{F}$. It’s boring, but we can work with it, and not have to talk about the frequency of an infinite number of events, estimates and beliefs, and it even solves the above linked paradox. In all three cases, the set of events is different, so of course there should be different answers. The downside is we have to choose what we call events and choose the probabilities assigned to them. There’s no reason to think our choice and assignments are a natural or even appropriate. Kolmogorov doesn’t debate, he just calculates numbers and pushes symbols. He’s dull like that.

Finally,  we can define a random variable. It’s simply a function $X$ from our sample space $\Omega$ to the real numbers $\mathbb{R}$. We can actually be more general and say $X:\Omega \to \mathbb{R}^n$ but we won’t need to think about multiple dimensions for our purposes. We also need $X$ to be $\mathcal{F}$-measurable. That means if we take a set $B$ from the our Borel $\sigma$-algebra $\mathcal{B}$, then the set $\{ \omega \in \Omega: X(\omega) \in B\}$ is an element of $\mathcal{F}$. Or more succinctly $\{X^{-1}(B) \} \in \mathcal{F}$ for all Borel sets $B \in \mathcal{B}$

Next for Me

Now we know the modern formulation of probability, and have defined random variables in a way that avoids any pitfalls. In the next post we will show how to basic things like to the expected value of a continuous random variable (I don’t care about discrete random variables because of a nifty theorem I’ll talk about in the next post). To do this, we will introduce Lebesgue integration, which is an extension to the integration you learned in high school. To justify this complexity we will have to see where the integration taught at the high school/undergraduate level, Riemann–Stieltjes and Darboux integration, breaks down.

Next for You

I don’t know. It’s Friday. Maybe grab a drink. See some shitty pop-punk band. Do key bumps in the restroom until you get kicked out. Yell, “Well fuck this place. I don’t even want to be hear.” Go home. Netflix. More blow. Complain about the plot holes in The Flash to everyone around the mirrored table. Gum the leftovers your amateur friend couldn’t snort off the mirror. It’s a wide open world.

Postscript

As always, corrections and suggestions are appreciated. And as always “Max, stop being a [racist/sexist/homophobic] epithet,” is not a useful comment or suggestion.
Cheers,
-M