Mathematical Statistics Problem #1
motivation
Through outlets like #HANIC, the NFL’s Big Data Bowl, Pro Football Focus, and many others, I’ve been exposed to a lot of great sports analyses that utilize advanced statistical modeling. I want to get on that level, which means that I need to shore up my knowledge of statistical theory (I’m the type of person who needs to understand what’s happening “under the hood”). To that end, I’ve decided to work through some problems in mathematical statistics. I’m going to post my solutions to any problems that I find particularly interesting/tricky in order to:
a) give people opportunities to correct me if/when I make mistakes, and
b) keep me honest in trekking through this material
So without further ado, let’s check out a problem:
problem statement
If \(Y\) is a discrete random variable that assigns positive probabilities to only the positive integers, show that \[E(Y) = \sum_{k=1}^{\infty} P(Y \geq k).\]
> solution
We’ll start by getting a better idea of \(Y\)’s probability distribution. From the problem statement, we can see that the support of the distribution is the set of all integers (\(\mathcal{D}_Y = \mathbb{Z}\)), but the distribution only takes positive values at positive integers, so we can explicitly define the probability mass function (pmf) as: \[ P(Y=y) = p_Y(y) = \begin{cases} p_y,\textrm{ for } y \in \mathbb{Z^+} \\ 0, \textrm{ for } y \in \mathbb{Z^-}\cup\{0\} \end{cases} \]
We can visually represent the pmf using a table:
Moving on, let’s first look at the left side of the equation.
The expected value of a discrete random variable, \(Y\), is defined as \(E[Y] = \sum\limits_{\mathcal{D}_Y} y \cdot p(y)\).
So, for \(Y\):
\[
\begin{aligned}
E[Y] &= \sum\limits_{i \in \mathcal{Z}} i \cdot p_i \\
&= \dots + (-2)(0) + (-1)(0) + (0)(0) + (1)(p_1) + (2)(p_2) + (3)(p_3) + (4)(p_4) + \dots \\
&= \sum\limits_{j \in \mathcal{Z^-} \cup \{0\}} j \cdot 0 + \sum\limits_{k \in \mathcal{Z^+}} k \cdot p_k \\
&= \sum\limits_{k = 1}^{\infty} k \cdot p_k \\
\end{aligned}
\]
Now, let’s pivot to the right side of the equation.
\(\sum_\limits{k=1}^{\infty} P(Y \geq k)\) is a confusing term, so let’s break it out:
\[
\begin{align*}
\sum_{k=1}^{\infty} P(Y \geq k) &= P(Y \geq 1) + P(Y \geq 2) + P(Y \geq 3) + P(Y \geq 4) + \dots \\
&= [P(Y=1) + P(Y=2) + P(Y=3) + P(Y=4) + \dots] + [P(Y=2) + P(Y=3) + P(Y=4) + \dots] + [P(Y=3) + P(Y=4) + \dots] + [P(Y=4) + \dots]\\
&= [p_1 + p_2 + p_3 + p_4 + \dots] + [p_2 + p_3 + p_4 + \dots] + [p_3 + p_4 + \dots] + [p_4 + \dots] \\
&= (1)(p_1) + (2)(p_2) + (3)(p_3) + (4)(p_4) + \dots \\
&= \sum\limits_{k = 1}^{\infty} k \cdot p_k
\end{align*}
\]
We find that both sides of the equation simplify to the same summation, thus proving equivalence.