Sheet 1

Prof. Leif Döring, Felix Benning
course: Wahrscheinlichkeitstheorie 1semester: FSS 2022tutorialDate: 21.02.2022dueDate: 10:15 in the exercise on Monday 21.02.2022
Exercise 1 (Properties of Conditional Expectation).

Let X and Y be two integrable random variables, 𝒢 σ-Algebras.

  1. (i)

    Prove 𝔼[𝔼[X|]]=𝔼[X] and 𝔼[1|]=1 a.s.

    Solution.

    Using the second defining property of conditional expectation with the set Ω we have

    𝔼[𝔼[X|]]=𝔼[𝟙Ω𝔼[X|]]=𝔼[𝟙ΩX]=𝔼[X].

    For the second claim we simply check the definition of conditional expectation for the constant function 1. Since constants are measureable with regard to any σ-algebra, the first requirement is a given, the second one is trivial. ∎

  2. (ii)

    Prove 𝔼[𝔼[X|]|𝒢]=𝔼[𝔼[X|𝒢]|]=𝔼[X|𝒢] a.s.

    Solution.

    Since 𝔼[X|𝒢] is by definition 𝒢 measurable and we have 𝒢, it is also measurable. We therefore have

    𝔼[𝔼[X|𝒢]|]=linearity𝔼[X|𝒢]𝔼[1|]=𝔼[X|𝒢].

    To show that 𝔼[𝔼[X|]|𝒢]=𝔼[X|𝒢], we will show that 𝔼[X|𝒢] fulfills the defining requirements. 𝒢 measurability is obvious, what is left to show is that for any A𝒢 we have

    𝔼[𝟙A𝔼[X|]] =A𝔼[𝟙AX]=A𝒢𝔼[𝟙A𝔼[X|𝒢]].
  3. (iii)

    Assume that X=Y almost surely. Prove that, almost surely, 𝔼[X|Y]=Y and 𝔼[Y|X]=X.

    Solution.

    Due to symmetry it is sufficient to prove 𝔼[X|Y]=Y. To show this we simply check the defining properties of 𝔼[X|Y]. First, Y is σ(Y) measurable by definition, and second, we have for all Aσ(Y)

    𝔼[X𝟙A] =𝔼[Y𝟙A].
Exercise 2.

The following questions are independent.

  1. (i)

    Let X and Y be i.i.d. Bernoulli variables: (X=1)=1-(X=0)=p for some p(0,1). We set Z𝟏{X+Y=0}. Compute 𝔼[X|Z] and 𝔼[Y|Z]. Are these random variables independent?

    Solution.

    First, due to Z=𝟙{X+Y=0}{0,1} we can write

    𝔼[X|Z] =𝔼[X|Z=0]𝟙Z=0+𝔼[X|Z=1]𝟙Z=1
    =𝔼[X𝟙Z=0](Z=0)𝟙Z=0+𝔼[X𝟙Z=1](Z=1)=0𝟙Z=1

    as X=0 for Z=1. So we only need to think about the first term. First, we have

    P(Z=0) =1-(Z=1)=1-(X=0)(Z=0)(1-p)2
    =2p-p2=p(2-p),

    and due to X𝟙Z=0=X we get

    𝔼[X𝟙Z=0]=𝔼[X]=𝔼[𝟙X=1]=p.

    Collecting all three results together, results in

    𝔼[X|Z] =12-p𝟙Z=0.

    Due to symmetry we get the same result for 𝔼[Y|Z] which implies 𝔼[X|Z]=𝔼[Y|Z] a.s. In particular they are not independent! ∎

  2. (ii)

    Let X be a square-integrable random variable and 𝒢 a sub-σ-algebra. We define the conditional variance

    Var(X|𝒢)𝔼[(X-𝔼[X|𝒢])2|𝒢].

    Prove the following identity:

    Var(X)=𝔼[Var(X|𝒢)]+Var(𝔼[X|𝒢]).
    Solution.

    Recall that we have Var(X)=𝔼[X2]-𝔼[X]2, the same holds true for the conditional variance:

    Var(X|𝒢) =𝔼[X2-2X𝔼[X|𝒢]+𝔼[X|𝒢]2|𝒢]
    =𝔼[X2|𝒢]-2𝔼[X𝔼[X|𝒢]|𝒢]+𝔼[𝔼[X|𝒢]2|𝒢]
    =𝔼[X2|𝒢]-𝔼[X|𝒢]2. (1)

    With those identities we can directly prove our claim by calculation

    𝔼[Var(X|𝒢)]+Var(𝔼[X|𝒢]) =𝔼[Var(X|𝒢)]+𝔼[𝔼[X|𝒢]2]-𝔼[𝔼[X|𝒢]]2
    =𝔼[Var(X|𝒢)+𝔼[X|𝒢]2]-𝔼[𝔼[X|𝒢]]2
    =(1)𝔼[𝔼[X2|𝒢]]-𝔼[𝔼[X|𝒢]]2
    =𝔼[X2]-𝔼[X]2
    =Var(X)
Exercise 3 (Factorization lemma).

(Important!) Let X,Y:(Ω,𝒜)(,()) be random variables. Show that, if X is (σ(Y),())-measurable, then there exists a measurable function g:(,())(,()), such that X=g(Y).

Solution.

First we consider simple functions, i.e.

X=k=1nak𝟙Ak

for ak0 and Ak𝒜 for all k=1,,n. Without loss of generality we can assume the Ak to be disjoint and akan for kn. This, together with the fact that X is supposed to be σ(Y) measurable, implies there exists some Bk() s.t. AkY-1(Bk) for all k, because we have

Ak=disjointX-1({ak})σ(Y)={Y-1(B):B()}.

Hence we have

X=k=1nak(𝟙BkY)𝟙Ak=(k=1nak𝟙Bkg)Y,

where g is measurable.

Now we just assume X0 is (σ(Y),())-measurable, which implies there exists a sequence (Xn)n of (σ(Y),())-measurable simple functions with XnX. Taking intersections of the indicator sets of Xn+1 and Xn implies that ΔXn=Xn-Xn-1 is a simple function too, which can be written as ΔXn=gn(Y). Since Xn converges we have

X=limnXn=limnk=1nΔXn=k=1gk(Y)=k=1gk=:gY

using X0=0 without loss of generality. And g is measurable as a limit of measurable functions. Note that g is well defined as k=1ngk with gk0 is monotonously increasing although it might be infinite. This is why we used the telescoping trick. Splitting X into X=X+-X- yields the general case. ∎

Exercise 4 (Best Estimators).

Let X be a real random variable.

  1. (i)

    Find the estimator m minimizing 𝔼[𝟙Xm] when X is discrete. What is the best estimator for a dice roll using this loss?

    Solution.

    The best estimator is the maximum likelihood estimator maxmP(X=m), as

    minm𝔼[𝟙Xm]=minm(Xm)=minm[1-P(X=m)]=1-maxmP(X=m).

    So for a fair dice, any of its faces {1,,6} is a minimizer. ∎

We define the median of X to be 𝕄[X]:=inf{m:f(m)>0}, where

f:{[-1,1]m(Xm)-(X>m)
  1. (ii)

    (Unimportant) Prove that (X𝕄[X])12 and (X𝕄[X])12, and show that for all m>𝕄[X] we do not have (Xm)12.

    Hint.

    Calculate the limits limm𝕄[X]f(m) and limm𝕄[X]f(m) using the continuity of measures. Also note that f(m)=2FX(m)-1, where FX is the cumulative distribution function of X.

    Solution.

    As f is right-continuous, since FX(m)=(Xm) is right-continuous, we have

    2(X𝕄[X])-1=f(𝕄[X])=limm𝕄[X]f(m)0

    and therefore (X𝕄[X])12. For the second claim we need to use continuity of measures directly

    2(X<𝕄[X])-1 =(X<𝕄[X])-(X𝕄[X])
    =(m<𝕄[X]{Xm})-(m<𝕄[X]{X>m})
    =limm𝕄[X](Xm)-(X>m)
    =limm𝕄[X]f(m)0

    Therefore we have (X<𝕄[X])12, which implies that (X𝕄[X])12.

    Lastly, for any m>𝕄[X], we can find a small ϵ>0 with m-ϵ>𝕄[X]. Therefore

    2(X<m)-12(Xm-ϵ)-1=f(m-ϵ)>0,

    which implies (X<m)>12 and therefore P(Xm)<12. ∎

    Remark.

    This property is usually the defining property of the median. If this property does not define the median uniquely, 𝕄[X] is the largest median. A lower bound can be found similarly.

  2. (iii)

    Prove that the median minimizes the L1 error, i.e. 𝕄[X]=argminm𝔼[|X-m|]. Whenever useful, you can assume continuity.

    Hint.

    Recall that for X0 a.s., we have 𝔼[X]=0(Xt)dt=0(X>t)dt. Use this fact to prove

    𝔼[(X-m)+]=m(X>m)dt.
    Solution.

    Recall that for X0 a.s., we have 𝔼[X]=0(Xt)dt=0(X>t)dt. Applying this fact to (X-m)+ we get

    𝔼[(X-m)+] =0((X-m)+>t)=(X-m>t)=(X>t-m)𝑑t
    =m(X>s)ds

    And similarly

    𝔼[(X-m)-] =0((X-m)-t)=(m-Xt)(t>0)=(m-tx)𝑑t
    =-m(sX)ds

    Put together we have

    𝔼[|X-m|] =m(X>s)ds+-m(Xs)ds.

    Assuming (X>s) and (Xx) are both continuous in s, we can use the fundamental theorem of calculus to obtain

    ddm𝔼[|X-m|]=(Xm)-(X>m)=f(m).

    Therefore we have

    𝔼[|X-m|]-𝔼[|X-𝕄[X]|]=𝕄[X]mf(t)𝑑t0

    as f is greater than zero for any m𝕄[X] and less than zero for any m𝕄[X] and sorting the integral borders results in another sign flip. Notice that f is strictly greater than zero for m𝕄[X] which implies that 𝕄[X] is the largest median, i.e. all numbers greater than 𝕄[X] do not minimize 𝔼[|X-m|].

    As mentioned one can similarly define a lower bound, and all numbers in-between fulfill the defining property of medians of (Xm)12 and (Xm)12. ∎

    Remark.

    In the non-continuous case, one still has right-continuity of f and can prove right-differentiability of 𝔼[|X-m|]. This might be sufficient to prove it is monotonously increasing after 𝕄[X] and similarly monotonously decreasing before. But the proof is likely going to be complex or require a strong analysis foundation. Alternatively, a short and basic (but unintuitive) proof of the general case can be found here: https://math.stackexchange.com/a/2790390/445105.