Convergences et théorèmes limites

Contenu

Convergences et théorèmes limites#

La loi des grands nombres et le théorème central limite sont les deux piliers de la théorie des probabilités.

Andreï Kolmogorov

Introduction#

Que se passe-t-il quand on répète une expérience aléatoire un grand nombre de fois ? La loi des grands nombres dit que la moyenne empirique se stabilise autour de l’espérance. Le théorème central limite dit que les fluctuations suivent une loi normale. Ces deux théorèmes fondamentaux justifient l’utilisation des probabilités en statistique et dans les sciences, et expliquent l’omniprésence de la gaussienne dans la nature.

_images/bda716baf8a4ac56af325a63bcfe6d32abfb5d8952552786eeebd66bf0f18655.png

Modes de convergence#

Définition 292 (Convergence presque sûre)

\((X_n)\) converge presque sûrement (p.s.) vers \(X\) si

\[\mathbb{P}\bigl(\{\omega : X_n(\omega) \to X(\omega)\}\bigr) = 1\]

On note \(X_n \xrightarrow{\text{p.s.}} X\).

Définition 293 (Convergence en probabilité)

\((X_n)\) converge en probabilité vers \(X\) si

\[\forall \varepsilon > 0, \quad \mathbb{P}(|X_n - X| > \varepsilon) \to 0\]

On note \(X_n \xrightarrow{\mathbb{P}} X\).

Définition 294 (Convergence en loi)

\((X_n)\) converge en loi (ou en distribution) vers \(X\) si \(F_{X_n}(x) \to F_X(x)\) en tout point \(x\) de continuité de \(F_X\). On note \(X_n \xrightarrow{\mathcal{L}} X\).

Définition 295 (Convergence dans \(L^p\))

\((X_n)\) converge dans \(L^p\) (\(p \geq 1\)) vers \(X\) si \(\mathbb{E}[|X_n - X|^p] \to 0\). La convergence dans \(L^2\) est la convergence en moyenne quadratique.

Théorème 79 (Hiérarchie des convergences)

\[\text{p.s.} \implies \text{en probabilité} \implies \text{en loi}\]

\[L^p \implies \text{en probabilité} \implies \text{en loi}\]

Aucune autre implication générale n’est vraie.

Proof. p.s. \(\implies\) en probabilité : Soit \(\varepsilon > 0\) et \(A_n = \{|X_n - X| > \varepsilon\}\). \(X_n \to X\) p.s. signifie \(\mathbb{P}(\limsup A_n) = 0\), donc \(\mathbb{P}\bigl(\bigcup_{k \geq n} A_k\bigr) \to 0\) (suite décroissante). Comme \(A_n \subset \bigcup_{k \geq n} A_k\), on a \(\mathbb{P}(A_n) \to 0\).

En probabilité \(\implies\) en loi : Soit \(x\) point de continuité de \(F_X\) et \(\varepsilon > 0\). \(\{X \leq x\} \subset \{X_n \leq x + \varepsilon\} \cup \{|X_n - X| > \varepsilon\}\), donc \(F_X(x) \leq F_{X_n}(x+\varepsilon) + \mathbb{P}(|X_n-X| > \varepsilon)\). De même \(F_{X_n}(x-\varepsilon) \leq F_X(x) + \mathbb{P}(|X_n-X| > \varepsilon)\). En faisant \(n \to \infty\) puis \(\varepsilon \to 0\) : \(F_{X_n}(x) \to F_X(x)\).

\(L^p \implies\) en probabilité : Par Markov : \(\mathbb{P}(|X_n-X| > \varepsilon) = \mathbb{P}(|X_n-X|^p > \varepsilon^p) \leq \mathbb{E}[|X_n-X|^p]/\varepsilon^p \to 0\).

Remarque 140

La convergence en loi est la plus faible : elle ne porte pas sur les variables elles-mêmes mais sur leurs lois. Les variables \(X_n\) et \(X\) n’ont pas besoin d’être sur le même espace de probabilité.
Contre-exemple p.s. \(\not\leftarrow\) en probabilité : Sur \([0,1]\), les fonctions indicatrices de \([k/2^n, (k+1)/2^n]\) convergent en probabilité vers 0 mais pas p.s.
Contre-exemple \(L^2 \not\leftarrow\) p.s. : \(X_n = n\mathbf{1}_{[0,1/n]}\) converge p.s. vers 0 mais \(\mathbb{E}[X_n] = 1 \not\to 0\).

Loi des grands nombres#

Théorème 80 (Loi faible des grands nombres)

Soit \((X_n)\) i.i.d. avec \(\mathbb{E}[X_1] = \mu\) et \(\text{Var}(X_1) = \sigma^2 < +\infty\). La moyenne empirique

\[\bar{X}_n = \frac{1}{n}\sum_{k=1}^n X_k \xrightarrow{\mathbb{P}} \mu\]

Proof. \(\mathbb{E}[\bar{X}_n] = \mu\) (linéarité). \(\text{Var}(\bar{X}_n) = \sigma^2/n\) (indépendance). Par Tchebychev :

\[\mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) \leq \frac{\text{Var}(\bar{X}_n)}{\varepsilon^2} = \frac{\sigma^2}{n\varepsilon^2} \to 0\]

Théorème 81 (Loi forte des grands nombres (Kolmogorov))

Si \((X_n)\) i.i.d. avec \(\mathbb{E}[|X_1|] < +\infty\) et \(\mathbb{E}[X_1] = \mu\), alors :

\[\bar{X}_n \xrightarrow{\text{p.s.}} \mu\]

Proof. La preuve complète (Kolmogorov, 1930) nécessite des outils plus avancés. Le cas \(\mathbb{E}[X_1^4] < +\infty\) s’obtient par Markov appliqué à \((\bar{X}_n - \mu)^4\) :

\[\mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) \leq \frac{\mathbb{E}[(\bar{X}_n-\mu)^4]}{\varepsilon^4}\]

Un calcul montre \(\mathbb{E}[(\bar{X}_n-\mu)^4] = O(1/n^2)\). Donc \(\sum_n \mathbb{P}(|\bar{X}_n - \mu| > \varepsilon) < +\infty\), et par Borel-Cantelli, \(\mathbb{P}(|\bar{X}_n - \mu| > \varepsilon \text{ i.o.}) = 0\).

Remarque 141

La loi forte est plus puissante que la loi faible : pour presque tout \(\omega\), la suite \((\bar{X}_n(\omega))\) converge vers \(\mu\). C’est la justification mathématique de l’interprétation fréquentiste : la fréquence d’un événement converge vers sa probabilité.

Théorème central limite#

Théorème 82 (Théorème central limite (TCL))

Soit \((X_n)\) i.i.d. avec \(\mathbb{E}[X_1] = \mu\) et \(\text{Var}(X_1) = \sigma^2 \in \,]0, +\infty[\). Alors

\[\frac{\bar{X}_n - \mu}{\sigma/\sqrt{n}} = \frac{S_n - n\mu}{\sigma\sqrt{n}} \xrightarrow{\mathcal{L}} \mathcal{N}(0,1)\]

où \(S_n = X_1 + \cdots + X_n\). Autrement dit, pour tout \(x \in \mathbb{R}\) :

\[\mathbb{P}\!\left(\frac{S_n - n\mu}{\sigma\sqrt{n}} \leq x\right) \to \Phi(x) = \frac{1}{\sqrt{2\pi}}\int_{-\infty}^x e^{-t^2/2} dt\]

Proof. Via les fonctions caractéristiques (preuve de Lévy).

Soit \(Y_k = (X_k - \mu)/\sigma\), centrée réduite. Notons \(Z_n = \frac{1}{\sqrt{n}}\sum_{k=1}^n Y_k\).

La fonction caractéristique \(\phi\) de \(Y_k\) vérifie \(\phi(0) = 1\), \(\phi'(0) = i\mathbb{E}[Y_k] = 0\), \(\phi''(0) = i^2\mathbb{E}[Y_k^2] = -1\).

Par indépendance : \(\phi_{Z_n}(t) = \phi(t/\sqrt{n})^n\).

Par développement de Taylor à l’ordre 2 en \(t/\sqrt{n}\) :

\[\phi(t/\sqrt{n}) = 1 - \frac{t^2}{2n} + o(t^2/n)\]

Donc \(\phi_{Z_n}(t) = \left(1 - \frac{t^2}{2n} + o(1/n)\right)^n \xrightarrow{n \to \infty} e^{-t^2/2}\).

Or \(e^{-t^2/2}\) est la fonction caractéristique de \(\mathcal{N}(0,1)\). Par le théorème de continuité de Lévy (convergence des fonctions caractéristiques \(\iff\) convergence en loi), \(Z_n \xrightarrow{\mathcal{L}} \mathcal{N}(0,1)\).

Remarque 142

Le TCL est universel : il ne dépend pas de la loi des \(X_i\) (pourvu qu’elle ait une variance finie). C’est pourquoi la loi normale apparaît partout : dès qu’une quantité est la somme de nombreux petits effets indépendants, elle est approximativement gaussienne (erreurs de mesure, fluctuations thermiques, scores de QI, etc.).

Exemple 157

Approximation normale de la binomiale. Si \(X \sim \mathcal{B}(n, p)\) :

\[\frac{X - np}{\sqrt{np(1-p)}} \xrightarrow{\mathcal{L}} \mathcal{N}(0,1)\]

Règle pratique : l’approximation est bonne si \(np \geq 5\) et \(n(1-p) \geq 5\).

Intervalle de fluctuation. \(\bar{X}_n \approx \mathcal{N}(\mu, \sigma^2/n)\), donc

\[\mathbb{P}\!\left(\mu - 1{,}96\frac{\sigma}{\sqrt{n}} \leq \bar{X}_n \leq \mu + 1{,}96\frac{\sigma}{\sqrt{n}}\right) \approx 0{,}95\]

C’est la base des intervalles de confiance (chapitre suivant).

Lemme de Borel-Cantelli#

Théorème 83 (Lemme de Borel-Cantelli)

Soit \((A_n)\) une suite d’événements.

Si \(\sum_n \mathbb{P}(A_n) < +\infty\), alors \(\mathbb{P}(\limsup A_n) = 0\) (p.s., un nombre fini de \(A_n\) se réalisent).
Si les \(A_n\) sont indépendants et \(\sum_n \mathbb{P}(A_n) = +\infty\), alors \(\mathbb{P}(\limsup A_n) = 1\) (p.s., une infinité de \(A_n\) se réalisent).

Proof. 1. \(\mathbb{P}(\limsup A_n) = \mathbb{P}\!\left(\bigcap_{n=1}^\infty \bigcup_{k \geq n} A_k\right) \leq \mathbb{P}\!\left(\bigcup_{k \geq n} A_k\right) \leq \sum_{k \geq n} \mathbb{P}(A_k) \to 0\) (reste d’une série convergente).

2. Par indépendance :

\[\mathbb{P}\!\left(\bigcap_{k=n}^N \bar{A}_k\right) = \prod_{k=n}^N (1-\mathbb{P}(A_k)) \leq \prod_{k=n}^N e^{-\mathbb{P}(A_k)} = e^{-\sum_{k=n}^N \mathbb{P}(A_k)} \to 0\]

quand \(N \to +\infty\) (car la somme diverge). Donc \(\mathbb{P}(\bigcap_{k \geq n} \bar{A}_k) = 0\) pour tout \(n\), et \(\mathbb{P}(\limsup A_n) = 1\).

Exemple 158

Application à la LGN forte : Pour \(A_n = \{|\bar{X}_n - \mu| > \varepsilon\}\) avec \(\mathbb{E}[X_1^4] < +\infty\), on a \(\sum_n \mathbb{P}(A_n) < +\infty\), donc par Borel-Cantelli, \(\mathbb{P}(A_n \text{ i.o.}) = 0\) : \(\bar{X}_n \to \mu\) p.s.

Singe dactylographe : Un singe tape aléatoirement sur un clavier de 26 touches. \(A_n\) = «le mot HAMLET apparaît en position \(n\)». \(\mathbb{P}(A_n) = (1/26)^6 > 0\), les \(A_n\) sont indépendants, \(\sum \mathbb{P}(A_n) = +\infty\) : le singe écrit HAMLET infiniment souvent p.s.

Approximation de Poisson#

Théorème 84 (Théorème de Poisson (loi des événements rares))

Si \(X_n \sim \mathcal{B}(n, p_n)\) avec \(np_n \to \lambda > 0\), alors

\[X_n \xrightarrow{\mathcal{L}} \mathcal{P}(\lambda)\]

Proof. \(\mathbb{P}(X_n = k) = \binom{n}{k} p_n^k (1-p_n)^{n-k}\). Avec \(p_n = \lambda/n + o(1/n)\) :

\[\binom{n}{k} p_n^k \approx \frac{n^k}{k!} \cdot \frac{\lambda^k}{n^k} = \frac{\lambda^k}{k!}, \qquad (1-p_n)^{n-k} \approx \left(1-\frac{\lambda}{n}\right)^n \to e^{-\lambda}\]

Donc \(\mathbb{P}(X_n = k) \to e^{-\lambda}\frac{\lambda^k}{k!}\).

_images/d9ad6329f9f8f3665be03405b1ffe6254abed9ef7843d9b25402d1a11ef8c994.png

Résumé#

Mode de convergence	Définition
Presque sûre	\(\mathbb{P}(X_n \to X) = 1\)
En probabilité	\(\mathbb{P}(\|X_n - X\| > \varepsilon) \to 0\)
En loi	\(F_{X_n}(x) \to F_X(x)\) aux points de continuité
Dans \(L^p\)	\(\mathbb{E}[\|X_n - X\|^p] \to 0\)

Théorème	Énoncé
LGN faible	\(\bar{X}_n \xrightarrow{\mathbb{P}} \mu\) (via Tchebychev)
LGN forte	\(\bar{X}_n \xrightarrow{\text{p.s.}} \mu\) (Kolmogorov, via Borel-Cantelli)
TCL	\((S_n - n\mu)/(\sigma\sqrt{n}) \xrightarrow{\mathcal{L}} \mathcal{N}(0,1)\) (via fc. caract.)
Poisson	\(\mathcal{B}(n, \lambda/n) \xrightarrow{\mathcal{L}} \mathcal{P}(\lambda)\)
Borel-Cantelli 1	\(\sum \mathbb{P}(A_n) < \infty \implies\) nombre fini p.s.
Borel-Cantelli 2	\(\sum \mathbb{P}(A_n) = \infty\) + indép. \(\implies\) infini p.s.