---
jupytext:
  text_representation:
    extension: .md
    format_name: myst
    format_version: 0.13
    jupytext_version: 1.16.0
kernelspec:
  name: python3
  display_name: Python 3
---

# Statistiques bayésiennes

> Dans un monde incertain, toute croyance est une distribution — non un point.

## Introduction : deux visions de la probabilité

Les statistiques fréquentistes et bayésiennes posent la même question — *que nous apprennent les données ?* — mais y répondent avec des philosophies radicalement différentes.

**Vision fréquentiste** : la probabilité est la fréquence limite d'un événement dans des répétitions infinies d'une expérience. Les paramètres $\theta$ sont des valeurs fixes (inconnues mais non aléatoires). On construit des estimateurs et des intervalles qui ont de bonnes propriétés en répétition.

**Vision bayésienne** : la probabilité est un degré de croyance (*degree of belief*). Les paramètres $\theta$ sont des variables aléatoires représentant notre incertitude. On met à jour nos croyances grâce aux données.

```{code-cell} python
:tags: [hide-input]

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
import pandas as pd
from scipy import stats
from scipy.special import betaln

sns.set_theme(style="whitegrid", palette="muted", font_scale=1.1)

# Illustration conceptuelle : prior, likelihood, posterior
fig, axes = plt.subplots(1, 3, figsize=(15, 4))

theta = np.linspace(0, 1, 500)

# Exemple : estimer un taux de conversion
# Prior : Beta(2, 8) → croyance a priori : ~20% de conversion
alpha_prior, beta_prior = 2, 8
prior = stats.beta.pdf(theta, alpha_prior, beta_prior)

# Données : 15 conversions sur 50 visiteurs
n_obs, k_obs = 50, 15
# Likelihood : vraisemblance de k=15 succès pour chaque valeur de theta
likelihood = stats.binom.pmf(k_obs, n_obs, theta)
likelihood /= np.trapezoid(likelihood, theta)  # normalisation pour visualisation

# Posterior : Beta(alpha+k, beta+n-k) par conjugaison
alpha_post = alpha_prior + k_obs
beta_post = beta_prior + (n_obs - k_obs)
posterior = stats.beta.pdf(theta, alpha_post, beta_post)

# Mode (MAP), moyenne, médiane du posterior
mode_post = (alpha_post - 1) / (alpha_post + beta_post - 2)
mean_post = alpha_post / (alpha_post + beta_post)
median_post = stats.beta.ppf(0.5, alpha_post, beta_post)

titles = ['Prior\nBeta(2, 8)', 'Vraisemblance\n(données : 15/50)', 'Posterior\nBeta(17, 43)']
distribs = [prior, likelihood, posterior]
colors = ['steelblue', 'tomato', 'darkgreen']

for ax, dist, title, color in zip(axes, distribs, titles, colors):
    ax.plot(theta, dist, color=color, lw=2.5)
    ax.fill_between(theta, 0, dist, alpha=0.2, color=color)
    ax.set_xlabel('θ (taux de conversion)')
    ax.set_title(title)
    ax.set_xlim(0, 1)

# Annoter le posterior
axes[2].axvline(mode_post, color='black', ls='--', lw=1.5, label=f'MAP = {mode_post:.3f}')
axes[2].axvline(mean_post, color='orange', ls='-.', lw=1.5, label=f'Moyenne = {mean_post:.3f}')
axes[2].axvline(0.3, color='gray', ls=':', lw=1.5, label=f'MLE = {k_obs/n_obs:.2f}')
axes[2].legend(fontsize=8)

plt.suptitle('Inférence bayésienne : Prior × Vraisemblance → Posterior', fontsize=12,
             fontweight='bold', y=1.02)
plt.tight_layout()
plt.savefig('_static/13_bayes_intro.png', dpi=120, bbox_inches='tight')
plt.show()
```

## Le théorème de Bayes

Le théorème de Bayes est le cœur de toute inférence bayésienne :

$$\underbrace{P(\theta \mid \text{données})}_{\text{Posterior}} = \frac{\underbrace{P(\text{données} \mid \theta)}_{\text{Vraisemblance}} \times \underbrace{P(\theta)}_{\text{Prior}}}{\underbrace{P(\text{données})}_{\text{Evidence}}}$$

En notation proportionnelle (on ignore souvent l'evidence, une constante de normalisation) :

$$P(\theta \mid \mathbf{y}) \propto P(\mathbf{y} \mid \theta) \times P(\theta)$$

### Rôle de chaque composante

- **Prior $P(\theta)$** : encode les croyances *avant* d'observer les données. Peut être informatif (expertise métier) ou non-informatif (maximum d'incertitude).
- **Vraisemblance $P(\mathbf{y} \mid \theta)$** : quantifie à quel point les données sont compatibles avec chaque valeur de $\theta$.
- **Posterior $P(\theta \mid \mathbf{y})$** : la distribution mise à jour après observation — c'est *notre incertitude sur $\theta$ compte tenu des données*.

```{admonition} Priors informatifs vs non-informatifs
:class: note

Un **prior non-informatif** (ou diffus) laisse les données parler : Beta(1,1) = uniforme sur [0,1], ou prior de Jeffreys. Un **prior informatif** intègre des connaissances métier préalables : si une étude précédente estimait le taux à ~20%, on peut poser Beta(4,16) pour encoder cette information avec un certain degré de confiance.
```

## Distributions conjuguées

Une famille de priors $\mathcal{P}$ est **conjuguée** à une vraisemblance $\mathcal{L}$ si le posterior appartient à la même famille que le prior. La conjugaison permet un calcul analytique exact du posterior.

### Conjugaison Beta–Binomiale

**Modèle :**
- Prior : $\theta \sim \text{Beta}(\alpha, \beta)$
- Vraisemblance : $Y \mid \theta \sim \text{Binomiale}(n, \theta)$
- **Posterior : $\theta \mid Y=k \sim \text{Beta}(\alpha + k, \beta + n - k)$**

```{code-cell} python
def posterior_beta_binomial(alpha_prior, beta_prior, n_obs, k_obs):
    """Calcule le posterior Beta-Binomial."""
    return alpha_prior + k_obs, beta_prior + n_obs - k_obs

# Exemple numérique
print("Conjugaison Beta-Binomiale")
print("="*40)
print(f"Prior : Beta({2}, {8})")
print(f"  Moyenne a priori : {2/(2+8):.2f}")
print(f"  Variance a priori : {2*8/((2+8)**2*(2+8+1)):.4f}")
print()

observations = [(10, 2), (20, 6), (50, 16), (200, 64)]
alpha_p, beta_p = 2, 8

for n, k in observations:
    a_post, b_post = posterior_beta_binomial(alpha_p, beta_p, n, k)
    print(f"Données : {k}/{n} (MLE = {k/n:.2f})")
    print(f"  → Posterior Beta({a_post}, {b_post})")
    print(f"     Moyenne post  : {a_post/(a_post+b_post):.3f}")
    print(f"     Écart-type    : {np.sqrt(a_post*b_post/((a_post+b_post)**2*(a_post+b_post+1))):.3f}")
    print()
```

### Conjugaison Normale–Normale

Pour une moyenne gaussienne avec variance connue $\sigma^2$ :

- Prior : $\mu \sim \mathcal{N}(\mu_0, \tau_0^2)$
- Vraisemblance : $Y_i \mid \mu \sim \mathcal{N}(\mu, \sigma^2)$

Le posterior est :

$$\mu \mid \mathbf{y} \sim \mathcal{N}\left(\mu_n, \tau_n^2\right)$$

avec :

$$\frac{1}{\tau_n^2} = \frac{1}{\tau_0^2} + \frac{n}{\sigma^2}, \qquad \mu_n = \tau_n^2\left(\frac{\mu_0}{\tau_0^2} + \frac{n\bar{y}}{\sigma^2}\right)$$

```{code-cell} python
def posterior_normal_normal(mu0, tau0_sq, sigma_sq, data):
    """Posterior Normal-Normal pour la moyenne (variance connue)."""
    n = len(data)
    y_bar = np.mean(data)
    tau_n_sq = 1 / (1/tau0_sq + n/sigma_sq)
    mu_n = tau_n_sq * (mu0/tau0_sq + n*y_bar/sigma_sq)
    return mu_n, tau_n_sq

np.random.seed(42)
# Scénario : estimer la durée moyenne d'un traitement (jours)
# Prior : mu0=10 jours, tau0=3 (incertitude a priori importante)
# Données : 25 patients, vraie moyenne = 7 jours, sigma = 2 jours
true_mu = 7.0
sigma = 2.0
n_patients = 25
data_durees = np.random.normal(true_mu, sigma, n_patients)

mu0, tau0_sq = 10.0, 9.0
mu_post, tau_post_sq = posterior_normal_normal(mu0, tau0_sq, sigma**2, data_durees)

print("Conjugaison Normale-Normale (durée de traitement)")
print(f"Prior         : N({mu0}, {tau0_sq:.1f}) → μ ~ [7.1, 12.9] à 95%")
print(f"Données       : n={n_patients}, ȳ={data_durees.mean():.2f}")
print(f"Posterior     : N({mu_post:.3f}, {tau_post_sq:.4f})")
print(f"  σ posterior : {np.sqrt(tau_post_sq):.4f}")
print(f"  IC 95% post : [{mu_post - 1.96*np.sqrt(tau_post_sq):.2f}, "
      f"{mu_post + 1.96*np.sqrt(tau_post_sq):.2f}]")
```

### Conjugaison Gamma–Poisson

Pour un taux de Poisson $\lambda$ :
- Prior : $\lambda \sim \text{Gamma}(\alpha, \beta)$
- Vraisemblance : $Y_i \mid \lambda \sim \text{Poisson}(\lambda)$
- **Posterior : $\lambda \mid \mathbf{y} \sim \text{Gamma}(\alpha + \sum y_i, \beta + n)$**

```{code-cell} python
def posterior_gamma_poisson(alpha_prior, beta_prior, data):
    """Posterior Gamma-Poisson pour le taux lambda."""
    n = len(data)
    sum_y = np.sum(data)
    return alpha_prior + sum_y, beta_prior + n

np.random.seed(2024)
true_lambda = 3.5
data_counts = np.random.poisson(true_lambda, 30)

alpha_g, beta_g = 2.0, 1.0  # prior Gamma(2,1)
alpha_g_post, beta_g_post = posterior_gamma_poisson(alpha_g, beta_g, data_counts)

print("Conjugaison Gamma-Poisson (taux d'événements)")
print(f"Prior    : Gamma({alpha_g}, {beta_g}) → E[λ] = {alpha_g/beta_g:.1f}")
print(f"Données  : n={len(data_counts)}, Σy={data_counts.sum()}, ȳ={data_counts.mean():.2f}")
print(f"Posterior: Gamma({alpha_g_post}, {beta_g_post})")
print(f"  E[λ|y]  : {alpha_g_post/beta_g_post:.3f}")
print(f"  MAP     : {(alpha_g_post-1)/beta_g_post:.3f}")
```

## Mise à jour séquentielle

Une propriété fondamentale de l'inférence bayésienne : le posterior d'aujourd'hui devient le prior de demain. Les données peuvent arriver par vagues sans tout retraiter.

```{code-cell} python
:tags: [hide-input]

fig, axes = plt.subplots(2, 3, figsize=(15, 9))

theta = np.linspace(0, 1, 500)

# Scénario fil rouge : taux de conversion A/B
# Vague 0 : prior
batches = [
    (0, 0),   # prior
    (10, 2),  # batch 1
    (15, 4),  # batch 2
    (20, 7),  # batch 3
    (30, 9),  # batch 4
    (50, 18)  # batch 5
]

alpha_p, beta_p = 3, 12  # prior : ~20%

titles = ['Prior\nBeta(3, 12)', 'Après 10 obs.\n(2 succès)',
          'Après 25 obs.\n(6 succès)', 'Après 45 obs.\n(13 succès)',
          'Après 75 obs.\n(22 succès)', 'Après 125 obs.\n(40 succès)']
colors_seq = sns.color_palette("viridis_r", 6)

cumul_n, cumul_k = 0, 0
alpha_curr, beta_curr = alpha_p, beta_p

for idx, (ax, (n_batch, k_batch), title, color) in enumerate(
        zip(axes.ravel(), batches, titles, colors_seq)):
    if idx > 0:
        cumul_n += n_batch
        cumul_k += k_batch
        alpha_curr = alpha_p + cumul_k
        beta_curr = beta_p + cumul_n - cumul_k
    else:
        alpha_curr, beta_curr = alpha_p, beta_p

    dist = stats.beta(alpha_curr, beta_curr)
    mean_c = dist.mean()
    ci_low, ci_high = dist.ppf(0.025), dist.ppf(0.975)

    ax.plot(theta, dist.pdf(theta), color=color, lw=2.5)
    ax.fill_between(theta, 0, dist.pdf(theta), alpha=0.25, color=color)
    ax.axvline(mean_c, color='black', ls='--', lw=1.5,
               label=f'Moy = {mean_c:.3f}')
    ax.fill_betweenx([0, dist.pdf(theta).max() * 0.8],
                      ci_low, ci_high, alpha=0.15, color='tomato',
                      label=f'IC 95%: [{ci_low:.2f},{ci_high:.2f}]')
    ax.set_xlim(0, 0.7)
    ax.set_title(title)
    ax.set_xlabel('θ')
    ax.legend(fontsize=7, loc='upper right')

plt.suptitle('Mise à jour séquentielle — Conjugaison Beta-Binomiale\n'
             'Estimation d\'un taux de conversion', fontsize=12, fontweight='bold', y=1.01)
plt.tight_layout()
plt.savefig('_static/13_sequential.png', dpi=120, bbox_inches='tight')
plt.show()
```

## Estimation bayésienne ponctuelle

Deux estimateurs ponctuels naturels émergent du posterior :

- **MAP** (Maximum A Posteriori) : mode du posterior — équivalent au MLE avec un prior comme régularisateur.
- **Moyenne du posterior** (EAP, *Expected A Posteriori*) : minimise l'erreur quadratique moyenne bayésienne.
- **Médiane** : minimise l'erreur absolue moyenne bayésienne.

```{code-cell} python
# Comparaison MAP, moyenne, MLE
alpha_ex, beta_ex = 3, 12  # prior
n_ex, k_ex = 100, 35       # données

alpha_post_ex = alpha_ex + k_ex
beta_post_ex = beta_ex + n_ex - k_ex

mle = k_ex / n_ex
map_est = (alpha_post_ex - 1) / (alpha_post_ex + beta_post_ex - 2)
mean_post_ex = alpha_post_ex / (alpha_post_ex + beta_post_ex)
median_post_ex = stats.beta.ppf(0.5, alpha_post_ex, beta_post_ex)

print("Estimateurs ponctuels (n=100, k=35)")
print(f"  MLE            : {mle:.4f}")
print(f"  MAP            : {map_est:.4f}")
print(f"  Moyenne post.  : {mean_post_ex:.4f}")
print(f"  Médiane post.  : {median_post_ex:.4f}")
print(f"\n  (Prior Beta(3,12) tire vers 3/15 = {3/15:.2f})")
print(f"  La moyenne du posterior est un mélange entre MLE et prior.")
```

## Intervalles crédibles vs intervalles de confiance

C'est l'une des différences conceptuelles les plus importantes entre les deux approches.

```{admonition} Différence fondamentale
:class: important

**Intervalle de confiance (fréquentiste)** : "Si je répète cette expérience de nombreuses fois, 95% des intervalles construits contiendront la vraie valeur de θ." Le paramètre θ est fixe ; c'est l'intervalle qui est aléatoire.

**Intervalle crédible (bayésien)** : "Étant donné les données observées, il y a 95% de probabilité que θ soit dans cet intervalle." C'est l'interprétation intuitive que beaucoup donnent à tort à l'IC fréquentiste !
```

```{code-cell} python
:tags: [hide-input]

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Posterior Beta-Binomial
theta = np.linspace(0, 1, 1000)
alpha_cred, beta_cred = 3 + 35, 12 + 65  # Beta(38, 77)
post_dist = stats.beta(alpha_cred, beta_cred)

# Intervalle crédible HPD (Highest Posterior Density)
ci_low_cred = post_dist.ppf(0.025)
ci_high_cred = post_dist.ppf(0.975)

ax = axes[0]
ax.plot(theta, post_dist.pdf(theta), 'darkgreen', lw=2.5, label='Posterior')
mask_cred = (theta >= ci_low_cred) & (theta <= ci_high_cred)
ax.fill_between(theta, 0, post_dist.pdf(theta), where=mask_cred,
                alpha=0.35, color='darkgreen', label=f'Intervalle crédible 95%\n[{ci_low_cred:.3f}, {ci_high_cred:.3f}]')
ax.axvline(post_dist.mean(), color='black', ls='--', lw=1.5, label=f'Moyenne = {post_dist.mean():.3f}')
ax.set_xlabel('θ'); ax.set_ylabel('Densité')
ax.set_title('Intervalle crédible bayésien (95%)\n"P(θ ∈ IC | données) = 0.95"')
ax.legend(fontsize=8)

# Intervalles de confiance fréquentistes (simulation)
ax = axes[1]
np.random.seed(42)
n_simul = 50
true_theta = 0.35
n_sample = 100

ic_contain = 0
ys = []
lows, highs = [], []
for i in range(n_simul):
    k_sim = np.random.binomial(n_sample, true_theta)
    p_hat = k_sim / n_sample
    se = np.sqrt(p_hat * (1 - p_hat) / n_sample)
    low = p_hat - 1.96 * se
    high = p_hat + 1.96 * se
    lows.append(low); highs.append(high); ys.append(p_hat)
    if low <= true_theta <= high:
        ic_contain += 1

colors_ic = ['steelblue' if l <= true_theta <= h else 'tomato'
             for l, h in zip(lows, highs)]
for i, (y, l, h, c) in enumerate(zip(ys, lows, highs, colors_ic)):
    ax.plot([l, h], [i, i], color=c, lw=1.5, alpha=0.7)
    ax.scatter(y, i, color=c, s=20)
ax.axvline(true_theta, color='black', ls='--', lw=1.5, label=f'Vraie valeur θ = {true_theta}')
ax.set_xlabel('θ'); ax.set_yticks([])
ax.set_title(f'Intervalles de confiance fréquentistes (95%)\n'
             f'{ic_contain}/{n_simul} contiennent θ (attendu : ~47.5)')
ax.legend(fontsize=8)

plt.tight_layout()
plt.savefig('_static/13_credible_vs_ci.png', dpi=120, bbox_inches='tight')
plt.show()
```

## Facteur de Bayes

Le **facteur de Bayes** permet de comparer deux modèles $M_1$ et $M_2$ :

$$\text{BF}_{12} = \frac{P(\mathbf{y} \mid M_1)}{P(\mathbf{y} \mid M_2)} = \frac{\int P(\mathbf{y} \mid \theta_1, M_1) P(\theta_1 \mid M_1) \, d\theta_1}{\int P(\mathbf{y} \mid \theta_2, M_2) P(\theta_2 \mid M_2) \, d\theta_2}$$

L'échelle de Jeffreys d'interprétation :

| BF₁₂ | Preuve en faveur de M₁ |
|-------|------------------------|
| 1–3 | Anecdotique |
| 3–10 | Modérée |
| 10–30 | Forte |
| 30–100 | Très forte |
| > 100 | Décisive |

```{code-cell} python
# Facteur de Bayes analytique pour le modèle Beta-Binomial
def log_marginal_beta_binomial(k, n, alpha, beta):
    """Log-vraisemblance marginale pour Beta-Binomial."""
    from scipy.special import gammaln
    log_binom = gammaln(n+1) - gammaln(k+1) - gammaln(n-k+1)
    log_beta_prior = betaln(alpha, beta)
    log_beta_post = betaln(alpha + k, beta + n - k)
    return log_binom + log_beta_post - log_beta_prior

# Comparer M1: theta ~ Beta(1,1) vs M2: theta ~ Beta(2,8)
k_obs_bf, n_obs_bf = 20, 50

log_m1 = log_marginal_beta_binomial(k_obs_bf, n_obs_bf, 1, 1)  # prior uniforme
log_m2 = log_marginal_beta_binomial(k_obs_bf, n_obs_bf, 2, 8)  # prior concentré ~20%

log_bf = log_m1 - log_m2
bf = np.exp(log_bf)

print(f"Facteur de Bayes BF₁₂ = {bf:.3f}")
print(f"  M1 : Prior Beta(1,1) (uniforme)")
print(f"  M2 : Prior Beta(2,8) (concentré autour de 20%)")
if bf > 1:
    print(f"  → Les données favorisent M1 (prior non-informatif)")
else:
    print(f"  → Les données favorisent M2 (prior informatif)")
```

## Exemple complet : A/B testing bayésien

```{code-cell} python
:tags: [hide-input]

# A/B test bayésien : comparer deux taux de conversion
np.random.seed(2024)

# Version A : 800 visiteurs, 180 conversions
# Version B : 800 visiteurs, 210 conversions
nA, kA = 800, 180
nB, kB = 800, 210

# Priors non-informatifs
alpha0, beta0 = 1, 1

alpha_A = alpha0 + kA
beta_A = beta0 + nA - kA
alpha_B = alpha0 + kB
beta_B = beta0 + nB - kB

theta_grid = np.linspace(0, 0.5, 1000)
post_A = stats.beta.pdf(theta_grid, alpha_A, beta_A)
post_B = stats.beta.pdf(theta_grid, alpha_B, beta_B)

# P(θ_B > θ_A) par simulation Monte Carlo
N_sim = 100000
samples_A = np.random.beta(alpha_A, beta_A, N_sim)
samples_B = np.random.beta(alpha_B, beta_B, N_sim)
prob_B_better = (samples_B > samples_A).mean()
lift = ((samples_B - samples_A) / samples_A)
lift_median = np.median(lift)
lift_ci = np.percentile(lift, [2.5, 97.5])

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

ax = axes[0]
ax.plot(theta_grid, post_A, 'steelblue', lw=2.5, label=f'Version A ({kA}/{nA} = {kA/nA:.1%})')
ax.plot(theta_grid, post_B, 'tomato', lw=2.5, label=f'Version B ({kB}/{nB} = {kB/nB:.1%})')
ax.fill_between(theta_grid, 0, post_A, alpha=0.2, color='steelblue')
ax.fill_between(theta_grid, 0, post_B, alpha=0.2, color='tomato')
ax.set_xlabel('Taux de conversion θ'); ax.set_ylabel('Densité')
ax.set_title(f'Posteriors A/B\nP(θ_B > θ_A) = {prob_B_better:.1%}')
ax.legend(fontsize=9)

ax = axes[1]
lift_vals = np.linspace(-0.3, 0.5, 500)
# Estimation par KDE
from scipy.stats import gaussian_kde
kde_lift = gaussian_kde(lift)
ax.plot(lift_vals, kde_lift(lift_vals), 'darkgreen', lw=2.5)
ax.fill_between(lift_vals, 0, kde_lift(lift_vals),
                where=(lift_vals >= 0), alpha=0.3, color='darkgreen',
                label=f'Lift > 0 : {(lift > 0).mean():.1%}')
ax.fill_between(lift_vals, 0, kde_lift(lift_vals),
                where=(lift_vals < 0), alpha=0.3, color='tomato')
ax.axvline(lift_median, color='black', ls='--', lw=1.5,
           label=f'Médiane lift : {lift_median:.1%}')
ax.axvline(0, color='gray', ls=':', lw=1)
ax.set_xlabel('Lift relatif (θ_B - θ_A) / θ_A')
ax.set_ylabel('Densité')
ax.set_title(f'Distribution du lift\nIC 95% : [{lift_ci[0]:.1%}, {lift_ci[1]:.1%}]')
ax.legend(fontsize=8)

plt.tight_layout()
plt.savefig('_static/13_ab_test.png', dpi=120, bbox_inches='tight')
plt.show()

print(f"P(Version B meilleure que A) = {prob_B_better:.1%}")
print(f"Lift médian = {lift_median:.1%}")
print(f"IC crédible 95% du lift = [{lift_ci[0]:.1%}, {lift_ci[1]:.1%}]")
```

## Comparaison avec l'approche fréquentiste

```{code-cell} python
from scipy.stats import chi2_contingency, norm

# Test du chi-2 fréquentiste pour le même A/B test
table = np.array([[kA, nA - kA], [kB, nB - kB]])
chi2, p_val, _, _ = chi2_contingency(table)

# Intervalle de confiance fréquentiste sur la différence
pA_hat = kA / nA
pB_hat = kB / nB
se_diff = np.sqrt(pA_hat*(1-pA_hat)/nA + pB_hat*(1-pB_hat)/nB)
diff = pB_hat - pA_hat
ci_freq = (diff - 1.96*se_diff, diff + 1.96*se_diff)

print("Comparaison fréquentiste vs bayésienne — A/B Test")
print("="*50)
print("\nApproche FRÉQUENTISTE :")
print(f"  Différence : {diff:.4f} ({diff:.1%})")
print(f"  IC 95%     : [{ci_freq[0]:.4f}, {ci_freq[1]:.4f}]")
print(f"  χ² = {chi2:.3f}, p-valeur = {p_val:.4f}")
print(f"  Conclusion : {'Différence significative' if p_val < 0.05 else 'Pas significatif'}")

print("\nApproche BAYÉSIENNE (prior Beta(1,1)) :")
print(f"  Différence médiane : {lift_median:.1%}")
print(f"  IC crédible 95%    : [{lift_ci[0]:.1%}, {lift_ci[1]:.1%}]")
print(f"  P(B>A) = {prob_B_better:.1%}")
print(f"  Conclusion : Version B meilleure avec probabilité {prob_B_better:.1%}")
print("\n→ L'approche bayésienne donne une probabilité directe, plus intuitive.")
```

## Résumé

```{admonition} Points clés — Statistiques bayésiennes
:class: note

- Le théorème de Bayes : **posterior ∝ vraisemblance × prior**. On met à jour les croyances grâce aux données.
- La **conjugaison** permet un calcul analytique exact : Beta-Binomiale, Normale-Normale, Gamma-Poisson.
- La **mise à jour séquentielle** est naturelle : le posterior courant devient le prior pour la prochaine vague de données.
- L'**intervalle crédible** donne directement une probabilité sur θ (contrairement à l'IC fréquentiste).
- Le **facteur de Bayes** compare des modèles sans se restreindre aux hypothèses nulles.
- L'A/B testing bayésien fournit P(B > A), plus actionnable que la p-valeur.
- Quand les posteriors n'ont pas de forme analytique, on recourt au MCMC (chapitre suivant).
```