本节介绍无偏估计、平均二次误差以及Cramér-Rao下界。
参考书籍:Statistical Inference, Casella and Berger, Duxbury, 2th edition.
无偏性(Unbiasedness)
参数\(\theta\)的估计量\(\hat{\theta}\)关于\(\theta\)是无偏的,即: \[\text{E}[\hat{\theta}(X)]=\theta_0,\] 其中\(\theta_0\)是真实参数。将\(\text{Bias}_{\theta_0}[\hat{\theta}(X)]=\text{E}[\hat{\theta}(X)]-\theta_0\)称之为偏差。
- 🌰例1. \(X_1,\dots,X_n\)服从参数为\(\theta_0\)的(0-1)分布,其极大似然估计和矩估计均为\(\hat{\theta}(X)=\frac{1}{n}\sum\limits_{i=1}^nX_i=\bar{X}\),是无偏估计量。 \[\text{E}[\hat{\theta}(X)]=\text{E}\left[\frac{1}{n}\sum_{i=1}^nX_i\right]=\frac{1}{n}\text{E}\left[\sum_{i=1}^nX_i\right]=\frac{1}{n}\sum_{i=1}^n\text{E}\left[X_i\right]=\frac{1}{n} \cdot n\cdot\theta_0=\theta_0.\]
\(\text{E}[a\cdot X]=a\cdot\text{E}[X]\)
\(\text{E}[X+Y]=\text{E}[X]+\text{E}[Y]\)
\(\text{E}[X]=\int_{-\infty}^{\infty}x\cdot f(x)dx\)
\(\text{V}[X]=\text{E}[(X-\text{E}[X])^2]=\text{E}[X^2]-\text{E}^2[X]\)
平均二次误差(Mean Squared Error)
参数\(\theta\)的估计量\(\hat{\theta}\)的评估二次误差是关于\(\theta_0\)的函数,被定义为:\(\text{E}_{\theta_0}[(\hat{\theta}(X)-\theta_0)^2]\)。
- 🌰例2. \(X_1,\dots,X_n\)服从参数为\(\theta_0\)的(0-1)分布,参数估计量\(\hat{\theta}(X)=\bar{X}\),该参数估计量的平均二次误差为 \[\text{MSE}=\text{E}[(\hat{\theta}(X)-\theta_0)^2]=\text{E}[(\hat{\theta}(X)-\text{E}[\hat{\theta}(X)])]=\text{V}[\hat{\theta}(X)]=\frac{1}{n^2}\sum_{i=1}^n\text{V}[X_i]=\frac{1}{n^2}\cdot n\cdot \theta_0(1-\theta_0)=\frac{\theta_0(1-\theta_0)}{n}\] 注:当\(n\to\infty\)时,平均二次误差\(\text{MSE}\)趋于0,即估计量趋近于真实值\(\theta_0\)。
\(\text{V}[a\cdot X]=a^2\cdot \text{V}[X]\)
\(\text{V}[X+Y]=\text{V}[X]+\text{V}[Y]+2\cdot\text{Cov}[X,Y]\)
平均二次误差可以被拆分成两部分: \[\begin{align} \text{MSE}&=\text{E}[(\hat{\theta}(X)-\theta_0)^2]\\ &=\text{E}\left[\left(\hat{\theta}(X)-\text{E}[\hat{\theta}(X)]+\text{E}[\hat{\theta}(X)]-\theta_0\right)^2\right]\\ &=\text{E}\left[(\hat{\theta}(X)-\text{E}[\hat{\theta}(X)])^2+2(\hat{\theta}(X)-\text{E}[\hat{\theta}(X)])(\text{E}[\hat{\theta}(X)]-\theta_0)+(\text{E}[\hat{\theta}(X)]-\theta_0)^2\right]\\ &=\text{E}\left[(\hat{\theta}(X)-\text{E}[\hat{\theta}(X)])^2\right]+2(\text{E}[\hat{\theta}(X)]-\theta_0)\cdot\text{E}\left[(\hat{\theta}(X)-\text{E}[\hat{\theta}(X)])\right]+\left(\text{E}[\hat{\theta}(X)]-\theta_0\right)^2\\ &=\text{V}[\hat{\theta}(X)]+\text{Bisa}_{\theta_0}^2[\hat{\theta}(X)] \end{align}\] 即平均二次误差为估计量的方差与与偏差的平方之和。
Cramér-Rao下界
如果\(X_1,\dots,X_n\)是互相独立,服从密度函数为\(f(x;\theta)\)的分布。则任何无偏估计量\(\hat{\theta}\)的方差满足: \[\text{V}[\hat{\theta}(X)]\ge\frac{1}{n\cdot I_{X_1}(\theta)}\] 其中\(I_{X_1}(\theta)\)被称为Fisher信息,定义为: \[\begin{align} I_{X_1}(\theta)&=\text{E}\left[\left(\frac{\partial}{\partial\theta}l_{X_1}(x;\theta)\right)^2\right]:=\text{E}[l'_{X_1}(\theta)^2]\\ &=-\text{E}\left[\frac{\partial^2}{\partial\theta^2}l_{X_1}(x;\theta)\right]:=-\text{E}[l''_{X_1}(\theta)]. \end{align}\] 其中\(l_{X_1}(\theta)=\log f_{X_1}(x;\theta)\)
- 证明:令\(Y=\hat{\theta}(X)\),\(Z=l'_X(\theta)\),由于\(Y,Z\)的相关系数介于\([-1,1]\),即 \[-1\le\frac{\text{Cov}[Y,Z]}{\sqrt{\text{V}[Y]\cdot\text{V}[Z]}}\le 1,\] 由此可得 \[\begin{equation} \text{V}[Y]=\text{V}[\hat{\theta}(X)]\ge\frac{\text{Cov}^2[Y,Z]}{\text{V}[Z]}, \end{equation}\] 观察上式,现需替换分子\(\text{Cov}^2[Y,Z]\)和分母\(\text{V}[Z]\),首先\(\text{V}[Z]\)可以被写作 \[\begin{align} \text{V}[Z]&=\text{V}[l'_X(\theta)]=\underbrace{\text{V}\left[\frac{\partial}{\partial\theta}\log f_X(x;\theta)\right]}_{对数似然函数}=\underbrace{\text{V}\left[\frac{\partial}{\partial\theta}\log\prod_{i=1}^nf_{X_i}(x;\theta)\right]}_{\log(ab)=\log a+\log b}\\ &=\underbrace{\text{V}\left[\sum_{i=1}^n\frac{\partial}{\partial\theta}\log f_{X_i}(x;\theta)\right]}_{X\text{是相互独立的}}=\sum_{i=1}^n\text{V}\left[\frac{\partial}{\partial\theta}\log f_{X_i}(x;\theta)\right]\\ &=n\cdot\text{V}[l'_{X_1}(\theta)]=n\cdot\left\{\text{E}[l'_{X_1}(\theta)^2]-\text{E}^2[l'_{X_1}(\theta)]\right\} \end{align}\] 由期望的定义有 \[\begin{align} &\text{E}[l'_{X_1}(\theta)]=\int_{-\infty}^{\infty}l'_{X_1}(\theta)f_{X_1}(x;\theta)dx=\int_{-\infty}^{\infty}\frac{\partial}{\partial\theta}\log f_{X_1}(x;\theta)\cdot f_{X_1}(x;\theta)dx\\ =&\int_{-\infty}^{\infty}\frac{\frac{\partial}{\partial\theta}f_{X_1}(x;\theta)}{f_{X_1}(x;\theta)}f_{X_1}(x;\theta)dx=\int_{-\infty}^{\infty}{\frac{\partial}{\partial\theta}f_{X_1}(x;\theta)}dx=\frac{\partial}{\partial\theta}\underbrace{\int_{-\infty}^{\infty}f_{X_1}(x;\theta)dx}_{\text{概率密度函数积分为1}}=0, \end{align}\] 即\(\text{V}[Z]=n\cdot\text{E}[l'_{X_1}(\theta)^2]=n\cdot I_{X_1}(\theta)\),根据协方差定义 \[\begin{align} \text{Cov}[Y,Z]&=\text{Cov}[\hat{\theta}(X),l'_X(\theta)]=\text{E}[\hat{\theta}(X)\cdot l'_X(\theta)]-\text{E}[\hat{\theta}(X)]\cdot\text{E}[l'_X(\theta)]\\ &=\text{E}[\hat{\theta}(X)\cdot l'_X(\theta)]-\text{E}[\hat{\theta}(X)]\cdot n\cdot\underbrace{\text{E}[l'_{X_1}(\theta)]}_{0}=\text{E}[\hat{\theta}(X)\cdot l'_X(\theta)], \end{align}\] 再由定义 \[\begin{align} \text{E}[\hat{\theta}(X)\cdot l'_X(\theta)]=\int_{-\infty}^{\infty}\hat{\theta}(X)\frac{\frac{\partial}{\partial\theta}f_X(x;\theta)}{f_X(x;\theta)}f_X(x;\theta)dx=\frac{\partial}{\partial\theta}\int_{-\infty}^{\infty}\hat{\theta}\cdot f_X(x;\theta)dx=\frac{\partial}{\partial\theta}\text{E}[\hat{\theta}(X)]=1, \end{align}\] 即\(\text{Cov}[Y,Z]=1\),综上有\(\text{V}[\hat{\theta}(X)]\ge\left(n\cdot I_{X_i}(\theta)\right)^{-1}\),得证。