本节将简要介绍参数估计,主要讨论矩估计(method of moments)和极大似然估计(maximum likelihood estimation)。
参考书籍:《概率论与数理统计》 浙大第四版
矩估计
设\(X_1,X_2,\dots,X_n\)是来自随机变量\(X\)的样本,如果\(X\)是分布律为\(P\{X=x\}=p(x;\theta_1,\dots,\theta_k)\)的离散型随机变量,其中参数\(\theta_1,\dots,\theta_k\)为待估计参数。总体\(X\)的\(m\)阶矩为 \[\mu_m(\theta_1,\dots,\theta_k)=\text{E}[X^m]=\sum_{x\in R_X}x^mp(x;\theta_1,\theta_2,\dots,\theta_k),\] 其中\(R_X\)是\(X\)可能的取值范围。如果随机变量\(X\)是概率密度为\(f(x;\theta_1,\theta_2,\dots,\theta_k)\)的连续型随机变量,假设总体\(X\)的\(m\)阶矩是 \[\mu_m(\theta_1,\dots,\theta_k)=\text{E}[X^m]=\int_{-\infty}^{\infty}x^mf(x;\theta_1,\theta_2,\dots,\theta_k)dx.\] 如果\(X\)的前\(k\)阶矩存在。由于样本矩\(A_m=\frac{1}{n}\sum_{i=1}^nX_i^m\)依概率收敛于相应的总体矩\(\mu_m\)。用样本矩作为相应的总体矩的估计量。对于\(m=1,2,\dots,k\),可以建立如下方程组: \[\begin{equation*} \left\{ \begin{array}{c} \mu_1(\theta_1,\dots,\theta_k)=A_1,\\ \mu_2(\theta_1,\dots,\theta_k)=A_2,\\ \vdots\\ \mu_k(\theta_1,\dots,\theta_k)=A_k \end{array} \right. \end{equation*}\] 通过求解出\(\theta_1,\dots,\theta_k\)。这样的估计量被称为矩估计量。
极大似然估计
设\(X_1,X_2,\dots,X_n\)是来自随机变量\(X\)的样本,如果\(X\)是分布律为\(P\{X=x\}=p(x;\theta_1,\dots,\theta_k)\)的离散型随机变量,则\(X_1,X_2,\dots,X_n\)的联合分布律为 \[\prod_{i=1}^np(x_i;\theta_1,\dots,\theta_k),\] 假如已知样本\(X_1,X_2,\dots,X_n\)对应的样本值是\(x_1,x_2,\dots,x_n\),则事件\(\{X_i=x_i,~i=1,\dots,n\}\)的概率为 \[P\{X_i=x_i,~i=1,\dots,n\}=\prod_{i=1}^np(x_i;\theta_1,\dots,\theta_k)=L(\theta_1,\dots,\theta_k),\] 由于该事件为观测值对应的事件,其已经发生,表明这个概率值较大,给予该原则最大化函数\(L(\theta_1,\dots,\theta_k)\),该函数被称为样本的似然函数。则极大似然估计值为 \[(\hat{\theta}_1,\dots,\hat{\theta}_k)=\max_{\theta_1,\dots,\theta_k}\prod_{i=1}^np(x_i;\theta_1,\dots,\theta_k).\] 对于连续型随机变量,假设其对应的概率密度函数为是概率密度为\(f(x;\theta_1,\theta_2,\dots,\theta_k)\),同理有极大似然估计量为 \[(\hat{\theta}_1,\dots,\hat{\theta}_k)=\max_{\theta_1,\dots,\theta_k}\prod_{i=1}^nf(x_i;\theta_1,\dots,\theta_k).\] 对于该优化问题的求解,当\(p(x_i;\theta_1,\dots,\theta_k)\)或\(f(x_i;\theta_1,\dots,\theta_k)\)关于\(\theta_j\)可微,则极大似然估计值可从方程组 \[\begin{equation*} \left\{ \begin{array}{c} \frac{\partial}{\partial\theta_1}L(\theta_1,\dots,\theta_k)=0,\\ \frac{\partial}{\partial\theta_2}L(\theta_1,\dots,\theta_k)=0,\\ \vdots\\ \frac{\partial}{\partial\theta_k}L(\theta_1,\dots,\theta_k)=0 \end{array} \right. \end{equation*}\] 求得。由于\(L(\theta_1,\dots,\theta_k)\)与\(\log L(\theta_1,\dots,\theta_k)\)在同一处取得极值,因此极大似然估计值亦可通过求解下面的对数似然方程组获得。 \[\begin{equation*} \left\{ \begin{array}{c} \frac{\partial}{\partial\theta_1}\log L(\theta_1,\dots,\theta_k)=\frac{\partial}{\partial\theta_1}\sum_{i=1}^n\log p(x_i;\theta_1,\dots,\theta_k)=0,\\ \frac{\partial}{\partial\theta_2}\log L(\theta_1,\dots,\theta_k)=\frac{\partial}{\partial\theta_2}\sum_{i=1}^n\log p(x_i;\theta_1,\dots,\theta_k)=0,\\ \vdots\\ \frac{\partial}{\partial\theta_k}\log L(\theta_1,\dots,\theta_k)=\frac{\partial}{\partial\theta_k}\sum_{i=1}^n\log p(x_i;\theta_1,\dots,\theta_k)=0 \end{array} \right.\qquad \text{(离散型随机变量)} \end{equation*}\] 或 \[\begin{equation*} \left\{ \begin{array}{c} \frac{\partial}{\partial\theta_1}\log L(\theta_1,\dots,\theta_k)=\frac{\partial}{\partial\theta_1}\sum_{i=1}^n\log f(x_i;\theta_1,\dots,\theta_k)=0,\\ \frac{\partial}{\partial\theta_2}\log L(\theta_1,\dots,\theta_k)=\frac{\partial}{\partial\theta_2}\sum_{i=1}^n\log f(x_i;\theta_1,\dots,\theta_k)=0,\\ \vdots\\ \frac{\partial}{\partial\theta_k}\log L(\theta_1,\dots,\theta_k)=\frac{\partial}{\partial\theta_k}\sum_{i=1}^n\log f(x_i;\theta_1,\dots,\theta_k)=0 \end{array} \right.\qquad \text{(连续型随机变量)} \end{equation*}\]