查看“︁贝叶斯多元线性回归”︁的源代码

{{回归侧栏}}
[[统计学]]中，'''贝叶斯多元线性回归'''是一种[[一般线性模型#多元线性回归|多元线性回归]]（预测结果为相关随机变量向量，而非标量随机变量的[[线性回归]]）的[[贝叶斯推断]]方法。这种方法的更一般论述见[[最小均方误差]]。

==细节==
考虑一回归问题，其中需要预测的自变量不是实标量而是相关实数组成的''m''维向量。与标准回归设置一样，有''n''个观测值，其中每个观测''i''包含''k''−1个解释变量，归为''k''维向量<math>\mathbf{x}_i</math>（添加值为1的[[虚拟变量]]，以允许截距系数）。对每个观测''i''，可以视作''m''个相关回归问题：
<math display="block">\begin{align}
y_{i,1} &= \mathbf{x}_i^\mathsf{T}\boldsymbol\beta_{1} + \epsilon_{i,1} \\
&\;\;\vdots \\
y_{i,m} &= \mathbf{x}_i^\mathsf{T}\boldsymbol\beta_{m} + \epsilon_{i,m}
\end{align}</math>
其中误差集<math>\{ \epsilon_{i,1}, \ldots, \epsilon_{i,m}\}</math>都是相关的。等价地，也可以视作单一回归问题，其中结果是[[行向量]]<math>\mathbf{y}_i^\mathsf{T}</math>，回归系数向量排在一起：
<math display="block">\mathbf{y}_i^\mathsf{T} = \mathbf{x}_i^\mathsf{T}\mathbf{B} + \boldsymbol\epsilon_{i}^\mathsf{T}.</math>

系数矩阵'''B'''是<math>k \times m</math>矩阵，其中每个回归问题的系数向量<math>\boldsymbol\beta_1,\ldots,\boldsymbol\beta_m</math>垂直排列在一起：
<math display="block">\mathbf{B} =
\begin{bmatrix}
\begin{pmatrix} \\ \boldsymbol\beta_1 \\ \\ \end{pmatrix}
\cdots
\begin{pmatrix} \\ \boldsymbol\beta_m \\ \\ \end{pmatrix}
\end{bmatrix}
=
\begin{bmatrix}
\begin{pmatrix}
\beta_{1,1} \\ \vdots \\ \beta_{k,1}
\end{pmatrix}
\cdots
\begin{pmatrix}
\beta_{1,m} \\ \vdots \\ \beta_{k,m}
\end{pmatrix}
\end{bmatrix}
.</math>

每个观测''i''的噪声向量<math>\boldsymbol\epsilon_{i}</math>服从联合正态分布，因此给定观测的结果是相关的：
<math display="block">\boldsymbol\epsilon_i \sim N(0, \boldsymbol\Sigma_{\epsilon}).</math>

可以将整个回归问题写成矩阵形式：
<math display="block">\mathbf{Y} = \mathbf{X}\mathbf{B} + \mathbf{E},</math>
其中'''Y'''、'''E'''是<math>n \times m</math>矩阵。[[设计矩阵]]'''X'''是<math>n \times k</math>矩阵，观测如标准[[线性回归]]垂直排列：
<math display="block">
 \mathbf{X} = \begin{bmatrix} \mathbf{x}^\mathsf{T}_1 \\ \mathbf{x}^\mathsf{T}_2 \\ \vdots \\ \mathbf{x}^\mathsf{T}_n \end{bmatrix}
 = \begin{bmatrix} x_{1,1} & \cdots & x_{1,k} \\
 x_{2,1} & \cdots & x_{2,k} \\
 \vdots & \ddots & \vdots \\
 x_{n,1} & \cdots & x_{n,k}
 \end{bmatrix}.
</math>

经典频率学派{{link-en|线性最小二乘|Linear least squares}}解利用[[摩尔－彭若斯广义逆]]，简单地估计回归系数矩阵<math>\hat{\mathbf{B}}</math>：
<math display="block"> \hat{\mathbf{B}} = (\mathbf{X}^\mathsf{T}\mathbf{X})^{-1}\mathbf{X}^\mathsf{T}\mathbf{Y}.</math>

求贝叶斯解，要先指定条件似然，再找到适当的共轭先验。与线性贝叶斯回归不同，可以指定一个自然的条件共轭先验（与规模相关）。

把条件似然写成<ref name="BSaM">Peter E. Rossi, Greg M. Allenby, Rob McCulloch. ''Bayesian Statistics and Marketing''. John Wiley & Sons, 2012, p. 32.</ref>
<math display="block">\rho(\mathbf{E}|\boldsymbol\Sigma_{\epsilon}) \propto |\boldsymbol\Sigma_{\epsilon}|^{-n/2} \exp\left(-\tfrac{1}{2} \operatorname{tr}\left(\mathbf{E}^\mathsf{T} \mathbf{E} \boldsymbol\Sigma_{\epsilon}^{-1}\right) \right) ,</math>
误差<math>\mathbf{E}</math>表为<math>\mathbf{Y},\mathbf{X},\mathbf{B}</math>，则有
<math display="block">\rho(\mathbf{Y}|\mathbf{X},\mathbf{B},\boldsymbol\Sigma_{\epsilon}) \propto |\boldsymbol\Sigma_{\epsilon}|^{-n/2} \exp(-\tfrac{1}{2} \operatorname{tr}((\mathbf{Y}-\mathbf{X} \mathbf{B})^\mathsf{T} (\mathbf{Y}-\mathbf{X} \mathbf{B}) \boldsymbol\Sigma_{\epsilon}^{-1} ) ) ,</math>

寻找一个自然共轭先验——联合密度<math>\rho(\mathbf{B},\Sigma_{\epsilon})</math>，其泛函形式与似然相同。由于似然在<math>\mathbf{B}</math>中是二次的，因此我们重写似然使其在<math>(\mathbf{B}-\hat{\mathbf{B}})</math>（与经典样本估计的差）是正态的。
用与{{link-en|贝叶斯线性回归|Bayesian linear regression}}相同的技术，可用矩阵形式的平方和分解指数项。不过此处还要用到矩阵微分（[[克罗内克积]]和[[向量化]]变换）。

首先，应用平方和得到新的似然表达式：
<math display="block">\rho(\mathbf{Y}|\mathbf{X},\mathbf{B},\boldsymbol\Sigma_{\epsilon}) \propto |\boldsymbol\Sigma_{\epsilon}|^{-(n-k)/2} \exp(-\operatorname{tr}(\tfrac{1}{2}\mathbf{S}^\mathsf{T} \mathbf{S} \boldsymbol\Sigma_{\epsilon}^{-1}))  
|\boldsymbol\Sigma_{\epsilon}|^{-k/2} \exp(-\tfrac{1}{2} \operatorname{tr}((\mathbf{B}-\hat{\mathbf{B}})^\mathsf{T} \mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B}-\hat{\mathbf{B}}) \boldsymbol\Sigma_{\epsilon}^{-1} ) )
,</math>
<math display="block">\mathbf{S} = \mathbf{Y} - \mathbf{X}\hat{\mathbf{B}}</math>

我们想开发一种先验的条件形式：
<math display="block">\rho(\mathbf{B},\boldsymbol\Sigma_{\epsilon}) = \rho(\boldsymbol\Sigma_{\epsilon})\rho(\mathbf{B}|\boldsymbol\Sigma_{\epsilon}),</math>
其中<math>\rho(\boldsymbol\Sigma_{\epsilon})</math>服从[[逆威沙特分布]]，<math>\rho(\mathbf{B}|\boldsymbol\Sigma_{\epsilon})</math>是矩阵<math>\mathbf{B}</math>中某种形式的[[正态分布]]。这是通过[[向量化]]变换实现的，它将似然从矩阵<math>\mathbf{B}, \hat{\mathbf{B}}</math>的函数变换为向量<math>\boldsymbol\beta = \operatorname{vec}(\mathbf{B}), \hat{\boldsymbol\beta} = \operatorname{vec}(\hat{\mathbf{B}})</math>的函数。

<math display="block">\operatorname{tr}((\mathbf{B} - \hat{\mathbf{B}})^\mathsf{T}\mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B} - \hat{\mathbf{B}}) \boldsymbol\Sigma_\epsilon^{-1}) = \operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}})^\mathsf{T} \operatorname{vec}(\mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B} - \hat{\mathbf{B}}) \boldsymbol\Sigma_{\epsilon}^{-1} )</math>

令
<math display="block"> \operatorname{vec}(\mathbf{X}^\mathsf{T} \mathbf{X}(\mathbf{B} - \hat{\mathbf{B}}) \boldsymbol\Sigma_{\epsilon}^{-1} ) =  (\boldsymbol\Sigma_{\epsilon}^{-1} \otimes \mathbf{X}^\mathsf{T}\mathbf{X} )\operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}}), </math>
其中<math>\mathbf{A} \otimes \mathbf{B}</math>表示矩阵'''A'''、'''B'''的[[克罗内克积]]，其是[[外积]]的推广。

则
<math display="block">\begin{align}
&\operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}})^\mathsf{T} (\boldsymbol\Sigma_{\epsilon}^{-1} \otimes \mathbf{X}^\mathsf{T}\mathbf{X} )\operatorname{vec}(\mathbf{B} - \hat{\mathbf{B}}) \\
&= (\boldsymbol\beta - \hat{\boldsymbol\beta})^\mathsf{T}(\boldsymbol\Sigma_{\epsilon}^{-1} \otimes \mathbf{X}^\mathsf{T}\mathbf{X} )(\boldsymbol\beta-\hat{\boldsymbol\beta})
\end{align}</math>
产生的似然在<math>(\boldsymbol\beta - \hat{\boldsymbol\beta})</math>中正态。

有了更易理解的似然，就可以找到自然的（条件）共轭先验了。

===共轭先验分布===
由向量化的<math>\boldsymbol\beta</math>得到的自然共轭先验形式为<ref name="BSaM" />
<math display="block">\rho(\boldsymbol\beta, \boldsymbol\Sigma_{\epsilon}) = \rho(\boldsymbol\Sigma_{\epsilon})\rho(\boldsymbol\beta|\boldsymbol\Sigma_{\epsilon}),</math>
其中
<math display="block"> \rho(\boldsymbol\Sigma_{\epsilon}) \sim \mathcal{W}^{-1}(\mathbf V_0,\boldsymbol\nu_0)</math>

<math display="block"> \rho(\boldsymbol\beta|\boldsymbol\Sigma_{\epsilon}) \sim N(\boldsymbol\beta_0, \boldsymbol\Sigma_{\epsilon} \otimes \boldsymbol\Lambda_0^{-1}).</math>

===后验分布===
利用上述先验与似然，可得到后验<ref name="BSaM" />
<math display="block">\begin{align}
\rho(\boldsymbol\beta,\boldsymbol\Sigma_{\epsilon}|\mathbf{Y},\mathbf{X})
\propto{}& |\boldsymbol\Sigma_{\epsilon}|^{-(\boldsymbol\nu_0 + m + 1)/2}\exp{(-\tfrac{1}{2}\operatorname{tr}(\mathbf V_0 \boldsymbol\Sigma_{\epsilon}^{-1}))} \\
&\times|\boldsymbol\Sigma_{\epsilon}|^{-k/2}\exp{(-\tfrac{1}{2} \operatorname{tr}((\mathbf{B}-\mathbf B_0)^\mathsf{T}\boldsymbol\Lambda_0(\mathbf{B}-\mathbf B_0)\boldsymbol\Sigma_{\epsilon}^{-1}))} \\
&\times|\boldsymbol\Sigma_{\epsilon}|^{-n/2}\exp{(-\tfrac{1}{2}\operatorname{tr}((\mathbf{Y}-\mathbf{XB})^\mathsf{T}(\mathbf{Y}-\mathbf{XB})\boldsymbol\Sigma_{\epsilon}^{-1}))},
\end{align}</math>
其中<math>\operatorname{vec}(\mathbf B_0) = \boldsymbol\beta_0</math>。
记<math>\boldsymbol\Lambda_0 = \mathbf{U}^\mathsf{T}\mathbf{U}</math>，涉及<math>\mathbf{B}</math>的项可以分类为
<math display="block">\begin{align}
& \left(\mathbf{B} - \mathbf B_0\right)^\mathsf{T} \boldsymbol\Lambda_0 \left(\mathbf{B} - \mathbf B_0\right) + \left(\mathbf{Y} - \mathbf{XB}\right)^\mathsf{T} \left(\mathbf{Y} - \mathbf{XB}\right) \\
={}& \left(\begin{bmatrix}\mathbf Y \\ \mathbf U \mathbf B_0\end{bmatrix} - \begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf{B}\right)^\mathsf{T} \left(\begin{bmatrix}\mathbf{Y}\\ \mathbf U \mathbf B_0\end{bmatrix}-\begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf{B}\right) \\
={}& \left(\begin{bmatrix}\mathbf Y \\ \mathbf U \mathbf B_0\end{bmatrix} - \begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf B_n\right)^\mathsf{T}\left(\begin{bmatrix}\mathbf{Y}\\ \mathbf U \mathbf B_0\end{bmatrix}-\begin{bmatrix}\mathbf{X}\\ \mathbf{U}\end{bmatrix}\mathbf B_n\right) + \left(\mathbf B - \mathbf B_n\right)^\mathsf{T} \left(\mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0\right) \left(\mathbf{B}-\mathbf B_n\right) \\
={}& \left(\mathbf{Y} - \mathbf X \mathbf B_n \right)^\mathsf{T} \left(\mathbf{Y} - \mathbf X \mathbf B_n\right) + \left(\mathbf B_0 - \mathbf B_n\right)^\mathsf{T} \boldsymbol\Lambda_0 \left(\mathbf B_0 - \mathbf B_n\right) + \left(\mathbf{B} - \mathbf B_n\right)^\mathsf{T} \left(\mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0\right)\left(\mathbf B - \mathbf B_n\right),
\end{align}</math>
其中
<math display="block">\mathbf B_n = \left(\mathbf{X}^\mathsf{T}\mathbf{X} + \boldsymbol\Lambda_0\right)^{-1}\left(\mathbf{X}^\mathsf{T} \mathbf{X} \hat{\mathbf{B}} + \boldsymbol\Lambda_0\mathbf B_0\right) = \left(\mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0\right)^{-1}\left(\mathbf{X}^\mathsf{T} \mathbf{Y} + \boldsymbol\Lambda_0 \mathbf B_0\right).</math>

现在可以用更有用的形式来写后验：
<math display="block">\begin{align}
\rho(\boldsymbol\beta,\boldsymbol\Sigma_{\epsilon}|\mathbf{Y},\mathbf{X})
\propto{}&|\boldsymbol\Sigma_{\epsilon}|^{-(\boldsymbol\nu_0 + m + n + 1)/2}\exp{(-\tfrac{1}{2}\operatorname{tr}((\mathbf V_0 + (\mathbf{Y}-\mathbf{XB_n})^\mathsf{T} (\mathbf{Y}-\mathbf{XB_n}) + (\mathbf B_n-\mathbf B_0)^\mathsf{T}\boldsymbol\Lambda_0(\mathbf B_n-\mathbf B_0))\boldsymbol\Sigma_{\epsilon}^{-1}))} \\
&\times|\boldsymbol\Sigma_{\epsilon}|^{-k/2}\exp{(-\tfrac{1}{2}\operatorname{tr}((\mathbf{B}-\mathbf B_n)^\mathsf{T} (\mathbf{X}^T\mathbf{X} + \boldsymbol\Lambda_0) (\mathbf{B}-\mathbf B_n)\boldsymbol\Sigma_{\epsilon}^{-1}))}.
\end{align}</math>

其形式为[[逆威沙特分布]]乘以[[矩阵正态分布]]：
<math display="block">\rho(\boldsymbol\Sigma_{\epsilon}|\mathbf{Y},\mathbf{X}) \sim \mathcal{W}^{-1}(\mathbf V_n,\boldsymbol\nu_n)</math>
<math display="block"> \rho(\mathbf{B}|\mathbf{Y},\mathbf{X},\boldsymbol\Sigma_{\epsilon}) \sim \mathcal{MN}_{k,m}(\mathbf B_n, \boldsymbol\Lambda_n^{-1}, \boldsymbol\Sigma_{\epsilon}).</math>

此后验的参数由下式给出
<math display="block">\mathbf V_n = \mathbf V_0 + (\mathbf{Y}-\mathbf{XB_n})^\mathsf{T}(\mathbf{Y}-\mathbf{XB_n}) + (\mathbf B_n - \mathbf B_0)^\mathsf{T}\boldsymbol\Lambda_0(\mathbf B_n-\mathbf B_0)</math>
<math display="block">\boldsymbol\nu_n = \boldsymbol\nu_0 + n</math>
<math display="block">\mathbf B_n = (\mathbf{X}^\mathsf{T}\mathbf{X} + \boldsymbol\Lambda_0)^{-1}(\mathbf{X}^\mathsf{T} \mathbf{Y} + \boldsymbol\Lambda_0\mathbf B_0)</math>
<math display="block">\boldsymbol\Lambda_n = \mathbf{X}^\mathsf{T} \mathbf{X} + \boldsymbol\Lambda_0</math>

==另见==
* {{link-en|贝叶斯线性回归|Bayesian linear regression}}
* [[矩阵正态分布]]

==参考文献==
{{Reflist}}
* {{cite book |authorlink= George E. P. Box |last= Box |first= G. E. P. |author2-link=George Tiao |last2= Tiao |first2= G. C. |year= 1973 |title= Bayesian Inference in Statistical Analysis |chapter= 8 |publisher= Wiley |isbn= 0-471-57428-7 }}
* {{cite journal|last= Geisser|first= S. |year= 1965 |title= Bayesian Estimation in Multivariate Analysis |url= https://archive.org/details/sim_annals-of-mathematical-statistics_1965-02_36_1/page/150|journal= [[The Annals of Mathematical Statistics]] |volume= 36 |issue= 1 |pages= 150&ndash;159 |jstor= 2238083}}
* {{cite journal|last= Tiao |first= G. C. |last2= Zellner |first2= A. |year= 1964 |title= On the Bayesian Estimation of Multivariate Regression |journal= [[Journal of the Royal Statistical Society. Series B (Methodological)]] |volume= 26 |issue= 2 |pages= 277&ndash;285 |jstor= 2984424}}

{{DEFAULTSORT:Bayesian Multivariate Linear Regression}}
[[Category:贝叶斯推断]]