# MCEM-MH Algorithm

- 14 mins

## 1 Summary

The main purpose of this paper Brooks S P, Morgan B J, Ridout M S, et al.(1997) is to consider the use of finite mixture models for six overdispersed data sets recording fetal control mortality in mouse litters. A standard approach are employed to describe the data by means of a beta-binomial model or to use quasi-likelihood methods. The major conclusion of this paper is that, while these approached may provide an acceptable desription of the main body of the data, they could conceal a more comlicated mixture structure, which in turn exerts influence on the beta-binomial model parameter estimates.

There is too much overdispersion in the data for a single binomial model to provide anything other than a crude description in all six cases. The beta-binomial and correlated-binomial models provide an adequate description when compared with the beta-correalated-binomial for AVSS data set. But for the other five data sets, the correlated-binomial model can be significantly improved by the beta-correlated-binomial model.

We have shown that it is important to proceed further than just using a standard model, such as the beta-binomial model, if the data set is sufficiently large to merit this. However, even for small data sets, a mixture of two binomial distributions may well provide a good model, especially if a biological basis can be found for the components of the mixture. The value of nonparametric mixture approach is that it places a limit on the number of terms that need to be considered for inclusion in a mixture. For the data sets we have considered, it has not been necessary to fit more than a two-point mixture.

## 4 MCEM-MH Method

The Monte Carlo E-M algorithm is the EM algorithm based on Monte Carlo approach where the expectation in the E-step is realized by Monte Carlo method. MCEM-MH is MCEM algorithm based on Metropolis-Hastings algorithm. M-H algorithm is used to sample from $f(\alpha \mid y)$ when it is extremely difficult to calculate the density function upto normalizing constants in practice.

• E-M algorithm

• Select starting values $\theta^{(0)} = (\mu^{(0)},\sigma^{(0)})$
• E-step: Calculate $Q(\theta \mid \theta^{(k)}) = E{log*f*(y,\alpha \mid \theta)\mid y,\theta^{(k)}}$
• M-step: Determine $\theta^{k+1} = ^{\arg\max}_{\theta} Q(\theta \mid \theta^{(k)})$
• Set k = k+1, and return to E-step until convergence is desired,
• e.g. Stop when $\sqrt{(\mu^{(k)} - \mu^{(k-1)})^{2} + (\sigma^{(k)} - \sigma^{(k-1)})^{2}} < 0.001$
• Note that, the expectation in the E-step cannot be computed in closed form. Here we use the M-H algorithm to construct Monte Carlo approximation for the E-step.
• MH algorithm

• Given a jumping distribution q(x,y) such that $q(x,\dot)$ is a pdf(or pmf) for every x and has symmetric properties, i.e. q(x,y) = q(y,x).
• Given the current value $X_{t} = x$, a candidate value for $X_{t+1}$, say y, is sampled from $q(x,\dot)$.
• A decision is then made on whether or not to “jump” to the candidate, based on acceptance function:$\gamma = \frac{f(y)q(y,x)}{f(x)q(x,y)}\wedge 1$
• Then $X_{t+1} = \begin{cases} y \quad \text{with prob. } \gamma \\ x \quad \text{with prob. } 1-\gamma\\ \end{cases}$
• MCEM-MH algorithm

• This generates a MC, with the transition kernel $K(x,y) = \alpha q(x,y) + 1 - p(x)\delta_{x}(y)$ where $p(x) = \int \alpha q(x,y) dy$

## 5 Analysis using MCMC  • We have $(\hat{\theta} - \theta) \dot{\sim} *N* (0,I^{-1}(\theta))$. Let $l_{i}(\theta) = log *f*(y_{i}\mid \theta)$ be the log-likelihood of $(y_{i1},y_{i2},…,y_{in_{i}})’$. Although I($\hat{\theta}$) estimates I(${\theta}$), I($\hat{\theta}$) can be difficult to evaluate directly. But we can use method to estimate I($\theta$).

$I(\theta) \approx I_{OC}(\theta) = - E[\frac{\partial^{2}}{\partial \theta \partial \theta '}log\ast f\ast (y, \alpha \mid \theta)\mid y, \theta]$ $= - E[\frac{\partial^{2}}{\partial \theta \partial \theta '}log\ast f\ast (y \mid \alpha \theta)\mid y, \theta ] - E[\frac{\partial^{2}}{\partial \theta \partial \theta '}log\ast f\ast (\alpha \mid \theta)\mid y, \theta ]$ $\approx C_{MH} + D_{MH}$ $\approx \hat{C}_{MH} + \hat{D}_{MH}$

where $C_{MH}$ and $D_{MH}$ are Monte Carlo approximations(via Metropolis-Hastings algorithm) of C and D, respectively, using the last set of samples $\alpha^{(d)}{i} \mid y{i}, \theta^{K-1}$ ‘s , $i = 1,…,m$, $d = 1,…,D$ , where K is the last iteration in the MCEM algorithm. Then

$\hat{C}_{\text{MH}} = C_{\text{MH}}\mid _{ \theta = \hat{\theta}}$ and

• To obtain the s.e. in a practical way, take the square roots of the inverse of the estimated $I(\theta)$.

## 6 Comparison

• The PQL and NI estimates are more stable, that is the estimates are fixed. But MCEM estimates vary each time we run the algorithm.

• The estimates of each method differ from each other.

• MCEM algorithm using MH took much longer than the other two methods to obtain the estimators.

• The PQL method might not be consistent. NI method can be consistent when nAGQ is large enough.

## Reference

Brooks S P, Morgan B J, Ridout M S, et al. Finite mixture models for proportions.[J]. Biometrics, 1997, 53(3):1097-115.

#Appendix