MCEMMH Algorithm
 13 mins1 Summary
The main purpose of this paper Brooks S P, Morgan B J, Ridout M S, et al.(1997) is to consider the use of finite mixture models for six overdispersed data sets recording fetal control mortality in mouse litters. A standard approach are employed to describe the data by means of a betabinomial model or to use quasilikelihood methods. The major conclusion of this paper is that, while these approached may provide an acceptable desription of the main body of the data, they could conceal a more comlicated mixture structure, which in turn exerts influence on the betabinomial model parameter estimates.
There is too much overdispersion in the data for a single binomial model to provide anything other than a crude description in all six cases. The betabinomial and correlatedbinomial models provide an adequate description when compared with the betacorrealatedbinomial for AVSS data set. But for the other five data sets, the correlatedbinomial model can be significantly improved by the betacorrelatedbinomial model.
We have shown that it is important to proceed further than just using a standard model, such as the betabinomial model, if the data set is sufficiently large to merit this. However, even for small data sets, a mixture of two binomial distributions may well provide a good model, especially if a biological basis can be found for the components of the mixture. The value of nonparametric mixture approach is that it places a limit on the number of terms that need to be considered for inclusion in a mixture. For the data sets we have considered, it has not been necessary to fit more than a twopoint mixture.
2 Data Analysis Using PQL
3 Data Analysis Using Numerical Integration
4 MCEMMH Method
The Monte Carlo EM algorithm is the EM algorithm based on Monte Carlo approach where the expectation in the Estep is realized by Monte Carlo method. MCEMMH is MCEM algorithm based on MetropolisHastings algorithm. MH algorithm is used to sample from $f(\alpha \mid y)$ when it is extremely difficult to calculate the density function upto normalizing constants in practice.

EM algorithm
 Select starting values $\theta^{(0)} = (\mu^{(0)},\sigma^{(0)})$
 Estep: Calculate $Q(\theta \mid \theta^{(k)}) = E{log*f*(y,\alpha \mid \theta)\mid y,\theta^{(k)}}$
 Mstep: Determine $\theta^{k+1} = ^{\arg\max}_{\theta} Q(\theta \mid \theta^{(k)})$
 Set k = k+1, and return to Estep until convergence is desired,
 e.g. Stop when $\sqrt{(\mu^{(k)}  \mu^{(k1)})^{2} + (\sigma^{(k)}  \sigma^{(k1)})^{2}} < 0.001$
 Note that, the expectation in the Estep cannot be computed in closed form. Here we use the MH algorithm to construct Monte Carlo approximation for the Estep.

MH algorithm
 Given a jumping distribution q(x,y) such that $q(x,\dot)$ is a pdf(or pmf) for every x and has symmetric properties, i.e. q(x,y) = q(y,x).
 Given the current value $X_{t} = x$, a candidate value for $X_{t+1}$, say y, is sampled from $q(x,\dot)$.
 A decision is then made on whether or not to “jump” to the candidate, based on acceptance function:\(\gamma = \frac{f(y)q(y,x)}{f(x)q(x,y)}\wedge 1\)
 Then \(X_{t+1} = \begin{cases} y \quad \text{with prob. } \gamma \\ x \quad \text{with prob. } 1\gamma\\ \end{cases}\)

MCEMMH algorithm
 This generates a MC, with the transition kernel \(K(x,y) = \alpha q(x,y) + 1  p(x)\delta_{x}(y)\) where $p(x) = \int \alpha q(x,y) dy$
5 Analysis using MCMC
 We have $(\hat{\theta}  \theta) \dot{\sim} *N* (0,I^{1}(\theta))$. Let $l_{i}(\theta) = log *f*(y_{i}\mid \theta)$ be the loglikelihood of $(y_{i1},y_{i2},…,y_{in_{i}})’$. Although I($\hat{\theta}$) estimates I(${\theta}$), I($\hat{\theta}$) can be difficult to evaluate directly. But we can use method to estimate I($\theta$).
\(I(\theta) \approx I_{OC}(\theta) =  E[\frac{\partial^{2}}{\partial \theta \partial \theta '}log\ast f\ast (y, \alpha \mid \theta)\mid y, \theta]\) \(=  E[\frac{\partial^{2}}{\partial \theta \partial \theta '}log\ast f\ast (y \mid \alpha \theta)\mid y, \theta ]  E[\frac{\partial^{2}}{\partial \theta \partial \theta '}log\ast f\ast (\alpha \mid \theta)\mid y, \theta ]\) \(\approx C_{MH} + D_{MH}\) \(\approx \hat{C}_{MH} + \hat{D}_{MH}\)
where $C_{MH}$ and $D_{MH}$ are Monte Carlo approximations(via MetropolisHastings algorithm) of C and D, respectively, using the last set of samples $\alpha^{(d)}{i} \mid y{i}, \theta^{K1}$ ‘s , $i = 1,…,m$, $d = 1,…,D$ , where K is the last iteration in the MCEM algorithm. Then
\(\hat{C}_{\text{MH}} = C_{\text{MH}}\mid _{ \theta = \hat{\theta}}\) and
\[\hat{D}_{MH} = D_{MH}\mid _{\theta = \hat{\theta}}\] To obtain the s.e. in a practical way, take the square roots of the inverse of the estimated $I(\theta)$.
6 Comparison

The PQL and NI estimates are more stable, that is the estimates are fixed. But MCEM estimates vary each time we run the algorithm.

The estimates of each method differ from each other.

MCEM algorithm using MH took much longer than the other two methods to obtain the estimators.

The PQL method might not be consistent. NI method can be consistent when nAGQ is large enough.
Reference
Brooks S P, Morgan B J, Ridout M S, et al. Finite mixture models for proportions.[J]. Biometrics, 1997, 53(3):1097115.
#Appendix