MCEM-MH Algorithm

- 14 mins

1 Summary

The main purpose of this paper Brooks S P, Morgan B J, Ridout M S, et al.(1997) is to consider the use of finite mixture models for six overdispersed data sets recording fetal control mortality in mouse litters. A standard approach are employed to describe the data by means of a beta-binomial model or to use quasi-likelihood methods. The major conclusion of this paper is that, while these approached may provide an acceptable desription of the main body of the data, they could conceal a more comlicated mixture structure, which in turn exerts influence on the beta-binomial model parameter estimates.

There is too much overdispersion in the data for a single binomial model to provide anything other than a crude description in all six cases. The beta-binomial and correlated-binomial models provide an adequate description when compared with the beta-correalated-binomial for AVSS data set. But for the other five data sets, the correlated-binomial model can be significantly improved by the beta-correlated-binomial model.

We have shown that it is important to proceed further than just using a standard model, such as the beta-binomial model, if the data set is sufficiently large to merit this. However, even for small data sets, a mixture of two binomial distributions may well provide a good model, especially if a biological basis can be found for the components of the mixture. The value of nonparametric mixture approach is that it places a limit on the number of terms that need to be considered for inclusion in a mixture. For the data sets we have considered, it has not been necessary to fit more than a two-point mixture.

2 Data Analysis Using PQL

## Linear mixed-effects model fit by maximum likelihood
##   Data: obs
##   Log-likelihood: NA
##   Fixed: y ~ 1
## (Intercept)
##   -2.198494
##
## Random effects:
##  Formula: ~1 | litter
##         (Intercept)  Residual
## StdDev:   0.7953074 0.8697554
##
## Variance function:
##  Structure: fixed weights
##  Formula: ~invwt
## Number of Observations: 10533
## Number of Groups: 1328
## The MLE estimate of mu using PQL method is -2.1984941549868
## The MLE estimate of sigma using PQL method is 0.795307393849152

3 Data Analysis Using Numerical Integration

## Generalized linear mixed model fit by maximum likelihood (Adaptive
##   Gauss-Hermite Quadrature, nAGQ = 50) [glmerMod]
##  Family: binomial  ( logit )
## Formula: y ~ (1 | litter)
##    Data: obs
##       AIC       BIC    logLik  deviance  df.resid
##  7146.401  7160.926 -3571.200  7142.401     10531
## Random effects:
##  Groups Name        Std.Dev.
##  litter (Intercept) 0.6793  
## Number of obs: 10533, groups:  litter, 1328
## Fixed Effects:
## (Intercept)  
##       -2.28
## The MLE estimate of mu using Numerical Integration method is -2.27998147524742
## The MLE estimate of sigma using Numerical Integration method is 0.679261533709715

4 MCEM-MH Method

The Monte Carlo E-M algorithm is the EM algorithm based on Monte Carlo approach where the expectation in the E-step is realized by Monte Carlo method. MCEM-MH is MCEM algorithm based on Metropolis-Hastings algorithm. M-H algorithm is used to sample from $f(\alpha \mid y)$ when it is extremely difficult to calculate the density function upto normalizing constants in practice.

5 Analysis using MCMC

plot of chunk plot1 plot of chunk plot2

## The MLE estimate of mu using MCEM-MH method is -2.29746043478105
## The MLE estimate of sigma using MCEM-MH method is 0.714258145517588

where $C_{MH}$ and $D_{MH}$ are Monte Carlo approximations(via Metropolis-Hastings algorithm) of C and D, respectively, using the last set of samples $\alpha^{(d)}{i} \mid y{i}, \theta^{K-1}$ ‘s , $i = 1,…,m$, $d = 1,…,D$ , where K is the last iteration in the MCEM algorithm. Then

and

6 Comparison

##             mu     sigma
## PQL  -2.198494 0.7953074
## NI   -2.279981 0.6792615
## MCEM -2.297460 0.7142581

Reference

Brooks S P, Morgan B J, Ridout M S, et al. Finite mixture models for proportions.[J]. Biometrics, 1997, 53(3):1097-115.

#Appendix

###organize data
library("nlme")
library("MASS")
library("lme4")
library("HLMdiag")
obs <- read.csv("observations.csv")
head(obs)
###2
obs$litter <- as.factor(obs$litter)
head(obs)
pql <- glmmPQL(y ~ 1,random = ~ 1 | litter, family = binomial, data = obs)
mu.pql <- fixef(pql)
sigma.pql <- sqrt(getVarCov(pql)[1])
###3
ni <- glmer(y ~ (1|litter),data = obs,family = binomial, nAGQ = 50)
mu.ni <- fixef(ni)
sigma.ni <- sqrt(varcomp.mer(ni)[["D11"]])
###4
###5
# Project 3:  Guideline for MCEM algorithm with M-H

MCEM.MH <- function(
  max_iter = 500,tolerance = 0.001,
  mh_iter = 3000,mh_burn = 1000)
{
  obs = read.csv("observations.csv")
  obs$litter = factor(obs$litter)

  # Statistics from data
  y_i = split(obs$y, obs$litter)
  sum_y_i = sapply(y_i, sum)
  n_i = sapply(y_i, length)
  m = length(unique(obs$litter))

  # Set up initial values and vector of iterations
  initial =
    glmmPQL(y ~ 1, ~ 1 | litter, family = binomial, data = obs,
            verbose = FALSE)
  mu_est = numeric(max_iter)
  mu_est[1] = fixef(initial)[[1]]
  sigma_est = numeric(max_iter)
  sigma_est[1] = sqrt(getVarCov(initial)[[1]])

  alpha = matrix(0, mh_iter, m)

  for (k in 2:max_iter) {
    # E Step
    # ======
    # Calculate Q using MH.
    for (d in 2:mh_iter) {
      draw = rnorm(m,0,sigma_est[k-1])

      a = pmin(1,(((1+exp(mu_est[k-1] + alpha[d-1,]))/(1+exp(mu_est[k-1] +
          draw)))^n_i)* exp((draw - alpha[d-1,])*sum_y_i))

      is_accepted = runif(m, 0, 1) < a
      alpha[d, ] = ifelse(is_accepted, draw, alpha[d - 1, ])
    }

    alpha_keep = alpha[(mh_burn + 1):mh_iter, ]

    Q = function(mu) {
      expit = exp(mu + alpha_keep) / (1 + exp(mu + alpha_keep))
      A = colSums(log(expit))
      B = colSums(log(1 - expit))
      (sum_y_i %*% A + (n_i - sum_y_i) %*% B) / nrow(alpha_keep)
    }

    # On next EM iteration, start MH from last value, so MH converges faster.
    alpha[1, ] = alpha[mh_iter, ]

    # M Step
    # ======
    # Maximize Q to get estimates for mu and sigma.
    mu_est[k] = optimize(Q, interval = c(-4, -1), maximum = TRUE)$maximum
    sigma_est[k] = sqrt(mean(alpha_keep^2))

    # Check for convergence.
    # Print out estimates from each iteration until convergence.
    cat(sprintf("Iteration %i: Mu %.4f, Sigma %.4f\n",
                k, mu_est[k], sigma_est[k]))

    tol = max(abs(mu_est[k] - mu_est[k - 1]), abs(sigma_est[k] - sigma_est[k - 1]))
    if (tol < tolerance)      {
      mu.mcem <- mu_est[k]
      sigma.mcem <- sigma_est[k]
      break
    }
  }

  # Make some plots.
  plot(mu_est[1:k],type = 'l')

  plot(sigma_est[1:k],type = "l")
  return(list(mu.mcem,sigma.mcem))
}

mcem <- MCEM.MH()
mu.mcem <- mcem[[1]]
sigma.mcem <- mcem[[2]]
optimize
###6 Comparison
mu <- c(mu.pql,mu.ni,mu.mcem)
sigma <- c(sigma.pql,sigma.ni,sigma.mcem)
para <- cbind(mu,sigma)
rownames(para) <- c("PQL","NI","MCEM")
para
Ying Zhang

Ying Zhang

A statistician who gets lost in analysis.

comments powered by Disqus
rss facebook twitter github youtube mail spotify instagram linkedin google google-plus pinterest medium vimeo stackoverflow reddit quora