Topic 3 Introduction to EM
3.1 Introduction
Given condition:
We have observed variables \(X\), but we believe (assume) that there are some “latent” variables \(Z\) that we cannot observe.
Usually, when given we only have knowledge about observed variables \(X\), we may want to calculate the MLE for the parameter of interest \(\theta\) in this case:
\[ \hat{\theta}_{MLE} = \underset{\theta}{\operatorname{argmax}} \log {L(\theta)} \]
However, for some problems, it is hard to solve the above equation.
→ We use EM Algorithm instead!
3.2 EM Algorithm Given Condition & Limitation
Given condition:
We have observed variables \(X\), but we believe (assume) that there are some “latent” variables *\(Z\) that we cannot observe.
Assume some parameterized distribution for the latent variables \(Z\) or \(p(Z\mid X, \theta)\)
Limitation:
EM estimate is only guaranteed to never get worse, but won’t necessarily find the global optimum of the likelihood
In practice, start EM from multiple initial guesses and choose the one with the largest likelihood as the final guess for \(\theta\)
3.3 EM Steps (5-step version)
EM algorithm is usually described as two steps (the E-step and the M-step), but we think it’s helpful to think of EM as five distinct steps:
Pick an initial guess for \(\theta^{t = 0}\) for the parameter of interest \(\theta\)
Given the observed variable \(X\), calculate the conditional distribution of the observed variable \(X\) and latent variable \(Z\): \(p(X, Z \mid \theta^{t})\)
Find the Q-function, which is the expected \(\log {p(X,Z\mid \theta)}\) by using the conditional distribution of latent variable \(Z\) from the last step \[ \begin{align} Q(\theta \mid \theta^t) & = expected\; \log {p(X,Z\mid \theta)} \\ & = E_{Z\mid x,\theta^t} [\log{p(X,Z\mid \theta)}] \\ & = \int^{all\; Z} \log{p(X,Z \mid \theta)p(Z\mid X,\theta^t)} \,dZ \end{align} \]
Find the new guess \(\theta^{t+1}\) by maximizing Q-function above
Let \(t = t+1\) and go back to Step 2
3.4 EM Steps (2-key-step version)
We also include the E-step and M-step version of EM algorithm below for reference:
Start with random guess for \(\theta\)
E step (finding the Q-function): \(Q(\theta \mid \theta^t) = E_{Z|X,\theta^m} [\log{p(X, Z \mid \theta)}]\)
M step (finding the new guess for \(\theta\) by maximizing the Q-function): \(\theta^{t+1} = \underset{\theta}{\operatorname{argmax}} \: Q(\theta \mid \theta^t)\)
Repeat until \(|\theta^{t+1} - \theta^m| < \epsilon\)