In this assignment, your task will be to implement and analyze inference in Bayesian Gaussian Mixture Model (GMM) as described in the slides on Approximate Inference in Bayesian Models. The preferred and easiest way of accomplishing this task is to complete this Jupyter Notebook, which already comes with training data definition and a code for Maximum Likelihood (ML) training of a contrastive GMM system. If you do not have any experience with Jupyter Notebook, the easiest way to start is to install Anaconda3, run Jupyter Notebook and open this notebook downloaded from BAYa_Assignment2019.ipynb. You can also benefit from reusing pieces of code from the Jupyter Notebooks provided for this class, namely: vb_gmm_training.ipynb and gs_gmm_training.ipynb.
The following cell contains a code with the definition of training data and the contrastive system training. You should not edit this part! The code does the following:
import numpy as np
import scipy.stats as sps
import matplotlib.pyplot as plt
from scipy.special import logsumexp
from scipy.special import digamma
%matplotlib inline
#%matplotlib qt5 #get an extra window for the plots
plt.rcParams.update({'figure.figsize': (10.0, 8.0), 'font.size': 18}) #Make the plots bigger
def GMM_pdf(x, mus, vars, pis):
# Function for evaluating univariate GMM at values given by vector 'x'
# 'mus', 'vars', 'pis' are vectors of means, variances and weights of individual Gaussian components
# 'x[:,np.newaxis]' makes 'x' column vector to evaluate every coefficient of 'x' w.r.t. every Gaussian
# component (row vectors 'mus' and 'vars') at once
return sps.norm.pdf(x[:,np.newaxis], mus, np.sqrt(vars)).dot(pis)
#1. Handcraft some GMM parameter
mus_true = [-4.0, 0.0, 4.0, 5]
vars_true = [1.0, 1.96, 1.44, 1]
pis_true = [0.1, 0.4, 0.2, 0.3]
#2. Pregenerated N datapoints from the GMM distribution above
x=np.array([ -3.07371088e+00, -4.14348725e+00, -3.01490821e+00, -3.54303388e+00, -3.48708234e+00, -3.59262207e+00,
-5.76100178e+00, -5.02726789e+00, 8.28817153e-04, 4.99898450e-01, 2.83745605e-01, 6.71947042e-01,
7.66679495e-01, 6.96995763e-01, 1.33546855e+00, 6.03847649e-01, -1.05992122e+00, 2.91024229e+00,
-2.12682520e+00, 8.33533885e-01, 1.77147857e+00, 7.37629536e-01, -1.25040836e+00, 1.87318623e+00,
-4.14582880e-01, 5.05680493e-01, 1.63091140e+00, 6.63219549e-01, -3.30841863e-01, -1.21874646e+00,
2.64384057e+00, -4.32674840e-01, -1.79034947e+00, 3.13567565e-01, -5.43911715e-01, 2.28572951e+00,
9.55653291e-01, -5.43582974e-01, -2.73850574e-01, -1.50383720e+00, 1.15925073e-01, 3.92541838e+00,
-1.57267817e+00, 4.43581114e-01, -8.11856886e-01, 2.62442641e+00, -4.36298436e-01, -6.72286580e-01,
1.52223784e+00, 1.25033658e+00, 4.88645989e+00, 2.96110183e+00, 4.74249957e+00, 2.17531545e+00,
3.43072143e+00, 3.49315547e+00, 2.51223591e+00, 2.55369053e+00, 2.93122261e+00, 6.40247818e+00,
5.12748233e+00, 4.08277439e+00, 4.96716209e+00, 1.56304959e+00, 4.31869585e+00, 2.07957592e+00,
4.56265393e+00, 3.74342366e+00, 4.36177483e+00, 5.21824922e+00, 4.94100019e+00, 4.70062989e+00,
6.12111884e+00, 6.69125720e+00, 5.03104495e+00, 5.72199065e+00, 4.29367941e+00, 3.72747772e+00,
4.41461701e+00, 5.48741263e+00, 4.56782193e+00, 6.45701533e+00, 5.49118936e+00, 4.25947605e+00,
3.39235348e+00, 4.10586407e+00, 2.76696554e+00, 6.66059909e+00, 6.00107916e+00, 5.92828295e+00,
4.97460855e+00, 2.77746143e+00, 2.99416076e+00, 5.24459233e+00, 6.44720235e+00, 4.71084807e+00,
5.62475093e+00, 3.76422931e+00, 5.79482964e+00, 5.11432194e+00])
#3. Choose some initial GMM parameters for EM training
C = 6 # number of GMM components
mus_ml = x[:C] # We choose few first observations as the initial means
vars_ml = np.repeat(np.var(x), C) # Variances for all components are set to the global variance of the training data
pis_ml = np.ones(C)/C # All component weights are set to the same value 1/C
#EM algorithm for ML GMM training
for _ in range(1000):
#E-step
log_p_xz = sps.norm.logpdf(x[:,np.newaxis], mus_ml, np.sqrt(vars_ml)) + np.log(pis_ml)
log_p_x = logsumexp(log_p_xz, axis=1, keepdims=True)
gammas_ml = np.exp(log_p_xz - log_p_x)
#M-step
Nc = gammas_ml.sum(axis=0)
mus_ml = x.dot(gammas_ml) / Nc
vars_ml = (x**2).dot(gammas_ml) / Nc - mus_ml**2
pis_ml = Nc / Nc.sum()
#4. Plot the true GMM, ML trained GMM and the observations
t = np.linspace(-10,10,1000)
true_GMM_pdf = GMM_pdf(t, mus_true, vars_true, pis_true)
ml_GMM_pdf = GMM_pdf(t, mus_ml, vars_ml, pis_ml);
plt.plot(t, true_GMM_pdf, 'gray')
plt.plot(t, ml_GMM_pdf, 'k')
plt.plot(x, np.zeros_like(x), '+k');
For for the Bayesian GMM, use the following setting of the prior parameters!
# Parameters of NormalGamma prior over means (mu) and precision (lambda)
m0, kappa0, a0, b0=[0.0, 0.05, 0.05, 0.05]
#Parameters of Dirichlet prior weights
alpha0=1.0
Now, include code the Variational Bayes (VB) approximate inference in Bayesian GMM.
Use the responsibilities obtained from the ML training (EM algorithm implemented above; variable 'gammas_ml') to initialize $q(\zz)$. This implies that we use the same number of Gaussian components $C=6$ as for the ML training.
Run iteratively the updates of $q(\mmu,\llambda)$, $q(\ppi)$ and $q(\zz)$ until the algorithm converges.
Once the VB inference converges:
Plot the estimated approximate distributions $q(\mu_c,\lambda_c)$ for all the Gaussian components $c \in \{1\dts C\}$. You can reuse the code for plotting NormalGamma distribution from vb_gmm_training.ipynb notebook.
Print also the vector of parameters $\aalpha^*$ of the approximate posterior distribution of the weights $q(\ppi)$.
Analyze these results. What can you say about the obtained posteriors distributions? What do they represent? How do these posterior distribution compare to the parameter estimates obtained from the EM algorithm?
Generate several (say 50) samples from the approximate posterior distribution $q(\ppi,\mmu,\llambda) = q(\ppi) \prod_{c=1}^C q(\mu_c,\lambda_c)$. Each such sample is set of parameters of one GMM. Plot the GMM distributions corresponding to all the samples into a single figure. Comment on this plot. What do the individual GMM distributions represent?
Now, average all the GMM distributions from the previous step. Attention! Average the GMM distributions, not their parameters as sampled from $q(\ppi,\mmu,\llambda)$. What can you say about the obtained average distribution? What does it represent?
#Your code goes here :)
Now estimate and plot the VB (approximate) posterior predictive distribution. How does the resulting posterior predictive distribution compare to:
#Your code for posterior predictive distribution
Now repeat the whole excercise, but this time using the inference based on Gibbs sampling (not the collapsed one).
Start from fixed $\zz$ initialized as the most likely assignments of observations to GMM components according the the ML training (i.e. assignment will be again derived from variable 'gammas_ml').
In each iteration, plot the GMM distribution corresponding to the sample from $p(\ppi,\mmu,\llambda|\zz, \xx)$. What do the individual GMM distributions represent? How do they compare to the samples from the VB approximate posterior $q(\ppi,\mmu,\llambda)$?
Average the distributions from the previous step to obtain an approximation to the posterior predictive distribution. Plot this distribution. How does it compare to the VB posterior predictive distribution? Is it any different? If yes, why?
#Your code for Gibbs sampling inference
Generate larger number of training observations (e.g. 1000 samples) from the true distribution (mus_true, vars_true, pis_true) and repeat the whole experiment once more (for both VB and Gibbs sampling inference) with the larger amount of training data. Regenerate all the plots and comment on how they change from the previous experiments with smaller training data set.
#Your code
Try to implement as much as you can and answer as many questions as you can. It is fine if you do not understand some questions or if you do not know some of the answers.