# Problem Set 3

Data analysis and machine learning

Due date: 2021-03-18 17:00 (Melbourne time) unless by prior arrangement

Your submission should be in the form of a PDF that includes relevant figures. The PDF can be compiled from $$\LaTeX$$ or outputted by Jupyter notebook, or similar. You must also submit code scripts that reproduce your work in full.

Marks will depend on the results/figures that you produce, and the clarity and depth of your accompanying interpretation. Don't just submit figures and code! You must demonstrate understanding and justification of what you have submitted. Please ensure figures have appropriate axes, that you have adopted sensible mathematical nomenclature, et cetera.

$\newcommand{\transpose}{^{\scriptscriptstyle \top}} \newcommand{\vec}{\mathbf{#1}}$

In total there are 3 questions in this problem set, with a total of 60 marks available.

### Question 1

(Total 10 marks available)

Draw a probabilistic graphical model for the model you specified in Question 7 of Problem Set 1.

### Question 2

(Total 20 marks available)

Generate $$N = 1000$$ data points that are drawn from a mixture of $$K = 5$$ one-dimensional Gaussians. You can either set the model parameters $$\vec{\theta} = \{\vec{\mu},\vec{\sigma},\vec{\pi}\}$$ yourself, or set them randomly.

If you set them randomly, remember to seed the random number generator directly before creating the random values.

#### Question 2, Part A

Specify the model.

#### Question 2, Part B

Choose an initial guess for $$\vec{\theta}$$ that is 'far' away from the true values.

Provide and explain the equation for membership probabilities for each of the $$N$$ data points to each of the $$K$$ mixtures, and calculate these membership probabilities conditioned on the initial estimate of the model parameters given above.

Provide and explain the equation for updating the model parameters, conditioned on some set of membership probabilities. Calculate new estimates of the model parameters $$\vec{\theta} = \{\vec{\mu}, \vec{\sigma}, \vec{\pi}\}$$ conditioned on the membership probabilities that you have just calculated.

#### Question 2, Part C

Write code to calculate what you did in Question 2, Part B, and alternate between the Expectation and Maximization step for 100 iterations. Store the log likelihood with every iteration. Make a plot showing the log likelihood as a function of E-M step.

#### Question 2, Part D

Make a figure showing the data points, and the probability density for each of the $$K$$ mixtures using the model parameters found after 100 E-M steps.

### Question 3

(Total 30 marks available)

There is overwhelming evidence of anthropogenic climate change. This question relates to what can be inferred from only a subset of the evidence available for global warming: a ficticious set of global temperatures spanning 135 years.

In this file you will find 1,000 fictious time series. Each series has length 135, assuming one measurement per year of temperature deviation from the mean, covering the time period from 1880-2014, inclusive. The data were first generated by drawing 1,000 random series (with some homoscedastic noise). Then, some of those series were randomly selected and had a trend added to them. The trends that were added were either +1°C / century or -1°C / century.