Data analysis and machine learning

**Due date:** 2022-03-11 17:00 (Melbourne time) unless by prior arrangement

Your submission should be in the form of a PDF that includes relevant figures. The PDF can be compiled from \(\LaTeX\) or outputted by Jupyter notebook, or similar. You must also submit code scripts that reproduce your work in full.

Marks will depend on the results/figures that you produce, **and** the clarity and depth of your accompanying interpretation. Don't just submit figures and code! You must demonstrate understanding and justification of what you have submitted. Please ensure figures have appropriate axes, that you have adopted sensible mathematical nomenclature, *et cetera*.

There are 50 marks allocated to this problem set, but the problem set accounts for 12% of your final grade. For example, if you receive 72 marks on this problem set, then you will contribute 10.8% to your final grade.

\[ \newcommand{\transpose}{^{\scriptscriptstyle \top}} \newcommand{\vec}[1]{\mathbf{#1}} \]In total there are five questions in this problem set, with a total of 50 marks available.

*(Total 10 marks available)*

Rosenbrock's function is \[ f(x, y) = (1 - x)^2 + 100(y - x^2)^2 \quad . \]

Calculate \(\log{f(x,y)}\) on a grid of \(x\) and \(y\) values from \(x \in [-2, 2]\) and \(y \in [-2, 2]\) with 250 grid points in each direction. Plot the \(\log{f(x,y)}\) contours and include a colour bar to show the value of \(\log{f(x,y)}\).

Starting from an initial value of \((x,y) = (-1, -1)\), find the minima given by Rosenbrock's function using

- the BFGS (or L-BFGS-B) algorithm,
- the Nelder-Mead algorithm (also known as the simplex algorithm), and
- an algorithm of your choice.

For each algorithm, record the \((x,y)\) points trialled by the algorithm and plot these on the figure you have made. This will show the path of both optimisation algorithms. Be sure to include a legend so it is clear which path corresponds to which algorithm.

**Hot Tip**: You can record the positions trialled by an optimisation algorithm using an *example* like this:

*(Total 5 marks available)*

Use the following code to generate some fake data that are drawn from the model \[ y \sim \mathcal{N}\left(\theta_0 + \theta_1x + \theta_2x^2 + \theta_3x^3,\sigma_{y}\right) \]

Even though this model includes \(x^2\) and \(x^3\) terms, this is still a **linear** problem that you can solve with linear algebra! Do you see why that is? (This is the generalised representation of ordinary least-squares!)

Let us assume that the generated data have no uncertainties in the \(x\)-direction, and that the uncertainties in the \(y\)-direction are normally distributed. Cast this problem with matrices and find a point estimate of the best-fitting model parameters. Make a plot showing the data, the true model, and the model using the best-fitting model parameters.

*(Total 10 marks available)*

In this question we will use a Leapfrog integrator to integrate the motion of a particle in a potential. This will be used in Question 4 when you implement a Hamiltonian Monte Carlo sampler. Recall that at time \(t\) a particle in some system with a location \(\vec{x}(t)\) and a momentum \(\vec{p}(t)\) can be fully described by the Hamiltonian \[ \mathcal{H}\left(\vec{x},\vec{p}\right) = U\left(\vec{x}\right) + K\left(\vec{p}\right) \] where the sum of gravitational \(U\) and kinetic \(K\) energy remains constant with time. Let \[ K(\vec{p}) = \frac{\vec{p}\transpose\vec{p}}{2} \] and \[ U(\vec{x}) = -\log{p(\vec{x})} \] where \(\log{p(\vec{x})}\) is short-hand notation for the log posterior probability for the model parameters \(\vec{x}\) (which we normally denote as \(\vec{\theta}\)) \[ \log{p(\vec{x})} \equiv \log{p(\vec{\theta}|\vec{y})} \quad . \] The time evolution of this particle is given by \[ \frac{d\vec{x}}{dt} = \frac{\partial\mathcal{H}}{\partial\vec{p}} = \vec{p} \] and \[ \frac{d\vec{p}}{dt} = -\frac{\partial\mathcal{H}}{\partial\vec{x}} = -\frac{\partial{}\log{p(\vec{x})}}{\partial{}\vec{x}} \quad . \] It is clear that that to integrate the motion of a particle in this system we will need the derivative of \(\log{p(\vec{x})}\) with respect to \(\vec{x}\).

Let us assume a model where we have a single data point \(y\) that is drawn from the model \[ y \sim \mathcal{N}\left(y_t, \sigma_y\right) \] where our only model parameter is the true value \(y_t\). The single data point is \((y, \sigma_y) = (1.2, 1)\).

The negative log posterior probability function for this model can be written as:

Make a plot showing the particle position (x-axis) and its momentum (y-axis) at every step of the integrated orbit. In another plot, show the total energy of the system with each step. Is the energy conserved? Why or why not?

*(Total 15 marks available)*

Use the potential and momentum distribution defined in Question 3 and build a Hamiltonian Monte Carlo sampler for the model defined in Question 3. Run your sampler for at least 1,000 steps, ensuring that you are taking appropriate number of steps (and step sizes) in the Leapfrog integration. Start the sampler from an initial value of \(x = -10\). Make a plot of the MCMC chain, and a histogram of the posterior on \(y_t\). Has the chain converged?

*(Total 10 marks available)*

Implement the same model in Question 4 using PyMC or Stan. Use the default sampling algorithm and default hyperparameters to sample the posteriors. Make a plot showing the chains produced by your sampler and those from PyMC or Stan. Comment on any differences.