Problem Set 5

Data analysis and machine learning

Due date: 2021-03-18 17:00 (Melbourne time) unless by prior arrangement

Your submission should be in the form of a PDF that includes relevant figures. The PDF can be compiled from \(\LaTeX\) or outputted by Jupyter notebook, or similar. You must also submit code scripts that reproduce your work in full.

Marks will depend on the results/figures that you produce, and the clarity and depth of your accompanying interpretation. Don't just submit figures and code! You must demonstrate understanding and justification of what you have submitted. Please ensure figures have appropriate axes, that you have adopted sensible mathematical nomenclature, et cetera.

\[ \newcommand{\transpose}{^{\scriptscriptstyle \top}} \newcommand{\vec}[1]{\mathbf{#1}} \]

In total there are 4 questions in this problem set, with a total of 40 marks available.

Question 1

(Total 10 marks available)

In this file you will find a set of 100 observations. Each observation has some features \(x_1\) and \(x_2\), and some classification of 0 or 1.

Plot the data \(x_1\) and \(x_2\), coloured by the classification \(c\). Your task is to build a neural network that will predict some classification given \(x_1\) and \(x_2\). Your network should have one hidden layer, with three neurons in the hidden layer. Chose an activation function. Draw the network, labelling inputs and weights. Derive the updated estimates for the weights by finding the derivatives of the loss function with respect to the weights.

Question 2

(Total 10 marks available)

Implement the neural network from Question 1 using numpy. Train the neural network using an appropriate portion of the data, and plot the training and test loss as a function of epoch. (You will need to make appropriate decisions regarding learning rates, initialisations, data segmentation, and number of epochs. Be sure to comment on these decisions.)

Question 3

(Total 10 marks available)

Implement the neural network from Question 1 using a neural network packaage of your choice (e.g., keras/TensorFlow, PyTorch). Make the same plot as you did for Question 2.

Question 4

(Total 10 marks available

A colleague is trying to train a neural network to make some classifications of objects. They have tried many different choices of initialisation, data splits, and network architecture, and they have encountered different problems for each choice they have made. For each of the situations below, the colleague has asked your opinion on what might be causing the problem, and what to do about it.

The training loss is decreasing with epoch but the test loss is unchanged, maybe even increasing.
The training loss is decreasing, but so slowly that it is going to take a lifetime to train!
The training and test loss is increasing!
I've run for 1,000 epochs and the training loss is exactly the same as it was in the first epoch.
The training loss has decreased, and the test loss has decreased, but the test loss has not decreased as much as the training loss. Is there something I could do to improve the network predictions in a generalised way so it performs better on unseen data?