Sometimes I want to make a simple (or complex) graphical user interface (GUI) for exploratory data analysis. I use Python, but there are probably better ways to do this. This tutorial serves as a basic walkthrough on how to produce a Python GUI that contains an interactive matplotlib figure.

A caveat before we start: There are a few different options available for GUIs in Python and elsewhere (Tkinter, Qt, Enthought Traits, etc). I’ve tried all of these at some point, but you shouldn’t take my word on what is described below as being “the best way” to do things; this is just what I have found to work best for me. Here is my setup (e.g., things you may need to make this work):

  1. Anaconda Python installation. Both Python 2.7 and 3.x will work, but with 3.x you may need to do conda install pyside after you have installed Anaconda.

  2. Qt 4.8.7

If you have these requirements already and just want to test the GUI, here is the “TL;DR”:

git clone git@github.com:andycasey/pyside-intro.git
cd pyside-intro/
python my_gui.py

Let’s create a GUI

If you have Qt installed, then you should also have Qt Designer. Load it and create a new widget:

pyside-1

On the left you can see widgets, which you can drag and drop the background. Double-click buttons or text to edit them.

pyside-2

In Qt we use layouts and size policies to make things sit in the right spot. First let’s put our buttons into a “horizontal layout” and drop a horizontal spacer before the last button:

pyside-3

To put everything in the right spot (eventually..), right-click on the parent window and give it a “vertical layout”:

pyside-4

pyside-5

Now let’s add a default “widget” at the top, which will later become our matplotlib figure.

pyside-6

Let’s set the size policies so that the widget has MinimumExpanding policy for the vertical direction.

pyside-7

Produce Python code from your widget

In the Qt Designer, save your widget to a file with a .ui extension: my_gui.ui. Now from the terminal we will create a Python script called my_gui.py from our my_gui.ui file:

pyuic4 my_gui.ui > my_gui.py

The my_gui.py file looks like this:

# -*- coding: utf-8 -*-

# Form implementation generated from reading ui file 'my_gui.ui'
#
# Created by: PyQt4 UI code generator Unknown
#
# WARNING! All changes made in this file will be lost!

from PyQt4 import QtCore, QtGui

try:
    _fromUtf8 = QtCore.QString.fromUtf8
except AttributeError:
    def _fromUtf8(s):
        return s

try:
    _encoding = QtGui.QApplication.UnicodeUTF8
    def _translate(context, text, disambig):
        return QtGui.QApplication.translate(context, text, disambig, _encoding)
except AttributeError:
    def _translate(context, text, disambig):
        return QtGui.QApplication.translate(context, text, disambig)

class Ui_Form(object):
    def setupUi(self, Form):
        Form.setObjectName(_fromUtf8("Form"))
        Form.resize(640, 480)
        self.verticalLayout = QtGui.QVBoxLayout(Form)
        self.verticalLayout.setObjectName(_fromUtf8("verticalLayout"))
        self.widget = QtGui.QWidget(Form)
        sizePolicy = QtGui.QSizePolicy(QtGui.QSizePolicy.Preferred, QtGui.QSizePolicy.MinimumExpanding)
        sizePolicy.setHorizontalStretch(0)
        sizePolicy.setVerticalStretch(0)
        sizePolicy.setHeightForWidth(self.widget.sizePolicy().hasHeightForWidth())
        self.widget.setSizePolicy(sizePolicy)
        self.widget.setObjectName(_fromUtf8("widget"))
        self.verticalLayout.addWidget(self.widget)
        self.horizontalLayout = QtGui.QHBoxLayout()
        self.horizontalLayout.setObjectName(_fromUtf8("horizontalLayout"))
        self.pushButton = QtGui.QPushButton(Form)
        self.pushButton.setObjectName(_fromUtf8("pushButton"))
        self.horizontalLayout.addWidget(self.pushButton)
        self.pushButton_3 = QtGui.QPushButton(Form)
        self.pushButton_3.setObjectName(_fromUtf8("pushButton_3"))
        self.horizontalLayout.addWidget(self.pushButton_3)
        spacerItem = QtGui.QSpacerItem(40, 20, QtGui.QSizePolicy.Expanding, QtGui.QSizePolicy.Minimum)
        self.horizontalLayout.addItem(spacerItem)
        self.pushButton_2 = QtGui.QPushButton(Form)
        self.pushButton_2.setObjectName(_fromUtf8("pushButton_2"))
        self.horizontalLayout.addWidget(self.pushButton_2)
        self.verticalLayout.addLayout(self.horizontalLayout)

        self.retranslateUi(Form)
        QtCore.QMetaObject.connectSlotsByName(Form)

    def retranslateUi(self, Form):
        Form.setWindowTitle(_translate("Form", "Form", None))
        self.pushButton.setText(_translate("Form", "Show data", None))
        self.pushButton_3.setText(_translate("Form", "Change color", None))
        self.pushButton_2.setText(_translate("Form", "OK", None))

(I should note here that you can actually just import .ui files directly into Python through the QUiLoader class, but I have not shown that here just so you can see the “bare bones” of what the Python code looks like.)

Clean up the automatically-generated code

The code in our my_gui.py file is not quite ready yet. This is because the pyuic4 function is not perfectly suited for our needs, so we will have to change some of the code that it has generated. Specifically we should replace the PyQt imports to PySide, remove the translation function, and re-name our widgets. To save time, below is the complete updated file (you can check the line-by-line differences yourself). You should be able to run this code and show a blank widget by just typing python my_gui.py from the terminal.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

""" My awesome GUI! """

from __future__ import (division, print_function, absolute_import,
                        unicode_literals)


from PySide import QtCore, QtGui


class MyGUI(QtGui.QDialog):

    def __init__(self, **kwargs):
        super(MyGUI, self).__init__(**kwargs)


        self.setGeometry(600, 480, 600, 480)
        self.move(QtGui.QApplication.desktop().screen().rect().center() \
            - self.rect().center())
        self.setWindowTitle("My awesome GUI")

        vertical_layout = QtGui.QVBoxLayout(self)
        self.figure_widget = QtGui.QWidget(self)
        sizePolicy = QtGui.QSizePolicy(
            QtGui.QSizePolicy.Preferred, QtGui.QSizePolicy.MinimumExpanding)
        sizePolicy.setHorizontalStretch(0)
        sizePolicy.setVerticalStretch(0)
        sizePolicy.setHeightForWidth(
            self.figure_widget.sizePolicy().hasHeightForWidth())
        self.figure_widget.setSizePolicy(sizePolicy)
        vertical_layout.addWidget(self.figure_widget)

        horizontal_layout = QtGui.QHBoxLayout()
        self.btn_show_data = QtGui.QPushButton(self)
        self.btn_show_data.setText("Show data")
        horizontal_layout.addWidget(self.btn_show_data)

        self.btn_change_color = QtGui.QPushButton(self)
        self.btn_change_color.setText("Change color")
        horizontal_layout.addWidget(self.btn_change_color)
        spacer = QtGui.QSpacerItem(
            40, 20, QtGui.QSizePolicy.Expanding, QtGui.QSizePolicy.Minimum)
        horizontal_layout.addItem(spacer)
        self.btn_ok = QtGui.QPushButton(self)
        self.btn_ok.setText("OK")
        horizontal_layout.addWidget(self.btn_ok)
        vertical_layout.addLayout(horizontal_layout)

        return None


if __name__ == "__main__":

    import sys

    app = QtGui.QApplication(sys.argv)
    window = MyGUI()
    window.exec_()

Integrate matplotlib functionality

Now we are going to add a matplotlib widget and get Python code to be executed when we click buttons. First create a file called mpl.py that contains the following code:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

""" Functionality to use matplotlib figures in PySide GUIs. """

from __future__ import (division, print_function, absolute_import,
                        unicode_literals)

import os
import matplotlib
from warnings import simplefilter

# Ignore warnings from matplotlib about fonts not being found.
simplefilter("ignore", UserWarning)

# Load our matplotlibrc file.
matplotlib.rc_file(os.path.join(os.path.dirname(__file__), "matplotlibrc"))

from matplotlib.backends.backend_qt4agg import FigureCanvasQTAgg as FigureCanvas
from matplotlib.figure import Figure

from PySide import QtCore, QtGui


class MPLWidget(FigureCanvas):
    """
    A widget to contain a matplotlib figure.
    """

    def __init__(self, parent=None, toolbar=False, tight_layout=True,
        autofocus=False, background_hack=True, **kwargs):
        """
        A widget to contain a matplotlib figure.

        :param autofocus: [optional]
            If set to `True`, the figure will be in focus when the mouse hovers
            over it so that keyboard shortcuts/matplotlib events can be used.
        """
        super(MPLWidget, self).__init__(Figure())
        
        self.figure = Figure(tight_layout=tight_layout)
        self.canvas = FigureCanvas(self.figure)
        self.canvas.setParent(parent)

        # Focus the canvas initially.
        self.canvas.setFocusPolicy(QtCore.Qt.WheelFocus)
        self.canvas.setFocus()

        self.toolbar = None 

        if autofocus:
            self._autofocus_cid = self.canvas.mpl_connect(
                "figure_enter_event", self._focus)


        self.figure.patch.set_facecolor([v/255. for v in 
            self.palette().color(QtGui.QPalette.Window).getRgb()[:3]])

        return None


    def _focus(self, event):
        """ Set the focus of the canvas. """
        self.canvas.setFocus()

And create a matplotlibrc file in the same folder, which lets you style your figures.

backend     : qt4agg
backend.qt4 : PySide

axes.titlesize  : 9.0
axes.labelsize  : 9.0 

xtick.labelsize : 9.0
ytick.labelsize : 9.0

legend.fontsize : 9.0  

font.family     : serif  
font.serif      : Computer Modern Roman  

text.antialiased : True
text.dvipnghack  : None

figure.figsize  : 7.3, 4.2  
figure.dpi      : 80

Add matplotlib functionality to our widget

Now let’s change our Widget to be a matplotlib widget (MPLWidget). In my_gui.py add import mpl at the top of the file and change this:

        self.figure_widget = QtGui.QWidget(self)

To this:

        self.figure_widget = mpl.MPLWidget(tight_layout=True)

And we’ll set up the axes in the end of our __init__ function:

        # Create a matplotlib axes.
        ax = self.figure_widget.figure.add_subplot(111)
        ax.set_xlabel(r"$x$")
        ax.set_ylabel(r"$y$")
        ax.scatter([], [])

Connect signals to widgets All widgets have signals that they emit when something happens in the GUI. For example, when a button is clicked, it emits a clicked signal. We just need to conncet these signals to a function we have written. Here’s one example:

        # Connect the signals.
        self.btn_ok.clicked.connect(self.close)
        self.btn_show_data.clicked.connect(self.show_data)


    def show_data(self):
        """ A function to show data. """

        x = np.random.uniform(size=100)
        y = np.random.uniform(size=100)

        self.figure_widget.figure.axes[0].collections[0].set_offsets(
            np.array([x, y]).T)
        self.figure_widget.draw()

Putting this all together, here is our completed my_gui.py file:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

""" My awesome GUI! """

from __future__ import (division, print_function, absolute_import,
                        unicode_literals)

import numpy as np

from PySide import QtCore, QtGui

import mpl


class MyGUI(QtGui.QDialog):

    def __init__(self, **kwargs):
        super(MyGUI, self).__init__(**kwargs)

        self.setGeometry(600, 480, 600, 480)
        self.move(QtGui.QApplication.desktop().screen().rect().center() \
            - self.rect().center())
        self.setWindowTitle("My awesome GUI")

        vertical_layout = QtGui.QVBoxLayout(self)
        self.figure_widget = mpl.MPLWidget(tight_layout=True)
        sizePolicy = QtGui.QSizePolicy(
            QtGui.QSizePolicy.Preferred, QtGui.QSizePolicy.MinimumExpanding)
        sizePolicy.setHorizontalStretch(0)
        sizePolicy.setVerticalStretch(0)
        sizePolicy.setHeightForWidth(
            self.figure_widget.sizePolicy().hasHeightForWidth())
        self.figure_widget.setSizePolicy(sizePolicy)
        vertical_layout.addWidget(self.figure_widget)

        horizontal_layout = QtGui.QHBoxLayout()
        self.btn_show_data = QtGui.QPushButton(self)
        self.btn_show_data.setText("Show data")
        horizontal_layout.addWidget(self.btn_show_data)

        self.btn_change_color = QtGui.QPushButton(self)
        self.btn_change_color.setText("Change color")
        horizontal_layout.addWidget(self.btn_change_color)
        spacer = QtGui.QSpacerItem(
            40, 20, QtGui.QSizePolicy.Expanding, QtGui.QSizePolicy.Minimum)
        horizontal_layout.addItem(spacer)
        self.btn_ok = QtGui.QPushButton(self)
        self.btn_ok.setText("OK")
        horizontal_layout.addWidget(self.btn_ok)
        vertical_layout.addLayout(horizontal_layout)

        # Create a matplotlib axes.
        ax = self.figure_widget.figure.add_subplot(111)
        ax.set_xlabel(r"$x$")
        ax.set_ylabel(r"$y$")
        ax.set_xlim(0, 1)
        ax.set_ylim(0, 1)
        ax.scatter([], [])

        # Connect the signals.
        self.btn_ok.clicked.connect(self.close)
        self.btn_show_data.clicked.connect(self.show_data)
        self.btn_change_color.clicked.connect(self.change_color)

        # Create a matplotlib event listener.
        self.figure_widget.mpl_connect("button_press_event", self.button_press)

        return None


    def show_data(self):
        """ A function to show data. """

        x = np.random.uniform(size=100)
        y = np.random.uniform(size=100)

        self.figure_widget.figure.axes[0].collections[0].set_offsets(
            np.array([x, y]).T)
        self.figure_widget.draw()


    def change_color(self):
        """ Change the color of the data points. """

        colors = "rgbmky"
        self.figure_widget.figure.axes[0].collections[0].set_color(
            colors[np.random.randint(0, len(colors))])
        self.figure_widget.draw()

        return None


    def button_press(self, event):
        """ A function for when a button has been pressed in the figure. """

        print("Button press event!", event)


if __name__ == "__main__":

    import sys

    app = QtGui.QApplication(sys.argv)
    window = MyGUI()
    window.exec_()

Which should look something like this:

pyside-8

That’s it!

End hiatus.

Today we (Schlaufman and I) posted our latest paper on extremely metal-poor (EMP) stars to the arXiv.

Extremely metal-poor stars are interesting because they uniquely inform us to the early chemical state of the universe, amongst other things (metal-free stellar populations, supernova, etc). Unfortunately EMP stars are extremely rare and usually intrinsically faint. In fact, progress on identifying and characterising EMP stars is limited because of how faint these stars typically are.

To address this, Schlaufman and I have developed a novel selection technique that identifies intrinsically luminous EMP stars using only infrared all-sky photometry. There is good astrophysical basis for our selection, which we have iterated upon with a data-driven apparoch. Our selection is as efficient as existing techniques but the candidates we identify are typically 3 magnitudes (x1000 times) brighter than other groups. That means it takes ~15 minutes to get good (high-resolution, high S/N) spectra for these stars, instead of the ~4 hours that would be required for targets identified by other methods.

Using only infrared photometry has a number of advantages over existing selection techniques. Unlike objective prism surveys, our selection works well in crowded fields. Additionally, the effects of dust is ~50 times less in infrared photometry than the optical. That means our approach is uniquely suited to places with high extinction (e.g., the bulge, where most Population III stars are expected to reside). And since our input photometry covers the entire sky we can focus on the Northern hemisphere, where there has been relatively little work on searching for extremely metal-poor stars.

Now that we have proved our selection we are increasing our rate of follow-up: next semester we are submitting proposals for telescope time on five different telescopes (between 2.5m-8m) to exploit our novel technique. Hopefully the telescope time allocation committees will take note of our quick turn-around in this paper: most of our 506 stars were only observed 11 weeks ago! And a lot of that time was spent with Schlaufman and me debating as to who would lead the first paper. We were both arguing for the other to lead.

In addition to calculating ensemble (homogenised) parameters for the sample of CoRoT stars in the Gaia-ESO Survey this week (blog post to appear later), I’ve been working with a student of Thomas Masseron’s. Masseron wanted to know if we could identify spectroscopic binary systems from limited, noisy photometry alone, and infer the system properties (e.g., stellar parameters of both systems, mass ratios). It’s a cool problem for a lot of reasons.

Spectroscopists often just throw away the binary systems because they aren’t worth the effort to analyse. The fraction of data thrown away for this reason is of order a few percent. That’s a lot of stars for big surveys, which means being able to identify these objects from photometry is a big win. There are obvious scientific extensions too: the binary fraction itself, binary fraction distributions for multiple populations within globular clusters, mass ratio distributions, etc. Without any astrophysical priors on mass/radius/luminosity ratios, it turns out you can identify these systems very easily with modest photometric data. However as one might expect, the quality of inference is dependent on the properties of individual systems: stars of similar mass and evolutionary states are much harder to distinguish, because you’re essentially just seeing a not-quite-right blackbody curve. The student (L. Orfali) will investigate the inference quality for different binary system properties, and see what is the minimum photometric quality (and in which bands) are required to constrain these systems. Spectroscopic modelling will occur next week too, but that part is trivial and easier to intuit.

Laura Watkins (STSCI) gave an excellent talk this week on the possible existence of an intermediate mass black hole at the center of omega Cen. Lots of exquisite data (HST and ground based spectra), with very detailed modelling. It’s an awesome project!

Rule of Observing

The first rule of observing is: you don’t leave the telescope until your data are reduced and analysed. If that seems like too much to ask then you’re using old analysis approaches and your competition isn’t.

When I was learning how to analyse high-resolution stellar spectra I wrote an intuitive, graphical software package for analysing spectra quickly and precisely. What used to take ~1 day per star now takes a couple of minutes, and it means I (and now, all my collaborators) follow the rules! It means we can vet candidates quickly, find the most interesting objects and return to them in the same night. Now the reduction takes more than an order of magnitude longer than the analysis! The code is described in Chapter 3 of my thesis, and a screenshot is below. There are more objective (read: better) ways to do stellar spectroscopy – and I will post about this in the future – but the code allows us to get a very good idea on what we’re looking at, very quickly. That’s important.

SMH

The last three nights I’ve been observing on Magellan (with Schlaufman) using the MIKE spectrograph, looking for extremely metal-poor stars using a novel technique devised by Schlaufman and me. The selection approach is as efficient (or more) than existing techniques, but the candidates are ~3 magnitudes brighter. That makes the requisite follow-up spectroscopy achievable for a large sample of stars. And our approach only uses global existing sky surveys, so targets are available throughout the year no matter where you’re observing from. The approach will appear in print later this year.

You should always reduce your data carefully by hand. Unless you’re lazy or time-poor. If that’s the case and you’re using MIKE on Magellan (where this post is written from) then the CarPy pipeline will do a pretty good reduction for you in most cases.

However it turns out it’s broken on the Las Campanas Observatory computers. Here’s how to fix it:

setenv PYPREFIX /usr/local/CarPy
setenv PYTHONBASE /usr/local/CarPy/builds/Darwin.10.6.x86_64/
source /usr/local/CarPy/Setup.csh 
setenv PATH /usr/local/CarPy/builds/Darwin.10.6.x86_64/bin:/usr/local/CarPy/builds/Darwin.10.6.x86_64/Python.framework/Versions/2.5/bin:/usr/local/CarPy/dist/bin_local:/usr/local/CarPy/dist/bin:/usr/local/CarPy/dist/bin_oldnumeric:/usr/local/CarPy/builds/Darwin.10.6.x86_64/bin:/usr/local/CarPy/builds/Darwin.10.6.x86_64/Python.framework/Versions/Current/bin:/usr/local/CarPy/dist/bin_local:/usr/local/CarPy/dist/bin:/usr/local/CarPy/dist/bin_oldnumeric:/Library/Frameworks/EPD64.framework/Versions/Current/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/texbin:/usr/X11/bin:/usr/local/wcstools/bin:/usr/local/isis/bin:/usr/local/cdsclient/bin:/Applications/itt/idl/bin:/usr/local/magellan/bin:/usr/local/lco/bin
setenv PYTHONPATH /usr/local/CarPy/dist/lib_local:/usr/local/CarPy/dist/lib:/usr/local/CarPy/dist/lib_oldnumeric
setenv PYTHONDATA /usr/local/CarPy/datafiles

Now you can follow the instructions properly. But there is one additional step for the blue arm. After you’ve run this step:

mikesetup -db DATABASE_FILE -blue -all -mk Makefile

You will need to add a flag in the lampblue/Makefile file before running make. To find the right line number (it’s usually 59):

cd lampblue
grep -n mikeMatchLamps Makefile
59:	mikeMatchLamps lampblue_lamp1136fbspecs.fits -x 5 -o 4

Then just add -maxsh 300 so it looks like:

grep -n mikeMatchLamps Makefile
59:	mikeMatchLamps lampblue_lamp1136fbspecs.fits -x 5 -o 4 -maxsh 300

And now you should be good to make.

Triple J is an Australian radio station and every year they run the Triple J’s Hottest 100, a democractically elected pick of the top 100 songs produced in the previous year. It is the largest democratic music election in the world, and each year it becomes more popular. Every person can vote for 10 songs, and on Australia Day they count down to #1. Any song is eligible for a vote, but Triple J usually only lists the ~top 1,000 songs on their website.

Last year I wanted to make “The most informed decision I ever made” – I would listen to every song on the Triple J website, give it a score, and then chose my top 10 from my highest rated songs. It took around 2 weeks to listen to every song, and there were certainly some crappy songs. But after all of it, I had a great playlist of songs with “4 or more stars”. Last year I had to write some Python code to scrape all the songs from Triple J, search YouTube, download the video from YouTube, scrape the audio to MP3, and put it in an iTunes playlist.

This year it’s even easier because they have put all 1,008 songs in a Spotify playlist. Here’s what I didn’t do, but what I would do if I wanted to grab all these songs:

Steps to making the most informed decision you’ll ever make

** Note: Read all the steps first, you might find you can skip Step #1 :-) **

  1. Open Spotify and find the Triple J Hottest 100 Candidates playlist

  2. Select all songs, copy, and paste to notepad. This is what it should look like. Save this file as hottest-100-candidates.txt in a new folder.

  3. I’m assuming you have Ruby installed here. If so, from a terminal use gem install spotify-to-mp3

  4. spotify-to-mp3 hottest-100-candidates.txt will find the artist and name for each song, search for it on Grooveshark, and download it to the current directory.

  5. Add all your songs to an iTunes playlist. Listen to it. Rate each song out of five stars as it ends.

  6. Vote!

After I had listened to ~2 weeks of music last year, carefully rating each song, I forgot to do the last step. So for me, “The most informed decision I ever made” became “The most informed decision I never made”.

In this post I’m going to give some very basic examples on how to get Python and TOPCAT (or other VO/SAMP applications) to talk to each other. The Python module you’ll need is called SAMPy. This module will eventually be incorporated into the AstroPy package. To install SAMPy:

pip install sampy

(or if you must, use easy_install sampy)

For our first example we’ll get TOPCAT to notify Python when we highlight a point or row in TOPCAT:

""" Interact with TOPCAT via SAMPy at the most basic level """

import sampy

if __name__ == "__main__":

    # The 'Hub' is for multiple applications to talk to each other.
    hub = sampy.SAMPHubServer()
    hub.start()

    # We need a client that will connect to the Hub. TOPCAT will also
    # connect to our Hub.
    client = sampy.SAMPIntegratedClient(metadata={
        "samp.name": "topdog",
        "samp.description.text": "Live demos are destined for disaster."
        })
    client.connect()

    # Create a 'callback' - something to do when a point or row is highlighted in TOPCAT
    def receive_samp_notification(private_key, sender_id, mtype, params, extra):
        print("Notification of {0} from {0} ({1}): {2}, {3}".format(mtype, sender_id, private_key, params, extra))

    # Register the callback
    client.bindReceiveNotification("table.highlight.row", receive_samp_notification)

Steps:

  1. Run the above code by putting it in a file named basic_example.py then from the terminal write: python basic_example.py

  2. Open TOPCAT and load a file. Ensure there are 3 icons in the SAMP Clients tab at the bottom of the TOPCAT GUI.

  3. In the “Current Table Properties”, make sure the “Broadcast Row” icon is ticked.

  4. Highlight a row and look at the Python output:

In [1]: run -i basic_example.py
[SAMP] Info    (2013-12-10T16:01:45.882344): Hub started

In [2]: 
Notification of table.highlight.row from table.highlight.row (cli#3): 5338b24be010f6ca598c744f3eea3afc, {'url': 'file:/Users/andycasey/thesis/presentations/2013/2013-csiro-astro/data/fld_list_230611', 'row': '4'}

I noticed something weird today. The exact same inputs and code were exhibiting completely different behaviour on two different clusters. The only difference between them was SciPy versions: 0.10.1 (correct behaviour) and 0.12.0 (incorrect behaviour). Here’s the line in question:

p1, cov_p, infodict, mesg, ier = scipy.optimize.leastsq(errfunc, p0.copy()[0], args=args, full_output=True)

The correct behaviour on 0.10.1:

ipdb> scipy.__version__
'0.10.1'
ipdb> errfunc(p0.copy()[0], *args)
array([ 0.06799529,  0.07318012,  0.06680378,  0.05200964,  0.05814424,
        0.09025226,  0.09680308,  0.05702837, -0.14674592, -0.22665459,
       -0.15485406, -0.01311882,  0.08502507,  0.10292671,  0.08557168,
        0.05098229,  0.04956718,  0.06520266, -0.05950772, -0.29728424])
ipdb> scipy.optimize.leastsq(errfunc, p0.copy()[0], args=args)
(array([  4.78875656e+03,   6.67606610e-02,   6.42906789e-01]), 2)

The incorrect behaviour on 0.12.0 (after excluding all other differences and possibilities):

ipdb> scipy.__version__
'0.12.0'
ipdb> errfunc(p0.copy()[0], *args)
array([ 0.06799529,  0.07318012,  0.06680378,  0.05200964,  0.05814424,
        0.09025226,  0.09680308,  0.05702837, -0.14674592, -0.22665459,
       -0.15485406, -0.01311882,  0.08502507,  0.10292671,  0.08557168,
        0.05098229,  0.04956718,  0.06520266, -0.05950772, -0.29728424])
ipdb> scipy.optimize.leastsq(errfunc, p0.copy()[0], args=args)
(array([  4.78874773e+03,   7.96486918e-02,   4.68803543e-01]), 2)

You can see that errfunc behaves the same way, but scipy.optimize.leastsq does not. Well, if you ever have this problem too then all you need to do is edit the epsfcn flag. The epsfcn flag is described as:

A suitable step length for the forward-difference approximation of the Jacobian (for Dfun=None). If epsfcn is less than the machine precision, it is assumed that the relative errors in the functions are of the order of the machine precision.

In scipy 0.10.1 the default value is 0.0, but in 0.12.0 the default value is None. In this example, 0.0 and None are very different beasts, which makes the default behaviour for scipy.optimize.leastsq unintuitively different between versions.

On 0.12.0 (the previously ‘incorrect’ behaviour):

ipdb> scipy.__version__
'0.12.0'
ipdb> optimize.leastsq(errfunc, p0.copy()[0], args=args, epsfcn=0.0)
(array([  4.78875656e+03,   6.67606608e-02,   6.42906789e-01]), 2)
ipdb> optimize.leastsq(errfunc, p0.copy()[0], args=args, epsfcn=None)
(array([  4.78874773e+03,   7.96486918e-02,   4.68803543e-01]), 2)

So there you go. If you’re using scipy.optimize.leastsq, make sure you specify epsfcn as 0.0 (or whatever) to be sure your code is future-compatible.

I use git everyday. You should use git or some other git-esque system when writing research papers, because it’s a great way to track all of your changes. The real-world problem is my co-authors don’t use git.

Typically I’ll draft a manuscript, distribute the document (PDF and/or LaTeX) by email, and wait for feedback. Some will provide changes to the LaTeX, others will annotate the PDF, some will provide itemised text responses, and some will print it, scribble on it, and hand me a butchered manuscript.

After sending the manuscript around once to everyone, I don’t want them to have to read everything through again: they should just notice the changes. It’s easier, and faster for everyone. To accomplish this I’ve installed latexdiff. It’s a Perl script that highlights the differences between two TeX files. You can download it here, or just read about it.

Once latexdiff is installed, let’s initiate a git repository and start writing a paper.

mrmagoo:research andycasey$ mkdir my-paper
mrmagoo:research andycasey$ cd my-paper/
mrmagoo:my-paper andycasey$ git init
Initialized empty Git repository in /Users/andycasey/research/my-paper/.git/
mrmagoo:my-paper andycasey$ echo "This is a fake paper to test latexdiff" > README
mrmagoo:my-paper andycasey$ git add README 
mrmagoo:my-paper andycasey$ git commit -m "Initial commit"
[master da886dd] Initial commit
 1 file changed, 1 insertion(+)
  create mode 100644 README

When I make major revisions to a paper (e.g., when I send out copies to co-authors), I want to use latexdiff to automatically create a file that highlights the changes from the previous version. Let’s set up a post-commit hook by putting the following code into a new file in your folder called .git/hooks/post-commit. Make sure this is executable by using chmod +x .git/hooks/post-commit. Now any time we commit to the repository, this script will run.

#!/bin/sh

# Post-commit hook for revision-awsm-ness

function gettempfilename()
{
    tempfilename=$1-$RANDOM$RANDOM.tex
    if [ -e $tempfilename ]
    then
        tempfilename=$(gettempfilename)
    fi
    echo $tempfilename
}

num_revisions=$(git log --pretty=oneline | grep -ic "revision [v\d+(?:\.\d+)*]")

# See if there are at least two revisions so that we can do a comparison
if [ $num_revisions -lt 2 ]
then
    exit
else

    # Check to see if the last named revision is actually the commit hash that just happened
    current_hash=$(git rev-parse HEAD)
    current_revision=$(git log --pretty=oneline | grep -i "revision [v\d\.]" | grep -oPi "v\d+(?:\.\d+)" | head -n 1)
    most_recent_revision_hash=$(git log --pretty=oneline | grep -i "revision [v\d+(?:\.\d+)*]" | head -n 1 | awk '{ print $1 }')
    
    # If the last commit wasn't the one that contained the most recent revision number, then there's nothing to do.
    if [[ "$current_hash" != "$most_recent_revision_hash" ]]; then
        exit
    fi

    previous_revision=$(git log --pretty=oneline | grep -i "revision [v\d\.]" | grep -oPi "v\d+(?:\.\d+)" | sed -n 2p)
    previous_revision_hash=$(git log --pretty=oneline | grep -i "revision [v\d+(?:\.\d+)*]" | sed -n 2p | awk '{ print $1 }')

    # Use the most edited tex file in this repository as the manuscript, unless the manuscript filename was specified as an argument
    most_edited_tex_file=$(git log --pretty=format: --name-only | sort | uniq -c | sort -rg | grep ".tex$" | head -n 1 | awk '{ print $2 }')
    manuscript_filename=${1:-$most_edited_tex_file}
    manuscript_filename_no_ext="${manuscript_filename%.*}"

    # If we can't find the manuscript filename, then exit.
    if [ ! -f $manuscript_filename ]; then
        echo "Manuscript file $manuscript_filename does not exist."
    fi

    # Get the manuscript file associated with the previous revision hash
    previous_manuscript_filename=$(gettempfilename previous)
    git show $previous_revision_hash:$manuscript_filename > $previous_manuscript_filename

    # Use latexdiff to create a difference version
    diff_ms_no_file_ext="$manuscript_filename_no_ext-revisions-$current_revision"
    latexdiff $previous_manuscript_filename $manuscript_filename > $diff_ms_no_file_ext.tex
    rm -f $previous_manuscript_filename

    # Compile the difference file
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1 
    bibtex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    
    # Remove the intermediate files
    ls $diff_ms_no_file_ext.* | grep -v pdf | xargs rm -f
    echo "Revisions to $manuscript_filename made between $previous_revision"\
         "and $current_revision are highlighted in $diff_ms_no_file_ext.pdf"
fi

Okay, now let’s work with an example of a “real” paper. Here’s the LaTeX for a manuscript:

\documentclass{article}

\begin{document}

\title{Fun with git and \LaTeX{}}
\author{Andrew R. Casey, Alice, Bob}

\maketitle

\begin{abstract}
One does not simply write an abstract.
\end{abstract}

\section{Introduction}
Here is the text of our introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta }
\end{equation}


\section{Conclusion}
There are many loose seals in the ocean.

\end{document}

Let’s commit this to the repository, and make a note in the commit message that this is version v0.1 of the paper.

mrmagoo:my-paper andycasey$ git add manuscript.tex
mrmagoo:my-paper andycasey$ git commit -m "First draft of paper, so revision v0.1"
[master 8857fdb] First draft of paper, so revision v0.1
 1 file changed, 26 insertions(+)
  create mode 100644 manuscript.tex

I send this version around to my co-authors Alice and Bob, and wait for their responses. Each time someone responds, I implement their suggestions and commit the changes to the repository.

Alice says,.. > You forgot my last name! I think you’re missing a constant from Equation 1. Also, can we use “simply not” instead of “not simply”?

We make the changes, then commit to the repository.

mrmagoo:my-paper andycasey$ git add manuscript.tex 
mrmagoo:my-paper andycasey$ git commit -m "Implemented changes suggested by Alice"
[master 6df5f6f] Implemented changes suggested by Alice
 1 file changed, 2 insertions(+), 2 deletions(-)

Bob says,.. > You should be more explicit in the conclusions. Perhaps mention how Buster should act accordingly? Also, you forgot my last name!

Bob’s suggestions are good, so we implement them. Here’s what the final LaTeX looks like:

\documentclass{article}

\begin{document}

\title{Fun with git and \LaTeX{}}
\author{Andrew R. Casey, Alice A. Aaronson, Bob B. Baaronson}

\maketitle

\begin{abstract}
One does simply not write an abstract.
\end{abstract}

\section{Introduction}
Here is the introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta } + C
\end{equation}


\section{Conclusion}
There are many loose seals in the ocean. Buster is not allowed to swim in the ocean.

\end{document}

Since Bob is the last of our co-authors, once we’ve implemented his changes we can call this v0.2. Notice in this commit message that Revision v0.2 can be anywhere in the commit message, and is not case-sensitive.

mrmagoo:my-paper andycasey$ git add manuscript.tex
mrmagoo:my-paper andycasey$ git commit -m "Put in changes suggested by Bob. Revision v0.2 ready to be sent out"
Revisions to manuscript.tex made between v0.1 and v0.2 are highlighted in manuscript-revisions-v0.2.pdf
[master de29aa6] Put in changes suggested by Bob. Revision v0.2 ready to be sent out
 1 file changed, 2 insertions(+), 2 deletions(-)

Notice the extra message at the start? Our post-commit hook has run and seen that we have more than one revision in our commit history. It’s found the previous version, made a comparison on the two TeX files and compiled it for us!

Take a look:

mrmagoo:my-paper andycasey$ ls
README              manuscript-revisions-v0.2.pdf   manuscript.tex

Now we can send out the revised version (manuscript.pdf), as well as a PDF with the highlighted changes between version 0.1 and version 0.2 (manuscript-revisions-v0.2.pdf). This will happen anytime you commit with something like revision vX in the commit message. Also you can be as pedantic as you want: revision v1, revision v32.4, revision v0.1.3, etc are all acceptable. Easiest way to create automatic PDF diff files, ever!

Here’s what manuscript-revisions-v0.2.pdf looks like:

manuscript-revisions

This makes it infinitely easier for your co-authors to digest what has changed, and will drastically shorten the turnaround between manuscript revisions. If you were wondering, it doesn’t take the fist TeX file it sees; it finds the TeX file that has been edited the most times in the repository, which is probably your manuscript!