Sometimes I want to make a simple (or complex) graphical user interface (GUI) for exploratory data analysis. I use Python, but there are probably better ways to do this. This tutorial serves as a basic walkthrough on how to produce a Python GUI that contains an interactive matplotlib figure.
A caveat before we start: There are a few different options available for GUIs in Python and elsewhere (Tkinter, Qt, Enthought Traits, etc). I’ve tried all of these at some point, but you shouldn’t take my word on what is described below as being “the best way” to do things; this is just what I have found to work best for me. Here is my setup (e.g., things you may need to make this work):
Anaconda Python installation. Both Python 2.7 and 3.x will work, but with 3.x you may need to do conda install pyside after you have installed Anaconda.
If you have these requirements already and just want to test the GUI, here is the “TL;DR”:
Let’s create a GUI
If you have Qt installed, then you should also have Qt Designer. Load it and create a new widget:
On the left you can see widgets, which you can drag and drop the background. Double-click buttons or text to edit them.
In Qt we use layouts and size policies to make things sit in the right spot. First let’s put our buttons into a “horizontal layout” and drop a horizontal spacer before the last button:
To put everything in the right spot (eventually..), right-click on the parent window and give it a “vertical layout”:
Now let’s add a default “widget” at the top, which will later become our matplotlib figure.
Let’s set the size policies so that the widget has MinimumExpanding policy for the vertical direction.
Produce Python code from your widget
In the Qt Designer, save your widget to a file with a .ui extension: my_gui.ui. Now from the terminal we will create a Python script called my_gui.py from our my_gui.ui file:
The my_gui.py file looks like this:
(I should note here that you can actually just import .ui files directly into Python through the QUiLoader class, but I have not shown that here just so you can see the “bare bones” of what the Python code looks like.)
Clean up the automatically-generated code
The code in our my_gui.py file is not quite ready yet. This is because the pyuic4 function is not perfectly suited for our needs, so we will have to change some of the code that it has generated. Specifically we should replace the PyQt imports to PySide, remove the translation function, and re-name our widgets. To save time, below is the complete updated file (you can check the line-by-line differences yourself). You should be able to run this code and show a blank widget by just typing python my_gui.py from the terminal.
Integrate matplotlib functionality
Now we are going to add a matplotlib widget and get Python code to be executed when we click buttons. First create a file called mpl.py that contains the following code:
And create a matplotlibrc file in the same folder, which lets you style your figures.
Add matplotlib functionality to our widget
Now let’s change our Widget to be a matplotlib widget (MPLWidget). In my_gui.py add import mpl at the top of the file and change this:
And we’ll set up the axes in the end of our __init__ function:
Connect signals to widgets
All widgets have signals that they emit when something happens in the GUI. For example, when a button is clicked, it emits a clicked signal. We just need to conncet these signals to a function we have written. Here’s one example:
Putting this all together, here is our completed my_gui.py file:
Today we (Schlaufman and I) posted our latest paper on extremely metal-poor (EMP) stars to the arXiv.
Extremely metal-poor stars are interesting because they uniquely inform us to the early chemical state of the universe, amongst other things (metal-free stellar populations, supernova, etc). Unfortunately EMP stars are extremely rare and usually intrinsically faint. In fact, progress on identifying and characterising EMP stars is limited because of how faint these stars typically are.
To address this, Schlaufman and I have developed a novel selection technique that identifies intrinsically luminous EMP stars using only infrared all-sky photometry. There is good astrophysical basis for our selection, which we have iterated upon with a data-driven apparoch. Our selection is as efficient as existing techniques but the candidates we identify are typically 3 magnitudes (x1000 times) brighter than other groups. That means it takes ~15 minutes to get good (high-resolution, high S/N) spectra for these stars, instead of the ~4 hours that would be required for targets identified by other methods.
Using only infrared photometry has a number of advantages over existing selection techniques. Unlike objective prism surveys, our selection works well in crowded fields. Additionally, the effects of dust is ~50 times less in infrared photometry than the optical. That means our approach is uniquely suited to places with high extinction (e.g., the bulge, where most Population III stars are expected to reside). And since our input photometry covers the entire sky we can focus on the Northern hemisphere, where there has been relatively little work on searching for extremely metal-poor stars.
Now that we have proved our selection we are increasing our rate of follow-up: next semester we are submitting proposals for telescope time on five different telescopes (between 2.5m-8m) to exploit our novel technique. Hopefully the telescope time allocation committees will take note of our quick turn-around in this paper: most of our 506 stars were only observed 11 weeks ago! And a lot of that time was spent with Schlaufman and me debating as to who would lead the first paper. We were both arguing for the other to lead.
In addition to calculating ensemble (homogenised) parameters for the sample of CoRoT stars in the Gaia-ESO Survey this week (blog post to appear later), I’ve been working with a student of Thomas Masseron’s. Masseron wanted to know if we could identify spectroscopic binary systems from limited, noisy photometry alone, and infer the system properties (e.g., stellar parameters of both systems, mass ratios). It’s a cool problem for a lot of reasons.
Spectroscopists often just throw away the binary systems because they aren’t worth the effort to analyse. The fraction of data thrown away for this reason is of order a few percent. That’s a lot of stars for big surveys, which means being able to identify these objects from photometry is a big win. There are obvious scientific extensions too: the binary fraction itself, binary fraction distributions for multiple populations within globular clusters, mass ratio distributions, etc. Without any astrophysical priors on mass/radius/luminosity ratios, it turns out you can identify these systems very easily with modest photometric data. However as one might expect, the quality of inference is dependent on the properties of individual systems: stars of similar mass and evolutionary states are much harder to distinguish, because you’re essentially just seeing a not-quite-right blackbody curve. The student (L. Orfali) will investigate the inference quality for different binary system properties, and see what is the minimum photometric quality (and in which bands) are required to constrain these systems. Spectroscopic modelling will occur next week too, but that part is trivial and easier to intuit.
Laura Watkins (STSCI) gave an excellent talk this week on the possible existence of an intermediate mass black hole at the center of omega Cen. Lots of exquisite data (HST and ground based spectra), with very detailed modelling. It’s an awesome project!
The first rule of observing is: you don’t leave the telescope until your data are reduced and analysed. If that seems like too much to ask then you’re using old analysis approaches and your competition isn’t.
When I was learning how to analyse high-resolution stellar spectra I wrote an intuitive, graphical software package for analysing spectra quickly and precisely. What used to take ~1 day per star now takes a couple of minutes, and it means I (and now, all my collaborators) follow the rules! It means we can vet candidates quickly, find the most interesting objects and return to them in the same night. Now the reduction takes more than an order of magnitude longer than the analysis! The code is described in Chapter 3 of my thesis, and a screenshot is below. There are more objective (read: better) ways to do stellar spectroscopy – and I will post about this in the future – but the code allows us to get a very good idea on what we’re looking at, very quickly. That’s important.
The last three nights I’ve been observing on Magellan (with Schlaufman) using the MIKE spectrograph, looking for extremely metal-poor stars using a novel technique devised by Schlaufman and me. The selection approach is as efficient (or more) than existing techniques, but the candidates are ~3 magnitudes brighter. That makes the requisite follow-up spectroscopy achievable for a large sample of stars. And our approach only uses global existing sky surveys, so targets are available throughout the year no matter where you’re observing from. The approach will appear in print later this year.
You should always reduce your data carefully by hand. Unless you’re lazy or time-poor. If that’s the case and you’re using MIKE on Magellan (where this post is written from) then the CarPy pipeline will do a pretty good reduction for you in most cases.
However it turns out it’s broken on the Las Campanas Observatory computers. Here’s how to fix it:
Now you can follow the instructions properly. But there is one additional step for the blue arm. After you’ve run this step:
Triple J is an Australian radio station and every year they run the Triple J’s Hottest 100, a democractically elected pick of the top 100 songs produced in the previous year. It is the largest democratic music election in the world, and each year it becomes more popular. Every person can vote for 10 songs, and on Australia Day they count down to #1. Any song is eligible for a vote, but Triple J usually only lists the ~top 1,000 songs on their website.
Last year I wanted to make “The most informed decision I ever made” – I would listen to every song on the Triple J website, give it a score, and then chose my top 10 from my highest rated songs. It took around 2 weeks to listen to every song, and there were certainly some crappy songs. But after all of it, I had a great playlist of songs with “4 or more stars”. Last year I had to write some Python code to scrape all the songs from Triple J, search YouTube, download the video from YouTube, scrape the audio to MP3, and put it in an iTunes playlist.
This year it’s even easier because they have put all 1,008 songs in a Spotify playlist. Here’s what I didn’t do, but what I would do if I wanted to grab all these songs:
Steps to making the most informed decision you’ll ever make
** Note: Read all the steps first, you might find you can skip Step #1 :-) **
After I had listened to ~2 weeks of music last year, carefully rating each song, I forgot to do the last step. So for me, “The most informed decision I ever made” became “The most informed decision I never made”.
In this post I’m going to give some very basic examples on how to get Python and TOPCAT (or other VO/SAMP applications) to talk to each other. The Python
module you’ll need is called SAMPy. This module will eventually be incorporated into the AstroPy package. To install SAMPy:
pip install sampy
(or if you must, use easy_install sampy)
For our first example we’ll get TOPCAT to notify Python when we highlight a point or row in TOPCAT:
Run the above code by putting it in a file named basic_example.py then from the terminal write: python basic_example.py
Open TOPCAT and load a file. Ensure there are 3 icons in the SAMP Clients tab at the bottom of the TOPCAT GUI.
In the “Current Table Properties”, make sure the “Broadcast Row” icon is ticked.
I noticed something weird today. The exact same inputs and code were exhibiting completely different behaviour on two different clusters.
The only difference between them was SciPy versions: 0.10.1 (correct behaviour) and 0.12.0 (incorrect behaviour). Here’s the line
The correct behaviour on 0.10.1:
The incorrect behaviour on 0.12.0 (after excluding all other differences and possibilities):
You can see that errfunc behaves the same way, but scipy.optimize.leastsq does not. Well, if you ever have this problem too then all you
need to do is edit the epsfcn flag. The epsfcn flag is described as:
A suitable step length for the forward-difference approximation of the Jacobian (for Dfun=None). If epsfcn is less than the machine precision, it is assumed that the relative errors in the functions are of the order of the machine precision.
In scipy 0.10.1 the default value is 0.0, but in 0.12.0 the default value is None. In this example, 0.0 and None
are very different beasts, which makes the default behaviour for scipy.optimize.leastsq unintuitively different between versions.
On 0.12.0 (the previously ‘incorrect’ behaviour):
So there you go. If you’re using scipy.optimize.leastsq, make sure you specify epsfcn as 0.0 (or whatever) to be sure your code is future-compatible.
I use git everyday. You should use git
or some other git-esque system when writing research papers, because it’s a great way to track all of your changes. The
real-world problem is my co-authors don’t use git.
Typically I’ll draft a manuscript, distribute the document (PDF and/or LaTeX) by email, and wait for feedback.
Some will provide changes to the LaTeX, others will annotate the PDF, some will provide itemised text responses, and some will print it,
scribble on it, and hand me a butchered manuscript.
After sending the manuscript around once to everyone, I don’t want them to have to read everything through again: they should just notice the changes. It’s easier, and faster for
everyone. To accomplish this I’ve installed latexdiff. It’s a Perl script that highlights the differences between two TeX files. You can download it here, or just read about it.
Once latexdiff is installed, let’s initiate a git repository and start writing a paper.
When I make major revisions to a paper (e.g., when I send out copies to co-authors), I want to use latexdiff to automatically create a file that highlights the changes from
the previous version. Let’s set up a post-commit hook by putting the following code into a new file in your folder called .git/hooks/post-commit. Make sure this is executable by using chmod +x .git/hooks/post-commit. Now any time we commit to the repository, this script will run.
Okay, now let’s work with an example of a “real” paper. Here’s the LaTeX for a manuscript:
Let’s commit this to the repository, and make a note in the commit message that this is version v0.1 of the paper.
I send this version around to my co-authors Alice and Bob, and wait for their responses. Each time someone responds, I implement their suggestions and commit the changes to the repository.
> You forgot my last name! I think you’re missing a constant from Equation 1. Also, can we use “simply not” instead of “not simply”?
We make the changes, then commit to the repository.
> You should be more explicit in the conclusions. Perhaps mention how Buster should act accordingly? Also, you forgot my last name!
Bob’s suggestions are good, so we implement them. Here’s what the final LaTeX looks like:
Since Bob is the last of our co-authors, once we’ve implemented his changes we can call this v0.2. Notice in this commit message that Revision v0.2 can be anywhere in the commit message, and is not case-sensitive.
Notice the extra message at the start? Our post-commit hook has run and seen that we have more than one revision in our commit history. It’s found the previous version, made a comparison on the two TeX files and compiled it for us!
Take a look:
Now we can send out the revised version (manuscript.pdf), as well as a PDF with the highlighted changes between version 0.1 and version 0.2 (manuscript-revisions-v0.2.pdf). This will happen anytime you commit with something like revision vX in the commit message. Also you can be as pedantic as you want: revision v1, revision v32.4, revision v0.1.3, etc are all acceptable. Easiest way to create automatic PDF diff files, ever!
Here’s what manuscript-revisions-v0.2.pdf looks like:
This makes it infinitely easier for your co-authors to digest what has changed, and will drastically shorten the turnaround between manuscript revisions. If you were wondering, it doesn’t take the fist TeX file it sees; it finds the TeX file that has been edited the most times in the repository, which is probably your manuscript!