Triple J is an Australian radio station and every year they run the Triple J’s Hottest 100, a democractically elected pick of the top 100 songs produced in the previous year. It is the largest democratic music election in the world, and each year it becomes more popular. Every person can vote for 10 songs, and on Australia Day they count down to #1. Any song is eligible for a vote, but Triple J usually only lists the ~top 1,000 songs on their website.

Last year I wanted to make “The most informed decision I ever made” — I would listen to every song on the Triple J website, give it a score, and then chose my top 10 from my highest rated songs. It took around 2 weeks to listen to every song, and there were certainly some crappy songs. But after all of it, I had a great playlist of songs with “4 or more stars”. Last year I had to write some Python code to scrape all the songs from Triple J, search YouTube, download the video from YouTube, scrape the audio to MP3, and put it in an iTunes playlist.

This year it’s even easier because they have put all 1,008 songs in a Spotify playlist. Here’s what I didn’t do, but what I would do if I wanted to grab all these songs:

Steps to making the most informed decision you’ll ever make

Note: Read all the steps first, you might find you can skip Step #1 :–)

  1. Open Spotify and find the Triple J Hottest 100 Candidates playlist

  2. Select all songs, copy, and paste to notepad. This is what it should look like. Save this file as hottest-100-candidates.txt in a new folder.

  3. I’m assuming you have Ruby installed here. If so, from a terminal use gem install spotify-to-mp3

  4. spotify-to-mp3 hottest-100-candidates.txt will find the artist and name for each song, search for it on Grooveshark, and download it to the current directory.

  5. Add all your songs to an iTunes playlist. Listen to it. Rate each song out of five stars as it ends.

  6. Vote!

After I had listened to ~2 weeks of music last year, carefully rating each song, I forgot to do the last step. So for me, “The most informed decision I ever made” became “The most informed decision I never made”.

In the coming weeks I will be putting up posts about how hard or easy it would be to rig the world’s biggest democratic music election, and some statistical predictions for this year’s Hottest 100.

~ astrowizicist

 
0 Kudos

In this post I’m going to give some very basic examples on how to get Python and TOPCAT (or other VO/SAMP applications) to talk to each other. The Python module you’ll need is called SAMPy. This module will eventually be incorporated into the AstroPy package. To install SAMPy:

pip install sampy

(or if you must, use easy_install sampy)

For our first example we’ll get TOPCAT to notify Python when we highlight a point or row in TOPCAT:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
""" Interact with TOPCAT via SAMPy at the most basic level """

import sampy

if __name__ == "__main__":

    # The 'Hub' is for multiple applications to talk to each other.
    hub = sampy.SAMPHubServer()
    hub.start()

    # We need a client that will connect to the Hub. TOPCAT will also
    # connect to our Hub.
    client = sampy.SAMPIntegratedClient(metadata={
        "samp.name": "topdog",
        "samp.description.text": "Live demos are destined for disaster."
        })
    client.connect()

    # Create a 'callback' - something to do when a point or row is highlighted in TOPCAT
    def receive_samp_notification(private_key, sender_id, mtype, params, extra):
        print("Notification of {0} from {0} ({1}): {2}, {3}".format(mtype, sender_id, private_key, params, extra))

    # Register the callback
    client.bindReceiveNotification("table.highlight.row", receive_samp_notification)

Steps:

(1) Run the above code by putting it in a file named basic_example.py then from the terminal write: python basic_example.py

(2) Open TOPCAT and load a file. Ensure there are 3 icons in the SAMP Clients tab at the bottom of the TOPCAT GUI.

(3) In the “Current Table Properties”, make sure the “Broadcast Row” icon is ticked.

(4) Highlight a row and look at the Python output:

1
2
3
4
5
In [1]: run -i basic_example.py
[SAMP] Info    (2013-12-10T16:01:45.882344): Hub started

In [2]:
Notification of table.highlight.row from table.highlight.row (cli#3): 5338b24be010f6ca598c744f3eea3afc, {'url': 'file:/Users/andycasey/thesis/presentations/2013/2013-csiro-astro/data/fld_list_230611', 'row': '4'}
 
0 Kudos

I noticed something weird today. The exact same inputs and code were exhibiting completely different behaviour on two different clusters. The only difference between them was SciPy versions: 0.10.1 (correct behaviour) and 0.12.0 (incorrect behaviour). Here’s the line in question:

1
p1, cov_p, infodict, mesg, ier = scipy.optimize.leastsq(errfunc, p0.copy()[0], args=args, full_output=True)

The correct behaviour on 0.10.1:

1
2
3
4
5
6
7
8
9
ipdb> scipy.__version__
'0.10.1'
ipdb> errfunc(p0.copy()[0], *args)
array([ 0.06799529,  0.07318012,  0.06680378,  0.05200964,  0.05814424,
        0.09025226,  0.09680308,  0.05702837, -0.14674592, -0.22665459,
       -0.15485406, -0.01311882,  0.08502507,  0.10292671,  0.08557168,
        0.05098229,  0.04956718,  0.06520266, -0.05950772, -0.29728424])
ipdb> scipy.optimize.leastsq(errfunc, p0.copy()[0], args=args)
(array([  4.78875656e+03,   6.67606610e-02,   6.42906789e-01]), 2)

The incorrect behaviour on 0.12.0 (after excluding all other differences and possibilities):

1
2
3
4
5
6
7
8
9
ipdb> scipy.__version__
'0.12.0'
ipdb> errfunc(p0.copy()[0], *args)
array([ 0.06799529,  0.07318012,  0.06680378,  0.05200964,  0.05814424,
        0.09025226,  0.09680308,  0.05702837, -0.14674592, -0.22665459,
       -0.15485406, -0.01311882,  0.08502507,  0.10292671,  0.08557168,
        0.05098229,  0.04956718,  0.06520266, -0.05950772, -0.29728424])
ipdb> scipy.optimize.leastsq(errfunc, p0.copy()[0], args=args)
(array([  4.78874773e+03,   7.96486918e-02,   4.68803543e-01]), 2)

You can see that errfunc behaves the same way, but scipy.optimize.leastsq does not. Well, if you ever have this problem too then all you need to do is edit the epsfcn flag. The epsfcn flag is described as:

A suitable step length for the forward-difference approximation of the Jacobian (for Dfun=None). If epsfcn is less than the machine precision, it is assumed that the relative errors in the functions are of the order of the machine precision.

In scipy 0.10.1 the default value is 0.0, but in 0.12.0 the default value is None. In this example, 0.0 and None are very different beasts, which makes the default behaviour for scipy.optimize.leastsq unintuitively different between versions.

On 0.12.0 (the previously ‘incorrect’ behaviour):

1
2
3
4
5
6
ipdb> scipy.__version__
'0.12.0'
ipdb> optimize.leastsq(errfunc, p0.copy()[0], args=args, epsfcn=0.0)
(array([  4.78875656e+03,   6.67606608e-02,   6.42906789e-01]), 2)
ipdb> optimize.leastsq(errfunc, p0.copy()[0], args=args, epsfcn=None)
(array([  4.78874773e+03,   7.96486918e-02,   4.68803543e-01]), 2)

So there you go. If you’re using scipy.optimize.leastsq, make sure you specify epsfcn as 0.0 (or whatever) to be sure your code is future-compatible.

~ astrowizicist

 
0 Kudos

I use git everyday. You should use git or some other git-esque system when writing research papers, because it’s a great way to track all of your changes. The real-world problem is my co-authors don’t use git.

Typically I’ll draft a manuscript, distribute the document (PDF and/or LaTeX) by email, and wait for feedback. Some will provide changes to the LaTeX, others will annotate the PDF, some will provide itemised text responses, and some will print it, scribble on it, and hand me a butchered manuscript.

After sending the manuscript around once to everyone, I don’t want them to have to read everything through again: they should just notice the changes. It’s easier, and faster for everyone. To accomplish this I’ve installed latexdiff. It’s a Perl script that highlights the differences between two TeX files. You can download it here, or just read about it.

Once latexdiff is installed, let’s initiate a git repository and start writing a paper.

1
2
3
4
5
6
7
8
9
10
mrmagoo:research andycasey$ mkdir my-paper
mrmagoo:research andycasey$ cd my-paper/
mrmagoo:my-paper andycasey$ git init
Initialized empty Git repository in /Users/andycasey/research/my-paper/.git/
mrmagoo:my-paper andycasey$ echo "This is a fake paper to test latexdiff" > README
mrmagoo:my-paper andycasey$ git add README 
mrmagoo:my-paper andycasey$ git commit -m "Initial commit"
[master da886dd] Initial commit
 1 file changed, 1 insertion(+)
  create mode 100644 README

When I make major revisions to a paper (e.g., when I send out copies to co-authors), I want to use latexdiff to automatically create a file that highlights the changes from the previous version. Let’s set up a post-commit hook by putting the following code into a new file in your folder called .git/hooks/post-commit. Make sure this is executable by using chmod +x .git/hooks/post-commit. Now any time we commit to the repository, this script will run.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
#!/bin/sh

# Post-commit hook for revision-awsm-ness

function gettempfilename()
{
    tempfilename=$1-$RANDOM$RANDOM.tex
    if [ -e $tempfilename ]
    then
        tempfilename=$(gettempfilename)
    fi
    echo $tempfilename
}

num_revisions=$(git log --pretty=oneline | grep -ic "revision [v\d+(?:\.\d+)*]")

# See if there are at least two revisions so that we can do a comparison
if [ $num_revisions -lt 2 ]
then
    exit
else

    # Check to see if the last named revision is actually the commit hash that just happened
    current_hash=$(git rev-parse HEAD)
    current_revision=$(git log --pretty=oneline | grep -i "revision [v\d\.]" | grep -oPi "v\d+(?:\.\d+)" | head -n 1)
    most_recent_revision_hash=$(git log --pretty=oneline | grep -i "revision [v\d+(?:\.\d+)*]" | head -n 1 | awk '{ print $1 }')

    # If the last commit wasn't the one that contained the most recent revision number, then there's nothing to do.
    if [[ "$current_hash" != "$most_recent_revision_hash" ]]; then
        exit
    fi

    previous_revision=$(git log --pretty=oneline | grep -i "revision [v\d\.]" | grep -oPi "v\d+(?:\.\d+)" | sed -n 2p)
    previous_revision_hash=$(git log --pretty=oneline | grep -i "revision [v\d+(?:\.\d+)*]" | sed -n 2p | awk '{ print $1 }')

    # Use the most edited tex file in this repository as the manuscript, unless the manuscript filename was specified as an argument
    most_edited_tex_file=$(git log --pretty=format: --name-only | sort | uniq -c | sort -rg | grep ".tex$" | head -n 1 | awk '{ print $2 }')
    manuscript_filename=${1:-$most_edited_tex_file}
    manuscript_filename_no_ext="${manuscript_filename%.*}"

    # If we can't find the manuscript filename, then exit.
    if [ ! -f $manuscript_filename ]; then
        echo "Manuscript file $manuscript_filename does not exist."
    fi

    # Get the manuscript file associated with the previous revision hash
    previous_manuscript_filename=$(gettempfilename previous)
    git show $previous_revision_hash:$manuscript_filename > $previous_manuscript_filename

    # Use latexdiff to create a difference version
    diff_ms_no_file_ext="$manuscript_filename_no_ext-revisions-$current_revision"
    latexdiff $previous_manuscript_filename $manuscript_filename > $diff_ms_no_file_ext.tex
    rm -f $previous_manuscript_filename

    # Compile the difference file
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    bibtex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1

    # Remove the intermediate files
    ls $diff_ms_no_file_ext.* | grep -v pdf | xargs rm -f
    echo "Revisions to $manuscript_filename made between $previous_revision"\
         "and $current_revision are highlighted in $diff_ms_no_file_ext.pdf"
fi

Okay, now let’s work with an example of a “real” paper. Here’s the LaTeX for a manuscript:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
\documentclass{article}

\begin{document}

\title{Fun with git and \LaTeX{}}
\author{Andrew R. Casey, Alice, Bob}

\maketitle

\begin{abstract}
One does not simply write an abstract.
\end{abstract}

\section{Introduction}
Here is the text of our introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta }
\end{equation}


\section{Conclusion}
There are many loose seals in the ocean.

\end{document}

Let’s commit this to the repository, and make a note in the commit message that this is version v0.1 of the paper.

1
2
3
4
5
mrmagoo:my-paper andycasey$ git add manuscript.tex
mrmagoo:my-paper andycasey$ git commit -m "First draft of paper, so revision v0.1"
[master 8857fdb] First draft of paper, so revision v0.1
 1 file changed, 26 insertions(+)
  create mode 100644 manuscript.tex

I send this version around to my co-authors Alice and Bob, and wait for their responses. Each time someone responds, I implement their suggestions and commit the changes to the repository.

Alice says,..

You forgot my last name! I think you’re missing a constant from Equation 1. Also, can we use “simply not” instead of “not simply”?

We make the changes, then commit to the repository.

1
2
3
4
mrmagoo:my-paper andycasey$ git add manuscript.tex 
mrmagoo:my-paper andycasey$ git commit -m "Implemented changes suggested by Alice"
[master 6df5f6f] Implemented changes suggested by Alice
 1 file changed, 2 insertions(+), 2 deletions(-)

Bob says,..

You should be more explicit in the conclusions. Perhaps mention how Buster should act accordingly? Also, you forgot my last name!

Bob’s suggestions are good, so we implement them. Here’s what the final LaTeX looks like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
\documentclass{article}

\begin{document}

\title{Fun with git and \LaTeX{}}
\author{Andrew R. Casey, Alice A. Aaronson, Bob B. Baaronson}

\maketitle

\begin{abstract}
One does simply not write an abstract.
\end{abstract}

\section{Introduction}
Here is the introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta } + C
\end{equation}


\section{Conclusion}
There are many loose seals in the ocean. Buster is not allowed to swim in the ocean.

\end{document}

Since Bob is the last of our co-authors, once we’ve implemented his changes we can call this v0.2. Notice in this commit message that Revision v0.2 can be anywhere in the commit message, and is not case-sensitive.

1
2
3
4
5
mrmagoo:my-paper andycasey$ git add manuscript.tex
mrmagoo:my-paper andycasey$ git commit -m "Put in changes suggested by Bob. Revision v0.2 ready to be sent out"
Revisions to manuscript.tex made between v0.1 and v0.2 are highlighted in manuscript-revisions-v0.2.pdf
[master de29aa6] Put in changes suggested by Bob. Revision v0.2 ready to be sent out
 1 file changed, 2 insertions(+), 2 deletions(-)

Notice the extra message at the start? Our post-commit hook has run and seen that we have more than one revision in our commit history. It’s found the previous version, made a comparison on the two TeX files and compiled it for us!

Take a look:

1
2
mrmagoo:my-paper andycasey$ ls
README              manuscript-revisions-v0.2.pdf   manuscript.tex

Now we can send out the revised version (manuscript.pdf), as well as a PDF with the highlighted changes between version 0.1 and version 0.2 (manuscript-revisions-v0.2.pdf). This will happen anytime you commit with something like revision vX in the commit message. Also you can be as pedantic as you want: revision v1, revision v32.4, revision v0.1.3, etc are all acceptable. Easiest way to create automatic PDF diff files, ever!

Here’s what manuscript-revisions-v0.2.pdf looks like:

This makes it infinitely easier for your co-authors to digest what has changed, and will drastically shorten the turnaround between manuscript revisions. If you were wondering, it doesn’t take the fist TeX file it sees; it finds the TeX file that has been edited the most times in the repository, which is probably your manuscript!

~ astrowizicist

 
0 Kudos

A reasonably well-known astrophysics professor once gave me some unsolicited advice:

“I always told people that if they cited me, I’d buy them a beer for every citation.”

He went on to say that even though he had a very well-known astrophysical relationship named after him, many more people knew him because of his open beverage offer. I thought this was a good idea, and recently I’ve been toying with the new API for the NASA/SAO Astrophysics Data Service. You can check out my code on Github. For the most recent example I’ve written a little script that will check to see if I have any new citations, and will alert me who I owe beer(s) to. Here’s the code:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# coding: utf-8

""" Beers for citations. The new underground currency. """

__author__ = "Andy Casey <acasey@mso.anu.edu.au>"

# Standard library
import httplib
import json
import os
import urllib
from collections import Counter

# Module specific
import ads

# Couple of mutable variables for the reader
author_query = "^Casey, Andrew R."
records_filename = "citations.json"

my_papers = ads.search(author_query)

# How many citations did we have last time this ran?
if not os.path.exists(records_filename):
    all_citations_last_time = {"total": 0}

else:
    with open(records_filename, "r") as fp:
        all_citations_last_time = json.load(fp)

# Build a dictionary with all of our citations
bibcodes, citations = zip(*[(paper.bibcode, paper.citation_count)
    for paper in my_papers])

all_citations = dict(zip(bibcodes, citations))
all_citations["total"] = sum(citations)

# Check if we have more citations than last time, but only if we have run 
# this script beforehand, too. Otherwise we'll get 1,000 notifications on
# the first time the script has been run
if  all_citations["total"] > all_citations_last_time["total"] \
and len(all_citations_last_time) > 1:

    # Someone has cited us since the last time we checked.
    newly_cited_papers = {}
    for bibcode, citation_count in zip(bibcodes, citations):

        new_citations = citation_count - all_citations_last_time[bibcode]

        if new_citations > 0:
            # Who were the first authors for the new papers that cited us?
            citing_papers = ads.search("citations(bibcode:{0})"
                .format(bibcode), rows=new_citations)
            newly_cited_papers[bibcode] = [paper.author[0] for paper in citing_papers]

    # Ok, so now we have a dictionary (called 'newly_cited_papers') that contains 
    # the bibcodes and names of authors who we owe beers to. But instead, we
    # would like to know how many beers we owe, and who we owe them to.
    beers_owed = Counter(sum(newly_cited_papers.values(), []))

    # Let's not buy ourself beers.
    if my_papers[0].author[0] in beers_owed:
        del beers_owed[my_papers[0].author[0]]

    for author, num_of_beers_owed in beers_owed.iteritems():

        readable_name = " ".join([name.strip() for name in author.split(",")[::-1]])
        this_many_beers = "{0} beers".format(num_of_beers_owed) \
            if num_of_beers_owed > 1 else "a beer"
        message = "You owe {0} {1} because they just cited you!"
            .format(readable_name, this_many_beers)

        print(message)

        if not "PUSHOVER_TOKEN" in os.environ \
        or not "PUSHOVER_USER" in os.environ:
            print("No pushover.net notification sent because PUSHOVER_TOKEN or"
                " PUSHOVER_USER environment variables not found.")
            continue

        conn = httplib.HTTPSConnection("api.pushover.net:443")
        conn.request("POST", "/1/messages.json",
          urllib.urlencode({
            "token": os.environ["PUSHOVER_TOKEN"],
            "user": os.environ["PUSHOVER_USER"],
            "message": message
          }), { "Content-type": "application/x-www-form-urlencoded" })
        conn.getresponse()

else:
    print("No new citations!")

# Save these citations
with open(records_filename, "w") as fp:
    json.dump(all_citations, fp)

That script will only work if you already have a ADS 2.0 username and an API key for ADS stored in ~/.ads/dev_key. The first time you run it, you won’t get any notifications. This is just to make sure you don’t get 1,000+ notifications the first time it’s run.

In the above example, it will keep track of your citations to the records_filename. That way you can run this script as frequent as you like (e.g., daily, weekly, monthly) and it will only notice citations since the last time it was run. That lets you set up a cron job really easily, so for example — we can be notified on the first of each month automatically when we’re cited:

1
2
3
andycasey@moron>crontab -l
# m h  dom mon dow   command
  0 7   1   *   *    python beers-for-cites.py 

This is great, but at the moment it will just print out who we owe beer(s) to. In reality if we’re running this as a cron job then we’ll want to be notified somehow. I like to use Pushover.net to send free notifications to my devices. So I’ve created an account and an application called “Beers for Citations”, then put the application token and user as the environment variables PUSHOVER_TOKEN and PUSHOVER_USER. Now I’ll get a notification to my phone when someone cites any of my papers.

The end result looks something like this:

So there you go. Cite any of my papers, and I’ll buy you a beer the next time I see you.

~ astrowizicist

 
0 Kudos

It was about time I updated my website. Here goes.

1
$ git init
 
0 Kudos