Publishing with Git

Reading time ~7 minutes

I use git everyday. You should use git or some other git-esque system when writing research papers, because it’s a great way to track all of your changes. The real-world problem is my co-authors don’t use git.

Typically I’ll draft a manuscript, distribute the document (PDF and/or LaTeX) by email, and wait for feedback. Some will provide changes to the LaTeX, others will annotate the PDF, some will provide itemised text responses, and some will print it, scribble on it, and hand me a butchered manuscript.

After sending the manuscript around once to everyone, I don’t want them to have to read everything through again: they should just notice the changes. It’s easier, and faster for everyone. To accomplish this I’ve installed latexdiff. It’s a Perl script that highlights the differences between two TeX files. You can download it here, or just read about it.

Once latexdiff is installed, let’s initiate a git repository and start writing a paper.

mrmagoo:research andycasey$ mkdir my-paper
mrmagoo:research andycasey$ cd my-paper/
mrmagoo:my-paper andycasey$ git init
Initialized empty Git repository in /Users/andycasey/research/my-paper/.git/
mrmagoo:my-paper andycasey$ echo "This is a fake paper to test latexdiff" > README
mrmagoo:my-paper andycasey$ git add README 
mrmagoo:my-paper andycasey$ git commit -m "Initial commit"
[master da886dd] Initial commit
 1 file changed, 1 insertion(+)
  create mode 100644 README

When I make major revisions to a paper (e.g., when I send out copies to co-authors), I want to use latexdiff to automatically create a file that highlights the changes from the previous version. Let’s set up a post-commit hook by putting the following code into a new file in your folder called .git/hooks/post-commit. Make sure this is executable by using chmod +x .git/hooks/post-commit. Now any time we commit to the repository, this script will run.

#!/bin/sh

# Post-commit hook for revision-awsm-ness

function gettempfilename()
{
    tempfilename=$1-$RANDOM$RANDOM.tex
    if [ -e $tempfilename ]
    then
        tempfilename=$(gettempfilename)
    fi
    echo $tempfilename
}

num_revisions=$(git log --pretty=oneline | grep -ic "revision [v\d+(?:\.\d+)*]")

# See if there are at least two revisions so that we can do a comparison
if [ $num_revisions -lt 2 ]
then
    exit
else

    # Check to see if the last named revision is actually the commit hash that just happened
    current_hash=$(git rev-parse HEAD)
    current_revision=$(git log --pretty=oneline | grep -i "revision [v\d\.]" | grep -oPi "v\d+(?:\.\d+)" | head -n 1)
    most_recent_revision_hash=$(git log --pretty=oneline | grep -i "revision [v\d+(?:\.\d+)*]" | head -n 1 | awk '{ print $1 }')
    
    # If the last commit wasn't the one that contained the most recent revision number, then there's nothing to do.
    if [[ "$current_hash" != "$most_recent_revision_hash" ]]; then
        exit
    fi

    previous_revision=$(git log --pretty=oneline | grep -i "revision [v\d\.]" | grep -oPi "v\d+(?:\.\d+)" | sed -n 2p)
    previous_revision_hash=$(git log --pretty=oneline | grep -i "revision [v\d+(?:\.\d+)*]" | sed -n 2p | awk '{ print $1 }')

    # Use the most edited tex file in this repository as the manuscript, unless the manuscript filename was specified as an argument
    most_edited_tex_file=$(git log --pretty=format: --name-only | sort | uniq -c | sort -rg | grep ".tex$" | head -n 1 | awk '{ print $2 }')
    manuscript_filename=${1:-$most_edited_tex_file}
    manuscript_filename_no_ext="${manuscript_filename%.*}"

    # If we can't find the manuscript filename, then exit.
    if [ ! -f $manuscript_filename ]; then
        echo "Manuscript file $manuscript_filename does not exist."
    fi

    # Get the manuscript file associated with the previous revision hash
    previous_manuscript_filename=$(gettempfilename previous)
    git show $previous_revision_hash:$manuscript_filename > $previous_manuscript_filename

    # Use latexdiff to create a difference version
    diff_ms_no_file_ext="$manuscript_filename_no_ext-revisions-$current_revision"
    latexdiff $previous_manuscript_filename $manuscript_filename > $diff_ms_no_file_ext.tex
    rm -f $previous_manuscript_filename

    # Compile the difference file
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1 
    bibtex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    pdflatex $diff_ms_no_file_ext.tex > /dev/null 2>&1
    
    # Remove the intermediate files
    ls $diff_ms_no_file_ext.* | grep -v pdf | xargs rm -f
    echo "Revisions to $manuscript_filename made between $previous_revision"\
         "and $current_revision are highlighted in $diff_ms_no_file_ext.pdf"
fi

Okay, now let’s work with an example of a “real” paper. Here’s the LaTeX for a manuscript:

\documentclass{article}

\begin{document}

\title{Fun with git and \LaTeX{}}
\author{Andrew R. Casey, Alice, Bob}

\maketitle

\begin{abstract}
One does not simply write an abstract.
\end{abstract}

\section{Introduction}
Here is the text of our introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta }
\end{equation}


\section{Conclusion}
There are many loose seals in the ocean.

\end{document}

Let’s commit this to the repository, and make a note in the commit message that this is version v0.1 of the paper.

mrmagoo:my-paper andycasey$ git add manuscript.tex
mrmagoo:my-paper andycasey$ git commit -m "First draft of paper, so revision v0.1"
[master 8857fdb] First draft of paper, so revision v0.1
 1 file changed, 26 insertions(+)
  create mode 100644 manuscript.tex

I send this version around to my co-authors Alice and Bob, and wait for their responses. Each time someone responds, I implement their suggestions and commit the changes to the repository.

Alice says,.. > You forgot my last name! I think you’re missing a constant from Equation 1. Also, can we use “simply not” instead of “not simply”?

We make the changes, then commit to the repository.

mrmagoo:my-paper andycasey$ git add manuscript.tex 
mrmagoo:my-paper andycasey$ git commit -m "Implemented changes suggested by Alice"
[master 6df5f6f] Implemented changes suggested by Alice
 1 file changed, 2 insertions(+), 2 deletions(-)

Bob says,.. > You should be more explicit in the conclusions. Perhaps mention how Buster should act accordingly? Also, you forgot my last name!

Bob’s suggestions are good, so we implement them. Here’s what the final LaTeX looks like:

\documentclass{article}

\begin{document}

\title{Fun with git and \LaTeX{}}
\author{Andrew R. Casey, Alice A. Aaronson, Bob B. Baaronson}

\maketitle

\begin{abstract}
One does simply not write an abstract.
\end{abstract}

\section{Introduction}
Here is the introduction.

\begin{equation}
    \label{simple_equation}
    \alpha = \sqrt{ \beta } + C
\end{equation}


\section{Conclusion}
There are many loose seals in the ocean. Buster is not allowed to swim in the ocean.

\end{document}

Since Bob is the last of our co-authors, once we’ve implemented his changes we can call this v0.2. Notice in this commit message that Revision v0.2 can be anywhere in the commit message, and is not case-sensitive.

mrmagoo:my-paper andycasey$ git add manuscript.tex
mrmagoo:my-paper andycasey$ git commit -m "Put in changes suggested by Bob. Revision v0.2 ready to be sent out"
Revisions to manuscript.tex made between v0.1 and v0.2 are highlighted in manuscript-revisions-v0.2.pdf
[master de29aa6] Put in changes suggested by Bob. Revision v0.2 ready to be sent out
 1 file changed, 2 insertions(+), 2 deletions(-)

Notice the extra message at the start? Our post-commit hook has run and seen that we have more than one revision in our commit history. It’s found the previous version, made a comparison on the two TeX files and compiled it for us!

Take a look:

mrmagoo:my-paper andycasey$ ls
README              manuscript-revisions-v0.2.pdf   manuscript.tex

Now we can send out the revised version (manuscript.pdf), as well as a PDF with the highlighted changes between version 0.1 and version 0.2 (manuscript-revisions-v0.2.pdf). This will happen anytime you commit with something like revision vX in the commit message. Also you can be as pedantic as you want: revision v1, revision v32.4, revision v0.1.3, etc are all acceptable. Easiest way to create automatic PDF diff files, ever!

Here’s what manuscript-revisions-v0.2.pdf looks like:

manuscript-revisions

This makes it infinitely easier for your co-authors to digest what has changed, and will drastically shorten the turnaround between manuscript revisions. If you were wondering, it doesn’t take the fist TeX file it sees; it finds the TeX file that has been edited the most times in the repository, which is probably your manuscript!

Making Python GUIs

Sometimes I want to make a simple (or complex) graphical user interface (GUI) for exploratory data analysis. I use Python, but there are ...… Continue reading

Hiatus

Published on September 13, 2015

Best and Brightest EMP Stars

Published on September 18, 2014