Git for Solo Users

Is this lesson right for you?

Use this Checklist

Warning: This is a spare lesson, as Git lessons go. We’ll tell you how to do things properly, but we’re not going to tell you everything you can do—until later.

In this lesson, you will learn how to type things like git commit -m "my log message here." (among other Git commands) and know what it means.

Later, you will learn things like this:

Prepare yourself. As Jenny Bryan (whom we quote assiduously within) wrote, “It’s not you, it’s Git.”

Install Git

You need Git so you can use it at the command line and so RStudio can call it.

Is it already, miraculously, installed on your system?

If there’s any chance it’s installed already, verify that, rejoice, and skip the rest of the “Install Git” section.

If, instead, you see something more like git: command not found, keep reading.

macOS users might get an immediate offer to install command line developer tools. Yes, you should accept! Click Install and carry on.

Windows Users (highly recommended): Install Git for Windows, also known as msysgit or “Git Bash”, to get Git in addition to some other useful tools, such as the Bash shell. Yes, all those names are totally confusing, but you might encounter them elsewhere and we want you to be well informed.

We like this because Git for Windows leaves the Git executable in a conventional location, which will help you and other programs, e.g. RStudio, find it and use it. This also supports a transition to more expert use, because the “Git Bash” shell will be useful as you venture outside of R within RStudio.

NOTE: When asked about “Adjusting your PATH environment”, make sure to select “Git from the command line and also from 3rd-party software”. Otherwise, we believe it is good to accept the defaults. Note also that RStudio for Windows prefers for Git to be installed below C:/Program Files and this appears to be the default. This implies, for example, that the Git executable on my Windows system is found at C:/Program Files/Git/bin/git.exe. Unless you have specific reasons to otherwise, follow this convention.

Installing this way also leaves you with a Git client, though not a very good one. So check out Git clients we recommend.

If you were not able to download and install Git, I highly recommend following Jenny Bryan’s instructions. They are uniquely simple and clear.

FYI, this appears to be equivalent to what you would download from here: https://git-scm.com/download/.

What you Need for this Lesson

  1. An open shell (a.k.a. “terminal”) like this one:

  1. Data, which I have loaded for you in the tibble scores. Type scores and run the code to see what’s in it.
scores

We’ll use your shell and the data to work through some Git basics. We will also open the data file in RStudio.

git init: Make a New Repo and Put Something in It

  1. Place yourself in a directory where you can create another directory that you’ll make into a Git “repository” or, as many people say, “repo.” I like to put all my repos in the folder GitHub, which resides in Documents on my computer.

  1. From your shell, create a directory called scores and navigate into it. It should be empty.

Now execute this Git command:

  1. git init

NOTE: You can also create a directory while initializing it by sitting in the parent directory of your as-yet-unmade one and typing git init my_new_directory.

Next, see what’s in the directory. Specifically, look for the directory .git.

  1. ls
  2. ls *.git

  1. For some people, this looks like an empty directory, because you haven’t made invisible files visible in your operating system’s settings. Even if you can’t see it, cd into .git, just to see if you can.

Of course you can.

  1. Let’s go back into RStudio. Save scores as “scores.csv” in your lovely new repo using the command write.csv(scores, "/your_path_here/scores.csv").

You have a repo that has something in it. Well done.

I have a really cool shell setup called iTerm2. The prompt changes to include not only my directory’s name, but also the fact that I’m in a Git repo, which branch of the repo I’m currently using (the master branch—and more about branches later), and whether or not I have uncommitted material in the repo (see the ex?).

Any old shell terminal will work, though. You don’t have to get fancy to do this lesson.

git add: Preparing for a commit

Use git add to tell Git which files you want to include in the next commit. I could go into a lot of detail about indices and how Git’s guts interpret add, but let’s just not. Git’s guts are frankly rather frightening. You don’t need to know about them yet. Use Git’s documentation if you want more detail—but first, brace yourself.

The easiest syntax to use from the command line is plain old git add ., where the dot means “everything”. In your head, you would say, “Git, add everything that has changed since the last time I added things.” You can replace the dot with specific file names if you need to. Most of the time, though, you won’t.

Notice that my ex didn’t go away. That’s because there are changes in the directory that I haven’t committed yet.

git commit: Advance a Repo

The git commit command records changes to the repo.

Do it Right

Let’s try it. Follow these steps:

  1. If you are following this tutorial, you already told Git that there is new stuff in the repo by typing git add .

  1. Type git commit -m"Added the data file." (you can put a space between -m and the message if you like; it doesn’t matter).

You should fall gracefully to the next command line, with new information provided to you along the way, but without any error messages. And did you notice that my ex went away? That’s because the local repo is up to date with the remote repo.

Forget to Stage (i.e. add) your Changes

This is what happens when you commit without first telling Git that you have added things:

The Dreaded Editor: Commit Wrong and Find Yourself . . . Where? Why?

This is what happens if you forget to send a message for the log along with your commit:

Inside an editor with no instructions on how to edit anything (only a demand that you do), no instructions on how to leave, and a powerful sense that if you mess this up, you will hurt your repo: this is one of the worst places to end up in all of coding.

I will tell you the carefully guarded secrets of this editor just in case you end up there some day:

  1. Press i to enter “insert mode.” Now you can type.
  2. Uncomment (i.e., delete hashtags preceding) any line(s) you want to use to describe this commit.
  3. Press Esc. You are no longer in insert mode.
  4. Type wq to save and quit the editor and finish the commit.

None of this is intuitive. Remember our mantra:

It’s not you. It’s Git.

Then soldier on.

git branch and git checkout: Switching Branches

“What’s a branch?” You say.

It’s a mysterious concept you’re supposed to just git, say I. Ha.

Branching

Many people have gotten this far and begun to collapse into confusion because terms like “branch” are used by kindly developers all over the internet with nary a definition in sight.

The problem is that the metaphors are all mixed up. A branch is actually about wood and trees and growing things, but here we must view it as being about the flow of time (yes, flowing branches).

We tell Git to keep track of things for us, and Git obliges by creating a history of our changes. It takes snapshots of whatever we tell it to add and commit. If you lay those snapshots one above another, consecutively, you get a sort of tree. Right? You aren’t actually copying files anywhere; you are keeping track of how they have changed so you could, should you need to, reconstruct a prior version of the file stored further down on the branch. So don’t think wood so much as a geneology (in which all the parents are single).

Right now, we have only one branch. Find it by typing git show-branch. Git will show you all the branches and the commits that have been applied to them.

It might not surprise you to find out that our tree is extremely simple. Go ahead, try it.

See? One branch called “master” (to bring in yet another metaphor), which has seen a single commit (remember doing that git commit just before you got to this section? That commit).

Here is what Jenny Bryan & Co. say about branches: “Branches afford explicit workflows for integrating different lines of work on your own terms.” If people make major changes on branches, the master branch can stay as it is until the changes on other branches are worth keeping. The branches still track the history of files, but in different places from where the master branch has it. You can merge branches back together at will. Well, the person(s) who own(s) the parent branch (in our case, the master branch) can. But we’ll get to that later.

According to Bryan et al., Branching means that you take a “detour” from the main “stream” of development (oh, the metaphors!) and do work without changing the main “stream”.

Creating a branch and switching between branches is nearly instantaneous. Git’s handling of branches encourages workflows in which you create small branches for exploration or for new features and merge them back together quickly and accurately.

Checking out Branches

“What does it mean to check something out in Git?” you say.

One web resource said, “It is the process that works to checking out previous commits and files in which the current working folder is updated to get equality on a selected branch.”

Huh?

Let’s try again. “git-slam-branch (for instance) slams a few non-stashed remote branches outside all failed downstream origins, and after a git-terrorize-ref (logged by git-transform-object) remotes a subtree, unsuccessfully imported tips are prevented for the user, and trees that were quiltimported during initializing are left in a cloned state.”

What?

Okay, okay, I confess: That last one came from the Git Man (Manual) Page Generator, which is really fun and in fact slighly better than the real Git documentation found on your computer and mine.

Back to the question at hand: What is git checkout?

“You use git checkout to switch between branches” (Jenny Bryan of happygitwithr.com).

Now that makes sense.

You can create a new branch with git branch, then checkout the branch with git checkout (meanwhile adding one more to the list of metaphors).

You can also use the shortcut

git checkout -b issue-5

to create and checkout the branch all at once. We’ll talk more about checking out stuff in a minute.

Meanwhile—what do you do if you are working on a branch and need to switch, but the work on the current branch is not complete?

One option is git stash (which we can look at later), but generally, a better option is to safeguard the current state with a temporary commit.

Here I use “WIP” as the commit message to indicate work in progress. The command is git commit --all -m"WIP".

So you’ve committed that, and gone off and made some changes on the master branch and committed them, thus:

…and now you want to come back to the first branch and pick up where you left off, pretending you never committed.

You want, in fact, to undo the commit you just did and pretend it never happened. Read on to learn how to use git reset to do that.

git reset: Undoing Changes

You are sitting in a branch that you saved by committing it before you actaully wanted to. How do you undo that commit?

Well, you don’t actually “undo” it. You git reset it.

Here, we check out the branch we made in the previous section of this tutorial.

When you come back to the branch and continue your work, you need to undo the temporary commit by resetting your state. Specifically, we want a “mixed reset”, but we have so many mixed metaphors that we’ll just call it a reset for now. A reset eliminates the temporary “WIP” commit we made in the previous section.

Use the reference HEAD^ to take the commit state back to the parent (HEAD) of the current commit.

Why do we have a parent named “HEAD”? Why do we not call the parent “PARENT”? Because that would be waaay too consistent. This is Git, where the metaphors are always mixed.

(Actually, HEAD^ is a reference to tape recording. But I digress.)

Just . . . just go with it.

And there you are, back to the exact state you were in before you had to rush off and do something on another branch.

git log: What Have I Done?

So you just inherited this project from someone else and you were told to roll back a feature. How do you figure out when it was introduced and how you might (later) go back to before that?

I typed git log and my shell changed to show me this:

You might have to press Ctrl-z to get back to the command line. I did.

This log says that we committed once on Christmas Eve 2019 at 8 past 4 in the afternoon eastern time and that we said the commit “added the data file.”

Some logs are complicated. For complex projects you probably want to install a Git client such as those listed here. Most of them keep the log available to you at all times as a sort of map, like this (scroll right to see all of it):

git diff and git status: Seeing what you’re Doing

git diff shows you what changed, thus:

Consider version A of a file and a modified version, version B. Assume that version A was part of one Git commit and version B was part of the next commit. The set of differences between A and B is called a “diff”.

Here is an example of one I got while working on this lesson:

Git users contemplate diffs a lot.

Diff inspection is how you re-explain to yourself how version A differs from version B.

In fact, Git’s notion of any specific version of a document, code, or data asset is as an accumulation of diffs. If you go back far enough, you find the commit where the file was created in the first place. Every later version is stored by Git as that initial version plus all the intervening diffs in the history that affect the file.

Another way to look at changes is to check the status of your repo. I used git status in this repo twice, once before and once after I created another_file.

Merging Local Branches

Here is how you merge:

If I wasn’t up to date, Git would have opened a file for me to save a comment about the merge. When that happens, you can add a note and save the file, thus completing the merge. The result, from git log, is this:

If you are still having issues with Git dumping you into a line editor that you don’t know how to use, see this set of instructions—from Jenny Bryan, of course. At the bottom of the page is how to configure a different Git editor. Emacs is a bit easier to use than whatever the default editor is for, say, macOS. You can write in an Emacs editor and save what you wrote without having to both do something unintuitive and also deal with the fact that neither user interface nor instruction, anywhere.

git configure: Introduce yourself to Git

Use git config to tell Git who you are. These are the most useful things to tell Git about yourself:

  • Your user name with git config --global user.name 'Your Name'
  • Your email with git config --global user.email 'jane@example.com'

Here is an example:

If you want to know what else you can configure, try typing git config --list. The command will disappear and you will see a very long list. Here’s what is currently at the top of it:

See Jenny Bryan’s happygitwithr.com pages to find out more.

git tag: Naming a Commit

A tag is a name for a commit. Git already names each commit with a sha, a set of unique random numbers and letters for each commit. But suppose you want a better name, like “v 1.0” to indicate that a given commit was the beginning of version 1.0? Let’s try that in the shell.

When you flag git tag with -a, you can add some notes to the tag if you want to. This is what that looks like in Emacs (the editor I have configured as Git’s default).

You can make sure a commit got tagged by using git log.

You can see the sha, Git’s internal name for the commit. It’s a long, complicated set of characters. That’s why tagging is helpful.

Some Helpful Videos

I like this video about using Git with GitKraken, a rather nice user interface that functions a bit like RStudio does with R: It isn’t the actual software, but it is a very nice way of interfacing with the software.

GitKraken’s videos build on vocabulary you already know even if you know nothing at all about Git. The first part of every tutorial is a good explanation of a Git concept. The end of each video is a demo of how you can use what you just learned and do something in GitKraken. Skip the bits about GitKraken at the end if you like, or consider using GitKraken to manage your files with Git.

References

Main author and content wrangler: Sheila Braun (CHOP).

Portions of this lesson were wrenched wholesale, or copied and somewhat altered to match my own particular je ne sais quoi, from Happy Git and GitHub for the useR, which was the work of Jenny Bryan, Software Engineer at RStudio on the tidyverse/r-lib team.

The development and delivery of Happy Git and GitHub for the useR also benefited greatly from contributions by

  • Dean Attali, Shiny consultant and STAT 545 TA alum
  • Bernhard Konrad, Software Engineer at Google and STAT 545 TA alum
  • Shaun Jackman, Bioinformatics Ph.D. student at UBC, lead maintainer of Linuxbrew, and STAT 545 TA alum
  • Jim Hester, Software Engineer at RStudio on the tidyverse/r-lib team
  • A growing number of GitHub contributors