What do you want to learn?
Skip to main content
Code School: Git Real 2
by Gregg Pollack
Learn advanced Git techniques like Reflog, interactive rebase, and protecting secure data.
Start CourseBookmarkAdd to Channel
Table of contents
Git, a distributed source control repository. I try to picture clusters of information as they move through the file system with the commit messages like history books. I kept dreaming of a repository I thought I'd never see. And then one day, Git Real. (techno music) You're watching Git Real 2. I'm Gregg Pollack, and in this level, we're going to be taking a closer look at rebase. First of all, before we get into it, let's make sure everyone's on the same page with what exactly rebase does. So here you can see we have a master branch, and here are the last three commits, and here we have a unicorns branch, and here are the last three commits. As you can see, they have some common commits, so if we were to illustrate this out in a tree, we can see the master branch has two unique commits and the unicorn branch also has two unique commits. Our objective here is that we want to get our unicorns branch up to date. We basically want to replay commits from the master branch onto the unicorn branch. So to do this, we first checkout the unicorn branch, then we run git rebase master. The first thing this is going to do is move the unique commits on the unicorn branch into a temporary area. Now the unicorns branch is moved to the final commit of master. Then the commits we put in the temporary area are replayed on top of the unicorn branch. So here's what our branch looked like before we ran rebase, and here's what it looked like after we ran rebase. Now, it may be tempting to say that the commits from master were moved onto the unicorn branch, but that's not really what happened here. Basically, once we moved the new commits on the unicorn branch to the temporary area, the next thing that we did is change the unicorn branch to point to the last commit on the master, to the final commit on the master branch, and then we reran the commits from the temporary area on top of that branch. At some point while you're working with Git, you're going to run into a situation where you want to redo the commits on the same branch you're working on, you know, change them in some way. This is where git rebase -i comes in, or interactive rebase. In our case, we want to redo the last three commits. So we do HEAD tilde three, meaning three commits before the current head. When we execute this command, it's going to pop up an editor. Inside this editor, we find the rebase script. You can think of these as commands that are going to get run once we save and exit this editor. Here you can see we're using the pick command for each of our commits, and in the next couple slides, we'll go over what the other commands do and why you might use them. Now, if we save and exit the editor, what is it going to do exactly? Well, first, it's going to move those three commits into a temporary area. Then it's going to go through and run each of the commands in our script. In this case, since we're using the pick keyword, it's going to rerun each commit one at a time, just like so, and nothing is going to change. Interactive rebase alters every commit after the one you specify. So if we specified HEAD here, well, there's no commits after HEAD, right? So in this case, it would pop up an editor with basically nothing to do, noop. If we did git rebase HEAD with a carat, that means go to the parent of head, so that means go to the last commit. Inside the editor, we would simply have one commit, as you see here. So now let's use git rebase to manipulate some of our commits. If we run git log, we might notice that, well, two of our commits are in the wrong order. How might we switch the order? So here's what our branches look like, and we just want to switch the order of two of our commits. Well, first thing we might do is run git rebase -i, same command we saw in the previous slide, and that's going to pop up this editor. One thing that's interesting to notice is when we run git log, it's showing us the list of commits from newest to oldest, but when we run the git rebase command and we jump into our rebase script, it's showing our commits from oldest to newest. Notice how the order's slightly different. So in our case, we want to change the order of these commits, and all we have to do to make that happen is switch the order inside this editor, as you see here. Now if we save and exit the editor, it's going to rerun the commits in the order we specified inside this script, and if we run git log, we'll see that it's actually been switched. So what's happening here when we exit that rebase script is it's rewinding our code, taking the commits into a temporary area, and then replaying them in the order that we specified inside the script. Now, what if we look back at our commits and we notice that we used a bad commit message? Well, rebase interactive can also help us fix that. To do that, we simply run our same rebase command, and inside the editor, we simply need to change the pick command to the reword command for the commit we want to change the message on. Then when we save and exit this file, another editor is going to pop up. This is where we can edit our commit message. So we'll go ahead and edit the commit message, and then when we save and exit, it's going to rerun those commits, change the message so that when we log out the commits again, we can see that the message has properly been changed. Another way we can manipulate commits inside interactive rebase is when we need to split a single commit into two. Maybe there was too much code in that single commit and it really should've been split up into two or even three. So here's what our branch currently looks like, and we want to split that last commit into two. To do this, inside of our rebase script for the commit that we want to split, we change the keyword pick to edit. We want to edit this commit. When we save and exit this file, it's going to rerun the commits, and when it gets to the commit that we specified the edit command for, it's going to run that commit and spit us out on the command line with the message that you see here. If we wanted to add things to this commit, we could stage some code and then use commit amend to attach to this commit and then use git rebase to continue, but what we want to do is split this commit. Well, it's already been committed, so the first thing we're going to have to do on the command line is to type git reset HEAD, which is going to roll back our last commit, leaving the changes in our working directory. If we would've done a hard reset here with that --hard option, it would've actually erased our changes, but we want to split our current changes, not erase them. So what we might do now is simply state the files that we want in our first commit and commit them, and then state the files we want in our second commit and commit them. At this point, now we want to run rebase --continue. This will continue running any other commands in our rebase script, but in our case, this was the last one, so now when we do a log command, we can see that the commits have been split like we wanted them to. Sometimes we might have the opposite problem. We might have two or three commits that should've been just one, so we need a way of squashing them together. In our case, we have those two commits we just split, but what if we wanted to squash those back into one? So we want to squash the commits. Well, in this case, we're going to run the git rebase command, this time on the last four commits. Here's our editor. Inside our editor, we are simply going to specify which commit we want to squash into the previous commit. Once we save and edit the rebase script, another editor is going to pop up. This is going to show us the commit message from the first commit and the second commit that we're going to squash into the first. And as you can see here, it says enter a new commit message. So in this case, we're simply going to write a new commit message that combines them both, in this case just change plurals to capybara. Now when we save and exit this file, well, the two commits are going to be squashed into one, and if we take a look at our log, we can see that's what happened. Visually, here's what's happening. So when we run git rebase, it's taking those four commits, moving them to a temporary area, and then picking out each commit and rerunning it onto our timeline. Then when it gets to the squash command, it's taking that commit and merging it with the previous commit, giving it our updated commit message.
Git Real You're watching Git Real 2 and this is level two, where we're going to be talking about stashing. So there you are working on a feature branch and you're about part way through a commit when your boss calls and says, 'hey, there's something wrong with the server. I need you to make a commit right now to the master branch.' And you go, well huh, okay I need to go back over to master branch but I'm right in the middle of this html file, about halfway done with this "gerbil" right here. And you can't go ahead and commit it halfway through. That would be kind of bad. So, what do you do? Well, this is where stashing comes in. Stashing allows you to take some files that may not make up a full commit and store them away in a temporary area so you can restore them at a later time. So in our case, if we run 'git stash save,' it's going to take those files that haven't been completed yet and it's going to save them away in a temporary area. It's also going to restore the state from the last commit. So, in this case, if I run 'git diff,' there's no changes and if I run 'git status,' I can see there's nothing to commit on our gerbil branch. So now we can safely go back over to our master branch from here and make all the changes that we need. We can pull down updates, we can make commits, and we can push up the changes. Now when we're ready to resume working on our feature branch, we can go ahead and check out that particular branch and then run the command, 'git stash apply' This will rerun the changes that we stashed away before so that we can continue working on that code and eventually make a commit. Every time you run 'git stash save' it pushes that stash onto the stash stack. (chuckles) It's a stack of stashes. So if you run 'git stash list' you'll see a list of all of the stashes that you've used there. You'll see, it says, WIP as in work-in-progress on master, that's the branch where we stashed. And it gives you the last commit before we stashed, right? Because a stash is not a commit, it's giving you the commit that was right before the time that we stashed. The stashes are each given a name that you can reference if you want to apply a certain stash. So, if you wanted to apply just the middle stash, stash number one, we could call 'git stash apply stash one' like you see here, and it would be applied into our code. Stash zero, obviously that's the one at the top of the stack, is going to be applied by default if we don't specify a stash by name. When we run the stash apply command it's going to apply our stash but it's not going to pop our stash from the stash stack. In order to drop that stash off the stack we can run 'git stash drop' like so, and that's going to remove it from the list. And we can see if we run 'git stash list' again, we can see that it's gone. There's some intelligent defaults for running stash commands for example, we can just run 'git stash' and it's the same thing as running 'git stash save.' If we run 'git stash apply' it's going to run the stash at the top of the stack, which is going to be stash zero. 'Git stash drop' is going to automatically drop the stash at the top of the stash stack, and lastly, there's the 'git stash pop' command, which runs 'git stash apply' and 'git stash drop.' So it actually applies the stash and then pops it off of the top of the stack. Just like when you merge branches, there is the possibility of running into conflicts or other problems when you try to take two files and put them together and merge them together. So when you run 'git stash apply' well, sometimes you might get an error message that looks like this. In this case it wasn't able to modify the local file. So what it says here is, commit your changes or stash them before you go ahead and try to run your stash apply command. So it actually didn't run the 'git stash apply' it's saying, 'whoa whoa whoa, this isn't going to work until you, you know, either roll back your changes or commit your changes then do the apply again.' Sometimes when you run 'git stash apply' it will just merge and it won't abort, in which case you're either going to want to merge the changes and make a commit, or maybe even roll back. If you're using the 'stash pop' command and you find some merge conflicts, you'll want to resolve them as usual, however, you should note that it doesn't automatically pop the stash off of the stack. That's because you might want to just, you know, do a hard reset and then do the pop command again. So, it doesn't assume that you want to delete that stash. So in this case, if it does run into conflicts when you're using pop, you're going to want to go back into the stash list and manually drop that stash. So Jane's currently working on some code and she has some changes she wants to commit and some that she wants to stash for a later commit. Well, she might try writing, 'git stash save' on her working directory, but if we run 'git status' she'll find that it stashed everything including the staging the area. So that wasn't right. Let's get our changes back into our current working directory. So, if Jane runs 'git stash pop' it's going to restore the state. And you'll notice there that 'git stash' will save both the changes in the staging and un-staged area. So let's try this again and this time we'll use the dash dash keep index option. What that's going to do is keep around the files that we already have staged and ready to commit. So now if we run 'git status' we'll that only our un-staged files have been stashed. Now, Jane can go ahead and make that commit. And then to restore the files she can run, 'git stash pop' and then she can get back to work. But what if some of the files that Jane needed to stash weren't tracked files? So they were brand new files that were just created. Well, when she ran 'git stash save' and then run 'git status' she would see that those un-tracked files aren't getting saved into the stash. It's only going to stash file that 'git' knows about. To fix this we can run 'git stash save' with the dash dash include un-tracked option. This is going to include all of the un-tracked files. So after we run that and we run 'git status' we can see that it's properly stashed it and then to un-stash it we can run 'git stash pop' and it'll properly pop tracked and un-tracked files off of the stack. When you have a bunch of stashes in your stash list, sometimes it may be hard to tell them apart and figure out which on you want. Well, luckily we can run this with several options. We could run this with the dash dash stat command as you see here and it would list out all the different stats for all the different stashes. But really we can use any option that we would typically use with the 'git log' command. We can use the options here to get more details about each of the stashes. If we want more information about one particular stash, we could run 'git stash show' and then specify the name of that stash, like you see here. If we run 'git stash show' and don't specify any stash, well, it's going to show us details about the most recent stash. Just like 'git stash list' 'git stash show' can take any option that the 'git log' command takes. So in this case we could do 'git stash show dash dash patch' it shows the file changes within the stash. When you run 'git stash save' you can optionally provide a stash message just like a commit message. And then when you run 'git stash list' you're going to see that message on the list. So Gregg is back working on the gerbils again and well, he started working on the gerbil toy section but then again, management called and said, 'hey, you need to go ahead and deploy those gerbil things that you're working on.' So let's go ahead and stash save the toy section that we've been working on, and then we'll maybe merge our existing code over in a master to deploy to master. But what if, hypothetically, Gregg destroyed that gerbils branch, you know, the one where we stashed the current work we did on toys. Well now we need a new branch to restore that stashed toys page. Well, we can use this command, 'git stash branch gerbil toys stash zero.' What this is going to do is, it's going to create a new branch name, called gerbil toys and it's automatically going to pop off of the stash, the work that we did on gerbil toys onto that branch. Then we can go ahead and make our first commit onto this new branch. What if you have a bunch of stashes in your stash list and you no longer need them? Well, that's where the 'git stash clear' command comes in. You run that command, and it blows away all your stashes as you can see here.
Git Real. You're watching Git Real 2, and this is level three, where we're going to be talking about purging history. So our coworker Bob did something he shouldn't have. He committed the passwords.txt file to his repository. And how should he fix this? Well, he could create another commit which removes that file, but then the next time he pushes up his commits, well, it's going to push up that file, as well, because even if it was deleted, well, Git keeps a history of everything that was created and deleted. So what do we do in this circumstance? There are commands in Git to rewrite history, but with great power comes great responsibility. So you need to know some things about this. First of all, there's some reasons not to rewrite history. First of all, why bother? Your data is already compromised. Maybe you should just change your password in this instance and not worry about messing with rewriting history. Also, everyone's got to update their work to, you know, get your revised commits and the ones that you deleted the files, so maybe they shouldn't even see that that commit existed. When should do revise history? Well, when, perhaps, what you've committed violates somebody's copyright, like you're using someone's library that shouldn't be in there. Obviously you might want to rewrite history, get rid of that commit so those files have no trace in your repository. Also, you might have accidentally committed large files to your repository. For example, maybe large video files. We've done that a bit around here. And so we might want to delete those commits that have records of those files so that our repository stays small. And lastly, it's okay to rewrite commits if you're doing it on your local repository. As soon as you push up changes, you should probably consider those commits set in stone. The first thing that we're going to do before we rewrite history is to make a backup. Pretty straightforward. So we can run git clone petshop petshop-filter, and this is going to clone our repo, so we have backup just in case we accidentally delete some commits that we didn't mean to. To rewrite history, we're going to use the git filter-branch command with the tree-filter option. We can then specify after that any shell command that we want to run. What it's then going to do is check out each commit into a working directory, run that command against it, and then recommit the code. Bob might want to do this and run rum -f passwords.txt, which is going to go through each commit and remove any file that's called passwords.txt on it. If we had some large video files we wanted to remove, we could run this shell command which finds any files with the extension MP4 and removes them. Let's go ahead and run the passwords command, and we're going to need to specify at the end dash dash and then --all. What this is going to do is run this command on all commits in all of our branches. We could also specify HEAD instead of all here if we only wanted this filter to run in our current branch. Now if we take a look inside of our log, we can see that our passwords.txt file has been removed. What would happen, though, if we ran rm passwords.txt without that -f? Well, not every commit is going to have that passwords.txt file, so the ones that it doesn't are going to fail, and then the entire filter is going to fail. So that's why we add the -f option to our remove command to make sure that even if a file doesn't exist, it doesn't return a failure and our filter doesn't stop. As you might imagine, checking out every commit and running a command against it and recommitting it can take a while, especially if you have a large codebase. That's why there's a different option you can use here called index-filter. The commands we specify to send into index-filter must operate on a staging area. Git's going to run this command on every commit, but it's not going to check all the files out, so whatever command you specify needs to operate on a staging area. What this means is that our rm -f passwords.txt command isn't going to work. What we need to do in this case is specify a git command. In this case, we might use git rm with the --cached option which will operate in our staging area. You see that unmatched option? Well, if we didn't have that and we just did the command as you see here, it would fail the first time that passwords.txt isn't present. Just like with tree-filter, if the command we specify fails, it's going to stop doing the filter. So we want to make sure that when we run this git rm command, we specify --ignore-unmatch, which'll make sure that the command runs successfully even if the file doesn't exist. If you try to run filter-branch a second time, you might get this error message you see here. What happens is the first time you run filter-branch, it leaves a backup of your tree in the .git directory. So what you'll need to do when you run this a second time is make sure you specify the -f option. Dash f stands for force, which forces it to override the backup. If you're deleting files from history, some of your commits might end up empty, which isn't good. Why would you want to leave a commit around if it doesn't actually do anything to your code? Here you can see our commit history, and there's one commit in there that's empty. How do we get rid of it? Well, we can run the filter-branch command with the --prune-empty option. This is going to delete any commits which are empty. Now if we run git log again, we can see that the empty commit is gone. We could've included this prune-empty option when we ran filter-branch initially and removed the passwords.txt file, and if it found any empty commits, it would simply get rid of them.
Git Real You're watching Git Real 2 and this is level four where we're going to be talking about working together. As you may know different operating systems sometimes use different line separators. So, with OS 10 or Linux we use a line feed and if you're familiar with string encoding this also sometimes is represented as slash n. And if you're on Windows the line separator by default is going to be a carriage return followed by a line feed or a slash r and then a slash n. Problems with this arise when you have some team members that are working on OS 10 or Linux and maybe they have some line separators in their file like this, and then another team member on Windows opens up that same file, and since there's no carriage returns they don't see separate lines at all. Luckily, Git comes with some configuration to deal with this. If you want to a Unix-like system like Linux or OS 10 and you run this configuration command as you see here, this is going to make sure that any files that you commit, that there's any carriage returns followed by line feeds, then it's going to change them all to just line feeds. Then on windows systems you can run this configuration command. What this is going to do is change all line feeds to carriage return line feeds when you check out the file, that way somebody on Windows is going to properly see all the line endings, and it will convert them back to line feeds when you commit the files. If you're working on a team that only uses Windows, and doesn't need to worry about getting rid of the carriage returns or adding them, then you can use this configuration command which is just going to leave the carriage returns there. You don't have to rely on everybody setting that configuration item though, instead you can create a Git attributes file. This sits in your file route and on the left side it has file types, on the right side it has conversion settings, and over the next couple slides we'll describe what each of these mean. First let's talk about file type which is on the left side of the file. If you have a star it's obviously going to match any file. You can do star.html which obviously will only match html files and of course .jpg will match any jpeg file. Then on the right side of the file we have conversion settings, which allows you to specify how a particular file type should be handled. In this case with text equals auto it's going to detect whether our file is a text file and if it is, it's going to convert it properly. Then if it says text, it's going to treat the file as a text file properly converting the line endings. If you want to specify exactly how our text files are converted, we can use these configuration settings here. EOL stands for end of line, so if you use the top command here CRLF when you check out code, it's going to add the carriage returns so you can see the line breaks if you're on Windows and then when you check in the code it's going to remove the carriage returns. Whereas the second command here it's simply going to make sure that there's never any carriage returns in your code. Lastly the binary configuration option is going to treat the files as binary so it's not going to try to do any conversion. Here's some typical rules that you'll find in a git attributes file. First of all at the top you have a start text equals auto so by default convert line endings for all text files that you find. Then secondly let's go through and make sure all HTML and CSS files are treated as text files and obviously these aren't needed if you're using the text equals auto above because it's going to detect the HTML and CSS files as text files and convert them appropriately. If we needed to make sure that our images were interpreted as binary files then we could specify star.jpg binary and PNG binary. And lastly we want to make sure that all shell scripts or .sh files never have carriage returns in them and are treated appropriately but .bat files which might be batch files in Windows format do have the carriage returns so when we execute them they work properly. Next up we're going to be talking about cherry picking so when do you need cherry picking? Well let's say we're working in our current production branch and we realize that we need a piece of functionality that we coated in another branch however that piece of code that we need is a commit in the middle of a bunch of other commits, well what we need to do here is cherry-pick that commit and put it into our production branch. Visually here's what our branches look like and what we want to do is take a commit that's currently in our development branch and cherry-pick it into our production branch. To do this the first thing we're going to do is check out our production branch make sure we're on the right branch and then we're going to write git cherry-pick and specify the hash for that particular commit in this case the one that starts with 53212 and that's all you have to do. Now if we run git log on our production branch we'll see that that particular commit has been cherry picked and put onto our branch. Notice here that the sha on our cherry picked commit changed when we copied it from the development branch to the production branch, that's because it has a different parent on the production branch. Sometimes we might want to have a different commit message when we cherry-pick a commit in which case we can use the edit option when we execute this command it's going to pop up an editor where we can edit the commit message and change it to whatever we want it to be. Once we save and quit the cherry-pick proceeds as usual, but with a different commit message as we can see down here in the log. Sometimes we might want to take multiple commits from a branch cherry-pick them, and combine them into a single commit in our current branch. How can we do that? To do this we're going to use the no commit option and then specify the two different commits we want to combine. The no commit option takes the changes from these two commits, applies them to our current head, but does not make any commits. So when we run git status, we can see the changes that applying these commits made, but they haven't made any commits to the local branch, which we need to do now. So now all we need to do is commit these changes into our own commit in the production branch. This no commit option is really useful when you want to cherry-pick a commit but make small changes before you commit them to your local branch. When you cherry-pick it's really useful to keep track of where that commit came from. One way to do this is to use the -x option, what this is going to do is insert into the commit message what you see here, so you can see it was cherry-picked from commit and it shows you the hash right there. Now when we run the log command we can see which commit this came from. This is only useful when you're cherry-picking from public branches, because if someone checks out your code and you cherry-picked from a local branch, they don't have your local branch and so that hash isn't going to lead them anywhere. When you cherry-pick a commit, the author of that commit also gets moved over to the new commit, but you might need to keep track of who did the cherry-picking, that's where the dash dash signoff option comes in handy. This is going to add to the commit message showing who cherry-picked this commit or really who signed off on this commit, same thing.
Git Real. You're watching Git Real 2 and this is level six where we're going to be talking about reflog. So there Gregg is going through his code, and he's got the third section, second section, first section, hmm. And he decides to drop the commit. Easy way to do that is to do the git reset hard and then specify which commit to go back to, effectively blowing away the third section. But wait, that was a mistake. Maybe he actually does want a third section. But now how do we get back that commit that we just blew away? Well, luckily Git never actually deletes a commit, partially because of situations like this. So right now we have our branch and our head pointing to the second commit here, but we want to restore that last commit. How do we do that? Well, the commit's not listed in the log, so that's not any help, but Git keeps a second log which is only in your local repo called the reflog. So if we do git reflog, we'll see something that looks like that. And if we read through the reflog, we can see, oh look, it knows when we've called reset. It also knows about the last three commits, and we can see the commit that we want back. So good, it still exists. That's what we want. And let's keep note of that hash there. Every time our HEAD moves, either because we have a new commit or we changed branches or we do a reset, another entry gets put in our reflog. So that's why we have an entry for where we did reset as well as an entry for each commit. In the left column, the reflog shows us the SHA of the commit that the HEAD was pointing at in its current state. It also shows us a shortname that we can use to refer to that particular commit. And lastly, it gives us a description of what caused the HEAD to move. So the second entry here, that's the commit that we want back that's currently not attached to any branch. To get it back, we can use git reset hard and then specify that commit. Or if we want, we can use that shortname to check out that commit. And if we run a git log at this point, we can see that our commit is now back at our HEAD. You should note that the reflog only keeps track of what your HEAD is doing locally, right? So if Jane clones Gregg's repo, she's not going to be able to see his reflog. Her reflog is starting from scratch. All right, it's time for another story. Jane's working on a new aviary branch, but she's decided that it's not needed, so she goes to destroy it, and it tells her, uh-oh, are you sure you want to delete it? You've got code on that branch that hasn't been learned. But she says yeah, yeah, I don't need that branch, and she uses a capital D, which of course is going to delete the branch anyway. And then, lo and behold, she goes, oh no, I actually did need that branch. I need to get it back. Well, how does she get that back? As we saw before, Git never really deletes any commits. It just deleted the branch. So if we can find the latest commit from that branch, we can create a new branch which points back to that commit. That's all we have to do and it'll be like the branch was never deleted at all. So to find the last commit for that branch, we need to go back into the reflog, and we can use the same command as before, but that only gives us one line for each time the HEAD changed. If we want more detail, we can run git log --walk-reflogs, and that'll give us more information about each of these commits. So here we can take note of the shortname of the SHA. Then, instead of doing a reset like we did before to recover the commit, we're going to create a new branch. So we'll do git branch aviary. That's the name of our new branch. And then reference the hash of the last commit. We can obviously also use the reflog shortname of HEAD(1). Now we can check out that new branch, and if we run a log on that branch, it's as if we never deleted it. The birds have been resurrected.
Gregg is passionate about taking complex topics and teaching them efficiently. He's helped build Envy Labs, Starter Studio, and Code School. He also furthers education through BarCamp in Orlando,...
Released9 May 2013