forked from hadley/r-pkgs
-
Notifications
You must be signed in to change notification settings - Fork 0
/
git.Rmd
736 lines (494 loc) · 41.3 KB
/
git.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
# (PART) Part Three {-}
```{r, include = FALSE}
source("common.R")
```
# Git and GitHub {#git}
If you're serious about software development, you need to learn about Git. Git is a _version control system_, a tool that tracks changes to your code and shares those changes with others. Git is most useful when combined with [GitHub](http://github.com), a website that allows you to share your code with the world, solicit improvements via pull requests and track issues. Git + GitHub is the most popular version control system for developers of R packages (witness the thousands of R packages hosted on GitHub).
Git and GitHub are generally useful for all software development and data analysis, not just R packages. I've included it here, because it is so useful when you're making a package. There's no way I could be as productive without Git and GitHub at my back, enabling me to rapidly spot mistakes and easily collaborate with others.
Why use Git + GitHub?
* It makes sharing your package easy. Any R user can install your package with
just two lines of code:
```{r, eval = FALSE}
install.packages("devtools")
devtools::install_github("username/packagename")
```
* GitHub is a great way to make a barebones website for your package. Readers
can easily browse code, and read documentation (via Markdown). They can
report bugs, suggest new features with [GitHub issues][gh-issues], and
propose improvements to your code with pull requests.
* Have your ever tried to collaboratively write code with someone by sending
files back and forth via email or a Dropbox folder? It takes a lot of effort
just to make sure that the two of you aren't working on the same file and
overwriting each others' changes. With Git, both of you can work on the
same file at the same time. Git will either combine your changes
automatically, or it will show you all the ambiguities and conflicts.
* Have you ever accidentally pressed `s` instead of Cmd + S to save your
file? It's very easy to accidentally introduce a mistake that takes a few
minutes to track down. Git makes this problem easy to spot because it
allows you to see exactly what's changed and undo any mistakes.
You can do many of these same things with other tools (like [subversion](https://subversion.apache.org) or [mercurial](https://www.mercurial-scm.org/)) and other websites (like [gitlab](https://about.gitlab.com) and [bitbucket](https://bitbucket.org)). Git is most useful in conjunction with GitHub, and vice versa, so I'll make no effort to distinguish between features that belong to Git and those that belong to GitHub. But I think Git + GitHub is the most user-friendly system (especially for new developers), not least because its popularity means that the answer or solution to every possible question or problem can be found on StackOverflow.
This is not to say that Git is easy to learn. Your initial experiences with Git are likely to be frustrating and you will frequently curse at the strange terminology and unhelpful error messages. Fortunately, there are many tutorials available online, and while they aren't always well written (many provide a lot of information but little guidance about what to do with it or why you need to care), you can absolutely master Git with a little practice. Don't give up! Persevere and you'll unlock the super-power of code collaboration.
## RStudio, Git and GitHub {#git-rstudio}
RStudio makes day-to-day use of Git simpler. Once you've set up a project to use Git, you'll see a new pane and toolbar icon. These provide shortcuts to the most commonly used Git commands. However, because only a handful of the 150+ Git commands are available in RStudio, you also need to be familiar with using Git from the __shell__ (aka the command line or the console). It's also useful to be familiar with using Git in a shell because if you get stuck you'll need to search for a solution with the Git command names.
The easiest way to get to a shell from RStudio is Tools > Shell. This will open a new shell located in the root directory of your project. (NB: on Windows, this opens up a _bash_ shell, the standard Linux shell, which behaves a little differently from the usual `cmd.exe` shell.)
Don't worry if you've never used the shell before because it's very similar to using R. The main difference is that instead of functions, you call commands, which have a slightly different syntax. For example, in R you might write `f(x, y = 1)`, where in the shell you'd write `f x --y=1` or `f x -y1`. Also, while shell commands are even less regular than R functions, you fortunately only need to be familiar with a few. In this chapter, you won't be doing much in the shell apart from running Git commands. However, it's a good idea to learn the three most important shell commands:
* `pwd`: print working directory. This tells you which directory you're currently in.
* `cd <name>`: change directory. Use `cd ..` to move up the directory hierarchy.
* `ls`: list files. Shows all files in the current directory.
If you've never used the shell before, I recommend playing [Terminus](http://web.mit.edu/mprat/Public/web/Terminus/Web/main.html). It's a fun way to learn the basics of the shell. I also recommend taking a look at Philip Guo's [Basic Unix-like command line tutorial](http://pgbovine.net/command-line-tutorial.htm) videos, and at <http://www.ee.surrey.ac.uk/Teaching/Unix/unix1.html> and <https://p1k3.com/userland-book/>.
## Initial set up {#git-setup}
If you've never used Git or GitHub before, start by installing Git and creating a GitHub account. Then, link the two together:
1. Install Git:
* Windows: <http://git-scm.com/download/win>.
* OS X: <http://git-scm.com/download/mac>.
* Debian/Ubuntu: `sudo apt-get install git-core`.
* Other Linux distros: <http://git-scm.com/download/linux>.
1. Tell Git your name and email address. These are used to label each commit
so that when you start collaborating with others, it's clear who made
each change. In the shell, run:
```bash
git config --global user.name "YOUR FULL NAME"
git config --global user.email "YOUR EMAIL ADDRESS"
```
(You can check if you're set up correctly by running
`git config --global --list`.)
1. Create an account on GitHub, <https://github.com> (the free plan is fine).
Use the same email address as above.
1. If needed, generate a SSH key. SSH keys allow you to securely
communicate with websites without a password. There are two parts to
an SSH key: one public, one private. People with your public key can
securely encrypt data that can only be read by someone with your
private key.
From R, you can check if you already have an SSH key-pair by running:
```{r, eval = FALSE}
file.exists("~/.ssh/id_rsa.pub")
```
If that returns `FALSE`, you'll need to create a new key. You can either follow the
[instructions on GitHub](https://help.github.com/articles/generating-ssh-keys)
or use RStudio. Go to RStudio's global options, choose the Git/SVN panel, and
click "Create RSA key...":
```{r, echo = FALSE}
knitr::include_graphics("images/git-config-2.png", dpi = 220)
```
1. Give GitHub your SSH public key: <https://github.com/settings/ssh>.
The easiest way to find the key is to click "View public key" in
RStudio's Git/SVN preferences pane.
## Create a local Git repository {#git-init}
Now that you have installed and configured Git, you can use it! To use GitHub with your package, you'll need to initialise a local repository, or __repo__ for short. This creates a `.git` directory that stores configuration files and a database that records changes to your code. A new repo exists only on your computer; you'll learn how to share it with others shortly.
To create a new repo:
* In RStudio, go to project options, then to the Git/SVN panel. Change the
"Version control system" from "None" to "Git":
```{r, echo = FALSE}
knitr::include_graphics("images/git-proj-config.png", dpi = 220)
```
You'll then be prompted to restart RStudio.
* In a shell, run `git init`. Restart RStudio and reopen your package.
Once Git has been initialised, you'll see two new components:
* The __git pane__, at the top-right, shows you what files have
changed and includes buttons for the most important Git commands:
```{r, echo = FALSE}
knitr::include_graphics("images/git-pane.png", dpi = 220)
```
* The __git dropdown__ menu, found in the toolbar, includes Git and GitHub commands
that apply to the current file:
```{r, echo = FALSE}
knitr::include_graphics("images/git-dropdown.png", dpi = 220)
```
## See what's changed {#git-status}
The first benefit of Git is that you can easily see the changes you've made. I find this really helpful, as I often accidentally mistype keyboard shortcuts, leaving stray characters in my code. The RStudio Git pane lists every file that's been added, modified or deleted. The icon describes the change:
* `r knitr::include_graphics("images/git-modified.png", dpi = 220)`,
__Modified__. You've changed the contents of the file.
* `r knitr::include_graphics("images/git-unknown.png", dpi = 220)`,
__Untracked__. You've added a new file that Git hasn't seen before.
* `r knitr::include_graphics("images/git-deleted.png", dpi = 220)`,
__Deleted__. You've deleted a file.
You can get more details about modifications with a "diff", `r knitr::include_graphics("images/git-diff.png", dpi = 220)`. This opens a new window showing the detailed **diff**erences:
```{r, echo = FALSE}
knitr::include_graphics("images/git-diff-window.png", dpi = 220)
```
The background colours tells you whether the text has been added (green) or removed (red). (If you're colourblind you can use the line numbers in the two columns at the far left as a guide: a number in the first column identifies the old version, a number in second column identifies the new version.) The grey lines of code above and below the changes give you additional context.
In the shell, use `git status` to see an overview of changes and `git diff` to show detailed differences.
## Record changes {#git-commit}
The fundamental unit of work in Git is a __commit__. A commit takes a snapshot of your code at a specified point in time. Using a Git commit is like using anchors and other protection when climbing. If you're crossing a dangerous rock face you want to make sure you've used protection to catch you if you fall. Commits play a similar role: if you make a mistake, you can't fall past the previous commit. Coding without commits is like free-climbing: you can travel much faster in the short-term, but in the long-term the chances of catastrophic failure are high! Like rock climbing protection, you want to be judicious in your use of commits. Committing too frequently will slow your progress; use more commits when you're in uncertain or dangerous territory. Commits are also helpful to others, because they show your journey, not just the destination.
There are five key components to every commit:
* A unique identifier, called a SHA (short for secure hash algorithm).
* A changeset that describes which files were added, modified and deleted.
* A human-readable commit message.
* A parent, the commit that came before this one. (There are two exceptions to
this rule: the initial commit doesn't have a parent, and merges, which you'll
learn about later, have two parents.)
* An author.
You create a commit in two stages:
1. You __stage__ files, telling Git which changes should be included in the
next commit.
1. You __commit__ the staged files, describing the changes with a message.
In RStudio, staging and committing are done in the same place, the commit window, which you can open by clicking `r knitr::include_graphics("images/git-commit.png", dpi = 220)` or by pressing Ctrl + Alt + m.
```{r, echo = FALSE}
knitr::include_graphics("images/git-commit-window.png", dpi = 220)
```
The commit window is made up of three panes:
* The top-left pane shows the current status in the way as the Git pane in the
main RStudio window.
* The bottom pane shows the diff of the currently selected file.
* The top-right pane is where you'll enter the commit message, a human
readable message summarising the changes made in the commit. More on
that shortly.
(Yes, this is exactly the same window you see when clicking `r knitr::include_graphics("images/git-diff.png", dpi = 220)`!)
To create a new commit:
1. __Save your changes__.
1. __Open the commit window__ by clicking
`r knitr::include_graphics("images/git-commit.png", dpi = 220)` or
pressing `Ctrl + Alt + m`.
1. __Select files__. To stage (select) a single file for inclusion, tick its
check box. To stage all files, press Ctrl/Cmd + A, then click
`r knitr::include_graphics("images/git-stage.png", dpi = 220)`.
As you stage each file, you'll notice that its status changes. The icon
will change columns from right (unstaged status) to left (staged status),
and you might see one of two new icons:
* Added: `r knitr::include_graphics("images/git-added.png", dpi = 220)`:
after staging an untracked file, Git now knows that you want to add it
to the repo.
* Renamed: `r knitr::include_graphics("images/git-renamed.png", dpi = 220)`:
If you rename a file, Git initially sees it as a deletion and addition.
Once you stage both changes, Git will recognise that it's a rename.
Sometimes you'll see a status in both columns, e.g.
`r knitr::include_graphics("images/git-modified-staged.png", dpi = 220)`.
This means that you have both staged and unstaged changes in the same file.
This happens when you've made some changes, staged them, and then made some
more. Clicking the staged checkbox will stage your new changes, clicking
it again will unstage both sets of changes.
1. __Stage files__, as above.
1. __Write a commit message__ (top-right panel) which describes the changes
that you've made. The first line of a commit message is called the subject
line and should be brief (50 characters or less). For complicated commits,
you can follow it with a blank line and then a paragraph or bulleted list
providing more detail. Write messages in imperative, like you're telling
someone what to do: "fix this bug", not "fixed this bug" or
"this bug was fixed".
1. __Click Commit__.
Staging files is a little more complicated in the shell. You use `git add` to stage new and modified files, and `git rm` to stage deleted files. To create the commit, use `git commit -m <message>`.
## Commit best practices {#commit-best-practices}
Ideally, each commit should be _minimal_ but _complete_:
* __Minimal__: A commit should only contain changes related to a single
problem. This will make it easier to understand the commit at a glance, and to
describe it with a simple message. If you should discover a new problem, you
should do a separate commit.
* __Complete__: A commit should solve the problem that it claims to solve.
If you think you've fixed a bug, the commit should contain a unit test
that confirms you're right.
Each commit message should:
* __Be concise, yet evocative__. At a glance, you should be able to see
what a commit does. But there should be enough detail so you can remember
(and understand) what was done.
* __Describe the why, not the what__. Since you can always retrieve the diff
associated with a commit, the message doesn't need to say exactly what
changed. Instead it should provide a high-level summary that focuses on the
reasons for the change.
If you do this:
* It'll be easier to work with others. For example, if two people have changed
the same file in the same place, it'll be easier to resolve conflicts if the
commits are small and it's clear why each change was made.
* Project newcomers can more easily understand the history by reading the commit
logs.
* You can load and run your package at any point along its development history.
This can be tremendously useful with tools like
[bisectr](https://github.com/wch/bisectr), which allow you to use binary
search to quickly find the commit that introduced a bug.
* If you can figure out exactly when a bug was introduced, you can easily
understand what you were doing (and why!).
You might think that because no one else will ever look at your repo, that writing good commit messages is not worth the effort. But keep in mind that you have one very important collaborator: future-you! If you spend a little time now polishing your commit messages, future-you will thank you if and when they need to do a post-mortem on a bug.
Remember that these directives are aspirational. You shouldn't let them get in your way. If you look at the commit history of my repositories, you'll notice a lot of them aren't that good, especially when I start to get frustrated that I __still__ haven't managed to fix a bug. Strive to follow these guidelines, and remember it's better to have multiple bad commits than to have one perfect commit.
## Ignoring files {#git-ignore}
Often, there are files that you don't want to include in the repository. They might be transient (like LaTeX or C build artefacts), very large, or generated on demand. Rather than carefully _not_ staging them each time, you should instead add them to `.gitignore`. This will prevent them from accidentally being added. The easiest way to do this is to right-click on the file in the Git pane and select `Ignore`:
```{r, echo = FALSE}
knitr::include_graphics("images/git-ignore.png", dpi = 220)
```
If you want to ignore multiple files, you can use a wildcard "glob" like `*.png`. To learn more about the options, see [ignoring files](http://git-scm.com/book/ch2-2.html#Ignoring-Files) in Pro-Git.
Some developers never commit derived files, files that can be generated automatically. For an R package this would mean ignoring the files in the `NAMESPACE` and `man/` directories because they're generated from comments. From a practical pespective, it's better to commit these files: R packages have no way to generate `.Rd` files on installation so ignoring derived files means that users who install your package from GitHub will have no documentation.
## Undo a mistake {#git-undo}
The best thing about using commits is that you can undo mistakes. RStudio makes it particularly easy:
* To undo the changes you've just made, right click on the file in the Git
pane and select "revert". This will roll any changes back to the previous
commit. Beware: you can't undo this operation!
You can also undo changes to just part of a file in the diff window. Look
for a __discard chunk__ button above the block of changes that you want to
undo:`r knitr::include_graphics("images/git-chunk.png", dpi = 220)`. You
can also discard changes to individual lines or selected text.
* If you committed changes too early, you can modify the previous commit by
staging the extra changes. Before you click commit, select
`r knitr::include_graphics("images/git-commit-amend.png", dpi = 220)`.
(Don't do this if you've pushed the previous commit to GitHub - you're
effectively rewriting history, which should be done with care when you're
doing it in public.)
If you didn't catch the mistake right away, you'll need to look backwards in history and find out where it occurred:
1. Open the history window by clicking
`r knitr::include_graphics("images/git-history.png", dpi = 220)` in the
Git pane.
```{r, echo = FALSE}
knitr::include_graphics("images/git-history-window.png", dpi = 220)
```
The history window is divided into two parts. The top part lists every
commit to your repo. The bottom part shows you the commit:
the SHA (the unique id), the author, the date, the parent and
the changes in the commit.
1. Navigate back in time until you find the commit where the mistake occurred.
Write down the parent SHA: that's the commit that occurred before the
mistake so it will be good.
Now you can use that SHA in the shell:
* See what the file looked like in the past so you can copy-and-paste the old
code:
```bash
git show <SHA> <filename>
```
* Or copy the version from the past back in to the present:
```bash
git checkout <SHA> <filename>
```
In both cases you'll need to finish by staging and committing the files.
(It's also possible to use Git as if you went back in time and prevented the mistake from happening in the first place. This is an advanced technique called __rebasing history__. As you might imagine, going back in time to change the past can have a profound impact on the present. It can be useful, but it needs to be done with extreme care.)
If you're still stuck, try <http://sethrobertson.github.io/GitFixUm/fixup.html> or <http://justinhileman.info/article/git-pretty/>. They give step-by-step approaches to fixing many common (and not so common!) problems.
## Synchronising with GitHub {#github-init}
So far we've only been working locally, using commits to track the progress of a project and to provide safe checkpoints. However, Git really shines when you start sharing your code with others with [GitHub](http://github.com). While there are other choices, I recommend GitHub because it is free for open source projects, it has all the features you'll need, and is a popular choice in the R world.
To publish, or __push__, your code to GitHub:
1. Create a new repo on GitHub: <https://github.com/new>. Give it the same
name as your package, and include the package title as the repo
description. Leave all the other options as is, then click Submit.
1. Open a shell, then follow the instructions on the new repo page.
They'll look something like this:
```bash
git remote add origin [email protected]:hadley/r-pkgs.git
git push -u origin master
```
The first line tells Git that your local repo has a remote version on
GitHub, and calls it "origin". The second line pushes all your current
work to that repo.
Now let's make a commit and verify that the remote repo updates:
1. Modify `DESCRIPTION` to add `URL` and `BugReports` fields that link
to your new GitHub site. For example, dplyr has:
```yaml
URL: http://github.com/hadley/dplyr
BugReports: http://github.com/hadley/dplyr/issues
```
1. Save the file and commit (with the message "Updating DESCRIPTION to add
links to GitHub site").
1. __Push__ your changes to GitHub by clicking
`r knitr::include_graphics("images/git-push.png", dpi = 220)`. (This is
the same as running `git push` in the shell).
1. Go to your GitHub page and look at the `DESCRIPTION`.
Usually, each push will include multiple commits. This is because you push much less often than you commit. How often you push versus commit is completely up to you, but pushing code means publishing code. So strive to push code that works.
To ensure your code is clean, I recommend always running `R CMD check` before you push (a topic you'll learn about in the chapter on [automated checking](#check)). If you want to publish code that doesn't work (yet), I recommend using a branch, as you'll learn about below in [branching](#git-branch).
Once you've connected your repo to GitHub, the Git pane will show you how many commits you have locally that are not on GitHub: `r knitr::include_graphics("images/git-local-commits.png", dpi = 220)`. This message indicates that I have 1 commit locally (my branch) that is not on GitHub ("origin/master").
## Benefits of using GitHub {#github-benefit}
1. You get a decent website. The GitHub page for your project, e.g.
<https://github.com/hadley/testthat> (the GitHub repo for testthat), lists
all the files and directories in your package. `.R` files will be formatted
with syntax highlighting, and `.md`/`.Rmd` files will be rendered as HTML.
And, if you include a `README.md` file in the top-level directory, it will
be displayed on the homepage. You'll learn more about the benefits of
creating this file in [README.md](#readme).
1. It makes it easy for anyone to install your package (and to benefit from your
hard work):
```r
devtools::install_github("<your_username>/<your_package>")
```
1. You can track the history of the project in the commit view, e.g.
<https://github.com/hadley/testthat/commits/master>. When I'm working on
a package with others, I often keep this page open so I can see what
they're working on. Individual commits show the same information that you
see in the commit/diff window in RStudio.
1. It's easy to see the history of a file. If you navigate to a file and click
__History__, you'll see every commit that affected that file. Another useful
view is __Blame__; it shows the last change made to each line of code, who
made the change, and the commit the change belongs to. This is tremendously
helpful when you're tracking down a bug.
You can jump directly to these pages from RStudio with the Git dropdown in
the main toolbar:
```{r, echo = FALSE}
knitr::include_graphics("images/git-dropdown.png", dpi = 220)
```
1. You can comment on commits. To comment on the commit as a whole, use the
comment box at the bottom of the page. To comment on an individual line,
click the plus sign that appears when you mouse over a line number,
`r knitr::include_graphics("images/github-comment-line.png", dpi = 220)`.
This is a great way to let your collaborators know if you see a mistake or
have a question. It's better than email because it's public so anyone
working on the repo (both present and future) can see the conversation.
## Working with others {#git-pull}
You use __push__ to send your changes to GitHub. If you're working with others, they also push their changes to GitHub. But, to see their changes locally you'll need to __pull__ their changes from GitHub. In fact, to make sure everyone is in sync, Git will only let you push to a repo if you've retrieved the most recent version with a pull.
When you pull, Git first downloads (__fetches__) all of the changes and then __merges__ them with the changes that you've made. A merge is a commit with two parents. It takes two different lines of development and combines them into a single result. In many cases, Git can do this automatically: for example, when changes are made to different files, or to different parts of the same file. However, if changes are made to the same place in a file, you'll need to resolve the __merge conflict__ yourself.
In RStudio, you'll discover that you have merge conflict when:
* A pull fails with an error.
* In the Git pane, you see a status like
`r knitr::include_graphics("images/git-commit-conflict.png", dpi = 220)`
RStudio currently doesn't provide any tools to help with merge conflicts, so you'll need to use the command line. I recommend starting by setting your merge conflict "style" to `diff3`. The `diff3` style shows three things when you get a merge conflict: your local changes, the original file and the remote changes. The default style is `diff2`, which only shows your changes and the remote changes. This generally makes it harder to figure out what's happened.
* If you've encountered your first merge conflict, do the following:
```bash
# Abort this merge
git merge --abort
# Set the conflict style
git config --global merge.conflictstyle diff3
# Re-try the merge
git pull
```
* If you're not in the middle of a merge conflict, just run
```bash
git config --global merge.conflictstyle diff3
```
To resolve a merge conflict, you need to open every file with the status `r knitr::include_graphics("images/git-commit-conflict.png", dpi = 220)`. In each file, you'll find a conflict marker that looks like this:
```
<<<<<<< HEAD
||||||| merged common ancestors
=======
>>>>>>> remote
```
This shows all three versions of the conflicting code:
* At the top, your local code.
* In the middle, the code from the last commit before the split between the
two lines of development (This is missing in the default conflict
style, so if you don't see it, follow the instructions above).
* At the bottom, the remote code that you pulled down from GitHub.
You need to work through each conflict and decide either which version is better, or how to combine both versions. Then, before you stage the file, make sure you've deleted all the conflict markers. Once you've fixed all conflicts, make a new commit and push to GitHub.
A couple of pointers when fixing text generated by roxygen:
* Don't fix problems in `man/*.Rd` files. Instead, resolve any conflicts in
the underlying roxygen comments and re-document the package.
* Merge conflicts in the `NAMESPACE` file will prevent you from re-loading or
re-documenting a package. Resolve them enough so that the package
can be loaded, then re-document to generate a clean and correct `NAMESPACE`.
Handling merge conflicts is one of the trickier parts of Git. You may need to read a few tutorials before you get the hang of it. Google and StackOverflow are great resources. If you get terribly confused, you can always abort the merge and try again by running `git merge --abort` then `git pull`.
## Issues {#github-issues}
Every GitHub repo comes with a page for tracking issues. Use it! If you encounter a bug while working on another project, jot a note down on the issues page. When you have a smaller project, don't worry too much about milestones, tags and assigning issues to specific people. Those are more useful once you get over a page of issues (>50). Once you get to that point, read the GitHub guide on issues: <https://guides.github.com/features/issues/>.
A useful technique is closing issues from a commit message. Just put `Closes #<issue number>` somewhere in your commit message and GitHub will close the issue for you when you next push. The best thing about closing issues this way is that it makes a link from the issue to the commit. This is useful if you ever have to come back to the bug and want to see exactly what you did to fix it. You can also link to issues without closing them; just refer to `#<issue number>`.
As you'll learn about in [NEWS.md](#news), it's a good idea to add a bullet to `NEWS.md` whenever you close an issue. The bullet point should describe the issue in terms that users will understand, as opposed to the commit message which is written for developers.
## Branches {#git-branch}
Sometimes you want to make big changes to your code without having to disturb your main stream of development. Maybe you want to break it up into multiple simple commits so you can easily track what you're doing. Maybe you're not sure what you've done is the best approach and you want someone else to review your code. Or, maybe you want to try something experimental (you can merge it back only if the experiment succeeds). Branches and pull requests provide powerful tools to handle these situations.
Although you haven't realised it, you're already using branches. The default branch is called __master__; it's where you've been saving your commits. If you synchronise your code to GitHub you'll also have a branch called __origin/master__: it's a local copy of all the commits on GitHub, which gets synchronised when you pull. `git pull` does two things:
1. `git fetch origin master` to update the local `origin/master` branch with
the latest commits from GitHub.
1. `git merge origin/master` to combine the remote changes with your changes.
It's useful to create your own branches when you want to (temporarily) break away from the main stream of development. You can create a new branch with `git checkout -b <branch-name>`. Names should be in lower case letters and numbers, with `-` used to separate words.
Switch between branches with `git checkout <branch-name>`. For example, to return to the main line of development use `git checkout master`. You can also use the branch switcher at the top right of the Git pane:
```{r, echo = FALSE}
knitr::include_graphics("images/git-branch.png", dpi = 220)
```
If you've forgotten the name of your branch in the shell, you can use `git branch` to list all existing branches.
If you try to synchronise this branch to GitHub from inside RStudio, you'll notice that push and pull are disabled: `r knitr::include_graphics("images/git-no-remote.png", dpi = 220)`. To enable them, you'll need to first tell Git that your local branch has a remote equivalent:
```bash
git push --set-upstream origin <branch-name>
```
After you've done that once, you can use the pull and push buttons as usual.
If you've been working on a branch for a while, other work might have been going on in the master branch. To integrate that work into your branch, run `git merge master`. You will need to resolve any merge conflicts (see above). It's best to do this fairly frequently - the less your branch diverges from the master, the easier it will be to merge.
Once you're done working on a branch, merge it back into the master, then delete the branch:
```bash
git checkout master
git merge <branch-name>
git branch -d <branch-name>
```
(Git won't let you delete a branch unless you've merged it back into the master branch. If you do want to abandon a branch without merging it, you'll need to force delete with `-D` instead of `-d`. If you accidentally delete a branch, don't panic. It's usually possible to get it back. See the advice about undoing mistakes).
## Making a pull request {#git-pullreq}
A pull request is a tool for proposing and discussing changes before merging them into a repo. The most common use for a pull request is to contribute to someone else's code: it's the easiest way to propose changes to code that you don't control.
Below, you'll learn about pull requests to make changes to your own code. This may seem a bit pointless because you don't _need_ them, as you can directly modify your code. But pull requests are surprisingly useful because they allow you to get feedback on proposed changes. We use them frequently at RStudio to get feedback before merging major changes.
GitHub has some good documentation on using pull requests: <https://help.github.com/articles/using-pull-requests/>. In this chapter, I'll focus on the basics you need to know to use pull requests effectively, and show you how they fit in with the Git commands you've learned so far.
To create a pull request, you create a branch, commit code, then push the branch to GitHub. When you next go to the GitHub website you'll see a header that invites you to submit a pull request. You can also do it by:
1. Switching branches:
```{r, echo = FALSE}
knitr::include_graphics("images/github-branches.png", dpi = 220)
```
1. Clicking `r knitr::include_graphics("images/pr-create.png", dpi = 220)`
This will create a page that looks like this:
```{r, echo = FALSE}
knitr::include_graphics("images/pr.png", dpi = 220)
```
This pull request, which fixes a couple of small problems, is one that was submitted to this book's GitHub site.
There are three parts to a pull request:
* A __conversation__,
`r knitr::include_graphics("images/pr-conversation.png", dpi = 220)`,
where you can discuss the changes as a whole.
* The __commits__ view,
`r knitr::include_graphics("images/pr-commits.png", dpi = 220)`,
where you can see each individual commit.
* The __file changes__,
`r knitr::include_graphics("images/pr-changes.png", dpi = 220)`,
where you see the overall diff of the commits, and you can comment
on individual lines.
Once you're done discussing a pull request, you either choose to merge it or delete it. Merging it is equivalent to running `git merge <branchname>` from the shell; deleting is equivalent to `git branch -d <branchname>`.
## Submitting a pull request to another repo {#pr-make}
To submit a pull request to a repo that you don't own, you first need to create a copy of the repo that you can own, called a __fork__, and then clone that fork on your own computer:
1. __Fork__ the original repo by going to the repo on GitHub and clicking
`r knitr::include_graphics("images/github-fork.png", dpi = 220)`.
This creates a copy of the repo that belongs to you.
1. __Clone__ the forked repo to create a local copy of the remote repo.
It's possible to do this within RStudio (using "Create New Project" from
"Version Control") but I think it's easier to do it from the shell:
```bash
git clone [email protected]:<your-name>/<repo>.git
cd <repo>
```
A fork is a _static_ copy of the repo: once you've created it, GitHub does nothing to keep it in sync with the upstream repo. This is a problem because while you're working on a pull request, changes might occur in the original repo. To keep the forked and the original repo in sync, start by telling your repo about the upstream repo:
```bash
git remote add upstream [email protected]:<original-name>/<repo>.git
git fetch upstream
```
Then you can merge changes from the upstream repo to your local copy:
```bash
git merge upstream/master
```
When working on a forked repo, I recommend that you don't work on the master branch. Because you're not really working on the main line of development for that repo, using your master branch makes things confusing.
If you always create pull requests in branches, you can make it a little easier to keep your local repo in sync with the upstream repo by running:
```bash
git branch -u upstream/master
```
Then you can update your local repo with the following code:
```bash
git checkout master
git pull
```
Changes may occur while you're working on the pull request, so remember to merge them into your branch with:
```bash
git checkout <my-branch>
git merge master
```
A pull request (PR) is a one-to-one mapping to a branch, so you can also use this technique to make updates based on the pull request discussion. Don't create a new pull request each time you make a change; instead you just need to push the branch that the PR is based on and the PR webpage will automatically update.
The diagram below illustrates the main steps of creating a pull request and updating the request as the upstream repo changes:
```{r, echo = FALSE}
knitr::include_graphics("diagrams/pull-request-process.png", dpi = 220)
```
## Reviewing and accepting pull requests {#pr-accept}
As your package gets more popular, you're likely to receive pull requests. Receiving a pull request is fantastic. Someone not only cares about your package enough to use it, they've actually read the source code and made an improvement!
When you receive a pull request, I recommend reviewing it using the three step approach described by Sarah Sharp. I summarise the three phases below, but I highly recommend reading the full article at <http://sarah.thesharps.us/2014/09/01/the-gentle-art-of-patch-review/>:
1. Is it a good idea? If you don't think the contribution is a good fit for
your project, it's polite to let the contributor know as quickly as
possible. Thank them for their work, and refocus them on a better area to
work on.
1. Is the overall approach sound? At this point you want to focus on the big
picture: have they modified the right functions in the right way? Avoid
nitpicking minor style problems (that's the final phase); instead just
provide a pointer to your style preferences, e.g.
<http://r-pkgs.had.co.nz/style.html>.
1. Is it polished? In the final review phase, make sure that the non-code
parts of the PR are polished. Prompt the contributor to update the
documentation, point out spelling mistakes and suggest better wording.
I recommend asking the contributor to include a bullet point in `NEWS.md`,
briefly describing the improvement and thanking themselves with their GitHub
username. More details to follow in [post release](#post-release)).
After discussion is complete, you can incorporate the changes by clicking the merge button. If the button doesn't work, GitHub provides some instructions on how to do it from the command line. While you've seen all the pieces before, it's useful to read through this just so you understand what exactly is happening.
```bash
# Create a new branch, and sync it with the pull request
git checkout -b <branch> master
git pull https://github.com/<user>/<branch>.git patch-3
# Merge the changes into the main line of development
git checkout master
git merge --no-ff <branch>
# Resolve conflicts, stage and add.
# Sync your local changes with GitHub
git push origin master
```
## Learning more {#git-learning}
Git and GitHub are a rich and powerful set of tools, and there's no way this chapter has taught you everything you need to know. However, you should now have the basic knowledge to be effective, and you should be in a good position to learn more. Some good resources are:
* GitHub help, <https://help.github.com>, not only teaches you about
GitHub, but also has good tutorials on many Git features.
* If you'd like to learn more about the details of Git, read
[Pro Git](http://git-scm.com/book/en/v2) by Scott Chacon and Ben Straub.
Finally, StackOverflow is a vital part of Git - when you have a problem that you don't know how to solve, StackOverflow should be your first resource. It's highly likely that someone has already had the same exact problem as you, and that there will be a variety of approaches and solutions to choose from.
[gh-issues]:https://guides.github.com/features/issues/
[gh-pr]: https://help.github.com/articles/using-pull-requests/
[gh-releases]: https://help.github.com/articles/about-releases/