-
Notifications
You must be signed in to change notification settings - Fork 0
/
intro_dependencies.qmd
70 lines (40 loc) · 6.55 KB
/
intro_dependencies.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
---
title: "Software Dependencies"
---
# Dependencies Overview
The general concept of software dependencies is relatively straightforward: "dependencies" are other softwares/programs that the software you're using *depends* on to function. If the dependencies are not installed, dependent software will either be missing features, not work as expected, or possibly not work at all.
We make an arbitrary distinction between "system dependencies" and "package dependencies" for the purposes of this tutorial, but there is principally no difference between the two.The distinction is made to clarify that system dependencies are not R packages, but rather are other software programs that are necessary for the project to run. Package dependencies are R packages that are necessary for the project to run.
## Package Dependencies
Package dependencies are R packages that are necessary for the project to run. For example, if you are using the `dplyr` package in your project, you will need to have the `dplyr` package installed on your system. However, `dplyr` itself depends on other R packages such as `tibble`, `rlang`, and `vctrs`. And `tibble`, `rlang`, and `vctrs` have their dependencies too.
This process works recursively until all unique packages are identified, and a project that appears to only use e.g. 5-10 packages can ultimately require a few dozen or hundred packages. This is not an R-specific problem, but rather a common issue in software development known as "dependency hell."[^1]
[^1]: If you have enough time, read about how the [left-pad incident](https://qz.com/646467/how-one-programmer-broke-the-internet-by-deleting-a-tiny-piece-of-code) broke the internet as an example of how dependencies can go wrong.
:::{.callout-note collapse="true" title="Finding Dependencies for R Packages"}
If you're ever curious about the dependencies of a package, they should all be declared in the `DESCRIPTION` file of an R package. There are a variety of ways you can access this information:
- Methods for reading the `DESCRIPTION` file of an R package:
- Directly on [CRAN by finding the package](https://cran.r-project.org/web/packages/available_packages_by_name.html) by name
- `utils::packageDescription()` to print the complete `DESCRIPTION` file of a package to the console
- `tools::package_dependencies()` for just a list of dependencies from the `DESCRIPTION` file
- Looking at the `DESCRIPTION` file online on the package's CRAN or GitHub page
- `pak::pkg_deps_tree()` for a visual representation of the direct dependencies **and** the recursive dependencies. An example of this will be shown in the [Starting Details](starting_details.qmd) chapter.
:::
## System Dependencies
Systems dependencies are not R packages, but rather are other software programs that are necessary for the project to run. For example, if you are using R to run a Python script, you will need to have Python installed on your system. Or if you are using R to run a C++ script, you will need to have a C++ compiler installed on your system. These system dependencies can be more difficult to manage than package dependencies, as they are not always explicitly declared in the project files.
# Intro to Management Practices
Given the potential complexity of software dependencies, it is important to manage these dependencies in a way that is both efficient and reproducible. This is especially important when sharing code with others or when working on a project over a long period of time so that you can easily recreate the environment in which the code was originally run.
You might already have a system in place for managing your dependencies, such as writing a document outlining installation procedures or you maintain a list of project packages, you might already use an automated package manager, or you might not have a system at all. Regardless of what you currently use, the manual and automated approaches are discussed below, and we clarify why you should use an automated approach for managing dependencies going forward.
## Manual management
As noted, the dependencies for your project can be managed manually with e.g. a written list of the softwares and versions used. The main pros to this approach is that little technical knowledge is needed to create a simple Word or text file with this information, and such information can likewise be shared easily. However, the cons are that this approach is inexact, error-prone, still requires another user to manually download each package by hand, and the recursive dependencies are unlikely to be logged. This approach is not recommended for any serious project, especially if you are working with others or if you are working on a project over a long period of time.
## Automated Management
Automated management of dependencies is the recommended approach for managing dependencies. This approach involves using a package manager to automatically install the necessary software dependencies for a project. The main pros to this approach are that it is more accurate, less error-prone, and can be easily shared with others. The cons are that it requires some technical knowledge to set up and can be more difficult to troubleshoot if something goes wrong.
There are many package managers available for different programming languages, and the choice of package manager can depend on the programming language you are using, the complexity of your project, and your personal preferences. Some common package managers for R and Python are listed below:
::: panel-tabset
## R
The major current and historical options for managing R dependencies are:
1. **{renv}** is arguably the most important package for managing R dependencies, is maintained by Posit, and is our recommendation
2. **{groundhog}** is a package that is similar to {renv}, and developed by independent, open-source contributors
3. **{packrat}** is a deprecated package that is no longer recommended for new projects
## Python
If you're coming from a background in Python, you might be familiar with the following analogous approaches:
The pip, pipenv, and poetry packages can, generally speaking, be used for package management and for declaring necessary Python versions in the respective `requirements.txt`, `Pipfile`, and `pyproject.toml` files. However, one would usually use this in conjunction with software like Anaconda or Miniconda to manage Python versions and environments.
:::
The next chapter delves into some necessary [Technical Definitions](./technical_definitions.html) in the R ecosystem, the following chapter will really dive in to the details of [R Package Managers](./dependencies_in_r.html).