Note
|
this section has been rewritten, using OASIS metastandards repository as a starting place. If you have anything to add or review, please comment. |
Details
- Explanations
-
Design roles focused on the functionality provided, not the software implementation.
- Rationale
-
Try to design roles focused on the functionality, not on the software implementation behind it. This will help abstracting differences between different providers, and help the user to focus on the functionality, not on technical details.
- Examples
-
For an example, designing a role to implement an NTP configuration on a server would be a role. The role internally would have the logic to decide whether to use ntpd, chronyd, and the ntp site configurations. However, when the underlying implementations become too divergent, for example implementing an email server with postfix or sendmail, then separate roles are encouraged.
Design roles to accomplish a specific, guaranteed outcome and limit the scope of the role to that outcome. This will help abstracting differences between different providers (see above), and help the user to focus on the functionality, not on technical details.
-
Design the interface focused on the functionality, not on the software implementation behind it.
Details
- Explanations
-
Limit the consumer’s need to understand specific implementation details about a collection, the role names within, and the file names. Presenting the collection as a low-code/no-code "automation application" provides the developer flexibility as the content grows and matures, and limits change the consumer may have to make in a later version.
- Examples
-
-
In the first example, the
mycollection.run
role has been designed to be an entry-point for the execution of multiple roles.mycollection.run
provides a standard interface for the user, without exposing the details of the underlying automations. The implementation details ofthing_1
andthing_2
may be changed by the developer without impacting the user as long as the interface ofmycollection.run
does not change.
-
Do this:- hosts: all gather_facts: false tasks: - name: Perform several actions include_role: mycollection.run vars: actions: - name: thing_1 vars: thing_1_var_1: 1 thing_1_var_2: 2 - name: thing_2 vars: thing_2_var_1: 1 thing_2_var_2: 2
-
In this example, the user must maintain awareness of the structure of the
thing_1
andthing_2
roles, and the order of operations necessary for these roles to be used together. If the implementation of these roles is changed, the user will need to modify their playbook. .Don’t do this:
- hosts: all gather_facts: false tasks: - name: Do thing_1 include_role: name: mycollection.thing_1 tasks_from: thing_1.yaml vars: thing_1_var_1: 1 thing_1_var_2: 2 - name: Do thing_2 include_role: name: mycollection.thing_2 tasks_from: thing_2.yaml vars: thing_2_var_1: 1 thing_2_var_2: 2
-
Place content common to multiple roles in a single reusable role, "common" is a typical name for this role when roles are packaged in a collection. Author loosely coupled, hierarchical content.
Details
- Explanations
-
Roles that have hard dependencies on external roles or variables have limited flexibility and increased risk that changes to the dependency will result in unexpected behavior or failures. Coupling describes the degree of dependency between roles and variables that need to act in coordination. Hierarchical content is an architectural approach to designing your content where individual roles have parent-child relationships in an overall tree structure
Details
- Explanations
-
New roles should be initiated in line, with the skeleton directory, which has standard boilerplate code for a Galaxy-compatible Ansible role and some enforcement around these standards
- Rationale
-
A consistent file tree structure will help drive consistency and reusability across the entire environment.
Details
- Explanations
-
Use semantic versioning for Git release tags. Use 0.y.z before the role is declared stable (interface-wise).
- Rationale
-
There are some restrictions for Ansible Galaxy and Automation Hub. The versioning must be in strict X.Y.Z[ab][W] format, where X, Y, and Z are integers.
Details
-
All defaults and all arguments to a role should have a name that begins with the role name to help avoid collision with other names. Avoid names like
packages
in favor of a name likefoo_packages
.- Rationale
-
Ansible has no namespaces, doing so reduces the potential for conflicts and makes clear what role a given variable belongs to.)
-
Same argument applies for modules provided in the roles, they also need a
$ROLENAME_
prefix:foo_module
. While they are usually implementation details and not intended for direct use in playbooks, the unfortunate fact is that importing a role makes them available to the rest of the playbook and therefore creates opportunities for name collisions. -
Moreover, internal variables (those that are not expected to be set by users) are to be prefixed by two underscores:
__foo_variable
.- Rationale
-
role variables, registered variables, custom facts are usually intended to be local to the role, but in reality are not local to the role - as such a concept does not exist, and pollute the global namespace. Using the name of the role reduces the potential for name conflicts and using the underscores clearly marks the variables as internals and not part of the common interface. The two underscores convention has prior art in some popular roles like geerlingguy.ansible-role-apache). This includes variables set by set_fact and register, because they persist in the namespace after the role has finished!
-
Prefix all tags within a role with the role name or, alternatively, a "unique enough" but descriptive prefix.
-
Do not use dashes in role names. This will cause issues with collections.
Details
When there are multiple implementations of the same functionality, we call them “providers”.
A role supporting multiple providers should have an input variable called $ROLENAME_provider
.
If this variable is not defined, the role should detect the currently running provider on the system, and respect it.
- Rationale
-
users can be surprised if the role changes the provider if they are running one already. If there is no provider currently running, the role should select one according to the OS version.
- Example
-
on RHEL 7, chrony should be selected as the provider of time synchronization, unless there is ntpd already running on the system, or user requests it specifically. Chrony should be chosen on RHEL 8 as well, because it is the only provider available.
The role should set a variable or custom fact called $ROLENAME_provider_os_default
to the appropriate default value for the given OS version.
- Rationale
-
users may want to set all their managed systems to a consistent state, regardless of the provider that has been used previously. Setting
$ROLENAME_provider
would achieve it, but is suboptimal, because it requires selecting the appropriate value by the user, and if the user has multiple system versions managed by a single playbook, a common value supported by all of them may not even exist. Moreover, after a major upgrade of their systems, it may force the users to change their playbooks to change their$ROLENAME_provider
setting, if the previous value is not supported anymore. Exporting$ROLENAME_provider_os_default
allows the users to set$ROLENAME_provider: "{{ $ROLENAME_provider_os_default }}"
(thanks to the lazy variable evaluation in Ansible) and thus get a consistent setting for all the systems of the given OS version without having to decide what the actual value is - the decision is delegated to the role).
Details
- Explanations
-
Avoid testing for distribution and version in tasks. Rather add a variable file to "vars/" for each supported distribution and version with the variables that need to change according to the distribution and version.
- Rationale
-
This way it is easy to add support to a new distribution by simply dropping a new file in to "vars/", see below Supporting multiple distributions and versions. See also Vars vs Defaults which mandates "Avoid embedding large lists or 'magic values' directly into the playbook." Since distribution-specific values are kind of "magic values", it applies to them. The same logic applies for providers: a role can load a provider-specific variable file, include a provider-specific task file, or both, as needed. Consider making paths to templates internal variables if you need different templates for different distributions.
Details
- Rationale
-
Packaging roles as a collection allows you to distribute many roles in a single cohesive unit of re-usable automation. Inside a collection, you can share custom plugins across all roles in the collection instead of duplicating them in each role’s
library/
directory. Collections give your roles a namespace, which removes the potential for naming collisions when developing new roles. - Example
-
See the Ansible documentation on migrating roles to collections for details.
Details
-
The role should work in check mode, meaning that first of all, they should not fail check mode, and they should also not report changes when there are no changes to be done. If it is not possible to support it, please state the fact and provide justification in the documentation. This applies to the first run of the role.
-
Reporting changes properly is related to the other requirement: idempotency. Roles should not perform changes when applied a second time to the same system with the same parameters, and it should not report that changes have been done if they have not been done. Due to this, using
command:
is problematic, as it always reports changes. Therefore, override the result by usingchanged_when:
-
Concerning check mode, one usual obstacle to supporting it are registered variables. If there is a task which registers a variable and this task does not get executed (e.g. because it is a
command:
or another task which is not properly idempotent), the variable will not get registered and further accesses to it will fail (or worse, use the previous value, if the role has been applied before in the play, because variables are global and there is no way to unregister them). To fix, either use a properly idempotent module to obtain the information (e.g. instead of usingcommand: cat
to read file into a registered variable, useslurp
and apply.content|b64decode
to the result like here), or apply propercheck_mode:
andchanged_when:
attributes to the task. more_info. -
Another problem are commands that you need to execute to make changes. In check mode, you need to test for changes without actually applying them. If the command has some kind of "--dry-run" flag to enable executing without making actual changes, use it in check_mode (use the variable
ansible_check_mode
to determine whether we are in check mode). But you then need to setchanged_when:
according to the command status or output to indicate changes. See (https://github.com/linux-system-roles/selinux/pull/38/files#diff-2444ad0870f91f17ca6c2a5e96b26823L101) for an example. -
Another problem is using commands that get installed during the install phase, which is skipped in check mode. This will make check mode fail if the role has not been executed before (and the packages are not there), but does the right thing if check mode is executed after normal mode.
-
To view reasoning for supporting why check mode in first execution may not be worthwhile: see here. If this is to be supported, see hhaniel’s proposal, which seems to properly guard even against such cases.
Details
- Explanations
-
Reporting changes properly is related to the other requirement: idempotency. Roles should not perform changes when applied a second time to the same system with the same parameters, and it should not report that changes have been done if they have not been done. Due to this, using
command:
is problematic, as it always reports changes. Therefore, override the result by usingchanged_when:
- Rationale
-
Additional automation or other integrations, such as with external ticketing systems, should rely on the idempotence of the ansible role to report changes accurately
Details
- Use Cases
-
-
The role developer needs to be able to set role variables to different values depending on the OS platform and version. For example, if the name of a service is different between EL8 and EL9, or a config file location is different.
-
The role developer needs to handle the case where the user specifies
gather_facts: false
in the playbook. -
The role developer needs to access the platform specific vars in role integration tests without making a copy.
-
Note
|
The recommended solution below requires at least some ansible_facts to be defined, and so relies on gathering some facts.
If you just want to ensure the user always uses gather_facts: true , and do not want to handle this in the role, then the role documentation should state that gather_facts: true or setup: is required in order to use the role, and the role should use fail: with a descriptive error message if the necessary facts are not defined.
|
If it is desirable to use roles that require facts, but fact gathering is expensive, consider using a cache plugin List of Cache Plugins, and also consider running a periodic job on the controller to refresh the cache.
Details
- Explanations
-
You normally use
vars/main.yml
(automatically included) to set variables used by your role. If some variables need to be parameterized according to distribution and version (name of packages, configuration file paths, names of services), use this in the beginning of yourtasks/main.yml
: - Examples
- name: Ensure ansible_facts used by role
setup:
gather_subset: min
when: not ansible_facts.keys() | list |
intersect(__rolename_required_facts) == __rolename_required_facts
- name: Set platform/version specific variables
include_vars: "{{ __rolename_vars_file }}"
loop:
- "{{ ansible_facts['os_family'] }}.yml"
- "{{ ansible_facts['distribution'] }}.yml"
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
vars:
__rolename_vars_file: "{{ role_path }}/vars/{{ item }}"
when: __rolename_vars_file is file
-
Add this as the first task in
tasks/main.yml
:- name: Set platform/version specific variables include_tasks: tasks/set_vars.yml
-
Add files to
vars/
for the required OS platforms and versions.
The files in the loop
are in order from least specific to most specific:
-
os_family
covers a group of closely related platforms (e.g.RedHat
covers RHEL, CentOS, Fedora) -
distribution
(e.g.Fedora
) is more specific thanos_family
-
distribution
_distribution_major_version
(e.g.RedHat_8
) is more specific thandistribution
-
distribution
_distribution_version
(e.g.RedHat_8.3
) is the most specific
See Commonly Used Facts for an explanation of the facts and their common values.
Each file in the loop
list will allow you to add or override variables to specialize the values for platform and/or version.
Using the when: item is file
test means that you do not have to provide all of the vars/
files, only the ones you need.
For example, if every platform except Fedora uses srv_name
for the service name, you can define myrole_service: srv_name
in vars/main.yml
then define myrole_service: srv2_name
in vars/Fedora.yml
.
In cases where this would lead to duplicate vars files for similar distributions (e.g. CentOS 7 and RHEL 7), use symlinks to avoid the duplication.
Note
|
With this setup, files can be loaded twice.
For example, on Fedora, the distribution_major_version is the same as distribution_version so the file vars/Fedora_31.yml will be loaded twice if you are managing a Fedora 31 host.
If distribution is RedHat then os_family will also be RedHat , and vars/RedHat.yml will be loaded twice.
This is usually not a problem - you will be replacing the variable with the same value, and the performance hit is negligible.
If this is a problem, construct the file list as a list variable, and filter the variable passed to loop using the unique filter (which preserves the order):
|
- name: Set vars file list
set_fact:
__rolename_vars_file_list:
- "{{ ansible_facts['os_family'] }}.yml"
- "{{ ansible_facts['distribution'] }}.yml"
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
- name: Set platform/version specific variables
include_vars: "{{ __rolename_vars_file }}"
loop: "{{ __rolename_vars_file_list | unique | list }}"
vars:
__rolename_vars_file: "{{ role_path }}/vars/{{ item }}"
when: __rolename_vars_file is file
Or define your __rolename_vars_file_list
in your vars/main.yml
.
The task Ensure ansible_facts used by role
handles the case where the user specifies gather_facts: false
in the playbook.
It gathers only the facts required by the role.
The role developer may need to add additional facts to the list, and use a different gather_subset
.
See Setup Module for more information.
Gathering facts can be expensive, so gather only the facts required by the role.
Using a separate task file for tasks/set_vars.yml
allows role integration tests to access the internal variables.
For example, if the role developer wants to pre-populate a VM with the packages used by the role, the following tasks can be used:
- hosts: all
tasks:
- name: Set platform/version specific variables
include_role:
name: my.fqcn.rolename
tasks_from: set_vars.yml
public: true
- name: Install test packages
package:
name: "{{ __rolename_packages }}"
state: present
In this way, the role developer does not have to copy and maintain a separate list of role packages.
Details
Platform specific tasks, however, are different.
You probably want to perform platform specific tasks once, for the most specific match.
In that case, use lookup('first_found')
with the file list in order of most specific to least specific, including a "default":
- name: Perform platform/version specific tasks
include_tasks: "{{ lookup('first_found', __rolename_ff_params) }}"
vars:
__rolename_ff_params:
files:
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_major_version'] }}.yml"
- "{{ ansible_facts['distribution'] }}.yml"
- "{{ ansible_facts['os_family'] }}.yml"
- "default.yml"
paths:
- "{{ role_path }}/tasks/setup"
Then you would provide tasks/setup/default.yml
to do the generic setup, and e.g. tasks/setup/Fedora.yml
to do the Fedora specific setup.
The tasks/setup/default.yml
is required in order to use lookup('first_found')
, which will give an error if no file is found.
If you want to have the "use first file found" semantics, but do not want to have to provide a default file, add skip: true
:
- name: Perform platform/version specific tasks
include_tasks: "{{ lookup('first_found', __rolename_ff_params) }}"
vars:
__rolename_ff_params:
files:
- "{{ ansible_facts['distribution'] }}_{{ ansible_facts['distribution_version'] }}.yml"
- "{{ ansible_facts['os_family'] }}.yml"
paths:
- "{{ role_path }}/tasks/setup"
skip: true
NOTE:
-
Use
include_tasks
orinclude_vars
withlookup('first_found')
instead ofwith_first_found
.loop
is not needed - the include forms take a string or a list directly. -
Always specify the explicit, absolute path to the files to be included, using
{{ role_path }}/vars
or{{ role_path }}/tasks
, when using these idioms. See below "Ansible Best Practices" for more information. -
Use the
ansible_facts['name']
bracket notation rather than theansible_facts.name
oransible_name
form. For example, useansible_facts['distribution']
instead ofansible_distribution
oransible.distribution
. Theansible_name
form relies on fact injection, which can break if there is already a fact of that name. Also, the bracket notation is what is used in Ansible documentation such as Commonly Used Facts and Operating System and Distribution Variance.
Details
Use a task file per provider and include it from the main task file, like this example from storage:
- name: include the appropriate provider tasks
include_tasks: "main_{{ storage_provider }}.yml"
The same process should be used for variables (not defaults, as defaults can
not be loaded according to a variable).
You should guarantee that a file exists for each provider supported, or use an explicit, absolute path using role_path
.
See below "Ansible Best Practices" for more information.
Details
-
Add
{{ ansible_managed | comment }}
at the top of the template file file to indicate that the file is managed by Ansible roles, while making sure that multi-line values are properly commented. For more information, see Adding comments to files. -
When commenting, don’t include anything like "Last modified: {{ date }}". This would change the file at every application of the role, even if it doesn’t need to be changed for other reasons, and thus break proper change reporting.
-
Use standard module parameters for backups, keep it on unconditionally (
backup: true
), until there is a user request to have it configurable. -
Make prominently clear in the HOWTO (at the top) what settings/configuration files are replaced by the role instead of just modified.
-
Use
{{ role_path }}/subdir/
as the filename prefix when including files if the name has a variable in it.- Rationale
-
your role may be included by another role, and if you specify a relative path, the file could be found in the including role. For example, if you have something like
include_vars: "{{ ansible_facts['distribution'] }}.yml"
and you do not provide every possiblevars/{{ ansible_facts['distribution'] }}.yml
in your role, Ansible will look in the including role for this file. Instead, to ensure that only your role will be referenced, useinclude_vars: "{{role_path}}/vars/{{ ansible_facts['distribution'] }}.yml"
. Same with other file based includes such asinclude_tasks
. See Ansible Developer Guide » Ansible architecture » The Ansible Search Path for more information.
Details
-
Avoid embedding large lists or "magic values" directly into the playbook. Such static lists should be placed into the
vars/main.yml
file and named appropriately -
Every argument accepted from outside of the role should be given a default value in
defaults/main.yml
. This allows a single place for users to look to see what inputs are expected. Document these variables in the role’s README.md file copiously -
Use the
defaults/main.yml
file in order to avoid use of the default Jinja2 filter within a playbook. Using the default filter is fine for optional keys on a dictionary, but the variable itself should be defined indefaults/main.yml
so that it can have documentation written about it there and so that all arguments can easily be located and identified. -
Don’t define defaults in
defaults/main.yml
if there is no meaningful default. It is better to have the role fail if the variable isn’t defined than have it do something dangerously wrong. Still do add the variable todefaults/main.yml
but commented out, so that there is one single source of input variables. -
Avoid giving default values in
vars/main.yml
as such values are very high in the precedence order and are difficult for users and consumers of a role to override. -
As an example, if a role requires a large number of packages to install, but could also accept a list of additional packages, then the required packages should be placed in
vars/main.yml
with a name such asfoo_packages
, and the extra packages should be passed in a variable namedfoo_extra_packages
, which should default to an empty array indefaults/main.yml
and be documented as such.
Details
- Rationale
-
The documentation is essential for the success of the content. Place the README file in the root directory of the role. The README file exists to introduce the user to the purpose of the role and any important information on how to use it, such as credentials that are required.
At minimum include:
-
Example playbooks that allow users to understand how the developed content works are also part of the documentation.
-
The inbound and outbound role argument specifications
-
List of user-facing capabilities within the role
-
The unit of automation the role provides
-
The outcome of the role
-
The roll-back capabilities of the role
-
Designation of the role as idempotent (True/False)
-
Designation of the role as atomic if applicable (True/False)
Details
- Explanations
-
It is relatively common to use (inventory) group names in roles:
-
either to loop through the hosts in the group, generally in a cluster context
-
or to validate that a host is in a specific group
Instead, store the host name(s) in a (list) variable, or at least make the group name a parameter of your role. You can always set the variable at group level to avoid repetitions.
-
- Rationale
-
Groups are a feature of the data in your inventory, meaning that you mingle data with code when you use those groups in your code. Rely on the inventory-parsing process to provide your code with the variables it needs instead of enforcing a specific structure of the inventory. Not all inventory sources are flexible enough to provide exactly the expected group name. Even more importantly, in a cluster context for example, if the group name is fixed, you can’t describe (and hence automate) more than one cluster in each inventory. You can’t possibly have multiple groups with the same name in the same inventory. On the other hand, variables can have any kind of value for each host, so that you can have as many clusters as you want.
- Examples
-
Assuming we have the following inventory (not according to recommended practices for sake of simplicity):
An inventory with two clusterslink:dont_use_groups/inventory[role=include]
We can then use one of the following three approaches in our role (here as playbook, again for sake of simplicity):
A playbook showing how to loop through a grouplink:dont_use_groups/playbook.yml[role=include]
The first approach is probably best to create a cluster configuration file listing all cluster’s hosts. The other approaches are good to make sure each action is performed only once, but this comes at the price of many skips. The second one fails if the first host isn’t reachable (which might be what you’d want anyway), and the last one has the best chance to be executed once and only once, even if some hosts aren’t available.
Tipthe variable cluster_group_name
could have a default group name value in your role, of course properly documented, for simple use cases.Overall, it is best to avoid this kind of constructs if the use case permits, as they are clumsy.
Details
- Explanation
-
It is a common practice to have
tasks/main.yml
file including other tasks files, which we’ll call sub-tasks files. Make sure that the tasks' names in these sub-tasks files are prefixed with a shortcut reminding of the sub-tasks file’s name. - Rationale
-
Especially in a complex role with multiple (sub-)tasks file, it becomes difficult to understand which task belongs to which file. Adding a prefix, in combination with the role’s name automatically added by Ansible, makes it a lot easier to follow and troubleshoot a role play.
- Examples
-
In a role with one
tasks/main.yml
task file, includingtasks/sub.yml
, the tasks in this last file would be named as follows:A prefixed task in a sub-tasks file- name: sub | Some task description mytask: [...]
The log output will then look something like
TASK [myrole : sub | Some task description] **
, which makes it very clear where the task is coming from.Tipwith a verbosity of 2 or more, ansible-playbook will show the full path to the task file, but this generally means that you need to restart the play in a higher verbosity to get the information you could have had readily available.
Details
- Explanation
-
Starting from ansible version 2.11, an option is available to activate argument validation for roles by utilizing an argument specification. When this specification is established, a task is introduced at the onset of role execution to validate the parameters provided for the role according to the defined specification. If the parameters do not pass the validation, the role execution will terminate.
- Rationale
-
Argument validation significantly contributes to the stability and reliability of the automation. It also makes the playbook using the role fail fast instead of failing later when an incorrect variable is utilized. By ensuring roles receive accurate input data and mitigating common issues, we can enhance the effectiveness of the Ansible playbooks using the roles.
- Examples
-
The specification is defined in the meta/argument_specs.yml. For more details on how to write the specification, refer to https://docs.ansible.com/ansible/latest/playbook_guide/playbooks_reuse_roles.html#specification-format.
Argument Specification file that validates the arguments provided to the role.argument_specs: main: short_description: Role description. options: string_arg1: description: string argument description. type: "str" default: "x" choices: ["x", "y"] dict_arg1: description: dict argument description. type: dict required: True options: key1: description: key1 description. type: int key2: description: key2 description. type: str key3: description: key3 description. type: dict
Details
Links that contain additional standardization information that provide context, inspiration or contrast to the standards described above.
-
https://github.com/debops/debops/blob/v0.7.2/docs/debops-policy/code-standards-policy.rst). For inspiration, as the DebOps project has some specific guidance that we do not necessarily want to follow.
-
https://docs.openstack.org/openstack-ansible/latest/contributor/code-rules.html