otent.
- When using
command or shell, always pair with creates, removes, or register variables to check state.
- Utilize
changed_when and failed_when to explicitly define task outcomes based on command output.
Code Example:
# tasks/main.yml
- name: Check if application binary exists
ansible.builtin.stat:
path: /opt/myapp/bin/myapp
register: app_binary
- name: Compile application from source
ansible.builtin.command:
cmd: make install
chdir: /tmp/myapp-src
creates: /opt/myapp/bin/myapp
register: compile_result
- name: Restart service only if binary changed
ansible.builtin.systemd:
name: myapp
state: restarted
when: compile_result.changed
2. Asynchronous Orchestration Pattern
Long-running tasks block the control node and degrade performance. The Asynchronous Orchestration pattern offloads tasks to managed nodes, allowing the control node to manage other hosts or poll for completion.
Implementation Strategy:
- Use
async and poll parameters for tasks exceeding 10 seconds.
- Set
poll: 0 for fire-and-forget tasks, then use async_status to check progress.
- Combine with
until loops for robust retry logic.
Code Example:
# tasks/deploy.yml
- name: Start long-running database migration
ansible.builtin.command: /usr/local/bin/db-migrate.sh
async: 3600
poll: 0
register: migration_task
- name: Wait for migration to complete
ansible.builtin.async_status:
jid: "{{ migration_task.ansible_job_id }}"
register: job_result
until: job_result.finished
retries: 60
delay: 10
3. Dynamic Inventory with Caching
Static inventory files become bottlenecks in dynamic cloud environments. The Dynamic Inventory pattern leverages plugins to fetch host data from cloud providers while caching results to minimize API calls.
Implementation Strategy:
- Configure
ansible.cfg to use inventory plugins (e.g., aws_ec2, azure_rm).
- Enable caching with a reasonable TTL to balance freshness and API rate limits.
- Use
compose to derive Ansible variables from cloud metadata.
Configuration Snippet (ansible.cfg):
[defaults]
inventory = ./inventories/aws_ec2.yml
cache_plugin = jsonfile
cache_plugin_connection = ~/.ansible/cache
cache_plugin_timeout = 3600
[inventory]
enable_plugins = aws_ec2
Inventory File (inventories/aws_ec2.yml):
plugin: aws_ec2
regions:
- us-east-1
filters:
tag:Environment: production
keyed_groups:
- key: tags.Role
prefix: role
compose:
ansible_host: public_ip_address
4. Execution Environment (EE) Isolation
Dependency management is a critical failure point. The EE pattern packages Ansible, collections, and system dependencies into containerized images, ensuring deterministic execution across control nodes.
Implementation Strategy:
- Define dependencies in
execution-environment.yml.
- Build images using
ansible-builder.
- Run playbooks using
ansible-runner or Automation Controller targeting the EE image.
Execution Environment Definition:
# execution-environment.yml
version: 3
images:
base_image:
name: quay.io/ansible/ansible-runner:stable-2.15-latest
dependencies:
galaxy: requirements.yml
python: requirements.txt
system: bindep.txt
options:
container_engine: podman
build_arg_defaults:
EE_BUILDER_IMAGE: quay.io/ansible/ansible-builder:latest
5. Secret Management Integration
Hardcoding secrets is a security anti-pattern. The integration pattern ensures secrets are resolved at runtime using Ansible Vault or external secret managers.
Implementation Strategy:
- Encrypt sensitive variables using
ansible-vault.
- Use
lookup plugins to fetch secrets from HashiCorp Vault or AWS Secrets Manager during runtime.
- Implement
no_log: true on tasks handling sensitive data.
Code Example:
# tasks/configure.yml
- name: Fetch database credentials from Vault
ansible.builtin.set_fact:
db_password: "{{ lookup('hashi_vault', 'secret=data/db password') }}"
no_log: true
- name: Configure application with vault secret
ansible.builtin.template:
src: app.conf.j2
dest: /etc/myapp/app.conf
owner: root
group: root
mode: '0600'
no_log: true
Pitfall Guide
1. Misusing shell and command Modules
Mistake: Using shell for tasks that have dedicated modules (e.g., shell: apt-get install nginx).
Impact: Breaks idempotency, reduces readability, and increases failure rates.
Best Practice: Always prefer native modules. Use shell only when no module exists, and implement guards.
2. Ignoring Variable Precedence
Mistake: Defining variables in multiple locations without understanding the precedence order (e.g., role defaults vs. host vars vs. extra vars).
Impact: Unexpected configuration values and debugging nightmares.
Best Practice: Document variable sources. Use group_vars and host_vars hierarchically. Prefer set_fact for derived values over implicit overrides.
3. Overusing delegate_to
Mistake: Delegating tasks to the control node for every host, creating a bottleneck.
Impact: Control node resource exhaustion and slow execution.
Best Practice: Use run_once for tasks that need to execute once per batch. Use delegate_to sparingly for specific cross-node interactions.
4. Forgetting check_mode Support
Mistake: Writing tasks that fail or modify state when run with --check.
Impact: Inability to perform dry-runs, reducing confidence in deployments.
Best Practice: Use ansible.check_mode conditional logic. Ensure modules support check mode or implement custom logic to skip modifications during dry-runs.
5. Monolithic Inventory Files
Mistake: Maintaining a single hosts file with thousands of entries.
Impact: Slow inventory parsing, merge conflicts in version control, and lack of segmentation.
Best Practice: Split inventory into environment-specific directories. Use dynamic inventory plugins. Group hosts logically by function and region.
6. Dependency Hell in Roles
Mistake: Roles with implicit dependencies or version conflicts.
Impact: Playbook failures when roles are reused across projects.
Best Practice: Explicitly declare dependencies in meta/main.yml. Version-pin dependencies. Use Collections to namespace roles and avoid collisions.
7. Lack of Testing Strategy
Mistake: Deploying playbooks without automated testing.
Impact: Regression bugs and configuration drift in production.
Best Practice: Implement Molecule for role testing. Use ansible-lint in CI pipelines. Test against multiple OS versions and configurations.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| Small Team / MVP | Modular Roles with Vault | Balances structure with speed; Vault secures secrets without external dependencies. | Low (Developer time) |
| Multi-Cloud Enterprise | Dynamic Inventory + EE | Handles dynamic host discovery; EE ensures consistent execution across diverse environments. | Medium (Infrastructure setup) |
| High Compliance | Collections + Automation Controller | Collections provide versioned, auditable content; Controller offers RBAC and audit logging. | High (License + Setup) |
| CI/CD Integration | ansible-runner + GitOps | Enables lightweight execution in pipelines; supports ephemeral environments. | Low (Tooling integration) |
| Legacy Refactor | Incremental Role Extraction | Reduces risk by refactoring piecemeal; allows parallel execution improvements. | Medium (Engineering effort) |
Configuration Template
ansible.cfg for Production:
[defaults]
inventory = ./inventories
roles_path = ./roles
collections_paths = ./collections
remote_tmp = /tmp/.ansible-${USER}/tmp
local_tmp = /tmp/.ansible-${USER}/tmp
forks = 50
timeout = 30
log_path = /var/log/ansible/ansible.log
vault_password_file = ~/.ansible/vault_pass
fact_caching = jsonfile
fact_caching_connection = /tmp/ansible_facts_cache
fact_caching_timeout = 3600
[privilege_escalation]
become = True
become_method = sudo
become_user = root
become_ask_pass = False
[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path_dir = /tmp/.ansible/cp
Role Directory Structure:
roles/
βββ myapp/
βββ tasks/
β βββ main.yml
β βββ install.yml
βββ handlers/
β βββ main.yml
βββ templates/
β βββ app.conf.j2
βββ files/
β βββ binary.tar.gz
βββ vars/
β βββ main.yml
βββ defaults/
β βββ main.yml
βββ meta/
β βββ main.yml
βββ tests/
βββ inventory
βββ test.yml
Quick Start Guide
-
Initialize Project:
mkdir ansible-project && cd ansible-project
ansible-galaxy init roles/myapp
touch ansible.cfg playbook.yml
-
Configure ansible.cfg:
Copy the production template above, adjusting paths and vault settings to match your environment.
-
Scaffold the Role:
Edit roles/myapp/tasks/main.yml to include modular task files. Define defaults in roles/myapp/defaults/main.yml.
-
Write the Playbook:
# playbook.yml
- hosts: all
roles:
- role: myapp
tags: ['myapp']
-
Validate and Run:
# Lint and check syntax
ansible-lint .
ansible-playbook playbook.yml --syntax-check
# Dry run
ansible-playbook playbook.yml --check --diff
# Execute
ansible-playbook playbook.yml
By adhering to these patterns and utilizing the production bundle, teams can transform Ansible from a fragile scripting tool into a robust, scalable automation platform capable of managing complex infrastructure with confidence and efficiency.