This Ansible role configures a full observability stack, including: Prometheus Operator, Grafana Agent, Grafana, Prometheus, Alertmanager, Loki, Promtail, Tempo, and OpenTelemetry Collector.
It is designed to configure VMs cloned using the machine
Terraform module, and configured with k3s Ansible role.
It integrates with the ClusterIssuer deployed by default by the cert_manager role for issuing TLS certificates.
The default Prometheus instance should be used for scraping metrics from apps running in the cluster. Prometheus Operator PodMonitor and ServiceMonitor resources should be used to configure scraping of metrics.
This role uses Cert Manager for generating certificates for the OpenTelemetry Operator.
Make sure to install the cert_manager
role first.
Variables
observability_kube_prometheus_stack_chart_version
- Version of prometheus-community/kube-prometheus-stack Helm chart to deploy. Default is45.8.1
.observability_loki_chart_version
- Version of grafana/loki Helm chart to deploy. Default is4.8.0
.observability_promtail_chart_version
- Version of grafana/promtail Helm chart to deploy. Default is6.8.1
.observability_tempo_chart_version
- Version of grafana/tempo Helm chart to deploy. Default is1.0.2
.observability_opentelemetry_operator_chart_version
- Version of OpenTelemetry Operator Helm chart to deploy. Default is0.23.0
.observability_grafana_host
- Host for Grafana. Required.observability_grafana_user
- Username of the Grafana admin. Required.observability_grafana_password
- Password of the Grafana admin. Required.observability_service_monitor_selector
- Additional labels for default Prometheus instance ServiceMonitor selector. Default is{}
.observability_pod_monitor_selector
- Additional labels for default Prometheus instance PodMonitor selector. Default is{}
.
The Ingress for Grafana is annotated to use the letsencrypt
ClusterIssuer configured by cert_manager
role.
Make sure observability_grafana_host
is in the zone (domain) the ClusterIssuer is configured for.
Example Playbook
Given a production group in the Ansible inventory, this playbook installs a single node K3s cluster and configures the monitoring setup on each host:
---
- name: Configure Kubernetes cluster
hosts: production
roles:
- role: lkummer.homelab.k3s
- role: lkummer.homelab.cert_manager
vars:
cert_manager_cloudflare_email: you@example.com
cert_manager_cloudflare_token: REDACTED
cert_manager_cloudflare_zone: example.com
- role: lkummer.homelab.prometheus
vars:
observability_grafana_host: grafana.example.com
observability_grafana_user: admin
observability_grafana_password: REDACTED
It is recommended to use Ansible Vault to encrypt secrets stored in infrastructure repositories.
The default Prometheus instance will scrape PodMonitor and ServiceMonitor resources labelled release: prometheus
.
You may set additional labels to scrape with the observability_service_monitor_selector
and observability_pod_monitor_selector
role variables.
For example, the following PodMonitor will configure the default Prometheus instance to scrape pods for metrics:
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: example-podmonitor
labels:
release: prometheus
spec:
selector:
matchLabels:
app.kubernetes.io/name: example
podMetricsEndpoints:
- port: http
path: /metrics
namespaceSelector:
matchNames:
- example