Picture this. You just finished up a change to your homelab and everything seems to be in working order. A couple hours pass by while you’re away from keyboard, and you get a message from a friend or family member. “Jellyfin isn’t working”. Well, shoot, you just tested it earlier! Turns out you flipped the wrong switch on your reverse proxy and broke off-premises connectivity. There goes your impecable uptime percentage. The horror!
Well, maybe that doesn’t really happen too often, and it’s not like you’re giving your aunt an Uptime SLA. But it would be nice to know when stuff breaks as soon as it does, and before a user notices.
A game of ping and pong
Gatus is a simple, read-only dashboard that monitors resources you define in a simple YAML configuration file. It can monitor website uptime, API endpoints, microservices, and infrastructure with HTTP/HTTPS, DNS, TCP, and ICMP protocols. It also supports sending alerts to a multitude of services, such as Discord, Telegram, and Slack. On top of that, Gatus’ “Suites” is an up-and-coming feature that enables extensive workflow testing, like authentication flows and CRUD sequences.
Those are a lot of selling points, but in all honesty, this guide will only cover the basics of Gatus configuration. More of the focus is on deployment automation with Ansible, something of which I’m eager to cover in my upcoming Komodo installation guide as well.
The beauty of using Ansible lies in the fact that you can run this playbook against any cloud VPS, as many times as you want, and you’ll always end up with the same result; Gatus monitoring your services. Have no fear of a manual restoration after Oracle Cloud shuts down your free tier instance, and easily move between providers in case you want to jump ship. With that aside, here’s what you’ll need before we get started.
You can run the Ansible playbook manually from any Linux host or Ansible docker container, but using Forgejo and Renovate gives a higher level of automation that really makes it shine. Only the first two are required for a basic setup.
- A cloud VPS with Docker (I’m using the Oracle Cloud Free Tier)
- Some external-facing services to monitor
- Forgejo and Forgejo Actions (Nick Cunningham has a great guide)
- Renovate Bot (covered in another one of Nick’s guides)
Setting up
Ensure you have the following directory structure in your homelab Git repository. The playbook install_gatus.yaml will define the steps Ansible takes to install Gatus on hosts defined in the inventory file. It will also import the configuration template, gatus_config.yaml, to said hosts. Forgejo will run the playbook via the run_gatus_playbook.yaml workflow, and Renovate Bot will watch Gatus’ version, defined in .renovaterc.json.
Ansible requires an SSH key to login to the target host. If you’re using Forgejo Actions to run the playbook, make a new SSH keypair and copy the contents of the private key into a Forgejo Secret called ANSIBLE_SSH_KEY. Copy the contents of the public key to your target host in ~/.ssh/authorized_keys.
Directory structure
mkdir -p ansible/{inventory,playbooks/templates} .forgejo/workflows
touch ansible/{inventory/inventory,playbooks/{install_gatus.yaml,templates/gatus_config.yaml}} .forgejo/workflows/run_gatus_playbook.yaml .renovaterc.json
.
├── ansible
│ ├── inventory/
│ │ └── inventory
│ └── playbooks/
│ ├── install_gatus.yaml
│ └── templates/
│ └── gatus_config.yaml
├── .forgejo/
│ └── workflows/
│ └── run_gatus_playbook.yaml
└── .renovaterc.json
inventory
Replace vps.cloud with the IP address or domain name pointing to your VPS. The value of ansible_user defines the username Ansible will attempt to login to via SSH.
[cloud]
vps.cloud ansible_user=ubuntu
Gatus Configuration
The example configuration for Gatus is just a snippet of what can be done with this software. For a broader view, I’d suggest browsing their docs.
Best practice dictates that you shouldn’t commit tokens to source code. Personally, leaking credentials for these alert endpoints is a minor inconvenience at best. Please make an informed decision on your threat level to obscure these tokens or not.
ui:
title: "Status Page"
alerting:
discord:
webhook-url: "https://discord.com/api/webhooks/**********/**********"
telegram:
token: "1234567890:AABBCCDDEEFFGGHHIIJJKKLLMMNNOOPPQQR"
id: "000000000"
endpoints:
- name: Service 1
url: https://service1.domain.tld
interval: 60s
conditions:
- "[STATUS] == 200"
alerts:
- type: discord
description: "Service 1 is down!"
send-on-resolved: true
- name: Service 2
url: https://service2.domain.tld
interval: 2min
conditions:
- "[STATUS] == 200"
alerts:
- type: telegram
send-on-resolved: true
Ansible Playbook
The following playbook is idempotent, meaning it can be run as many times as you want and always achieve the same result. It will target the hosts defined in the cloud group in the above inventory. I’ve set gather_facts to false because we’re not using any of them and it saves a bit of run time. I’ve also set the gatus_version variable, which is what Renovate Bot will watch and update.
First, the playbook ensures Gatus is stopped but ignores any errors like if Gatus is not running already. Then, it creates the required directories and plants Gatus’ configuration file there. After that, it deploys Gatus with a docker run command and checks to see if Gatus has launched successfully. To decrease the fragility of the playbook, I considered using the Docker community collection instead of the brittle shell module, but I prefer to stick to the builtin collection.
I only require alerts on failure, so I’ve disabled the external dashboard. If you want to have a status page, remove 127.0.0.1: from the docker run command and consider setting up Caddy as a reverse proxy on the VPS.
install_gatus.yaml
---
- hosts: cloud
gather_facts: false
vars:
gatus_version: "v5.29.0"
tasks:
- name: Ensure Gatus is stopped
shell: docker stop gatus
ignore_errors: true
- name: Create Gatus structure
file:
path: $HOME/docker/gatus/config
state: directory
- name: Configure Gatus
template:
src: "gatus_config.yaml"
dest: $HOME/docker/gatus/config/config.yaml
- name: Run Gatus
shell: "docker run -d --rm -p 127.0.0.1:50475:8080 --mount type=bind,source=$HOME/docker/gatus/config/config.yaml,target=/config/config.yaml --name gatus twinproduction/gatus:{{ gatus_version }}"
- name: Check Gatus
uri:
url: "http://localhost:50475/health"
status_code: 200
retries: 5
delay: 5
At this point, you have everything you need to install Gatus on a cloud node using Ansible. You can run the playbook manually from a Linux host or Ansible docker container, but if you want to automate updates, keep reading!
Forgejo Workflow
In order to update Gatus automatically, we have to pair Forgejo Actions and Renovate Bot together. Once a new version of Gatus releases, Renovate Bot will open a Pull Request to replace the old version variable with the new one. Once merged, Forgejo will notice a change and run the playbook.
Here’s a breakdown of the following workflow.
- Run the playbook on any changes to
install_gatus.yaml - Install Node.js (required for
actions/checkout) - Checkout the code
- Copy the SSH key from Forgejo Secrets to the runner
- Run the playbook from the ansible directory
It’s necessary to disable StrictHostKeyChecking when running the playbook because it will always be the container’s first time logging in via SSH to the target host. Another more secure method is to import a predefined known_hosts file to the runner.
name: Install Gatus on a target host
on:
push:
branches:
- 'main'
paths:
- '**/install_gatus.yaml'
jobs:
run-playbooks:
runs-on: your-runner # CHANGE ME
container:
image: alpine/ansible:latest
steps:
- name: Install Node.js
run: apk add --update nodejs
- name: Checkout code
uses: https://code.forgejo.org/actions/checkout@v5
- name: Setup credentials
run: |
mkdir -p $HOME/.ssh/
echo "${{ secrets.ANSIBLE_SSH_KEY }}" > $HOME/.ssh/ansible_key
chmod 700 $HOME/.ssh/ansible_key
- name: Run Ansible playbook
run: |
cd ${{ forgejo.workspace }}/ansible
ansible-playbook --private-key $HOME/.ssh/ansible_key \
-i inventory playbooks/install_gatus.yaml \
--ssh-extra-args="-o StrictHostKeyChecking=no"
Renovate Bot
Now, we can set up Renovate Bot to watch for new versions based on the version we define in the playbook variable. Using regex, matchStrings will do just that and pair it to the Docker source. Simply add this snippet to your Renovate configuration file, and Renovate Bot will start monitoring Gatus’ version.
"customManagers": [
{
"customType": "regex",
"managerFilePatterns": [
"ansible/playbooks/install_gatus.yaml"
],
"matchStrings": [
"gatus_version: [\"']v?(?<currentValue>.+?)[\"']"
],
"datasourceTemplate": "docker",
"depNameTemplate": "twinproduction/gatus"
}
]
Afterword
Funnily enough, this has been the only thing I’ve set up that I never want to see. That dastardly notification can vary from a quick minute fix to a weekend-long disaster recovery. I hope this guide has helped you set up uptime monitoring for your sites, and that you rarely see the alert in action!
