Backup/Restore
Backup/Restore
Admin path: System > Backup/Restore (view_system_backup.cfm).
console.admin
ComingCLI-onlysoon.by design.First-class Docker-aware backupBackup and restoretoolingrunis in development and isnot yet shippedin this release. Until it lands,from therecommended interim strategy is hypervisor / VM snapshots — seeRecommended interim strategybelow. Tracking issues:#219(system_backup.shDockerrefactor)host'sandshell,#220not(fromsystem_restore.shDockertherefactor).
WhyThe thisadmin console's Backup/Restore page is a notice,read-only info surface (CLI examples + a list of backups detected on disk + a link back to this doc). There are no buttons. Long-running operations + web UIs is a known footgun (page reload kills progress, browser timeouts, race conditions); the CLI is the canonical interface.
What ships in this release
Two scripts under scripts/:
system_backup.sh
Hot mode by default — zero application downtime. Uses application-native hot-backup primitives: mariadb-dump --single-transaction, slapcat, and live tar of mail tiers (Dovecot, Amavis, Postfix all use atomic-rename writes safe for live tar). Toggles occ maintenance:mode --on briefly during Nextcloud file tar to pause NC user writes (mail flow unaffected). --cold flag stops the full stack for legal-hold / forensic snapshots that need absolute byte-level consistency.
system_restore.sh
Always cold on the restore side (we're overwriting tier contents — concurrent reads/writes would corrupt). Verifies the manifest + per-archive SHA256 BEFORE any destructive action, refuses on storage-topology mismatch unless FORCE_REMAP=1 is set, restores DBs via socket auth, restores OpenLDAP via slapadd, rsyncs in-scope tiers from staging to mount paths with --delete, restarts the stack.
Backup scopes
The -B flag chooses what to back up. Pick the scope that matches your need — there's no reason to back up 500 GB of vmail every night if only the DBs and configs are churning.
system
Config tier + Data tier + 6 DB dumps + LDAP slapcat
Nightly
seconds to a few minutes (dominated by /mnt/data tar size; DB+LDAP dumps are fast)
archive
Archive tier (Amavis quarantine)
Weekly or per retention policy
proportional to archive size; mail intake continues uninterrupted
vmail
Vmail tier (Dovecot mailboxes)
Weekly
proportional to mailbox size; mail flow continues uninterrupted
nextcloud
Nextcloud tier (NC files)
Weekly
proportional to NC file size; NC web UI shows "under maintenance" during the tar; mail unaffected
all
Everything above
Periodic full-DR snapshot
sum of all of the above
Hot-mode safety per component
Why we don't need downtime:
mariadb-dump --single-transaction --routines --triggers --events --databases <db>
InnoDB MVCC gives a consistent point-in-time snapshot. No table locks. Stored procedures, triggers, and scheduled events captured.
OpenLDAP
slapcat -b dc=hermes,dc=local inside hermes_ldap
Standard hot LDIF export.
Dovecot (vmail)
tar /mnt/vmail live
maildir/sdbox writes are atomic-rename (write to temp filename, atomic mv to final name). No torn files. Worst case: messages arriving during the tar window may land after the tar's snapshot — they're durable upstream (postfix queue, sender's MX retries) and captured by the next backup.
Amavis (archive)
tar /mnt/archive live
Amavis quarantine writes are atomic-rename. Same as Dovecot.
Nextcloud (files)
tar /mnt/files live, with occ maintenance:mode --on toggled around the tar
NC writes are atomic, but the filesystem ↔ oc_filecache DB table can drift if a user uploads mid-tar. Maintenance mode pauses NC user writes — the NC web UI shows "under maintenance" briefly, but mail flow is unaffected. Use --no-nc-maintenance to skip the toggle if needed.
Postfix (data tier)
tar /mnt/data/postfix live
Postfix queue files are atomic-rename.
Service logs (data tier)
tar live
Append-only. A torn last line is cosmetic, not HermesHot shippedmode is the daily backup. Cold mode (--cold) is the escape hatch for yearsuse cases where absolute byte-level consistency matters more than uptime — legal hold, forensic snapshots, regulatory archive. Cold mode does docker compose stop for the full duration.
Backup
Backup quick start
sudo /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B system --yes
The script creates /mnt/backups/hermes-backup-system-<build_no>-<UTC-timestamp>.tar. The outer tar is uncompressed (each tier inside is already .tar.gz); operators can tar -xf it once to inspect the manifest before deciding to restore.
Output layout
Inside the outer .tar (only the archives relevant to the chosen scope are present):
backup_manifest.json ← scope, mode (hot/cold), topology, SHA256 per archive
databases.tar.gz ← 6 .sql files; system / all scopes only
ldap.ldif.gz ← slapcat output; system / all scopes only
config.tar.gz ← install root MINUS data tiers
(excludes install-logs/ and .git/);
system / all scopes only
data.tar.gz ← Data tier; system / all scopes only
(excludes mysql/ ldap/ clamav/ — captured
authoritatively by dumps / slapcat / are
regenerable)
archive.tar.gz ← Archive tier; archive / all scopes only
vmail.tar.gz ← Vmail tier; vmail / all scopes only
nextcloud.tar.gz ← Nextcloud tier; nextcloud / all scopes only
Backup flags
-P <path>
Required. Output directory. Must exist and be writable.
-B <scope>
Required. One of: system, archive, vmail, nextcloud, all.
--cold
Stop the full stack for the duration of the backup. Use for legal-hold / forensic snapshots. Default is HOT mode (zero application downtime).
--no-nc-maintenance
Skip the brief occ maintenance:mode --on that hot-mode nextcloud / all backups use to pause NC user writes during the file tar. Without it, file uploads happening mid-tar may be missed by the backup.
--yes (or -y)
Skip the interactive confirmation prompt. Use for cron / Ofelia.
--dry-run (or -n)
Print what would happen without changing anything.
--help (or -h)
Show usage.
Scheduling
For nightly automated backups, use host cron on the Docker host. system_backup.sh is a host-level script (it runs docker compose stop, reads .env from the host, writes to /mnt/backups on the host) — host cron is the natural fit. Example /etc/cron.d/hermes-backup:
# m h dom mon dow user command
0 3 * * * root /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B system --yes >> /var/log/hermes-backup.log 2>&1
0 4 * * 0 root /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B vmail --yes >> /var/log/hermes-backup.log 2>&1
0 5 1 * * root /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B all --yes >> /var/log/hermes-backup.log 2>&1
A typical cadence:
system
Small + fast. Captures DBs, LDAP, configs, install-root state. Run with hot mode = zero downtime.
Weekly
vmail (or archive or nextcloud, rotated)
Larger but slower-changing.
Monthly
all
Full disaster-recovery snapshot.
The script's exit code reflects success (0) or failure (non-zero). For built-in email alerting, use the --notify-email=ADDR flag (see below). For "Hermes is so dead it can't even tell you" cases, see External monitoring.
Why host cron and not Ofelia? Ofelia runs as a
bare-metalcontainerUbuntu(hermes_ofelia).install,Its job model (job-execinto a named container,job-localon the Ofelia container itself) doesn't fitsystem_backup.shcleanly — the script needs host-leveldocker composeaccess, root, and write access to/mnt/backups. Ofelia's image lacksdocker composeplugin and root host access. Native Ofelia integration is deliberately NOT on the roadmap; the existing System > Scheduled Tasks admin page lists Ofelia jobs but does NOT support adding new ones from the UI today.
Failure / success email alerting
Use --notify-email=ADDR to receive an email on backup completion. By default emails on failure only (the "noisy on failure, silent on success" pattern most operators want). Add --notify-on-success to also email on success — useful for "daily I-am-alive confirmation" use cases.
# Email on failure only (typical)
sudo /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B system --yes \
--notify-email=admin@example.com
# Email on both failure AND success
sudo /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B all --yes \
--notify-email=admin@example.com --notify-on-success
Subject lines are bracketed for easy scanning in a mail client:
[SUCCESS] Hermes backup on <hostname> (scope=<scope>)
Failure: [FAILURE] Hermes backup on <hostname> (scope=<scope>)
Failure bodies include the timestamp, scope, mode, reason, log file path, and the last 50 lines of the log. Success bodies include the timestamp, scope, mode, output filename, file size, and run duration.
How it works: the script shells out to docker exec -i hermes_postfix_dkim sendmail -t and pipes the message into the Postfix container's sendmail binary. Postfix queues and delivers it like any other outbound mail from Hermes. No host MTA configuration is needed — Hermes's own Postfix does the work.
Verify the path before wiring into cron — --test-notify sends one [TEST] [SUCCESS] sample and one [TEST] [FAILURE] sample to the address you give, then exits without running a backup:
sudo /opt/hermes-seg-docker-gl/scripts/system_backup.sh --test-notify \
--notify-email=admin@example.com
Both test messages have a [TEST] prefix in the subject so any ops-alert filters watching for [FAILURE] are not tripped. If both arrive, your notification path is good. If neither arrives, check hermes_postfix_dkim is running and look at the log file the script prints for sendmail errors.
Caveat — needs Hermes to be at least partially healthy: if the failure cause is "the Postfix container is down" or "the Docker daemon is down", docker exec has nothing to talk to and the email won't go out. The script logs the failure-to-notify as a warning and exits with the original non-zero status, but you won't get the email. This is the gap external monitoring fills — see below.
External monitoring (strongly recommended)
Built-in email alerting covers the "backup ran but something went wrong" case (the 99% case). It does NOT cover "Hermes itself is so broken it can't send any email at all" — Docker daemon crashed, host out of disk, container restart loop, network partition, etc. For that, you need an external monitoring tool that lives off the Hermes host and tells YOU when Hermes goes dark.
Strongly recommended for every production install. Common choices:
The healthchecks.io pattern works nicely alongside cron-based backups:
# Pings healthchecks.io on success only (curl wraps the backup; ping is the URL of your check)
0 3 * * * root /opt/.../system_backup.sh -P /mnt/backups -B system --yes \
--notify-email=admin@example.com \
&& curl -fsS --retry 3 https://hc-ping.com/<your-uuid> >/dev/null
If the backup fails, the --notify-email sends the failure email (assuming Postfix is up). If the backup succeeds, healthchecks.io gets the ping. If the WHOLE HOST is down (no ping, no email), healthchecks.io alerts you after the scheduled interval. Three-layer coverage with minimal moving parts.
Off-site copy
system_backup.sh writes to the local -P path only. Off-site copy is left to your existing tooling — rclone, rsync to remote storage, aws s3 cp, restic, whatever you already use. Typical pattern:
sudo /opt/hermes-seg-docker-gl/scripts/system_backup.sh -P /mnt/backups -B system --yes \
&& rclone sync /mnt/backups remote:hermes-backups/
Restore
Restore quick start
sudo /opt/hermes-seg-docker-gl/scripts/system_restore.sh -F /mnt/backups/hermes-backup-system-v260119-20260601T103000Z.tar
The restore replaces the data in the backup's scope and leaves other scopes alone. Restoring a system backup overwrites the install root + Data tier + DBs + LDAP; the Vmail / Archive / Nextcloud tiers are untouched. Restoring a vmail backup overwrites only /mnt/vmail. The stack is stopped for the duration of the restore (always — even hot-mode backups are restored cold).
Safety: SHA256, version, and topology gates
Three gates fire BEFORE any destructive action:
build_no (captured at backup time from system_settings.build_no) is compared against the current host's build_no. If they differ, restore refuses unless FORCE_VERSION_MISMATCH=1 is set. Schema migrations between Hermes builds make cross-version restore unsafe — restoring a v260119 DB dump onto a v260201 host leaves the schema in a state the running code does not expect, which breaks silently when something hits a missing or renamed column. The correct procedure is to install Hermes at the matching build first (git checkout <build>), restore, then upgrade forward via scripts/system_update_docker.sh — same model the legacy bare-metal install system_backup.shrefusal. system_restore.shthe /opt/hermes/mnt/data, /etc/postfix/mnt/vmail, etc.) don't match this host's current mount paths from /var/spool/postfix/.env, /var/lib/mysql/restore To torestore thea backup onto a host filesystemwith root.a Thatdifferent modelstorage workedtopology on(e.g., bare-metala because5-tier-split thehost backuprestoring originatedonto froma thesingle-mount samehost layoutwhere iteverything was restored into.
The Dockerized rewrite changed the layout entirely:
config/<service>/etc/...mnt/data), set FORCE_REMAP=1:
sudo FORCE_REMAP=1 /opt/hermes-seg-docker-gl/scripts/system_restore.sh -F /path/to/backup.tar
FORCE_REMAP=1 is all-or-nothing in thePhase repo
config/hermes/opt/hermes/keys/--remap-tiers The legacy scripts have no awareness of any of this. They will not capture Authelia or Nextcloud databases (which did not exist in the bare-metal era), they will not correctly stop containers before snapshotting their volumes, and the legacy restore script will overwrite directories on the Docker host that have completely different semantics from where the backup data originally lived. Running them on a Docker install is unsafe.
The Docker-aware replacements are the work tracked by #219 and #220. Theyflag will land in aPhase futureB.
Disaster-recovery Untilflow they(different do,host)
install_hermes_docker.sh. The install root + .env need to exist before restore can succeed.
scp the backup tarball from off-site storage to the new host.
Run system_restore.sh -F /path/to/backup.tar. If the new host's mount paths differ from the original (typical when restoring onto different hardware), prefix with FORCE_REMAP=1.
Verify the admin A cross-host restore needs more than the restore itself. The restored data carries the source host's identity and
usecredentials, so several things must be reconciled by hand — runsystem_rehost.sh, re-activate theinterimProstrategylicense,below.and re-save the Content Checks pages to re-apply the milter chain. Follow the full checklist: Post-Restore Steps.
Restore flags
-F <path>
Required. Path to the backup tarball produced by system_backup.sh.
--yes (or -y)
Skip the interactive confirmation prompt.
--dry-run (or -n)
Show what would happen without changing anything.
--help (or -h)
Show usage.
FORCE_REMAP=1 (env)
Required to proceed past the topology-mismatch refusal.
RecommendedWhen interimto strategyuse hypervisor snapshots instead
The cold-mode escape hatch (--cold) covers byte-level-consistency use cases that the cold-mode scripts can satisfy. For two other cases, Hypervisorhypervisor / VM snapshots.snapshots Takeare the right tool, not the Hermes scripts:
system_update_docker.sh — that gives you a working rollback if the upgrade fails mid-flight. The methodology doc codifies this.
Zero-downtime full-host snapshot. If you want a single consistent point-in-time image of the entire Hermes host Per-hypervisor snapshot mechanisms:
| Platform | |
|---|---|
| Proxmox VE | Datacenter > Backup, or Snapshot from the VM's right-click menu |
| VMware vSphere / ESXi | VM > Snapshots > Take Snapshot |
| KVM / libvirt | virsh snapshot-create-as <domain> <name> --disk-only --atomic |
| AWS EC2 | EBS volume snapshot (or AMI for |
| Azure VMs | Disk snapshot, or Recovery Services Vault |
| Google Compute Engine | Disk snapshot |
| Hyper-V | Checkpoint |
Take the snapshot with the VM either:
A whole-VM snapshot captures every storage tier, every database, every container's state, and the Docker daemon's own metadata in one consistent point-in-time image. Restoration is your hypervisor's standard "revert to snapshot" workflow — no Hermes-specific tooling needed.
This is the only backup strategy we currently recommend for Docker installs.
What you should NOT do
Do NOT run the legacy CLIbare-metal scripts on a Docker host
The legacy bare-metal scripts still exist in the repository atpre-Docker config/hermes/opt/hermes/scripts/system_backup.sh and system_restore.sh. They are kept in the repo for reference and for the legacy-to-Docker migration path. Do not run them on a Docker install. Specifically:
system_restore.sh does cd / && tar -xvzf <backup-file> — /etc//opt//var/directories with files from a layout that does not match the Docker host's reality. Hermes services system_backup.shDo NOT tar a running storage tier with tar directly
If for some reason you reach for tar directly instead of system_backup.sh, do NOT tar /mnt/data, /mnt/vmail, /mnt/files, andor /mnt/archive allwhile containthe filesstack thatis running containerswithout areusing activelythe writinghot-backup to.primitives the script uses. Specifically:
/mnt/datacontains MariaDB's tablespace files — tar'ing them whilehermes_db_serveris running produces a backupthat mariadb-backup orMariaDBitselfwill reject as inconsistent on restore. Usesystem_backup.sh(which excludesmysql/from the data tar and captures DBs viamariadb-dump) instead.- Without
slapcat, raw tar of/mnt/vmaildata/ldapcontains Dovecot mailboxes — tar'ing them whilehermes_dovecothas them openmid-write capturestorninconsistentwritesslapdmid-delivery.
/mnt/filesoc_filecache/mnt/archiveIfThe youHermes needscripts file-levelhandle ratherall thanof VM-levelthis backupscorrectly. whileUse waiting for #219 / #220, stop the stack (docker compose down), perform the tar, then restart (docker compose up -d). That is the cold-backup pattern the Docker-aware tooling will eventually wrap into a single command — but for now it is a manual procedure, with no automated restore counterpart.them.
Do NOT trust an untested restore procedure
Whatever interimbackup strategy you adopt, practice the restore at least once on a non-production system before you rely on it. Take a backup of your live Hermes host, spin up a second VM, run the restore, verify you can log into the admin console and send a test message. A backup procedure that has never been restored from is not a backup procedure — it is wishful thinking.
What's upcoming in Phase B
The Phase A scripts cover the common cases (hot daily system backup, scoped tier backups, cold-mode forensic snapshot, scope-aware restore). The Phase B refactor (post-Link-Guard) will add:
--retain-last=N deletes older backups beyond N)
Per-tier --remap-tiers <old>:<new> replacing the all-or-nothing FORCE_REMAP=1 env var
Selective container restart instead of full compose down on the restore side (faster restart, smaller blast radius)
Filesystem-snapshot integration (LVM / ZFS / btrfs detection): if a --cold is too disruptive
Not on the Phase B roadmap (deliberately dropped):
job-exec into a named container, job-local on the Ofelia container) doesn't fit a host-level script cleanly. Forcing it docker compose plugin + Docker socket + root access, plus admin-page UI work to add jobs — all to honor a pattern that doesn't fit. Host cron is the Failure / success notification is a separate discussion — see the backupScheduling viable.section above. Today the answer is cron's MAILTO= / pipe exit code into existing alerting; if operators ask for native built-in notification, it's a small Phase B addition.
Tracking: #219 for the backup-side enhancements, #220 for the restore-side.
Migrating from a legacy bare-metal install
A separate migration tool exists at scripts/migrate_legacy_to_docker.sh for operators onmoving from a legacy bare-metal install who want to move to the Docker install. That toolIt consumes a backup produced by the legacy system_backup.sh (which is correct in the bare-metal context where it was made)ran) and restores it into the Docker layout via a translation step — notNOT the same as running the legacy restore script directly.
That migration tool is itself early-stage; seeSee the Migrating from legacymigration section of the v260119 release notes for current scope and limitations.
What will land in #219 / #220
The Docker-aware tooling will offer at minimum:
--scope=system|archive|vmail|nextcloud|allmariadb-backupocc maintenance:mode --on--remap-tiersTrack #219 and #220 for progress. Subscribe to release announcements on the GitHub releases page to be notified when the tooling ships.scope.
Cross-references
- Storage Topology —
whattheeachfive-tieroflayout thefivebackuptiersoperatescontains, which is what backup/restore needs to operate againston - Release & Update Methodology — recommends taking a hypervisor snapshot before running
system_update_docker.sh - scripts/migrate_legacy_to_docker.sh — separate from backup/restore; for one-time bare-metal-to-Docker migration only