Page:
mergerfs + snapraid
Clone
31
mergerfs + snapraid
nunks edited this page 2026-03-08 20:46:14 -03:00
Table of Contents
- Sumario
- Documentacao relevante
- Duvidas para sanar
- Passo a passo
- MERGERFS: Criando e montando os discos...
- Discos
- Criando os diretorios https://trapexit.github.io/mergerfs/latest/config/branches/#mount-points
- Formatando https://trapexit.github.io/mergerfs/latest/config/branches/#formatting
- Montando
- Depois de montar
- Finalmente montando o merge
- SNAPRAID: Criando e montando os discos...
Sumario
Documentacao relevante
- mergerfs e hardlinks pensando no *arr stack
- mergerfs: quickstart
- https://wiki.archlinux.org/title/SnapRAID
- https://zackreed.me/posts/modern-snapraid-maintenance-script/
Duvidas para sanar
- Qual o melhor fs pra mergerfs? EXT4? XFS?
- QUal o melhor fs pra SnapRAID parity?
Passo a passo
- LER AS DOCUMENTAÇÕES
- Instalar a placa SATA e o novo HDD de 10GB (hardware)
- Formatar o
/mnt/SINKBAKe tambem o novo HDD - Montar o
SINKBAKe o novo HDD em/mnt/storage/disks/data-drive-1edata-drive-2 - Instalar o mergerfs e configurar pra fazer merge dos dois
data-drive*acima, em modo que preserve hardlinks - Montar o novo filesystem mergeado em
/mnt/storage/data - Fazer testes de hardlink "simulando" o radarr: criar um diretorio de destino e um de download. Gerar um arquivo no diretorio de download e fazer link pra dentro do diretorio de destino
- Estando tudo testado a contento, fazer rsync do
/mnt/SINKpara/mnt/storage/datae deixar o mergerfs distribuir tudo o que precisar - Testar apontamento do *arr stack para os novos mounts em
/mnt/storage/data/media - Estando tudo certo e garantido q nao precisa mais de rsync (FAZER UM ULTIMO SYNC PRA GARANTIR), seguir com o procedimento do SnapRAID:
- Formatar o
/mnt/SINKcomo paridade de SnapRAID e montar em/mnt/storage/disks/parity-drive-1 - Instalar o SnapRAID e configurar os
/data-drive-*e/parity-drive-1de acordo - Fazer o primeiro sync do SnapRAID
- Agendar futuros syncs
- Fim?...
MERGERFS: Criando e montando os discos...
Discos
NAME PATH SERIAL
sda /dev/sda WD-BC0AV53J
sdc /dev/sdc ZA2BNCBQ
Criando os diretorios https://trapexit.github.io/mergerfs/latest/config/branches/#mount-points
while read DEVICE COMMENT; do
MP=/mnt/storage/disks/${DEVICE};
echo $(date '+%F %T') Creating mountpoint ${MP};
mkdir -p ${MP};
chown root:root ${MP};
chmod 0000 ${MP};
chattr +i ${MP};
done <<EOF
hdd01-10T-WD-BC0AV53J
hdd02-10T-ZA2BNCBQ #sinkbak
EOF
setfattr -n user.mergerfs.branch_mounts_here /mnt/storage/disks;
Formatando https://trapexit.github.io/mergerfs/latest/config/branches/#formatting
mkfs.xfs -L WD-BC0AV53J /dev/sda
mkfs.xfs -L ZA2BNCBQ /dev/sdc
Montando
#/etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
LABEL=WD-BC0AV53J /mnt/storage/disks/hdd01-10T-WD-BC0AV53J auto nofail 0 2
LABEL=ZA2BNCBQ /mnt/storage/disks/hdd02-10T-ZA2BNCBQ auto nofail 0 2
mount -a
Depois de montar
for D in /mnt/storage/disks/hdd*; do
chown root:root ${D};
chmod 1777 ${D};
setfattr -n user.mergerfs.branch ${D};
getfattr ${D};
done;
Finalmente montando o merge
#fstab
/mnt/storage/disks/hdd* /mnt/storage/data mergerfs cache.files=off,category.create=pfrd,func.getattr=newest,dropcacheonclose=false 0 0
SNAPRAID: Criando e montando os discos...
Discos
sdb /dev/sdb ZTN19R59
Criando os diretorios
mkdir -p /mnt/storage/parity/hdd01-10T-ZTN19R59 #sink
Formatando
mkfs.xfs -L ZTN19R59 /dev/sdb
Montando
#fstab
LABEL=ZTN19R59 /mnt/storage/parity/hdd01-10T-ZTN19R59 auto nofail 0 2
Configurando
# /etc/snapraid.conf
parity /mnt/storage/parity/hdd01-10T-ZTN19R59/snapraid.parity
content /var/snapraid/snapraid.content
content /mnt/nvme1/snapraid.content
content /mnt/storage/disks/hdd01-10T-WD-BC0AV53J/snapraid.content
content /mnt/storage/disks/hdd02-10T-ZA2BNCBQ/snapraid.content
data d1 /mnt/storage/disks/hdd01-10T-WD-BC0AV53J
data d2 /mnt/storage/disks/hdd02-10T-ZA2BNCBQ
exclude /media/transmission
exclude $RECYCLE.BIN/
exclude .Trash-*/
exclude *.unrecoverable
exclude /tmp/
exclude /lost+found/
exclude downloads/
exclude *.!sync
exclude .AppleDouble
exclude ._AppleDouble
exclude .DS_Store
exclude ._.DS_Store
exclude .Thumbs.db
exclude .fseventsd
exclude .Spotlight-V100
exclude .TemporaryItems
exclude .Trashes
exclude .AppleDB
autosave 250
Primeiro sincronismo
snapraid sync
Profilaxia a ser agendada
snapraid diff
snapraid sync
snapraid scrub
***
snapraid diff
snapraid sync
snapraid scrub -p new
snapraid scrub -p 100 -o 20
#Just in case is something went wrong
snapraid -e fix
snapraid -p bad scrub
snapraid touch
snapraid smart
snapraid status
snapraid down
#!/usr/bin/env bash
#######################################################################
# SnapRAID helper script:
# 1) Optionally pauses configured Docker services
# 2) Runs snapraid diff
# 3) If del/updated thresholds exceeded -> warn + optionally force sync after N warnings
# 4) If authorized -> runs snapraid sync
# 5) If in-sync (or sync completed) -> runs snapraid scrub (partial, configurable)
# 6) Optionally runs snapraid smart + snapraid down
# 7) Restores services and emails output (if configured)
#
# Modernized with:
# - Robust BEGIN/END markers for each SnapRAID job
# - Exit-code capture + warning/failure reporting
# - Optional DIFF list summarization for email (full untrimmed log still preserved)
# - Optional Healthchecks ping integration (/start, success, /<exitcode>)
#######################################################################
#######################
# USER CONFIGURATION #
#######################
EMAIL_ADDRESS="yourusername@gmail.com"
# Set the threshold of deleted files to stop the sync job from running.
DEL_THRESHOLD=100
UP_THRESHOLD=500
# 0 -> always force a sync (ignore thresholds)
# -1 -> never force a sync (manual intervention required if thresholds exceeded)
# N -> force a sync after N warnings
SYNC_WARN_THRESHOLD=-1
# Set percentage of array to scrub if it is in sync.
# 0 disables scrub. 100 scrubs the full array in one run (can take a long time).
SCRUB_PERCENT=3
SCRUB_AGE=10
# Spindown disks after jobs complete.
# 1 = run `snapraid down` (spins down array disks)
# 0 = skip spindown (useful if you have other jobs running, or want disks warm)
SPINDOWN_DISKS=0
# Log SMART info.
SMART_LOG=1
SNAPRAID_BIN="/usr/local/bin/snapraid"
MAIL_BIN="/usr/bin/mutt"
DOCKER_BIN="/usr/bin/docker"
SNAPRAID_CONF="/etc/snapraid.conf"
# Docker services control (pause containers by name).
MANAGE_SERVICES=1
SERVICES=(sabnzbd sonarr radarr lidarr)
PAUSED_SERVICES=()
# Where to keep the warning counter (persistent across runs)
SYNC_WARN_FILE="/tmp/snapRAID.warnCount"
# Optional: prevent overlapping runs (recommended for cron)
LOCK_FILE="/tmp/snapraid-sync.lock"
# Exit-code policy:
# 0 = continue on failures (but warn and block downstream risky steps)
# 1 = fail fast (exit on first non-zero exit code from snapraid, except diff rc=2)
FAIL_FAST=1
# Summarize the verbose `snapraid diff` file list in the emailed report.
# Full untrimmed log is always saved to disk.
SUMMARIZE_DIFF_EMAIL=1 # This trims the huge per-file add/remove list in the EMAIL ONLY, while saving the full log to disk.
# When summarizing DIFF list: keep first N and last N file-change lines (add/remove/...)
DIFF_LIST_HEAD=20
DIFF_LIST_TAIL=20
# Where to store full logs persistently (email will include the path)
LOG_DIR="/var/log/snapraid"
# Healthchecks integration (optional)
HEALTHCHECKS_ALERTS=1
HEALTHCHECKS_ID="588220cd-28b1-40dc-6524-6e28a0g1d1a3"
HEALTHCHECKS_URL="https://healthchecks.yourdomain.com/ping/"
HC_TIMEOUT_SECS=10
HC_RETRIES=3
############################
# DO NOT EDIT BELOW THIS #
############################
set -u
set -o pipefail
shopt -s lastpipe 2>/dev/null || true
SECONDS=0
TMP_OUTPUT=""
EMAIL_OUTPUT=""
FULL_LOG_FILE=""
EMAIL_SUBJECT_PREFIX=""
GRACEFUL=0
CHK_FAIL=0
DO_SYNC=0
JOBS_DONE=""
DEL_COUNT=""
ADD_COUNT=""
MOVE_COUNT=""
COPY_COUNT=""
UPDATE_COUNT=""
RESTORED_COUNT=""
SYNC_WARN_COUNT=""
DIFF_RC=0
SYNC_RC=0
SCRUB_RC=0
SMART_RC=0
DOWN_RC=0
TOUCH_RC=0
SERVICE_RC=0
HAD_FAILURE=0
SERVICES_PAUSED_COUNT=0
SERVICES_RESTORED_COUNT=0
SERVICES_FAILED_PAUSE=0
SERVICES_FAILED_RESTORE=0
# Healthchecks
HC_ENABLED=0
HC_TOOL="" # curl|wget
HC_SENT_START=0
#######################################################################
# HELPER FUNCTIONS
#######################################################################
# Simple logging wrapper
log() {
printf '%s\n' "$*"
}
# Fatal error - exit immediately
die() {
log "**ERROR** $*"
exit 1
}
# Check if a command exists
have_cmd() {
command -v "$1" >/dev/null 2>&1
}
# Format duration in seconds to human-readable format
format_duration() {
local total_seconds=$1
local hours=$((total_seconds / 3600))
local minutes=$(((total_seconds % 3600) / 60))
local seconds=$((total_seconds % 60))
if (( hours > 0 )); then
printf '%dh %dm %ds' "$hours" "$minutes" "$seconds"
elif (( minutes > 0 )); then
printf '%dm %ds' "$minutes" "$seconds"
else
printf '%ds' "$seconds"
fi
}
# Verify all required binaries are present and executable
require_bins() {
[[ -x "$SNAPRAID_BIN" ]] || die "snapraid binary not found/executable at: $SNAPRAID_BIN"
[[ -f "$SNAPRAID_CONF" ]] || die "snapraid config not found at: $SNAPRAID_CONF"
if [[ -n "${EMAIL_ADDRESS:-}" ]]; then
[[ -x "$MAIL_BIN" ]] || die "mail binary not found/executable at: $MAIL_BIN"
fi
if (( MANAGE_SERVICES == 1 )); then
[[ -x "$DOCKER_BIN" ]] || die "docker binary not found/executable at: $DOCKER_BIN"
fi
for b in awk sed grep hostname date tee mkdir mktemp; do
have_cmd "$b" || die "$b not found"
done
}
# Print a section header for better log readability
section() {
log
log "----------------------------------------"
log "$1"
}
#######################################################################
# HEALTHCHECKS INTEGRATION
#######################################################################
# Initialize healthchecks - determine if enabled and which tool to use
hc_init() {
if (( HEALTHCHECKS_ALERTS != 1 )); then
HC_ENABLED=0
return 0
fi
if [[ -z "${HEALTHCHECKS_ID:-}" || -z "${HEALTHCHECKS_URL:-}" ]]; then
log "WARNING: HEALTHCHECKS_ALERTS=1 but HEALTHCHECKS_ID/HEALTHCHECKS_URL not set. Disabling."
HC_ENABLED=0
return 0
fi
if have_cmd curl; then
HC_TOOL="curl"
HC_ENABLED=1
elif have_cmd wget; then
HC_TOOL="wget"
HC_ENABLED=1
else
log "WARNING: Healthchecks enabled but neither curl nor wget found. Disabling."
HC_ENABLED=0
fi
}
# Build the healthchecks ping URL with optional suffix
hc_ping_url() {
local suffix="${1:-}"
local base="${HEALTHCHECKS_URL%/}/"
local url="${base}${HEALTHCHECKS_ID}"
[[ -n "$suffix" ]] && url="${url}/${suffix}"
printf '%s' "$url"
}
# Send a ping to healthchecks (monitoring must never block maintenance)
hc_send() {
(( HC_ENABLED == 1 )) || return 0
local suffix="${1:-}"
local body="${2:-}"
local url
url="$(hc_ping_url "$suffix")"
local result=0
# Use curl if available, otherwise wget
if [[ "$HC_TOOL" == "curl" ]]; then
if [[ -n "$body" ]]; then
curl -fsS --max-time "$HC_TIMEOUT_SECS" --retry "$HC_RETRIES" \
--retry-delay 1 --retry-all-errors \
-X POST --data-raw "$body" "$url" >/dev/null 2>&1 || result=$?
else
curl -fsS --max-time "$HC_TIMEOUT_SECS" --retry "$HC_RETRIES" \
--retry-delay 1 --retry-all-errors \
"$url" >/dev/null 2>&1 || result=$?
fi
else
if [[ -n "$body" ]]; then
printf '%s' "$body" | wget -qO- --timeout="$HC_TIMEOUT_SECS" --tries="$HC_RETRIES" \
--method=POST --body-file=- "$url" >/dev/null 2>&1 || result=$?
else
wget -qO- --timeout="$HC_TIMEOUT_SECS" --tries="$HC_RETRIES" "$url" >/dev/null 2>&1 || result=$?
fi
fi
# Log ping failures for debugging (non-fatal)
if (( result != 0 )); then
log "DEBUG: Healthcheck ping failed (non-fatal): $url" >&2
fi
return 0
}
# Signal job start to healthchecks
hc_start() {
(( HC_ENABLED == 1 )) || return 0
hc_send "start" "SnapRAID job started on $(hostname) at $(date)"
HC_SENT_START=1
}
# Signal successful completion to healthchecks
hc_finish_success() {
(( HC_ENABLED == 1 )) || return 0
hc_send "" "SnapRAID job success on $(hostname) at $(date). Jobs: ${JOBS_DONE}"
}
# Signal failure to healthchecks with exit code
hc_finish_fail() {
(( HC_ENABLED == 1 )) || return 0
local code="${1:-1}"
(( code >= 1 && code <= 255 )) || code=1
hc_send "$code" "SnapRAID job WARNING/FAIL on $(hostname) at $(date). Subject: ${SUBJECT:-"(no subject)"}"
}
#######################################################################
# ROBUST JOB MARKERS AND COMMAND RUNNER
#######################################################################
# Mark the beginning of a SnapRAID job in the log
mark_begin() {
local name="$1"
echo "__SNAPRAID_${name}_BEGIN__ [$(date)]" | tee -a "$TMP_OUTPUT" >/dev/null
}
# Mark the end of a SnapRAID job with its exit code
mark_end() {
local name="$1"
local rc="$2"
{
echo "__SNAPRAID_${name}_END__ [$(date)] rc=${rc}"
echo
} | tee -a "$TMP_OUTPUT" >/dev/null
}
# Check if a job completed (has an END marker)
marker_end_present() {
local name="$1"
grep -q "__SNAPRAID_${name}_END__" "$TMP_OUTPUT"
}
# snapraid diff: rc=2 means "differences found" (normal, not an error)
is_snapraid_diff_ok() {
local rc="$1"
[[ "$rc" -eq 0 || "$rc" -eq 2 ]]
}
# Run a command with robust logging and error handling
run_cmd() {
local name="$1"; shift
mark_begin "$name"
{
echo "###${name} [$(date)]"
"$@"
} 2>&1 | tee -a "$TMP_OUTPUT"
local rc=${PIPESTATUS[0]}
mark_end "$name" "$rc"
if (( rc != 0 )); then
HAD_FAILURE=1
log "**WARNING** ${name} returned non-zero exit code: ${rc}"
if (( FAIL_FAST == 1 )); then
die "${name} failed with rc=${rc} (FAIL_FAST=1)"
fi
fi
return "$rc"
}
#######################################################################
# DOCKER SERVICE MANAGEMENT
#######################################################################
# Pause configured Docker services to prevent file changes during sync
service_pause() {
local s running
for s in "${SERVICES[@]}"; do
running="$("$DOCKER_BIN" inspect -f '{{.State.Running}}' "$s" 2>/dev/null || true)"
if [[ "$running" == "true" ]]; then
echo "Pausing Service - ${s}" | tee -a "$TMP_OUTPUT"
if "$DOCKER_BIN" pause "$s" >/dev/null 2>&1; then
PAUSED_SERVICES+=("$s")
((SERVICES_PAUSED_COUNT++))
else
echo "WARNING: failed to pause $s" | tee -a "$TMP_OUTPUT"
((SERVICES_FAILED_PAUSE++))
SERVICE_RC=1
HAD_FAILURE=1
fi
elif [[ "$running" == "false" ]]; then
echo "Service not running (skip pause) - ${s}" | tee -a "$TMP_OUTPUT"
else
echo "Service not found (skip pause) - ${s}" | tee -a "$TMP_OUTPUT"
fi
done
}
# Unpause previously paused Docker services
service_unpause() {
local s st
for s in "${PAUSED_SERVICES[@]}"; do
st="$("$DOCKER_BIN" inspect -f '{{.State.Status}}' "$s" 2>/dev/null || true)"
if [[ "$st" == "paused" ]]; then
echo "Unpausing Service - ${s}" | tee -a "$TMP_OUTPUT"
if "$DOCKER_BIN" unpause "$s" >/dev/null 2>&1; then
((SERVICES_RESTORED_COUNT++))
else
echo "WARNING: failed to unpause $s" | tee -a "$TMP_OUTPUT"
((SERVICES_FAILED_RESTORE++))
SERVICE_RC=1
HAD_FAILURE=1
fi
else
echo "Service not paused (skip unpause) - ${s} (status: $st)" | tee -a "$TMP_OUTPUT"
fi
done
}
# Restore all paused services
restore_services() {
(( MANAGE_SERVICES == 1 )) || return 0
if [[ ${#PAUSED_SERVICES[@]} -eq 0 ]]; then
log "No services to restore."
return 0
fi
service_unpause
return 0
}
# Cleanup function - runs on script exit (normal or interrupted)
cleanup() {
local exit_code=$?
# Always try to restore services
restore_services || {
log "WARNING: Failed to restore services during cleanup" >&2
# Don't overwrite a non-zero exit code with service restoration failure
(( exit_code == 0 )) && exit_code=1
}
# Clean up lock file on successful exit
if (( exit_code == 0 )) && [[ -f "$LOCK_FILE" ]]; then
rm -f "$LOCK_FILE" 2>/dev/null || true
fi
exit $exit_code
}
# Register cleanup to run on exit/interrupt
trap cleanup INT TERM EXIT
#######################################################################
# SNAPRAID CONFIG PARSING
#######################################################################
# Parse SnapRAID config to extract content and parity file paths
parse_snapraid_conf() {
# Extract all content file paths
mapfile -t CONTENT_FILES < <(
awk '
# Skip blank lines and comments
/^[[:space:]]*($|#|;)/ { next }
# Match "content" keyword (standard SnapRAID format)
$1 == "content" && $2 != "" {
print $2
}
' "$SNAPRAID_CONF"
)
((${#CONTENT_FILES[@]} > 0)) || die "Could not determine content files from $SNAPRAID_CONF"
# Use the first content file as primary
CONTENT_FILE="${CONTENT_FILES[0]}"
# Extract all parity file paths (handles comma-separated values)
mapfile -t PARITY_FILES < <(
awk '
function trim(s) {
gsub(/^[[:space:]]+|[[:space:]]+$/, "", s)
return s
}
# Skip blank lines and comments
/^[[:space:]]*($|#|;)/ { next }
# Match parity keywords: parity, 2-parity, 3-parity, ..., z-parity
$1 == "parity" || $1 ~ /^([2-6]|z)-parity$/ {
if ($2 == "") next
# Handle comma-separated paths in $2
n = split($2, a, ",")
for (i = 1; i <= n; i++) {
path = trim(a[i])
if (path != "") print path
}
}
' "$SNAPRAID_CONF"
)
((${#PARITY_FILES[@]} > 0)) || die "Could not determine parity files from $SNAPRAID_CONF"
}
# Verify that all content and parity files exist
sanity_check() {
local cf pf
log "Verifying all content files are present."
for cf in "${CONTENT_FILES[@]}"; do
[[ -e "$cf" ]] || die "Content file not found: $cf"
done
log "Verifying all parity files are present."
for pf in "${PARITY_FILES[@]}"; do
[[ -e "$pf" ]] || die "Parity file not found: $pf"
done
log "All content and parity files found. Continuing..."
}
#######################################################################
# SNAPRAID DIFF ANALYSIS
#######################################################################
# Extract change counts from snapraid diff output
# Updated format (as of recent SnapRAID versions):
# " 50 added"
# " 9 removed"
# " 0 updated"
get_counts() {
# Extract only the DIFF section from the log
# Note: Using "in_block" to avoid confusion with awk's `in` operator
local diff_block
diff_block="$(
awk '
/__SNAPRAID_DIFF_BEGIN__/ { in_block=1; next }
/__SNAPRAID_DIFF_END__/ { in_block=0 }
in_block { print }
' "$TMP_OUTPUT"
)"
# Fallback to full output if DIFF block not found
[[ -n "$diff_block" ]] || diff_block="$(cat "$TMP_OUTPUT")"
# Parse the summary lines from snapraid diff output
ADD_COUNT="$(awk '/^[[:space:]]*[0-9]+[[:space:]]+added$/ {print $1; exit}' <<<"$diff_block" || true)"
DEL_COUNT="$(awk '/^[[:space:]]*[0-9]+[[:space:]]+removed$/ {print $1; exit}' <<<"$diff_block" || true)"
UPDATE_COUNT="$(awk '/^[[:space:]]*[0-9]+[[:space:]]+updated$/ {print $1; exit}' <<<"$diff_block" || true)"
MOVE_COUNT="$(awk '/^[[:space:]]*[0-9]+[[:space:]]+moved$/ {print $1; exit}' <<<"$diff_block" || true)"
COPY_COUNT="$(awk '/^[[:space:]]*[0-9]+[[:space:]]+copied$/ {print $1; exit}' <<<"$diff_block" || true)"
RESTORED_COUNT="$(awk '/^[[:space:]]*[0-9]+[[:space:]]+restored$/ {print $1; exit}' <<<"$diff_block" || true)"
# Ensure restored count defaults to 0 if not found
RESTORED_COUNT="${RESTORED_COUNT:-0}"
}
# Check if deleted files are below threshold
chk_del() {
if [[ -n "$DEL_COUNT" ]] && (( DEL_COUNT < DEL_THRESHOLD )); then
log "Deleted files ($DEL_COUNT) below threshold ($DEL_THRESHOLD). SYNC authorized."
DO_SYNC=1
else
log "**WARNING** Deleted files ($DEL_COUNT) exceeded threshold ($DEL_THRESHOLD)."
CHK_FAIL=1
fi
}
# Check if updated files are below threshold
chk_updated() {
if (( UPDATE_COUNT < UP_THRESHOLD )); then
log "Updated files ($UPDATE_COUNT) below threshold ($UP_THRESHOLD). SYNC authorized."
DO_SYNC=1
else
log "**WARNING** Updated files ($UPDATE_COUNT) exceeded threshold ($UP_THRESHOLD)."
CHK_FAIL=1
fi
}
# Handle forced sync after N warnings
chk_sync_warn() {
if (( SYNC_WARN_THRESHOLD > -1 )); then
log "Forced sync is enabled. [$(date)]"
# Load warning count from file
if [[ -f "$SYNC_WARN_FILE" ]]; then
SYNC_WARN_COUNT="$(awk 'NR==1 && $0 ~ /^[0-9]+$/ {print $0; exit}' "$SYNC_WARN_FILE" || true)"
else
SYNC_WARN_COUNT=""
fi
SYNC_WARN_COUNT="${SYNC_WARN_COUNT:-0}"
# Check if we've hit the warning threshold
if (( SYNC_WARN_COUNT >= SYNC_WARN_THRESHOLD )); then
log "Warning count ($SYNC_WARN_COUNT) reached/exceeded threshold ($SYNC_WARN_THRESHOLD). Forcing SYNC. [$(date)]"
DO_SYNC=1
else
# Increment warning count
((SYNC_WARN_COUNT += 1))
printf '%s\n' "$SYNC_WARN_COUNT" > "$SYNC_WARN_FILE"
log "$((SYNC_WARN_THRESHOLD - SYNC_WARN_COUNT)) warning(s) remaining until forced sync. NOT proceeding with SYNC. [$(date)]"
DO_SYNC=0
fi
else
log "Forced sync is not enabled. Check output for details. NOT proceeding with SYNC. [$(date)]"
DO_SYNC=0
fi
}
# Check for and fix files with zero sub-second timestamps
chk_zero() {
run_cmd "TOUCH_CHECK" "$SNAPRAID_BIN" status
local timelog
timelog="$(grep -E 'You have [1-9][0-9]* files with zero sub-second timestamp\.' "$TMP_OUTPUT" | tail -n 1 || true)"
if [[ -n "$timelog" ]]; then
log "${timelog/You have/Found}"
run_cmd "TOUCH" "$SNAPRAID_BIN" touch
TOUCH_RC=$?
JOBS_DONE="${JOBS_DONE:+$JOBS_DONE + }TOUCH"
else
log "No files with zero sub-second timestamps found."
fi
}
#######################################################################
# EMAIL PREPARATION
#######################################################################
# Build the email subject line based on job results
prepare_mail_subject() {
local msg=""
local STATUS_ICON="🟢"
local STATUS_WORD="COMPLETED"
# Check for threshold violations (warnings)
if (( CHK_FAIL == 1 )); then
STATUS_ICON="🟠"
STATUS_WORD="WARNING"
if (( DEL_COUNT >= DEL_THRESHOLD && DO_SYNC == 0 )); then
msg="Deleted ($DEL_COUNT>=$DEL_THRESHOLD)"
fi
if (( DEL_COUNT >= DEL_THRESHOLD && UPDATE_COUNT >= UP_THRESHOLD && DO_SYNC == 0 )); then
msg="${msg} & "
fi
if (( UPDATE_COUNT >= UP_THRESHOLD && DO_SYNC == 0 )); then
msg="${msg}Updated ($UPDATE_COUNT>=$UP_THRESHOLD)"
fi
SUBJECT="${STATUS_ICON} [${STATUS_WORD}] ${msg} ${EMAIL_SUBJECT_PREFIX}"
return 0
fi
# Check for command failures
if (( HAD_FAILURE == 1 )); then
STATUS_ICON="🔴"
STATUS_WORD="FAILED"
SUBJECT="${STATUS_ICON} [${STATUS_WORD}] ${EMAIL_SUBJECT_PREFIX}"
return 0
fi
# Success case
SUBJECT="${STATUS_ICON} [${STATUS_WORD}] ${JOBS_DONE} ${EMAIL_SUBJECT_PREFIX}"
}
# Create a summarized email copy of the log (full log preserved separately)
# This intelligently trims the verbose file-change list while preserving context
summarize_diff_for_email() {
# Start with a copy of the full log
cp -f "$TMP_OUTPUT" "$EMAIL_OUTPUT"
# Skip summarization if disabled
(( SUMMARIZE_DIFF_EMAIL == 1 )) || return 0
# Use awk to keep only head/tail of file-change lines and add summary
awk -v head="$DIFF_LIST_HEAD" -v tail="$DIFF_LIST_TAIL" '
# Helper function to identify file-change action lines
function is_action(line) {
return (line ~ /^(add|remove|update|move|copy|restore)[[:space:]]+/)
}
# Extract the action type from a line
function action_type(line, a) {
split(line, a, /[[:space:]]+/)
return a[1]
}
BEGIN {
in_block=0
action_count=0
tail_length=0
tail_start=1
# Counters for each action type
add_count=0
remove_count=0
update_count=0
move_count=0
copy_count=0
restore_count=0
}
# Detect start of DIFF block
/__SNAPRAID_DIFF_BEGIN__/ {
in_block=1
print
next
}
# Detect end of DIFF block - output tail buffer and summary
/__SNAPRAID_DIFF_END__/ {
if (in_block) {
# If we omitted lines, show how many and breakdown by type
if (action_count > head + tail) {
omitted = action_count - (head + tail)
print ""
printf "... (%d file-change lines omitted from email; breakdown: add=%d remove=%d update=%d move=%d copy=%d restore=%d; see full log on disk) ...\n",
omitted, add_count, remove_count, update_count, move_count, copy_count, restore_count
print ""
}
# Output the tail buffer
for (i=1; i<=tail_length; i++) {
idx = tail_start + i - 1
if (idx > tail) idx -= tail
print tail_buffer[idx]
}
}
in_block=0
print
next
}
{
# Pass through non-DIFF content unchanged
if (!in_block) {
print
next
}
# Handle file-change action lines
if (is_action($0)) {
action_count++
# Track action type
t = action_type($0)
if (t=="add") add_count++
else if (t=="remove") remove_count++
else if (t=="update") update_count++
else if (t=="move") move_count++
else if (t=="copy") copy_count++
else if (t=="restore") restore_count++
# Print first N lines directly
if (action_count <= head) {
print
next
}
# Store last N lines in circular buffer
if (tail > 0) {
if (tail_length < tail) {
tail_length++
pos=tail_length
} else {
pos=tail_start
tail_start++
if (tail_start > tail) tail_start=1
}
tail_buffer[pos] = $0
}
next
}
# Pass through all other lines within DIFF block
print
}
' "$TMP_OUTPUT" > "$EMAIL_OUTPUT".tmp && mv -f "$EMAIL_OUTPUT".tmp "$EMAIL_OUTPUT"
}
# Format the email output for better readability (plain text optimized)
beautify_email_output() {
local tmp duration_str
tmp="$(mktemp -t snapraid.pretty.XXXXXX)"
# Calculate human-readable duration
local hours minutes seconds
hours=$((SECONDS / 3600))
minutes=$(((SECONDS % 3600) / 60))
seconds=$((SECONDS % 60))
if (( hours > 0 )); then
duration_str="${hours}h ${minutes}m ${seconds}s"
elif (( minutes > 0 )); then
duration_str="${minutes}m ${seconds}s"
else
duration_str="${seconds}s"
fi
# Use awk to format the email with headers, sections, and simplified output
awk -v subject="$SUBJECT" \
-v host="$(hostname)" \
-v logfile="$FULL_LOG_FILE" \
-v duration="$duration_str" \
-v del_count="${DEL_COUNT:-0}" \
-v add_count="${ADD_COUNT:-0}" \
-v update_count="${UPDATE_COUNT:-0}" \
-v move_count="${MOVE_COUNT:-0}" \
-v copy_count="${COPY_COUNT:-0}" \
-v restored_count="${RESTORED_COUNT:-0}" \
-v del_thresh="${DEL_THRESHOLD}" \
-v up_thresh="${UP_THRESHOLD}" \
-v warn_count="${SYNC_WARN_COUNT:-0}" \
-v warn_thresh="${SYNC_WARN_THRESHOLD}" \
-v chk_fail="${CHK_FAIL}" \
-v do_sync="${DO_SYNC}" '
# Helper functions for formatted output
function hr() {
print "============================================================"
}
function h1(t) {
print ""
hr()
print t
hr()
print ""
}
function h2(t) {
print ""
print "=============="
print t
print "=============="
print ""
}
# ASCII box drawing for critical warnings
function box_start() {
print "╔════════════════════════════════════════════════════════════╗"
}
function box_line(t) {
printf "║ %-58s ║\n", t
}
function box_end() {
print "╚════════════════════════════════════════════════════════════╝"
}
BEGIN {
# Email header with key metadata
h1(subject)
print "Host: " host
print "Duration: " duration
if (logfile != "") print "Full log: " logfile
print "Finished: " strftime("%c")
print ""
# Change summary section (keep as-is - it is clean and useful)
h2("Change Summary")
printf " Added: %6d files\n", add_count
printf " Removed: %6d files\n", del_count
printf " Updated: %6d files\n", update_count
printf " Moved: %6d files\n", move_count
printf " Copied: %6d files\n", copy_count
printf " Restored: %6d files\n", restored_count
# Critical warning box if thresholds exceeded
if (chk_fail == 1 && do_sync == 0) {
print ""
box_start()
box_line("⚠ CRITICAL: Manual review needed")
box_line("")
if (del_count >= del_thresh) {
box_line(sprintf("Deleted files: %d (threshold: %d)", del_count, del_thresh))
}
if (update_count >= up_thresh) {
box_line(sprintf("Updated files: %d (threshold: %d)", update_count, up_thresh))
}
if (warn_thresh > -1) {
box_line(sprintf("Warning count: %d/%d (will force sync at %d)", warn_count, warn_thresh, warn_thresh))
}
box_end()
}
# State tracking for content filtering
in_status_report = 0
skip_until_blank = 0
in_scrub_section = 0
in_smart_report = 0
in_wait_chart = 0
pause_header_shown = 0
unpause_header_shown = 0
pause_last_line = 0
blank_count = 0
just_had_verified = 0
# Scrub stats tracking
scrub_last = ""
scrub_oldest = ""
scrub_median = ""
scrub_newest = ""
scrub_errors = ""
# SMART data tracking
smart_disk_count = 0
smart_data_count = 0
smart_parity_count = 0
smart_other_count = 0
smart_high_temp_count = 0
smart_high_fp_count = 0
smart_error_count = 0
smart_max_temp = 0
smart_overall_fp = ""
delete smart_warnings
smart_warning_count = 0
}
# REMOVE ALL INTERNAL MARKERS - these should never appear in the email
/^__SNAPRAID_[A-Z0-9_]+_(BEGIN|END)__/ {
next
}
# Remove internal job timestamps
/^###[A-Z0-9_]+ \[/ {
next
}
# Group service pause messages into a dedicated section
/^Pausing Service -/ || /^Service not running.*skip pause/ || /^Service not found.*skip pause/ {
if (!pause_header_shown) {
h2("Services Paused")
pause_header_shown = 1
}
print " " $0
pause_last_line = NR
next
}
# Group service unpause messages into a dedicated section
/^Unpausing Service -/ || /^Service not paused.*skip unpause/ {
if (!unpause_header_shown) {
h2("Services Restored")
unpause_header_shown = 1
}
print " " $0
next
}
# Add section header after service pause messages when we see "Self test..."
/^Self test\.\.\./ {
# If we just finished showing service pause messages, add a section header
if (pause_header_shown && NR - pause_last_line < 5) {
h2("DIFF Analysis")
}
print
next
}
# FILTER OUT: Entire SnapRAID status report (verbose, not needed in email)
/^SnapRAID status report:/ {
in_status_report = 1
next
}
# End status report when we hit "The oldest block was scrubbed" line
in_status_report == 1 && /^The oldest block was scrubbed/ {
# Extract scrub statistics for simplified summary
if (match($0, /scrubbed ([0-9]+) days ago, the median ([0-9]+), the newest ([0-9]+)/, arr)) {
scrub_oldest = arr[1]
scrub_median = arr[2]
scrub_newest = arr[3]
}
in_status_report = 0
in_scrub_section = 1
next
}
# Skip all lines within the status report
in_status_report == 1 {
next
}
# Capture scrub error info if present
in_scrub_section == 1 && /^No error detected/ {
scrub_errors = "✓ No errors detected"
in_scrub_section = 0
next
}
in_scrub_section == 1 && /error/ {
scrub_errors = "⚠ Errors detected - check full log"
in_scrub_section = 0
next
}
# End scrub section after a few lines if no error line found
in_scrub_section == 1 {
scrub_line_count++
if (scrub_line_count > 3) {
in_scrub_section = 0
if (scrub_errors == "") scrub_errors = "Status unknown"
}
next
}
# Detect SCRUB job completion and output simplified summary
/^Self test completed OK/ || /^Scrubbing completed/ {
# Only show scrub summary if we have data
if (scrub_oldest != "" || scrub_errors != "") {
h2("Scrub Summary")
if (scrub_oldest != "") {
print " Last scrub: " scrub_oldest " days ago"
print " Oldest block: " scrub_oldest " days (median: " scrub_median " days, newest: " scrub_newest " days)"
}
if (scrub_errors != "") {
print " Status: " scrub_errors
}
print ""
}
# Reset for next potential scrub
scrub_oldest = ""
scrub_median = ""
scrub_newest = ""
scrub_errors = ""
next
}
# FILTER OUT: Wait time charts (not readable in email, not actionable)
/^[[:space:]]*(d[0-9]+|parity|2-parity|raid|hash|sched|misc)[[:space:]]+[0-9]+%[[:space:]]*\|/ {
in_wait_chart = 1
next
}
# End of wait time chart
in_wait_chart == 1 && /wait time \(total, less is better\)/ {
in_wait_chart = 0
next
}
in_wait_chart == 1 {
next
}
# Detect start of SMART report and begin parsing
/^SnapRAID SMART report:/ {
in_smart_report = 1
smart_in_header = 1
next
}
# Skip SMART header lines
in_smart_report == 1 && smart_in_header == 1 && /^[[:space:]]*$/ {
next
}
in_smart_report == 1 && smart_in_header == 1 && /Temp Power Error FP Size/ {
next
}
in_smart_report == 1 && smart_in_header == 1 && /C OnDays Count TB Serial/ {
next
}
in_smart_report == 1 && smart_in_header == 1 && /^[[:space:]]*-+[[:space:]]*$/ {
smart_in_header = 0
next
}
# Parse SMART data lines
in_smart_report == 1 && !smart_in_header && /^[[:space:]]+[0-9-]+[[:space:]]+/ {
# Extract fields: Temp, Power, Error, FP, Size, Serial, Device, Disk
temp = $1
power = $2
error = $3
fp = $4
size = $5
serial = $6
device = $7
disk = $8
smart_disk_count++
# Categorize disk
if (disk ~ /^d[0-9]+$/) {
smart_data_count++
} else if (disk ~ /parity/) {
smart_parity_count++
} else {
smart_other_count++
}
# Check for warnings
has_warning = 0
warning_msg = ""
# High failure probability (>50%)
if (fp ~ /^[0-9]+%$/) {
fp_val = fp
gsub(/%/, "", fp_val)
if (fp_val + 0 > 50) {
smart_high_fp_count++
has_warning = 1
if (warning_msg != "") warning_msg = warning_msg " | "
warning_msg = warning_msg "High failure risk (" fp ")"
}
}
# High temperature (>40°C)
if (temp ~ /^[0-9]+$/ && temp + 0 > 40) {
smart_high_temp_count++
has_warning = 1
if (warning_msg != "") warning_msg = warning_msg " | "
warning_msg = warning_msg "High temp (" temp "°C)"
}
# Track max temp
if (temp ~ /^[0-9]+$/ && temp + 0 > smart_max_temp) {
smart_max_temp = temp
smart_max_temp_disk = disk
}
# Errors present
if (error ~ /^[0-9]+$/ && error + 0 > 0) {
smart_error_count++
has_warning = 1
if (warning_msg != "") warning_msg = warning_msg " | "
warning_msg = warning_msg error " errors"
}
# Store warning if present
if (has_warning) {
smart_warning_count++
smart_warnings[smart_warning_count] = sprintf(" • %s (%s) - %s - %s - %s°C", \
disk, fp, device, serial, temp)
if (warning_msg != "") {
smart_warnings[smart_warning_count] = smart_warnings[smart_warning_count] "\n " warning_msg
}
}
next
}
# Capture overall failure probability
in_smart_report == 1 && /^Probability that at least one disk/ {
if (match($0, /is ([0-9]+)%/, arr)) {
smart_overall_fp = arr[1]
}
in_smart_report = 0
# Output SMART summary
h2("SMART Summary")
printf " Disks monitored: %d total", smart_disk_count
if (smart_data_count > 0 || smart_parity_count > 0 || smart_other_count > 0) {
printf " ("
parts = 0
if (smart_data_count > 0) {
printf "%d data", smart_data_count
parts++
}
if (smart_parity_count > 0) {
if (parts > 0) printf " + "
printf "%d parity", smart_parity_count
parts++
}
if (smart_other_count > 0) {
if (parts > 0) printf " + "
printf "%d other", smart_other_count
}
printf ")"
}
print ""
print ""
# Show warnings or all-clear
if (smart_warning_count > 0) {
if (smart_high_fp_count > 0) {
print " ⚠ High failure probability:"
for (i = 1; i <= smart_warning_count; i++) {
if (smart_warnings[i] ~ /High failure risk/) {
print smart_warnings[i]
}
}
print ""
}
if (smart_high_temp_count > 0) {
print " ⚠ Temperature warnings (>40°C):"
for (i = 1; i <= smart_warning_count; i++) {
if (smart_warnings[i] ~ /High temp/) {
print smart_warnings[i]
}
}
print ""
}
if (smart_error_count > 0) {
print " ⚠ Disks with errors:"
for (i = 1; i <= smart_warning_count; i++) {
if (smart_warnings[i] ~ /errors/) {
print smart_warnings[i]
}
}
print ""
}
} else {
print " ✓ All disks healthy"
print ""
}
# Always show max temp and overall failure probability
if (smart_max_temp > 0) {
printf " Highest temp: %d°C", smart_max_temp
if (smart_max_temp_disk != "") {
printf " (%s)", smart_max_temp_disk
}
print ""
}
if (smart_overall_fp != "") {
printf " Overall failure probability: %s%%", smart_overall_fp
print " (at least one disk in next year)"
}
print ""
next
}
# Skip remaining SMART report lines
in_smart_report == 1 {
next
}
# Detect job section headers from the log structure
/^##(Preprocessing|Processing|Postprocessing)/ {
# Extract the section name
section = $0
gsub(/^##/, "", section)
h2(section)
next
}
# REMOVE: SnapRAID raw summary lines (duplicate of our formatted summary at top)
/^[[:space:]]*[0-9]+[[:space:]]+equal/ { next }
/^[[:space:]]*[0-9]+[[:space:]]+added/ { next }
/^[[:space:]]*[0-9]+[[:space:]]+removed/ { next }
/^[[:space:]]*[0-9]+[[:space:]]+updated/ { next }
/^[[:space:]]*[0-9]+[[:space:]]+moved/ { next }
/^[[:space:]]*[0-9]+[[:space:]]+copied/ { next }
/^[[:space:]]*[0-9]+[[:space:]]+restored/ { next }
/^There are differences!/ { next }
/^No differences/ { next }
/^\*\*SUMMARY of changes/ { next }
# Remove the horizontal line separators from the log
/^----------------------------------------$/ { next }
# Remove standalone "Everything OK" lines (redundant with our summaries)
/^Everything OK$/ { next }
# Preserve the file-change omission message with better spacing
/^\.\.\. \([0-9]+ file-change lines omitted/ {
# Ensure single blank line before
if (blank_count == 0) print ""
print $0
print ""
blank_count = 1
next
}
# Reduce excessive spacing around file operations (add/remove/update lines)
/^(add|remove|update|move|copy|restore)[[:space:]]+/ {
blank_count = 0
print
next
}
# Reduce spacing after "Verified" lines and similar status messages
/^(Saving state to|Verified|Scanned|Using|Initializing|Resizing|Syncing|Scrubbing|Selecting|Comparing)/ {
# Skip if we just had this type of message
if ($0 !~ /^Verified/ || !just_had_verified) {
print
}
if ($0 ~ /^Verified/) just_had_verified = 1
else just_had_verified = 0
blank_count = 0
next
}
# Skip excessive blank lines (more than 1 in a row)
/^[[:space:]]*$/ {
if (blank_count >= 1) next
blank_count++
print
next
}
# Reset blank line counter on non-blank lines
{
blank_count = 0
print
}
' "$EMAIL_OUTPUT" > "$tmp" && mv -f "$tmp" "$EMAIL_OUTPUT"
}
# Send the formatted email
send_mail() {
if ! "$MAIL_BIN" -s "$SUBJECT" "$EMAIL_ADDRESS" < "$EMAIL_OUTPUT"; then
log "ERROR: Failed to send email to $EMAIL_ADDRESS"
return 1
fi
}
# Save the full unformatted log to disk for reference
persist_full_log() {
mkdir -p "$LOG_DIR" || die "Unable to create log dir: $LOG_DIR"
local ts host
ts="$(date +'%Y%m%d-%H%M%S')"
host="$(hostname)"
FULL_LOG_FILE="${LOG_DIR}/snapraid-${host}-${ts}.log"
cp -f "$TMP_OUTPUT" "$FULL_LOG_FILE" || die "Unable to write full log to: $FULL_LOG_FILE"
}
#######################################################################
# MAIN EXECUTION
#######################################################################
main() {
# Prevent overlapping runs using flock if available
if have_cmd flock; then
exec 200>"$LOCK_FILE"
flock -n 200 || die "Another snapraid job appears to be running (lock: $LOCK_FILE)."
fi
# Initialize
require_bins
hc_init
EMAIL_SUBJECT_PREFIX="(SnapRAID on $(hostname))"
TMP_OUTPUT="$(mktemp -t snapraid.out.XXXXXX)"
EMAIL_OUTPUT="$(mktemp -t snapraid.email.XXXXXX)"
: > "$TMP_OUTPUT"
: > "$EMAIL_OUTPUT"
export PATH="/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:$PATH"
parse_snapraid_conf
hc_start
log "SnapRAID Script Job started [$(date)]"
#####################################################################
# PREPROCESSING
#####################################################################
section "##Preprocessing"
# Pause Docker services to prevent file changes during sync
if (( MANAGE_SERVICES == 1 )); then
log "###Stop Services [$(date)]"
service_pause
fi
# Verify all content and parity files exist
sanity_check
#####################################################################
# PROCESSING
#####################################################################
section "##Processing"
# Check for and fix zero sub-second timestamp files
chk_zero
#
# DIFF - Analyze what has changed since last sync
# Note: snapraid diff returns rc=2 when differences are found (this is normal)
#
mark_begin "DIFF"
{
echo "###DIFF [$(date)]"
"$SNAPRAID_BIN" diff
} 2>&1 | tee -a "$TMP_OUTPUT"
DIFF_RC=${PIPESTATUS[0]}
mark_end "DIFF" "$DIFF_RC"
JOBS_DONE="DIFF"
# Handle DIFF exit code (0 or 2 are both acceptable)
if ! is_snapraid_diff_ok "$DIFF_RC"; then
HAD_FAILURE=1
log "**WARNING** DIFF returned non-zero exit code: ${DIFF_RC}"
if (( FAIL_FAST == 1 )); then
die "DIFF failed with rc=${DIFF_RC} (FAIL_FAST=1)"
fi
fi
# Extract change counts from DIFF output
get_counts
# Verify we got all required counts
if [[ -z "${DEL_COUNT:-}" || -z "${ADD_COUNT:-}" || -z "${MOVE_COUNT:-}" || -z "${COPY_COUNT:-}" || -z "${UPDATE_COUNT:-}" ]]; then
log "**ERROR** Failed to extract change counts from DIFF output. Unable to proceed safely."
persist_full_log
if [[ -n "${EMAIL_ADDRESS:-}" ]]; then
SUBJECT="${EMAIL_SUBJECT_PREFIX} WARNING - Unable to proceed with SYNC/SCRUB job(s). Check DIFF job output."
summarize_diff_for_email
beautify_email_output
send_mail
fi
hc_finish_fail 2
exit 1
fi
log
log "**SUMMARY of changes - Added [$ADD_COUNT] - Deleted [$DEL_COUNT] - Moved [$MOVE_COUNT] - Copied [$COPY_COUNT] - Updated [$UPDATE_COUNT]**"
log
#
# SYNC Decision Logic
#
if (( DEL_COUNT > 0 || ADD_COUNT > 0 || MOVE_COUNT > 0 || COPY_COUNT > 0 || UPDATE_COUNT > 0 )); then
# Changes detected - check thresholds
if (( SYNC_WARN_THRESHOLD == 0 )); then
# Always force sync when threshold is 0
DO_SYNC=1
else
# Check deletion threshold
chk_del
# Only check update threshold if deletion check passed
if (( CHK_FAIL == 0 )); then
chk_updated
fi
# If either threshold was exceeded, check if we should force sync anyway
if (( CHK_FAIL == 1 )); then
chk_sync_warn
fi
fi
else
# No changes detected
log "No changes detected. Not running SYNC job. [$(date)]"
DO_SYNC=0
fi
#
# SYNC - Update parity if authorized
#
if (( DO_SYNC == 1 )); then
run_cmd "SYNC" "$SNAPRAID_BIN" sync -q
SYNC_RC=$?
JOBS_DONE="${JOBS_DONE} + SYNC"
# Clear warning counter after successful sync authorization
[[ -e "$SYNC_WARN_FILE" ]] && rm -f "$SYNC_WARN_FILE"
fi
#
# SCRUB - Verify data integrity on a portion of the array
#
if (( SCRUB_PERCENT > 0 )); then
# Don't scrub if thresholds were exceeded and sync was skipped
if (( CHK_FAIL == 1 && DO_SYNC == 0 )); then
log "Scrub job cancelled - parity info is out of sync (threshold breached). [$(date)]"
else
# If SYNC ran, verify it completed successfully before scrubbing
if (( DO_SYNC == 1 )); then
if ! marker_end_present "SYNC"; then
log "**WARNING** SYNC end marker missing. Not proceeding with SCRUB. [$(date)]"
elif (( SYNC_RC != 0 )); then
log "**WARNING** SYNC failed with rc=${SYNC_RC}. Not proceeding with SCRUB. [$(date)]"
else
run_cmd "SCRUB" "$SNAPRAID_BIN" scrub -p "$SCRUB_PERCENT" -o "$SCRUB_AGE" -q
SCRUB_RC=$?
JOBS_DONE="${JOBS_DONE} + SCRUB"
fi
else
# No SYNC needed, safe to scrub
run_cmd "SCRUB" "$SNAPRAID_BIN" scrub -p "$SCRUB_PERCENT" -o "$SCRUB_AGE" -q
SCRUB_RC=$?
JOBS_DONE="${JOBS_DONE} + SCRUB"
fi
fi
else
log "Scrub job is not enabled (SCRUB_PERCENT=0). Skipping SCRUB. [$(date)]"
fi
#####################################################################
# POSTPROCESSING
#####################################################################
section "##Postprocessing"
#
# SMART - Log disk SMART attributes
#
if (( SMART_LOG == 1 )); then
run_cmd "SMART" "$SNAPRAID_BIN" smart
SMART_RC=$?
JOBS_DONE="${JOBS_DONE} + SMART"
fi
#
# DOWN - Spindown array disks
#
if (( SPINDOWN_DISKS == 1 )); then
run_cmd "DOWN" "$SNAPRAID_BIN" down
DOWN_RC=$?
JOBS_DONE="${JOBS_DONE} + DOWN"
else
log "Spindown disabled (SPINDOWN_DISKS=0). Skipping \`snapraid down\`."
DOWN_RC=0
fi
# Restore paused services
restore_services
log "All jobs completed. [$(date)]"
#####################################################################
# REPORTING
#####################################################################
# Save full log to disk
persist_full_log
# Prepare and send email if configured
if [[ -n "${EMAIL_ADDRESS:-}" ]]; then
prepare_mail_subject
summarize_diff_for_email
beautify_email_output
send_mail
fi
# Send healthcheck ping
if [[ "${SUBJECT:-}" == *"[WARNING]"* || $HAD_FAILURE -eq 1 || $CHK_FAIL -eq 1 ]]; then
hc_finish_fail 1
else
hc_finish_success
fi
exit 0
}
# Execute main function
main "$@"