Compare commits

..

4 commits

2 changed files with 69 additions and 38 deletions

View file

@ -1,5 +1,4 @@
Nagios plugin to check Postgres Streaming replication
=====================================================
# Nagios plugin to check Postgres Streaming replication
This script could be used as Nagios check plugin to verify Postgres Streaming replication state.
@ -13,7 +12,7 @@ This script :
- retreive master connection informations from Postgres recovery configuration file (_UNKNOWN_ raise on error). Default Postgres master TCP port will be used if port is not specify.
- retreive the current state and sync state of the host from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error).
- check if the current state of the host is "streaming" (_CRITICAL_ raise if not)
- check if the current sync state of the host is "sync" (_CRITICAL_ raise if not)
- check if the current sync state of the host is "sync" (or the state specified using `-e` parameter, _CRITICAL_ raise if not)
- if the check of the current XLOG file of the master host is enabled :
- retreive current _xlog_ file from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error).
- check if the current master _xlog_ file is the last received _xlog_ file (_CRITICAL_ raise if not)
@ -24,22 +23,31 @@ This script :
- check if stand-by client(s) is connected (_WARNING_ raise if not)
- Return _OK_ state with list and count of stand-by client(s)
**Note :** This script was originally write for PostgreSQL 9.1 and test on 9.1, 9.5 and 9.6 but it could be compatible with other versions of PostgreSQL. Some adjustments have been made for PostgreSQL >= 10 (without testing it). Do not hesitate to tell me how this script work with other versions and share some fix. All contributions are welcome !
**Note :** This script was originally write for PostgreSQL 9.1 and test on 9.1, 9.5, 9.6, 11, 13 and 15. Do not hesitate to tell me how this script work with other versions and share some fix. All contributions are welcome !
Requirements
------------
## Requirements
* Some CLI tools: `sudo`, `awk`, `sed`, `bc`, `psql` and `pg_lscluster`
* **On master node:** Slaves must be able to connect with user from `recovery.conf` (or user specify using `-U`) to database with the same name (or another specified with `-D`) as `trust` (or via `md5` using password specified in `~/.pgpass`). This user must have `SUPERUSER` privilege (need to get replication details).
* **On master node:** Slaves must be able to connect with user from `recovery.conf` / `postgresql.auto.conf` (or user specify using `-U`) to database with the same name (or another specified with `-D`) as `trust` (or using password specified in `~/.pgpass`). This user must have `SUPERUSER` privilege (need to get replication details).
* **On standby node:** `PG_USER` must be able to connect localy on the database with the same name `(or another specified with -D)` as `trust` (or via `md5` using password specified in `~/.pgpass`).
* **On standby node:** `PG_USER` must be able to connect localy on the database with the same name `(or another specified with -D)` as `trust` (or using password specified in `~/.pgpass`).
Usage
-----
## Installation
```
Usage: check_pg_streaming_replication [-d] [-h] [options]
apt install sudo awk sed bc postgresql-client
git clone https://gitea.zionetrix.net/bn8/check_pg_streaming_replication.git \
/usr/local/src/check_pg_streaming_replication
mkdir -p /usr/local/lib/nagios/plugins
ln -s /usr/local/src/check_pg_streaming_replication/check_pg_streaming_replication \
/usr/local/lib/nagios/plugins/check_pg_streaming_replication
```
## Usage
```
Usage: ./check_pg_streaming_replication [-d] [-h] [options]
-u pg_user Specify local Postgres user (Default: try to auto-detect or use postgres)
-b psql_bin Specify psql binary path (Default: /usr/bin/psql)
-B pg_lsclusters_bin Specify pg_lsclusters binary path (Default: /usr/bin/pg_lsclusters)
@ -47,7 +55,7 @@ Usage: check_pg_streaming_replication [-d] [-h] [options]
-m pg_main Specify Postgres main directory path (Default: try to auto-detect or use
/var/lib/postgresql//main)
-r recovery_conf Specify Postgres recovery configuration file path
(Default: [PG_MAIN]/recovery.conf)
(Default: [PG_MAIN]/recovery.conf for PG <= 11, [PG_MAIN]/postgresql.auto.conf for PG >= 12)
-U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file)
-p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL
port if detected or use 5432)
@ -57,17 +65,15 @@ Usage: check_pg_streaming_replication [-d] [-h] [options]
of the last received LSN (Default: 1)
-w replay_warn_delay Specify the replay warning delay in second (Default: 3)
-c replay_crit_delay Specify the replay critical delay in second (Default: 5)
-e expected_sync_state The expected replication state ('sync' or 'async', default: sync)
-d Debug mode
-h Show this message
```
Copyright
---------
## Copyright
Copyright (c) 2014-2020 Benjamin Renard
Copyright (c) 2014-2024 Benjamin Renard
License
-------
## License
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

View file

@ -8,19 +8,19 @@
#
# Some CLI tools: sudo, awk, sed, bc, psql and pg_lscluster
#
# On master node: Slaves must be able to connect with user from recovery.conf
# (or user specify using -U) to database with the same name
# (or another specified with -D) as trust (or via md5 using
# password specified in ~/.pgpass). This user must have
# SUPERUSER privilege (need to get replication details).
# On master node: Slave nodes must be able to connect with user from recovery.conf /
# `postgresql.auto.conf` (or user specify using -U) to database with the same
# name (or another specified with -D) as trust (or using password specified in
# ~/.pgpass). This user must have SUPERUSER privilege (need to get replication
# details).
#
# On standby node: PG_USER must be able to connect localy on the database
# with the same name (or another specified with -D) as trust
# (or via md5 using password specified in ~/.pgpass).
# On standby node: PG_USER must be able to connect localy on the database with the same name
# (or another specified with -D) as trust (or using password specified in
# ~/.pgpass).
#
# Author: Benjamin Renard <brenard@easter-eggs.com>
# Date: Wed, 04 Nov 2020 15:31:13 +0100
# Source: https://gogs.zionetrix.net/bn8/check_pg_streaming_replication
# Date: Mon, 03 Jun 2024 15:31:29 +0200
# Source: https://gitea.zionetrix.net/bn8/check_pg_streaming_replication
# SPDX-License-Identifier: GPL-3.0-or-later
#
@ -34,7 +34,6 @@ PG_MAIN=""
PG_MASTER_USER=""
PSQL_BIN=/usr/bin/psql
PG_LSCLUSTER_BIN=/usr/bin/pg_lsclusters
RECOVERY_CONF_FILENAME=recovery.conf
RECOVERY_CONF=""
PG_DEFAULT_PORT=""
PG_DEFAULT_APP_NAME=$( hostname )
@ -42,10 +41,13 @@ PG_DB=""
CHECK_CUR_MASTER_LSN=1
REPLAY_WARNING_DELAY=3
REPLAY_CRITICAL_DELAY=5
EXPECTED_SYNC_STATE=sync
DEBUG=0
function usage () {
ERROR="$1"
[ -n "$ERROR" ] && echo -e "$ERROR\n"
cat << EOF
Usage: $0 [-d] [-h] [options]
-u pg_user Specify local Postgres user (Default: try to auto-detect or use $DEFAULT_PG_USER)
@ -55,7 +57,7 @@ Usage: $0 [-d] [-h] [options]
-m pg_main Specify Postgres main directory path (Default: try to auto-detect or use
$DEFAULT_PG_MAIN)
-r recovery_conf Specify Postgres recovery configuration file path
(Default: [PG_MAIN]/$RECOVERY_CONF_FILENAME)
(Default: [PG_MAIN]/recovery.conf on PG <= 11, [PG_MAIN]/postgresql.auto.conf on PG >= 12)
-U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file)
-p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL
port if detected or use $DEFAULT_PG_PORT)
@ -65,13 +67,14 @@ Usage: $0 [-d] [-h] [options]
of the last received LSN (Default: $CHECK_CUR_MASTER_LSN)
-w replay_warn_delay Specify the replay warning delay in second (Default: $REPLAY_WARNING_DELAY)
-c replay_crit_delay Specify the replay critical delay in second (Default: $REPLAY_CRITICAL_DELAY)
-e expected_sync_state The expected replication state ('sync' or 'async', default: $EXPECTED_SYNC_STATE)
-d Debug mode
-h Show this message
EOF
exit 0
[ -n "$ERROR" ] && exit 1 || exit 0
}
while getopts "hu:b:B:V:m:r:U:p:D:C:w:c:d" OPTION
while getopts "hu:b:B:V:m:r:U:p:D:C:w:c:e:d" OPTION
do
case $OPTION in
u)
@ -110,6 +113,11 @@ do
c)
REPLAY_CRITICAL_DELAY=$OPTARG
;;
e)
[ "$OPTARG" != "sync" -a "$OPTARG" != "async" ] && \
usage "Invalid expected replication state '$OPTARG'. Possible values: sync or async."
EXPECTED_SYNC_STATE=$OPTARG
;;
d)
DEBUG=1
;;
@ -180,7 +188,10 @@ id "$PG_USER" > /dev/null 2>&1
[ ! -d "$PG_MAIN/" ] && echo "UNKNOWN: Invalid Postgres main directory path ($PG_MAIN)" && exit 3
# Check RECOVERY_CONF
[ -z "$RECOVERY_CONF" ] && RECOVERY_CONF="$PG_MAIN/$RECOVERY_CONF_FILENAME"
if [ -z "$RECOVERY_CONF" ]; then
[ $PG_VERSION -le 11 ] && RECOVERY_CONF_FILENAME="recovery.conf" || RECOVERY_CONF_FILENAME="postgresql.auto.conf"
RECOVERY_CONF="$PG_MAIN/$RECOVERY_CONF_FILENAME"
fi
# Check PG_DEFAULT_PORT
[ $( echo "$PG_DEFAULT_PORT"|grep -c -E '^[0-9]*$' ) -ne 1 ] && "UNKNOWN: Postgres default master TCP port must be an integer." && exit 3
@ -313,8 +324,22 @@ then
M_APP_NAME=$( echo "$MASTER_CONN_INFOS"| grep 'application_name=' | sed "s/^.*application_name=[ \'\"]*\([^ \'\"]\+\)[ \'\"]*.*$/\1/" )
if [ ! -n "$M_APP_NAME" ]
then
debug "Master application name not specified, use default: $PG_DEFAULT_APP_NAME"
M_APP_NAME=$PG_DEFAULT_APP_NAME
if [ $PG_VERSION -ge 12 ]
then
debug "Master application name not specified, use cluster_name if defined"
CLUSTER_NAME=$( psql_get "SELECT current_setting('cluster_name')" )
debug "Cluster name: $CLUSTER_NAME"
if [ -n "$CLUSTER_NAME" ]
then
M_APP_NAME=$CLUSTER_NAME
else
debug "Cluster name not defined, use default: $PG_DEFAULT_APP_NAME"
M_APP_NAME=$PG_DEFAULT_APP_NAME
fi
else
debug "Master application name not specified, use default: $PG_DEFAULT_APP_NAME"
M_APP_NAME=$PG_DEFAULT_APP_NAME
fi
else
debug "Master application name: $M_APP_NAME"
fi
@ -338,9 +363,9 @@ then
M_CUR_SYNC_STATE=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f2 )
debug "Master current sync state: $M_CUR_SYNC_STATE"
if [ "$M_CUR_SYNC_STATE" != "sync" ]
if [ "$M_CUR_SYNC_STATE" != "$EXPECTED_SYNC_STATE" ]
then
echo "CRITICAL: this host is not synchronized according to master host (current sync state = '$M_CUR_SYNC_STATE')"
echo "CRITICAL: unexpected replication state '$M_CUR_SYNC_STATE' (expected state = '$EXPECTED_SYNC_STATE')"
exit 2
fi