Compare commits

..

4 commits

2 changed files with 69 additions and 38 deletions

View file

@ -1,5 +1,4 @@
Nagios plugin to check Postgres Streaming replication # Nagios plugin to check Postgres Streaming replication
=====================================================
This script could be used as Nagios check plugin to verify Postgres Streaming replication state. This script could be used as Nagios check plugin to verify Postgres Streaming replication state.
@ -13,7 +12,7 @@ This script :
- retreive master connection informations from Postgres recovery configuration file (_UNKNOWN_ raise on error). Default Postgres master TCP port will be used if port is not specify. - retreive master connection informations from Postgres recovery configuration file (_UNKNOWN_ raise on error). Default Postgres master TCP port will be used if port is not specify.
- retreive the current state and sync state of the host from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error). - retreive the current state and sync state of the host from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error).
- check if the current state of the host is "streaming" (_CRITICAL_ raise if not) - check if the current state of the host is "streaming" (_CRITICAL_ raise if not)
- check if the current sync state of the host is "sync" (_CRITICAL_ raise if not) - check if the current sync state of the host is "sync" (or the state specified using `-e` parameter, _CRITICAL_ raise if not)
- if the check of the current XLOG file of the master host is enabled : - if the check of the current XLOG file of the master host is enabled :
- retreive current _xlog_ file from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error). - retreive current _xlog_ file from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error).
- check if the current master _xlog_ file is the last received _xlog_ file (_CRITICAL_ raise if not) - check if the current master _xlog_ file is the last received _xlog_ file (_CRITICAL_ raise if not)
@ -24,22 +23,31 @@ This script :
- check if stand-by client(s) is connected (_WARNING_ raise if not) - check if stand-by client(s) is connected (_WARNING_ raise if not)
- Return _OK_ state with list and count of stand-by client(s) - Return _OK_ state with list and count of stand-by client(s)
**Note :** This script was originally write for PostgreSQL 9.1 and test on 9.1, 9.5 and 9.6 but it could be compatible with other versions of PostgreSQL. Some adjustments have been made for PostgreSQL >= 10 (without testing it). Do not hesitate to tell me how this script work with other versions and share some fix. All contributions are welcome ! **Note :** This script was originally write for PostgreSQL 9.1 and test on 9.1, 9.5, 9.6, 11, 13 and 15. Do not hesitate to tell me how this script work with other versions and share some fix. All contributions are welcome !
Requirements ## Requirements
------------
* Some CLI tools: `sudo`, `awk`, `sed`, `bc`, `psql` and `pg_lscluster` * Some CLI tools: `sudo`, `awk`, `sed`, `bc`, `psql` and `pg_lscluster`
* **On master node:** Slaves must be able to connect with user from `recovery.conf` (or user specify using `-U`) to database with the same name (or another specified with `-D`) as `trust` (or via `md5` using password specified in `~/.pgpass`). This user must have `SUPERUSER` privilege (need to get replication details). * **On master node:** Slaves must be able to connect with user from `recovery.conf` / `postgresql.auto.conf` (or user specify using `-U`) to database with the same name (or another specified with `-D`) as `trust` (or using password specified in `~/.pgpass`). This user must have `SUPERUSER` privilege (need to get replication details).
* **On standby node:** `PG_USER` must be able to connect localy on the database with the same name `(or another specified with -D)` as `trust` (or via `md5` using password specified in `~/.pgpass`). * **On standby node:** `PG_USER` must be able to connect localy on the database with the same name `(or another specified with -D)` as `trust` (or using password specified in `~/.pgpass`).
Usage ## Installation
-----
``` ```
Usage: check_pg_streaming_replication [-d] [-h] [options] apt install sudo awk sed bc postgresql-client
git clone https://gitea.zionetrix.net/bn8/check_pg_streaming_replication.git \
/usr/local/src/check_pg_streaming_replication
mkdir -p /usr/local/lib/nagios/plugins
ln -s /usr/local/src/check_pg_streaming_replication/check_pg_streaming_replication \
/usr/local/lib/nagios/plugins/check_pg_streaming_replication
```
## Usage
```
Usage: ./check_pg_streaming_replication [-d] [-h] [options]
-u pg_user Specify local Postgres user (Default: try to auto-detect or use postgres) -u pg_user Specify local Postgres user (Default: try to auto-detect or use postgres)
-b psql_bin Specify psql binary path (Default: /usr/bin/psql) -b psql_bin Specify psql binary path (Default: /usr/bin/psql)
-B pg_lsclusters_bin Specify pg_lsclusters binary path (Default: /usr/bin/pg_lsclusters) -B pg_lsclusters_bin Specify pg_lsclusters binary path (Default: /usr/bin/pg_lsclusters)
@ -47,7 +55,7 @@ Usage: check_pg_streaming_replication [-d] [-h] [options]
-m pg_main Specify Postgres main directory path (Default: try to auto-detect or use -m pg_main Specify Postgres main directory path (Default: try to auto-detect or use
/var/lib/postgresql//main) /var/lib/postgresql//main)
-r recovery_conf Specify Postgres recovery configuration file path -r recovery_conf Specify Postgres recovery configuration file path
(Default: [PG_MAIN]/recovery.conf) (Default: [PG_MAIN]/recovery.conf for PG <= 11, [PG_MAIN]/postgresql.auto.conf for PG >= 12)
-U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file) -U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file)
-p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL -p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL
port if detected or use 5432) port if detected or use 5432)
@ -57,17 +65,15 @@ Usage: check_pg_streaming_replication [-d] [-h] [options]
of the last received LSN (Default: 1) of the last received LSN (Default: 1)
-w replay_warn_delay Specify the replay warning delay in second (Default: 3) -w replay_warn_delay Specify the replay warning delay in second (Default: 3)
-c replay_crit_delay Specify the replay critical delay in second (Default: 5) -c replay_crit_delay Specify the replay critical delay in second (Default: 5)
-e expected_sync_state The expected replication state ('sync' or 'async', default: sync)
-d Debug mode -d Debug mode
-h Show this message
``` ```
Copyright ## Copyright
---------
Copyright (c) 2014-2020 Benjamin Renard Copyright (c) 2014-2024 Benjamin Renard
License ## License
-------
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version. This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

View file

@ -8,19 +8,19 @@
# #
# Some CLI tools: sudo, awk, sed, bc, psql and pg_lscluster # Some CLI tools: sudo, awk, sed, bc, psql and pg_lscluster
# #
# On master node: Slaves must be able to connect with user from recovery.conf # On master node: Slave nodes must be able to connect with user from recovery.conf /
# (or user specify using -U) to database with the same name # `postgresql.auto.conf` (or user specify using -U) to database with the same
# (or another specified with -D) as trust (or via md5 using # name (or another specified with -D) as trust (or using password specified in
# password specified in ~/.pgpass). This user must have # ~/.pgpass). This user must have SUPERUSER privilege (need to get replication
# SUPERUSER privilege (need to get replication details). # details).
# #
# On standby node: PG_USER must be able to connect localy on the database # On standby node: PG_USER must be able to connect localy on the database with the same name
# with the same name (or another specified with -D) as trust # (or another specified with -D) as trust (or using password specified in
# (or via md5 using password specified in ~/.pgpass). # ~/.pgpass).
# #
# Author: Benjamin Renard <brenard@easter-eggs.com> # Author: Benjamin Renard <brenard@easter-eggs.com>
# Date: Wed, 04 Nov 2020 15:31:13 +0100 # Date: Mon, 03 Jun 2024 15:31:29 +0200
# Source: https://gogs.zionetrix.net/bn8/check_pg_streaming_replication # Source: https://gitea.zionetrix.net/bn8/check_pg_streaming_replication
# SPDX-License-Identifier: GPL-3.0-or-later # SPDX-License-Identifier: GPL-3.0-or-later
# #
@ -34,7 +34,6 @@ PG_MAIN=""
PG_MASTER_USER="" PG_MASTER_USER=""
PSQL_BIN=/usr/bin/psql PSQL_BIN=/usr/bin/psql
PG_LSCLUSTER_BIN=/usr/bin/pg_lsclusters PG_LSCLUSTER_BIN=/usr/bin/pg_lsclusters
RECOVERY_CONF_FILENAME=recovery.conf
RECOVERY_CONF="" RECOVERY_CONF=""
PG_DEFAULT_PORT="" PG_DEFAULT_PORT=""
PG_DEFAULT_APP_NAME=$( hostname ) PG_DEFAULT_APP_NAME=$( hostname )
@ -42,10 +41,13 @@ PG_DB=""
CHECK_CUR_MASTER_LSN=1 CHECK_CUR_MASTER_LSN=1
REPLAY_WARNING_DELAY=3 REPLAY_WARNING_DELAY=3
REPLAY_CRITICAL_DELAY=5 REPLAY_CRITICAL_DELAY=5
EXPECTED_SYNC_STATE=sync
DEBUG=0 DEBUG=0
function usage () { function usage () {
ERROR="$1"
[ -n "$ERROR" ] && echo -e "$ERROR\n"
cat << EOF cat << EOF
Usage: $0 [-d] [-h] [options] Usage: $0 [-d] [-h] [options]
-u pg_user Specify local Postgres user (Default: try to auto-detect or use $DEFAULT_PG_USER) -u pg_user Specify local Postgres user (Default: try to auto-detect or use $DEFAULT_PG_USER)
@ -55,7 +57,7 @@ Usage: $0 [-d] [-h] [options]
-m pg_main Specify Postgres main directory path (Default: try to auto-detect or use -m pg_main Specify Postgres main directory path (Default: try to auto-detect or use
$DEFAULT_PG_MAIN) $DEFAULT_PG_MAIN)
-r recovery_conf Specify Postgres recovery configuration file path -r recovery_conf Specify Postgres recovery configuration file path
(Default: [PG_MAIN]/$RECOVERY_CONF_FILENAME) (Default: [PG_MAIN]/recovery.conf on PG <= 11, [PG_MAIN]/postgresql.auto.conf on PG >= 12)
-U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file) -U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file)
-p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL -p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL
port if detected or use $DEFAULT_PG_PORT) port if detected or use $DEFAULT_PG_PORT)
@ -65,13 +67,14 @@ Usage: $0 [-d] [-h] [options]
of the last received LSN (Default: $CHECK_CUR_MASTER_LSN) of the last received LSN (Default: $CHECK_CUR_MASTER_LSN)
-w replay_warn_delay Specify the replay warning delay in second (Default: $REPLAY_WARNING_DELAY) -w replay_warn_delay Specify the replay warning delay in second (Default: $REPLAY_WARNING_DELAY)
-c replay_crit_delay Specify the replay critical delay in second (Default: $REPLAY_CRITICAL_DELAY) -c replay_crit_delay Specify the replay critical delay in second (Default: $REPLAY_CRITICAL_DELAY)
-e expected_sync_state The expected replication state ('sync' or 'async', default: $EXPECTED_SYNC_STATE)
-d Debug mode -d Debug mode
-h Show this message -h Show this message
EOF EOF
exit 0 [ -n "$ERROR" ] && exit 1 || exit 0
} }
while getopts "hu:b:B:V:m:r:U:p:D:C:w:c:d" OPTION while getopts "hu:b:B:V:m:r:U:p:D:C:w:c:e:d" OPTION
do do
case $OPTION in case $OPTION in
u) u)
@ -110,6 +113,11 @@ do
c) c)
REPLAY_CRITICAL_DELAY=$OPTARG REPLAY_CRITICAL_DELAY=$OPTARG
;; ;;
e)
[ "$OPTARG" != "sync" -a "$OPTARG" != "async" ] && \
usage "Invalid expected replication state '$OPTARG'. Possible values: sync or async."
EXPECTED_SYNC_STATE=$OPTARG
;;
d) d)
DEBUG=1 DEBUG=1
;; ;;
@ -180,7 +188,10 @@ id "$PG_USER" > /dev/null 2>&1
[ ! -d "$PG_MAIN/" ] && echo "UNKNOWN: Invalid Postgres main directory path ($PG_MAIN)" && exit 3 [ ! -d "$PG_MAIN/" ] && echo "UNKNOWN: Invalid Postgres main directory path ($PG_MAIN)" && exit 3
# Check RECOVERY_CONF # Check RECOVERY_CONF
[ -z "$RECOVERY_CONF" ] && RECOVERY_CONF="$PG_MAIN/$RECOVERY_CONF_FILENAME" if [ -z "$RECOVERY_CONF" ]; then
[ $PG_VERSION -le 11 ] && RECOVERY_CONF_FILENAME="recovery.conf" || RECOVERY_CONF_FILENAME="postgresql.auto.conf"
RECOVERY_CONF="$PG_MAIN/$RECOVERY_CONF_FILENAME"
fi
# Check PG_DEFAULT_PORT # Check PG_DEFAULT_PORT
[ $( echo "$PG_DEFAULT_PORT"|grep -c -E '^[0-9]*$' ) -ne 1 ] && "UNKNOWN: Postgres default master TCP port must be an integer." && exit 3 [ $( echo "$PG_DEFAULT_PORT"|grep -c -E '^[0-9]*$' ) -ne 1 ] && "UNKNOWN: Postgres default master TCP port must be an integer." && exit 3
@ -253,7 +264,7 @@ RECOVERY_MODE=0
if [ -f $RECOVERY_CONF ] if [ -f $RECOVERY_CONF ]
then then
debug "File recovery.conf found. Hot-standby mode." debug "File recovery.conf found. Hot-standby mode."
# Check recovery mode # Check recovery mode
if [ $RECOVERY_MODE -ne 1 ] if [ $RECOVERY_MODE -ne 1 ]
then then
@ -313,8 +324,22 @@ then
M_APP_NAME=$( echo "$MASTER_CONN_INFOS"| grep 'application_name=' | sed "s/^.*application_name=[ \'\"]*\([^ \'\"]\+\)[ \'\"]*.*$/\1/" ) M_APP_NAME=$( echo "$MASTER_CONN_INFOS"| grep 'application_name=' | sed "s/^.*application_name=[ \'\"]*\([^ \'\"]\+\)[ \'\"]*.*$/\1/" )
if [ ! -n "$M_APP_NAME" ] if [ ! -n "$M_APP_NAME" ]
then then
debug "Master application name not specified, use default: $PG_DEFAULT_APP_NAME" if [ $PG_VERSION -ge 12 ]
M_APP_NAME=$PG_DEFAULT_APP_NAME then
debug "Master application name not specified, use cluster_name if defined"
CLUSTER_NAME=$( psql_get "SELECT current_setting('cluster_name')" )
debug "Cluster name: $CLUSTER_NAME"
if [ -n "$CLUSTER_NAME" ]
then
M_APP_NAME=$CLUSTER_NAME
else
debug "Cluster name not defined, use default: $PG_DEFAULT_APP_NAME"
M_APP_NAME=$PG_DEFAULT_APP_NAME
fi
else
debug "Master application name not specified, use default: $PG_DEFAULT_APP_NAME"
M_APP_NAME=$PG_DEFAULT_APP_NAME
fi
else else
debug "Master application name: $M_APP_NAME" debug "Master application name: $M_APP_NAME"
fi fi
@ -338,9 +363,9 @@ then
M_CUR_SYNC_STATE=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f2 ) M_CUR_SYNC_STATE=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f2 )
debug "Master current sync state: $M_CUR_SYNC_STATE" debug "Master current sync state: $M_CUR_SYNC_STATE"
if [ "$M_CUR_SYNC_STATE" != "sync" ] if [ "$M_CUR_SYNC_STATE" != "$EXPECTED_SYNC_STATE" ]
then then
echo "CRITICAL: this host is not synchronized according to master host (current sync state = '$M_CUR_SYNC_STATE')" echo "CRITICAL: unexpected replication state '$M_CUR_SYNC_STATE' (expected state = '$EXPECTED_SYNC_STATE')"
exit 2 exit 2
fi fi