Compare commits

...

22 commits

Author SHA1 Message Date
Benjamin Renard d33476ce8d Update comment and README.md file 2020-11-04 19:19:22 +01:00
Benjamin Renard 0443f56b1d Improve LSN master/slave checks 2020-11-04 16:23:50 +01:00
Benjamin Renard d4cbdb3c79 Remove spaces before colon (:) 2020-11-04 15:41:06 +01:00
Benjamin Renard c8e6c80dc0 Update comment and README.md file 2020-11-04 15:37:20 +01:00
Benjamin Renard ea72d09399 Adjust recovery control function names for PG > 10 2020-11-04 15:16:41 +01:00
Benjamin Renard 50c3363cbc Improve master node output message 2020-11-04 15:04:57 +01:00
Benjamin Renard dc07eccd09 Improve PostgreSQL installation parameters auto-detection 2020-11-04 15:01:59 +01:00
Benjamin Renard b092186b89 Make the check more adjustable to allow some delay between xlog files sent by master, received and replayed 2020-02-20 11:32:58 +01:00
Benjamin Renard 327f382b30 Improve README file 2019-03-15 17:21:18 +01:00
Benjamin Renard 914e33d335 Use sudo instead of su command to run command as postgres user 2019-03-15 17:15:22 +01:00
Benjamin Renard 950e21be0e Improve README and convert to markdown 2019-03-15 16:35:31 +01:00
Benjamin Renard 54b9b79023 Improve debug mode about master SQL request 2019-03-15 16:30:20 +01:00
Benjamin Renard f883216050 Improve master connection informations detection reliability 2019-03-15 16:29:07 +01:00
Benjamin Renard be30219100 Improve RECOVERY_MODE detection reliability 2019-03-15 16:28:39 +01:00
Benjamin Renard 99bacd9979 Improve debug mode 2019-03-15 16:27:42 +01:00
Benjamin Renard a33e0bfabc Fix handling -D parameter 2019-03-15 16:26:05 +01:00
Benjamin Renard 40ecc78570 Wording 2019-03-15 16:23:37 +01:00
Benjamin Renard 690811ccfb Add -U parameter to provide user to use on master host 2018-01-30 17:45:15 +01:00
Benjamin Renard 4efda74bf6 Improve help message and docs 2017-08-25 16:19:43 +02:00
Benjamin Renard 3966aac8c1 Try to auto-detect PG_MAIN directory. 2017-08-25 16:19:15 +02:00
Benjamin Renard d8a4cfac51 Use PG_DB also on slave and use PG_USER as default value. 2017-08-25 16:18:51 +02:00
Benjamin Renard 22bbb4223b Add -D parameter and retreive master user in recovery.conf file 2017-08-25 15:12:05 +02:00
3 changed files with 388 additions and 150 deletions

74
README
View file

@ -1,74 +0,0 @@
Nagios plugin to check Postgres Streaming replication
=====================================================
This script could be used as Nagios check plugin to verify Postgres Streaming
replication state.
This script :
- check if Postgres is running (CRITICAL raise if not)
- check if Postgres is in recovery mode :
- if Postgres is in recovery mode :
- retreive from Postgres the last xlog file receive and the xlog file replay
- check if Postgres recovery configuration file is NOT present (CRITICAL
raise if present)
- retreive master connection informations from Postgres recovery configuration
file (UNKNOWN raise on error). Default Postgres master TCP port will be used
if port is not specify.
- retreive current xlog file from Postgres master server (UNKNOWN raise on error).
- check if the last receive xlog file is the last replay xlog file (WARNING raise if not)
- Return OK state
- if Postgres is not in recovery mode :
- check if Postgres recovery configuration file is present (CRITICAL raise if present)
- check if stand-by client(s) is connected (WARNING raise if not)
- Return OK state with list and count of stand-by client(s)
Note : This script was originally write and test for PostgreSQL 9.1, but it could be compatible
with other versions of PostgreSQL. Do not hesitate to tell me how this script work with other
versions and share some fix. All contributions are welcome !
Requirements
------------
* On master node :
Slaves node must be able to connect with user PG_USER to database postgres as trust.
* On standby node :
PG_USER must be able to connect localy as trust
Usage
-----
Usage : ./check_pg_streaming_replication [-h] [-d] [options]
-u pg_user Specify Postgres user (Default : postgres)
-b psql_bin Specify psql binary path (Default : /usr/bin/psql)
-m pg_main Specify Postgres main directory path
(Default : /var/lib/postgresql/9.1/main)
-r recovery_conf Specify Postgres recovery configuration file path
(Default : /var/lib/postgresql/9.1/main/recovery.conf)
-p pg_port Specify default Postgres master TCP port (Default : 5432)
-d Debug mode
-h Show this message
Copyright
---------
Copyright (c) 2013 Benjamin Renard
License
-------
This program is free software; you can redistribute it and/or
modify it under the terms of the GNU General Public License version 2
as published by the Free Software Foundation.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

76
README.md Normal file
View file

@ -0,0 +1,76 @@
Nagios plugin to check Postgres Streaming replication
=====================================================
This script could be used as Nagios check plugin to verify Postgres Streaming replication state.
This script :
- check if Postgres is running (_CRITICAL_ raise if not)
- check if Postgres is in recovery mode :
- if Postgres is in recovery mode :
- retreive from Postgres the last _xlog_ file receive and the _xlog_ file replay
- check if Postgres recovery configuration file is NOT present (_CRITICAL_ raise if present)
- retreive master connection informations from Postgres recovery configuration file (_UNKNOWN_ raise on error). Default Postgres master TCP port will be used if port is not specify.
- retreive the current state and sync state of the host from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error).
- check if the current state of the host is "streaming" (_CRITICAL_ raise if not)
- check if the current sync state of the host is "sync" (_CRITICAL_ raise if not)
- if the check of the current XLOG file of the master host is enabled :
- retreive current _xlog_ file from Postgres master server by making a connection on master server (_UNKNOWN_ raise on error).
- check if the current master _xlog_ file is the last received _xlog_ file (_CRITICAL_ raise if not)
- check if the last received _xlog_ file is the last replay _xlog_ file : if not, check the current delay with the last replayed transaction against _replay_warn_delay_ and _replay_crit_delay_ thresholds and raise corresponding error if they are exceeded
- Return _OK_ state
- if Postgres is not in recovery mode :
- check if Postgres recovery configuration file is present (_CRITICAL_ raise if present)
- check if stand-by client(s) is connected (_WARNING_ raise if not)
- Return _OK_ state with list and count of stand-by client(s)
**Note :** This script was originally write for PostgreSQL 9.1 and test on 9.1, 9.5 and 9.6 but it could be compatible with other versions of PostgreSQL. Some adjustments have been made for PostgreSQL >= 10 (without testing it). Do not hesitate to tell me how this script work with other versions and share some fix. All contributions are welcome !
Requirements
------------
* Some CLI tools: `sudo`, `awk`, `sed`, `bc`, `psql` and `pg_lscluster`
* **On master node:** Slaves must be able to connect with user from `recovery.conf` (or user specify using `-U`) to database with the same name (or another specified with `-D`) as `trust` (or via `md5` using password specified in `~/.pgpass`). This user must have `SUPERUSER` privilege (need to get replication details).
* **On standby node:** `PG_USER` must be able to connect localy on the database with the same name `(or another specified with -D)` as `trust` (or via `md5` using password specified in `~/.pgpass`).
Usage
-----
```
Usage: check_pg_streaming_replication [-d] [-h] [options]
-u pg_user Specify local Postgres user (Default: try to auto-detect or use postgres)
-b psql_bin Specify psql binary path (Default: /usr/bin/psql)
-B pg_lsclusters_bin Specify pg_lsclusters binary path (Default: /usr/bin/pg_lsclusters)
-V pg_version Specify Postgres version (Default: try to auto-detect or use 9.1)
-m pg_main Specify Postgres main directory path (Default: try to auto-detect or use
/var/lib/postgresql//main)
-r recovery_conf Specify Postgres recovery configuration file path
(Default: [PG_MAIN]/recovery.conf)
-U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file)
-p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL
port if detected or use 5432)
-D dbname Specify DB name on Postgres master/slave to connect on (Default: PG_USER, must
match with .pgpass one is used)
-C 1/0 Enable or disable check if the current LSN of the master host is the same
of the last received LSN (Default: 1)
-w replay_warn_delay Specify the replay warning delay in second (Default: 3)
-c replay_crit_delay Specify the replay critical delay in second (Default: 5)
-d Debug mode
-h Show this message
```
Copyright
---------
Copyright (c) 2014-2020 Benjamin Renard
License
-------
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <https://www.gnu.org/licenses/>.

View file

@ -1,51 +1,77 @@
#!/bin/bash
#
# Nagios plugin to check Postgresql streamin replication state
#
#
# Could be use on Master or on standby node
#
# Requirement :
# Requirements:
#
# On master node : Slaves must be able to connect with user PG_USER
# to database postgres as trust
# Some CLI tools: sudo, awk, sed, bc, psql and pg_lscluster
#
# On standby node : PG_USER must be able to connect localy as trust
# On master node: Slaves must be able to connect with user from recovery.conf
# (or user specify using -U) to database with the same name
# (or another specified with -D) as trust (or via md5 using
# password specified in ~/.pgpass). This user must have
# SUPERUSER privilege (need to get replication details).
#
# Author : Benjamin Renard <brenard@easter-eggs.com>
# Date : Wed, 14 Mar 2012 14:45:55 +0000
# Source : http://git.zionetrix.net/check_pg_streaming_replication
# On standby node: PG_USER must be able to connect localy on the database
# with the same name (or another specified with -D) as trust
# (or via md5 using password specified in ~/.pgpass).
#
# Author: Benjamin Renard <brenard@easter-eggs.com>
# Date: Wed, 04 Nov 2020 15:31:13 +0100
# Source: https://gogs.zionetrix.net/bn8/check_pg_streaming_replication
# SPDX-License-Identifier: GPL-3.0-or-later
#
PG_USER=postgres
DEFAULT_PG_USER=postgres
DEFAULT_PG_VERSION=9.1
DEFAULT_PG_MAIN=/var/lib/postgresql/$PG_VERSION/main
DEFAULT_PG_PORT=5432
PG_USER=""
PG_VERSION=""
PG_MAIN=""
PG_MASTER_USER=""
PSQL_BIN=/usr/bin/psql
PG_MAIN=/var/lib/postgresql/9.1/main
if [ -f /etc/redhat-release ]
then
PG_MAIN=/var/lib/pgsql/9.1/data
fi
PG_LSCLUSTER_BIN=/usr/bin/pg_lsclusters
RECOVERY_CONF_FILENAME=recovery.conf
RECOVERY_CONF=""
PG_DEFAULT_PORT=5432
PG_DEFAULT_PORT=""
PG_DEFAULT_APP_NAME=$( hostname )
PG_DB=""
CHECK_CUR_MASTER_LSN=1
REPLAY_WARNING_DELAY=3
REPLAY_CRITICAL_DELAY=5
DEBUG=0
function usage () {
cat << EOF
Usage : $0 [-d] [-h] [options]
-u pg_user Specify Postgres user (Default : $PG_USER)
-b psql_bin Specify psql binary path (Default : $PSQL_BIN)
-m pg_main Specify Postgres main directory path
(Default : $PG_MAIN)
Usage: $0 [-d] [-h] [options]
-u pg_user Specify local Postgres user (Default: try to auto-detect or use $DEFAULT_PG_USER)
-b psql_bin Specify psql binary path (Default: $PSQL_BIN)
-B pg_lsclusters_bin Specify pg_lsclusters binary path (Default: $PG_LSCLUSTER_BIN)
-V pg_version Specify Postgres version (Default: try to auto-detect or use $DEFAULT_PG_VERSION)
-m pg_main Specify Postgres main directory path (Default: try to auto-detect or use
$DEFAULT_PG_MAIN)
-r recovery_conf Specify Postgres recovery configuration file path
(Default : $PG_MAIN/$RECOVERY_CONF_FILENAME)
-p pg_port Specify default Postgres master TCP port (Default : $PG_DEFAULT_PORT)
(Default: [PG_MAIN]/$RECOVERY_CONF_FILENAME)
-U pg_master_user Specify Postgres user to use on master (Default: user from recovery.conf file)
-p pg_port Specify default Postgres master TCP port (Default: same as local PostgreSQL
port if detected or use $DEFAULT_PG_PORT)
-D dbname Specify DB name on Postgres master/slave to connect on (Default: PG_USER, must
match with .pgpass one is used)
-C 1/0 Enable or disable check if the current LSN of the master host is the same
of the last received LSN (Default: $CHECK_CUR_MASTER_LSN)
-w replay_warn_delay Specify the replay warning delay in second (Default: $REPLAY_WARNING_DELAY)
-c replay_crit_delay Specify the replay critical delay in second (Default: $REPLAY_CRITICAL_DELAY)
-d Debug mode
-h Show this message
EOF
exit 0
}
while getopts "hu:b:m:r:p:d" OPTION
while getopts "hu:b:B:V:m:r:U:p:D:C:w:c:d" OPTION
do
case $OPTION in
u)
@ -54,15 +80,36 @@ do
b)
PSQL_BIN=$OPTARG
;;
B)
PG_LSCLUSTER_BIN=$OPTARG
;;
V)
PG_VERSION=$OPTARG
;;
m)
PG_MAIN=$OPTARG
;;
r)
RECOVERY_CONF=$OPTARG
;;
U)
PG_MASTER_USER=$OPTARG
;;
p)
PG_DEFAULT_PORT=$OPTARG
;;
D)
PG_DB=$OPTARG
;;
C)
CHECK_CUR_MASTER_LSN=$OPTARG
;;
w)
REPLAY_WARNING_DELAY=$OPTARG
;;
c)
REPLAY_CRITICAL_DELAY=$OPTARG
;;
d)
DEBUG=1
;;
@ -75,40 +122,116 @@ do
esac
done
function debug() {
if [ $DEBUG -eq 1 ]
then
>&2 echo -e "[DEBUG] $1"
fi
}
debug "Starting options (before handling auto-detection/default values):
PG_VERSION = $PG_VERSION
PG_DB = $PG_DB
PG_USER = $PG_USER
PSQL_BIN = $PSQL_BIN
PG_LSCLUSTER_BIN = $PG_LSCLUSTER_BIN
PG_MAIN = $PG_MAIN
RECOVERY_CONF = $RECOVERY_CONF
PG_DEFAULT_PORT = $PG_DEFAULT_PORT
PG_DEFAULT_APP_NAME = $PG_DEFAULT_APP_NAME
CHECK_CUR_MASTER_LSN = $CHECK_CUR_MASTER_LSN
REPLAY_WARNING_DELAY = $REPLAY_WARNING_DELAY
REPLAY_CRITICAL_DELAY = $REPLAY_CRITICAL_DELAY
"
# Auto-detect PostgreSQL information using pg_lsclusters
if [ -x "$PG_LSCLUSTER_BIN" ]
then
PG_CLUSTER=$( $PG_LSCLUSTER_BIN -h 2>/dev/null|head -n1 )
if [ -n "$PG_CLUSTER" ]
then
debug "pg_lsclusters output:\n\t$PG_CLUSTER"
# Output example:
# 9.6 main 5432 online,recovery postgres /var/lib/postgresql/9.6/main /var/log/postgresql/postgresql-9.6-main.log
[ -z "$PG_VERSION" ] && PG_VERSION=$( echo "$PG_CLUSTER"|awk -F ' +' '{print $1}' )
[ -z "$PG_DEFAULT_PORT" ] && PG_DEFAULT_PORT=$( echo "$PG_CLUSTER"|awk -F ' +' '{print $3}' )
[ -z "$PG_USER" ] && PG_USER=$( echo "$PG_CLUSTER"|awk -F ' +' '{print $5}' )
[ -z "$PG_MAIN" ] && PG_MAIN=$( echo "$PG_CLUSTER"|awk -F ' +' '{print $6}' )
fi
else
debug "pg_lsclusters not found ($PG_LSCLUSTER_BIN): parameters auto-detection disabled"
fi
# If auto-detection failed, use default values
[ -z "$PG_USER" ] && PG_USER="$DEFAULT_PG_USER"
[ -z "$PG_VERSION" ] && PG_VERSION="$DEFAULT_PG_VERSION"
[ -z "$PG_MAIN" ] && PG_MAIN="$DEFAULT_PG_MAIN"
[ -z "$PG_DEFAULT_PORT" ] && PG_DEFAULT_PORT="$DEFAULT_PG_PORT"
# Check PG_USER
[ -z "$PG_USER" ] && echo "UNKNOWN : Postgres user not specify" && exit 3
[ -z "$PG_USER" ] && echo "UNKNOWN: Postgres user not specified" && exit 3
id "$PG_USER" > /dev/null 2>&1
[ $? -ne 0 ] && echo "UNKNOWN : Invalid Postgres user ($PG_USER)" && exit 3
[ $? -ne 0 ] && echo "UNKNOWN: Invalid Postgres user ($PG_USER)" && exit 3
# Check PSQL_BIN
[ ! -x "$PSQL_BIN" ] && echo "UNKNOWN : Invalid psql bin path ($PSQL_BIN)" && exit 3
[ ! -x "$PSQL_BIN" ] && echo "UNKNOWN: Invalid psql bin path ($PSQL_BIN)" && exit 3
# Check PG_MAIN
[ ! -d "$PG_MAIN/" ] && echo "UNKNOWN : Invalid Postgres main directory path ($PG_MAIN)" && exit 3
[ ! -d "$PG_MAIN/" ] && echo "UNKNOWN: Invalid Postgres main directory path ($PG_MAIN)" && exit 3
# Check RECOVERY_CONF
[ -z "$RECOVERY_CONF" ] && RECOVERY_CONF="$PG_MAIN/$RECOVERY_CONF_FILENAME"
# Check PG_DEFAULT_PORT
[ $( echo "$PG_DEFAULT_PORT"|grep -c -E '^[0-9]*$' ) -ne 1 ] && "UNKNOWN : Postgres default master TCP port must be an integer." && exit 3
[ $( echo "$PG_DEFAULT_PORT"|grep -c -E '^[0-9]*$' ) -ne 1 ] && "UNKNOWN: Postgres default master TCP port must be an integer." && exit 3
# If PG_DB is not provided with -D parameter, use PG_USER as default value
[ -z "$PG_DB" ] && PG_DB="$PG_USER"
function psql_get () {
echo "$1"|su - $PG_USER -c "$PSQL_BIN -t -P format=unaligned"
sql="$1"
debug "Exec 'echo \"$sql\"|sudo -u $PG_USER $PSQL_BIN -d \"$PG_DB\" -w -t -P format=unaligned"
echo "$sql"|sudo -u $PG_USER $PSQL_BIN -d "$PG_DB" -w -t -P format=unaligned
}
function debug() {
if [ $DEBUG -eq 1 ]
then
echo "[DEBUG] $1"
fi
function psql_master_get () {
sql="$1"
debug "Exec 'echo \"$sql\"|sudo -u $PG_USER $PSQL_BIN -U $M_USER -h $M_HOST -w -p $M_PORT -d $PG_DB -t -P format=unaligned"
echo "$sql"|sudo -u $PG_USER $PSQL_BIN -U $M_USER -h $M_HOST -w -p $M_PORT -d $PG_DB -t -P format=unaligned
}
debug "Running options :
debug "Running options:
PG_VERSION = $PG_VERSION
PG_DB = $PG_DB
PG_USER = $PG_USER
PSQL_BIN = $PSQL_BIN
PG_LSCLUSTER_BIN = $PG_LSCLUSTER_BIN
PG_MAIN = $PG_MAIN
RECOVERY_CONF = $RECOVERY_CONF
PG_DEFAULT_PORT = $PG_DEFAULT_PORT"
PG_DEFAULT_PORT = $PG_DEFAULT_PORT
PG_DEFAULT_APP_NAME = $PG_DEFAULT_APP_NAME
CHECK_CUR_MASTER_LSN = $CHECK_CUR_MASTER_LSN
REPLAY_WARNING_DELAY = $REPLAY_WARNING_DELAY
REPLAY_CRITICAL_DELAY = $REPLAY_CRITICAL_DELAY
"
# Set some stuff to PostgreSQL version
if [ $( echo "$PG_VERSION < 10" |bc -l ) -eq 1 ]
then
pg_last_wal_receive_lsn='pg_last_xlog_receive_location()'
pg_last_wal_replay_lsn='pg_last_xlog_replay_location()'
pg_current_wal_lsn='pg_current_xlog_location()'
pg_wal_lsn_diff='pg_xlog_location_diff'
sent_lsn='sent_location'
write_lsn='write_location'
else
pg_last_wal_receive_lsn='pg_last_wal_receive_lsn()'
pg_last_wal_replay_lsn='pg_last_wal_replay_lsn()'
pg_current_wal_lsn='pg_current_wal_lsn()'
pg_wal_lsn_diff='pg_wal_lsn_diff'
sent_lsn='sent_lsn'
write_lsn='write_lsn'
fi
# Postgres is running ?
if [ $DEBUG -eq 0 ]
@ -119,13 +242,13 @@ else
fi
if [ $? -ne 0 ]
then
echo "CRITICAL : Postgres is not running !"
echo "CRITICAL: Postgres is not running !"
exit 2
fi
debug "Postgres is running"
RECOVERY_MODE=0
[ $( psql_get 'SELECT pg_is_in_recovery();' ) == "t" ] && RECOVERY_MODE=1
[ "$( psql_get 'SELECT pg_is_in_recovery();' )" == "t" ] && RECOVERY_MODE=1
if [ -f $RECOVERY_CONF ]
then
@ -134,69 +257,148 @@ then
# Check recovery mode
if [ $RECOVERY_MODE -ne 1 ]
then
echo "CRITICAL : Not in recovery mode while recovery.conf file found !"
echo "CRITICAL: Not in recovery mode while recovery.conf file found !"
exit 2
fi
debug "Postgres is in recovery mode"
LAST_XLOG_RECEIVE=$( psql_get "SELECT pg_last_xlog_receive_location()" )
debug "Last xlog file receive : $LAST_XLOG_RECEIVE"
LAST_XLOG_REPLAY=$( psql_get "SELECT pg_last_xlog_replay_location()" )
debug "Last xlog file replay : $LAST_XLOG_REPLAY"
# Get local current last received/replayed LSN
LAST_RECEIVED_LSN=$( psql_get "SELECT $pg_last_wal_receive_lsn" )
debug "Last received LSN: $LAST_RECEIVED_LSN"
LAST_REPLAYED_LSN=$( psql_get "SELECT $pg_last_wal_replay_lsn" )
debug "Last replayed LSN: $LAST_REPLAYED_LSN"
# Get master connection informations from recovery.conf file
MASTER_CONN_INFOS=$( egrep '^ *primary_conninfo' $RECOVERY_CONF|sed "s/^ *primary_conninfo *= *[\"\']\([^\"\']*\)[\"\'].*$/\1/" )
MASTER_CONN_INFOS=$( egrep '^ *primary_conninfo' $RECOVERY_CONF|sed "s/^ *primary_conninfo *= *\(.\+\) *$/\1/" )
if [ ! -n "$MASTER_CONN_INFOS" ]
then
echo "UNKNOWN : Can't retreive master connection informations form recovery.conf file"
echo "UNKNOWN: Can't retreive master connection informations form recovery.conf file"
exit 3
fi
debug "Master connection informations : $MASTER_CONN_INFOS"
debug "Master connection informations: $MASTER_CONN_INFOS"
M_HOST=$( echo "$MASTER_CONN_INFOS"|sed 's/^.*host= *\([^ ]*\) *.*$/\1/' )
M_HOST=$( echo "$MASTER_CONN_INFOS"| grep 'host=' | sed 's/^.*host= *\([0-9a-zA-Z.-]\+\) *.*$/\1/' )
if [ ! -n "$M_HOST" ]
then
echo "UNKNOWN : Can't retreive master host from recovery.conf file"
echo "UNKNOWN: Can't retreive master host from recovery.conf file"
exit 3
fi
debug "Master host : $M_HOST"
debug "Master host: $M_HOST"
M_PORT=$( echo "$MASTER_CONN_INFOS"|sed 's/^.*port= *\([^ ]*\) *.*$/\1/' )
M_PORT=$( echo "$MASTER_CONN_INFOS"| grep 'port=' | sed 's/^.*port= *\([0-9a-zA-Z.-]\+\) *.*$/\1/' )
if [ ! -n "$M_PORT" ]
then
debug "Master port not specify, use default : $PG_DEFAULT_PORT"
debug "Master port not specified, use default: $PG_DEFAULT_PORT"
M_PORT=$PG_DEFAULT_PORT
else
debug "Master port : $M_PORT"
debug "Master port: $M_PORT"
fi
# Get current xlog file from master
M_CUR_XLOG="$( echo 'SELECT pg_current_xlog_location()'|su - $PG_USER -c "$PSQL_BIN -h $M_HOST -p $M_PORT -t -P format=unaligned" )"
if [ ! -n "$M_CUR_XLOG" ]
if [ -n "$PG_MASTER_USER" ]
then
echo "UNKNOWN : Can't retreive current xlog from master server"
debug "Master user provided by command-line, use it: $PG_MASTER_USER"
M_USER="$PG_MASTER_USER"
else
M_USER=$( echo "$MASTER_CONN_INFOS"| grep 'user=' | sed 's/^.*user= *\([0-9a-zA-Z.-]\+\) *.*$/\1/' )
if [ ! -n "$M_USER" ]
then
debug "Master user not specified, use default: $PG_USER"
M_USER=$PG_USER
else
debug "Master user: $M_USER"
fi
fi
M_APP_NAME=$( echo "$MASTER_CONN_INFOS"| grep 'application_name=' | sed "s/^.*application_name=[ \'\"]*\([^ \'\"]\+\)[ \'\"]*.*$/\1/" )
if [ ! -n "$M_APP_NAME" ]
then
debug "Master application name not specified, use default: $PG_DEFAULT_APP_NAME"
M_APP_NAME=$PG_DEFAULT_APP_NAME
else
debug "Master application name: $M_APP_NAME"
fi
# Get current replication state information from master
M_CUR_REPL_STATE_INFO="$( psql_master_get "SELECT state, sync_state, $sent_lsn AS sent_lsn, $write_lsn AS write_lsn FROM pg_stat_replication WHERE application_name='$M_APP_NAME';" )"
if [ ! -n "$M_CUR_REPL_STATE_INFO" ]
then
echo "UNKNOWN: Can't retreive current replication state information from master server"
exit 3
fi
debug "Master current xlog : $M_CUR_XLOG"
debug "Master current replication state:\n\tstate|sync_state|sent_lsn|write_lsn\n\t$M_CUR_REPL_STATE_INFO"
# Master current xlog is the last receive xlog ?
if [ "$M_CUR_XLOG" != "$LAST_XLOG_RECEIVE" ]
M_CUR_STATE=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f1 )
debug "Master current state: $M_CUR_STATE"
if [ "$M_CUR_STATE" != "streaming" ]
then
echo "CRITICAL : Master current xlog is not the last receive xlog"
echo "CRITICAL: this host is not in streaming state according to master host (current state = '$M_CUR_STATE')"
exit 2
fi
debug "Master current xlog is the last receive xlog"
# The last receive xlog is the last replay file ?
if [ "$LAST_XLOG_RECEIVE" != "$LAST_XLOG_REPLAY" ]
M_CUR_SYNC_STATE=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f2 )
debug "Master current sync state: $M_CUR_SYNC_STATE"
if [ "$M_CUR_SYNC_STATE" != "sync" ]
then
echo "WARNING : last receive xlog file is not the last replay file"
echo "CRITICAL: this host is not synchronized according to master host (current sync state = '$M_CUR_SYNC_STATE')"
exit 2
fi
M_CUR_SENT_LSN=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f3 )
M_CUR_WRITED_LSN=$( echo "$M_CUR_REPL_STATE_INFO"|cut -d'|' -f4 )
debug "Master current last sent/writed LSN: '$M_CUR_SENT_LSN' / '$M_CUR_WRITED_LSN'"
# Check current master LSN vs last received LSN
if [ "$CHECK_CUR_MASTER_LSN" == "1" ]
then
# Get current LSN from master
M_CUR_LSN="$( psql_master_get "SELECT $pg_current_wal_lsn" )"
if [ ! -n "$M_CUR_LSN" ]
then
echo "UNKNOWN: Can't retreive current LSN from master server"
exit 3
fi
debug "Master current LSN: $M_CUR_LSN"
# Master current LSN is the last received LSN ?
if [ "$M_CUR_LSN" != "$LAST_RECEIVED_LSN" ]
then
echo "CRITICAL: Master current LSN is not the last received LSN"
exit 2
fi
debug "Master current LSN is the last received LSN"
fi
# The last received LSN is the last replayed ?
if [ "$LAST_RECEIVED_LSN" != "$LAST_REPLAYED_LSN" ]
then
debug "/!\ The last received LSN is NOT the last replayed LSN ('$M_CUR_LSN' / '$LAST_REPLAYED_LSN')"
REPLAY_DELAY="$( psql_get 'SELECT EXTRACT(EPOCH FROM now() - pg_last_xact_replay_timestamp());' )"
debug "Replay delay is $REPLAY_DELAY second(s)"
if [ $( echo "$REPLAY_DELAY >= $REPLAY_CRITICAL_DELAY"|bc -l ) -gt 0 ]
then
echo "CRITICAL: last received LSN is not the last replayed ('$LAST_RECEIVED_LSN' / '$LAST_REPLAYED_LSN') and replay delay is $REPLAY_DELAY second(s)"
exit 2
fi
if [ $( echo "$REPLAY_DELAY >= $REPLAY_WARNING_DELAY"|bc -l ) -gt 0 ]
then
echo "WARNING: last received LSN is not the last replay file ('$LAST_RECEIVED_LSN' / '$LAST_REPLAYED_LSN') and replay delay is $REPLAY_DELAY second(s)"
exit 1
fi
debug "Replay delay is not worrying"
fi
debug "Last received LSN is the last replayed file"
# The master last sent LSN is the last received (and synced) ?
if [ "$M_CUR_SENT_LSN" != "$LAST_RECEIVED_LSN" ]
then
echo "WARNING: master last sent LSN is not already received (and synced to disk) by slave. May be we have some network delay or load on slave"
echo "Master last sent LSN: $M_CUR_SENT_LSN"
echo "Slave last received (and synced to disk) LSN: $LAST_RECEIVED_LSN"
exit 1
fi
debug "Last receive xlog file is the last replay file"
echo "OK : Hot-standby server is uptodate"
echo "OK: Hot-standby server is uptodate"
exit 0
else
debug "File recovery.conf not found. Master mode."
@ -204,31 +406,65 @@ else
# Check recovery mode
if [ $RECOVERY_MODE -eq 1 ]
then
echo "CRITICAL : In recovery mode while recovery.conf file not found !"
echo "CRITICAL: In recovery mode while recovery.conf file not found !"
exit 2
fi
debug "Postgres is not in recovery mode"
# Retreive current lsn
CURRENT_LSN=$( psql_get "SELECT $pg_current_wal_lsn" )
if [ -z "$CURRENT_LSN" ]
then
echo "UNKNOWN: Fail to retreive current LSN (Log Sequence Number)"
exit 3
fi
debug "Current LSN: $CURRENT_LSN"
# Check standby client
STANDBY_CLIENTS=$( psql_get "SELECT client_addr, sync_state FROM pg_stat_replication;" )
STANDBY_CLIENTS=$( psql_get "SELECT application_name, client_addr, sent_lsn, write_lsn, state, sync_state, current_lag
FROM (
SELECT application_name, client_addr, sent_lsn, write_lsn, state, sync_state, current_lag
FROM (
SELECT application_name, client_addr, $sent_lsn AS sent_lsn, $write_lsn AS write_lsn, state, sync_state,
$pg_wal_lsn_diff($pg_current_wal_lsn, $write_lsn) AS current_lag
FROM pg_stat_replication
) AS s2
) AS s1" )
if [ ! -n "$STANDBY_CLIENTS" ]
then
echo "WARNING : no stand-by client connected"
echo "WARNING: no stand-by client connected"
exit 1
fi
debug "Stand-by client(s) : $( echo -n $STANDBY_CLIENTS|sed 's/\n/ , /g' )"
debug "Stand-by client(s):\n\t$( echo -e "$STANDBY_CLIENTS"|sed 's/\n/\n\t/' )"
STANDBY_CLIENTS_TXT=""
STANDBY_CLIENTS_COUNT=0
CURRENT_LSN_IS_LAST_SENT=1
for line in $STANDBY_CLIENTS
do
let STANDBY_CLIENTS_COUNT=STANDBY_CLIENTS_COUNT+1
IP=$( echo $line|cut -d '|' -f 1 )
MODE=$( echo $line|cut -d '|' -f 2 )
STANDBY_CLIENTS_TXT="$STANDBY_CLIENTS_TXT $IP (mode=$MODE)"
NAME=$( echo $line|cut -d '|' -f 1 )
IP=$( echo $line|cut -d '|' -f 2 )
SENT_LSN=$( echo $line|cut -d '|' -f 3 )
WRITED_LSN=$( echo $line|cut -d '|' -f 4 )
STATE=$( echo $line|cut -d '|' -f 5 )
SYNC_STATE=$( echo $line|cut -d '|' -f 6 )
LAG=$( echo $line|cut -d '|' -f 7 )
STANDBY_CLIENTS_TXT="$STANDBY_CLIENTS_TXT\n$NAME ($IP): $STATE/$SYNC_STATE (LSN: sent='$SENT_LSN' / writed='$WRITED_LSN', Lag: ${LAG}b)"
[ "$SENT_LSN" != "$CURRENT_LSN" ] && CURRENT_LSN_IS_LAST_SENT=0
done
echo "OK : $STANDBY_CLIENTS_COUNT stand-by client(s) connected - $STANDBY_CLIENTS_TXT"
exit 0
if [ $CURRENT_LSN_IS_LAST_SENT -eq 1 ]
then
echo "OK: $STANDBY_CLIENTS_COUNT stand-by client(s) connected"
EXIT_CODE=0
else
echo "WARNING: current master LSN is not the last sent to stand-by client(s) connected. May be we have some load ?"
EXIT_CODE=1
fi
echo "Current master LSN: $CURRENT_LSN"
echo -e "$STANDBY_CLIENTS_TXT"
exit $EXIT_CODE
fi