HADR Info in db2diag.log

Db2diag.log is an important DB2 diagnostic tool. It records history of events in DB2. The db2diag tool can be used to parse and filter db2diag.log file. This page describes db2diag.log messages relevant to HADR.

Table of Contents

HADR Info in db2diag.log

The HADR Threads

One or more threads in the database engine works on HADR tasks. A thread is also known as an EDU (Engine Dispatching Unit). You can use the "db2pd -edus" command to list threads. The HADR threads in a primary database are named "db2hadrp". Those in a standby database are named "db2hadrs". Upon role change ("takeover" command), the threads are renamed to match new role.

Log Stream id and Standby id

In V10.1 and later, log stream id and standby id are appended to HADR edu name, to support pureScale and multiple standby. Standby id is always zero for db2hadrs edus because standby id is assigned by the primary and does not propagate to the standby (similarly, the STANDBY_ID monitor field is always zero when the monitor query is issued on a standby). Example:

db2hadrp.0.1 is the primary side edu for log stream 0, standby 1. There is a dedicated edu for each standby. So if there are 3 standbys, you will see db2hadrp.0.1, db2hadrp.0.2, and db2hadrp.0.3 on the primary database.
db2hadrs.0.0 is the standby side edu for log stream 0. In a pureScale environment, there is a dedicated edu for each log stream. So if there are 4 pureScale members in the system, you will see db2hadrs.0.0, db2hadrs.1.0, db2hadrs.2.0, and db2hadrs.3.0 on the standby replay member.

The "EDUNAME" field in a db2diag.log message shows the edu who wrote the message. The "DB" field shows the database from which the message comes. DB name is also embedded in the EDUNAME field in parenthesis. When looking at messages, be sure no to confuse messages from different database, standby id, and stream id. The db2diag tool can be used to parse and filter db2diag.log file.

HADR Role Change

HADR role is changed by an HADR command (start/stop/takeover HADR. See HADR commands ). Look for keyword hdrSetDbRole in V97 and earlier or hdrSetDbRoleAndDbType in V10.1 and later in db2diag.log. Sample messages:

V9.7

"Start HADR as primary" command results in "standard" to "primary" role change:

2013-10-07-14.40.35.450719-420 E68393E386          LEVEL: Event
PID     : 4737                 TID : 46914980538688PROC : db2sysc
INSTANCE: zhuge                NODE : 000          DB   : HADRDB
EDUID   : 524                  EDUNAME: db2hadrp (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetDbRole, probe:10010
CHANGE : HADR role set to Primary (was Standard)

V10.1

Role switch result in "primary" to "standby" change (shown below) and "standby" to "primary" change (not shown):

2013-10-07-14.26.41.918203-420 E132007E1255          LEVEL: Event
PID     : 1562                 TID : 46916419184960 PROC : db2sysc
INSTANCE: zhuge                NODE : 000            DB   : HADRDB
APPHDL : 0-101                APPID: *LOCAL.DB2.131007212657
HOSTNAME: zeeman
EDUID   : 468                  EDUNAME: db2agent (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetDbRoleAndDbType, probe:10020
CHANGE : HADR DATABASE ROLE/TYPE -
HADR database role set to STANDBY (was PRIMARY).
HADR database type set to PHYSICAL (was PHYSICAL).

HADR State Change

Internal vs. External HADR States

The HADR states in db2diag.log is the internal state. It has finer granularity than the external state reported by monitor interfaces. The internal state indicates what the local database (primary or standby) is doing, while the external state indicates what the HADR pair is doing. Internal states have P_ or S_ prefix. The internal states are:

HDR_NO_STATE: Initial state, like a logical "NULL" state.
HDR_P_BOOT and HDR_S_BOOT: HADR primary/standby is starting.
HDR_S_LOC_CATCHUP: Standby is doing local catchup. In V9.7 and earlier, state is HDR_S_LOC_CATCHUP when standby is retrieving logs from archive.
HDR_P_REM_CATCHUP_PENDING and HDR_S_REM_CATCHUP_PENDING: Primary/standby is waiting for remote catchup to start, either because primary and standby are not connected yet, or standby is still doing local catchup. In V10.1 and later, state is HDR_S_REM_CATCHUP_PENDING when standby is retrieving logs from archive.
HDR_P_NPEER and HDR_S_NPEER: Primary/standby is in the transition from remote catchup to peer (nearly peer). Externally, this is reported as remote catchup (not peer yet).
HDR_P_PEER and HDR_S_PEER: Primary and standby are in peer state.
HDR_P_DISCONN_PEER and HDR_S_DISCONN_PEER: Primary/Standby is in disconnected peer state (peer window is enabled and the pair lost connection).

For more info on HADR state, see HADR log shipping

Sample Messages from V9.7

Entering Peer state:

2013-10-07-14.40.36.401196-420 E76255E392          LEVEL: Event
PID     : 4737                 TID : 46914980538688PROC : db2sysc
INSTANCE: zhuge                NODE : 000          DB   : HADRDB
EDUID   : 524                  EDUNAME: db2hadrp (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
CHANGE : HADR state set to P-Peer (was P-NearlyPeer)

During a role switch, standby changes into P-Peer state. There is a matching standby to primary role change message (not shown).

2013-10-07-14.40.50.329996-420 E81365E448          LEVEL: Event
PID     : 4737                 TID : 46914993121600PROC : db2sysc
INSTANCE: zhuge                NODE : 000          DB   : HADRDB
APPHDL : 0-113                APPID: *LOCAL.DB2.131007214120
EDUID   : 553                  EDUNAME: db2agent (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
CHANGE : HADR state set to S-Peer (was P-Peer)

Sample Messages from V10.1

Entering Peer state:

2013-10-07-14.26.24.033020-420 E126131E433           LEVEL: Event
PID     : 1562                 TID : 46916289161536 PROC : db2sysc
INSTANCE: zhuge                NODE : 000            DB   : HADRDB
HOSTNAME: zeeman
EDUID   : 453                  EDUNAME: db2hadrp.0.1 (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrSetHdrState, probe:10000
CHANGE : HADR state set to HDR_P_PEER (was HDR_P_NPEER), connId=1

Connection ID

In V10.1 and later, each time the primary and standby establishes connection, an connection id ( connId ) is assigned. The connection id is shared on primary and standby (primary and standby negotiate a common id for a particular connection). ConnId is included in all HADR state change messages. You can use this id to match messages on primary and standby. For example, the entering peer messages on primary and standby sharing the same connId are from the same peer session. Previously, you can only use timestamp to match primary and standby messages, which is not reliable, especially when primary and standby hosts have clock skew. Sample messages:

Primary side:

2013-10-07-14.26.23.545742-420 I119733E421           LEVEL: Info
PID     : 1562                 TID : 46916289161536 PROC : db2sysc
INSTANCE: zhuge                NODE : 000            DB   : HADRDB
HOSTNAME: zeeman
EDUID   : 453                  EDUNAME: db2hadrp.0.1 (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:30440
DATA #1 : <preformatted>
Connection succeeded, connId=1

Standby side message for the same round of handshake (connId is also 1):

2013-10-07-14.26.23.551195-420 I48195E422            LEVEL: Info
PID     : 2973                 TID : 46914393336128 PROC : db2sysc
INSTANCE: zhuge                NODE : 000            DB   : HADRDB
HOSTNAME: zernike
EDUID   : 315                  EDUNAME: db2hadrs.0.0 (HADRDB)
FUNCTION: DB2 UDB, High Availability Disaster Recovery, hdrHandleHsAck, probe:30440
DATA #1 : <preformatted>
Connection succeeded, connId=1

HADR Command History

You can find out the history of start/stop/takeover HADR command invocation from db2diag.log.

Start HADR command

In V10.1 and later, the following messages are recorded:

For "start HADR as primary":
Received START HADR PRIMARY command.
For "start HADR as primary by force":
Received START HADR PRIMARY FORCE command.
For "start HADR as standby":
Received START HADR SECONDARY command.

In V9.7 and earlier, the following message is recorded for all variations of this command:

Received START HADR command.

You may use role and state information to determine which variation of the command was issued.

Stop HADR command

The following message is recorded for this command:

Received STOP HADR command.

Takeover HADR command

The following message is recorded for this command on the standby's db2diag.log:

Received TAKEOVER HADR command.

On the standby, a message following this message indicates which variation of takeover command was issued:

For "takeover HADR" (no "by force" option):
Standby has initiated a takeover.
For "takeover HADR by force":
Standby has initiated a takeover by force.
For "takeover HADR by force peer window only":
Standby has initiated a takeover by force peer window only.

On the primary, you can find the following messages:

For "takeover HADR" (no "by force" option):
Primary has started non-forced takeover request.
For "takeover HADR by force" (this message is printed only when primary and standby are connected at takeover time. This messages includes the "peer window only" variation):
Primary has started a forced takeover request.