HADR with TSA using db2haicu (A DB2 v9.7 utility) on AIX/Linux

HADR with TSA
using
db2haicu
(A DB2 v9.7 utility)
on AIX/Linux
Date
Version
Author
: 04 Apr 2010
: 2.0
: Neeraj Sharma ([email protected])
Abhishek Iyer ([email protected])
Abstract: This is a step by step guide for setting up end to end HADR with TSA using db2haicu
utility.
Audience: The target audience for this document is customer.
Acronyms
Text in
Text in
Text in
indicates user inputs.
indicates questions prompted by system/utility.
indicates information message by system/utility.
Above nomenclature does not apply to “
” output in section 11, where-in
indicates online node,
indicates offline node (in warm mode) and
indicates failed
node.
2
Table of contents
Part 1 : DB2 configuration.................................................................................................. 4
1. Check/Update /etc/hosts file ....................................................................................... 4
2. Check/Update /etc/services file................................................................................... 4
3. Enable HADR on databases........................................................................................ 5
4. Take database backup ................................................................................................. 5
5. Restore database backup on standby node.................................................................. 6
6. Check database states.................................................................................................. 6
7. Start HADR on databases ........................................................................................... 6
Part 2 : TSA Cluster setup .................................................................................................. 7
8. Prepare cluster nodes first........................................................................................... 7
9. On Standby node (should be executed on standby node first).................................... 7
10. Execute db2haicu on Primary node .......................................................................... 9
11. Check cluster status................................................................................................. 10
Part 3 : Miscellaneous tasks / Diagnostics........................................................................ 11
12. Disable TSA cluster ................................................................................................ 11
13. Enable TSA cluster ................................................................................................. 11
14. db2haicu – maintenance mode................................................................................ 12
15. HADR automated failover resource listing using db2pd ........................................ 12
16. HADR status using db2pd ...................................................................................... 14
Part 4 : Remove TSA/HADR configuration ..................................................................... 15
17. Remove TSA cluster ............................................................................................... 15
18. Stop HADR............................................................................................................. 15
Part 5 : Automatic client reroute (ACR) ........................................................................... 16
19. Add ACR configuration.......................................................................................... 16
20. Remove ACR configuration ................................................................................... 16
Appendix........................................................................................................................... 17
Why database creation is required on both nodes? ....................................................... 17
Quorum device.............................................................................................................. 17
Cluster node name resolution........................................................................................ 17
Cluster Public/Private networks.................................................................................... 17
HADR States................................................................................................................. 18
Peer ........................................................................................................................... 18
LocalCatchup ............................................................................................................ 18
RemoteCatchupPending............................................................................................ 19
RemoteCatchup......................................................................................................... 19
Disconnected............................................................................................................. 19
Manual takeover............................................................................................................ 19
Definite NO NO for HADR setup ................................................................................ 19
DB2_LOAD_COPY_NO_OVERRIDE ....................................................................... 20
Application impact........................................................................................................ 20
Upgrading DB2............................................................................................................. 21
References......................................................................................................................... 21
3
Part 1 : DB2 configuration
Overall architecture
Create database on both database servers [yes, creation on both servers is
required, why its required is explained in appendix]
Configuration setting on both databases
Create database layout and other database objects on just one database which initially will act as
primary node (say hadrnode01)
1. Check/Update /etc/hosts file
Make sure following entries exits on both the nodes
!
2. Check/Update /etc/services file
Make sure following entries exits on both the nodes
"#
"
"
"
"$% &
*
*
*
"
'''''(
'
(
)
)
+
+
)
)
(
(
4
3. Enable HADR on databases
hadrnode01
,
1
*
6
7
, ) ,
#
#
#
#
#
#
#
#
-
. /
*
*
8 * , *
6 8*
*
.*
.*
*
*
-
8 . 8
, < *
/
)
<*
< 8
0 %
& 1 2 3
%4 5
8*
01
9
2 3
:& &
0 % 1
9 1 2 3 %
6
.
*= *
*
0
2
0#
" %
"#%:&2
0#
" %
":4 2
0#
" ;%& "#%:&2
0#
" ;%& ":4 2
0#
" ;%& "1 :&2
0#
"&1; % &2
0#
":5 ;% 2
0#
"$
">1 %>2
3
3
3
3
3
3
3
3 +
"#
"$% &
"#
*
"$% &
:5
hadrnode02
,
1
#
#
#
#
#
#
#
#
*
6
7
, ) ,
-
. /
*
*
8 * , *
6 8*
*
.*
.*
*
*
-
8 . 8
, < *
/
)
<*
< 8
0 %
& 1 2 3
%4 5
8*
01
9
2 3
:& &
0 % 1
9 1 2 3 %
6
.
*= *
*
0
2
0#
" %
"#%:&2
0#
" %
":4 2
0#
" ;%& "#%:&2
0#
" ;%& ":4 2
0#
" ;%& "1 :&2
0#
"&1; % &2
0#
":5 ;% 2
0#
"$
">1 %>2
3
3
3
3
3
3
3
3 +
"#
"$% &
"#
*
"$% &
:5
Values # "&1; % & and # "$ ">1 %> should be tuned for a particular setup.
Standard recommended values of these paramenters are
and
respectively.
8)
-, -
Once
%
& 1
8 * ,
is set to
%4
%
5,
& 1
%4
5
database will move to “
?8) )
* ,”
D
!
state.
hadrnode01
,
?8) )
-, -
@ ,
) A* B
?8) )
* ,B
3 5 :
@ ,
) A* B
?8) )
* ,B
3 5 :
* ,
hadrnode02
,
?8) )
-, * ,
4. Take database backup
hadrnode01
?8)
?8)
8
-8
&
*
) -
*
?8) *
,
*
C
This will create a file with name #
%
&
D
!
in
current directory. Make this file available on hadrnode02 for restoration during step 5.
hadrnode02
?8)
?8)
8
-8
&
(
*
.( 8
) -
*
?8) *
,
*
C
D
'
5
EF
?8) *
,
-
<*
8
-
*
(
.( 8
G
The HADR DB CFG parameters mentioned in section 3 (4th to 11th) should be set after
doing the Backup. The first 3 parameters should be set before Backup itself.
5. Restore database backup on standby node
hadrnode02
:H
' >
?8) *
/ 8 <
1
>
,
* ,I
* 8
:&%
&
?
* ,
D
* ,
<*
6*
-*
&
J 0/( 2 /
&
:
!
*
)
8
-8
/
The HADR DB CFG parameters mentioned in section 3 (4th to 11th) should be set after
doing the Restore.
6. Check database states
hadrnode01
,
- <
)
-, * ,
@ ,
) A* B
-
<
)
* ,B
3 %
@ ,
) A* B
-
<
)
* ,B
3 5 :
hadrnode02
-
,
<
)
-, * ,
7. Start HADR on databases
hadrnode02
1
&
:&
& #
%
&
:
%
&
) *
:
/
)
8
-8
/
)
8
-8
/
hadrnode01
1
&
:&
& #
/
6
Part 2 : TSA Cluster setup
RSCT version 2.5.3.3 and TSA version 3.1.0.0 was used for this implementation
8. Prepare cluster nodes first
Execute following commands as
on both nodes, all other commands should
be executed as instance owner ( *
in current example).
)
)
) )
) )
9. On Standby node (should be executed on standby node first)
7
* 8
#*,
>
5 8
8
* ,
M
-*
,
*
* -
)*
1 -
. *
* * / 1
*
* ,
K / 8
8
/ 8
*
#*,
N
*
8
. *
-*,8
* *
8 * * /
*
*
.
L8
)
-*,8 * , / 8
* * / 1
8
* * / 0
.*
*
-*,8
* ,
* 82
*
/
8
8 * ,
* * / 0
, -*
-
* 8K
* 82N *
*
* 8
-*,8
*
-
* 8 *
* 8 <*
>
/ 8 8
*
M
1 *.
8
* 8 *
/
* * 8
8
*
)) /
*
*
/ 8
/ 8
8
)*
*
-*
*
8
8
-*
*
*
N
* ,
*
8)
*
.*
* ,
8
8
*
*
* * /K / 8
*
8
)
*
K / 8
*
* 8 <*
8
<
. *
*
& *
*
8
8
*,
-
*
&
8
*
8
* 8 *
.*
* 8 <*
,
<*
* , * *
*.
* 8
-*,8
* * K
*
8
<
&
* 8
8
/
.
*
?
*
)
<*
8
-
6*
< L8
*
* 8N *
* ,
/
-*,8 * , / 8
* O
<*
K
6*
*
* 8 J E G
5
"
# <
8 *L8
*
8
<
/
8
* 8
8
*
*
*
* C
<
<*
*
* ,
* ,
<
<
-*,8
.*
P *
* K
*
8
*
*
"
*
* J
-
*
* C
-
*
* C
<
5
5 8
PH8
-
-
<
*
*
<K
* * ,
* 8 <*
6*
*
/ 8
)
*-*
1- / 8
<J E G
"
"
*
*
L8
8
*
*
8
1 )
8
8
.*
*
<*
*.* /
<
8
.
-8
*
M
1- / 8
8
/ *
* *
-*,8
.
*-
K
L8
8
)*
8
.*
-
7
-*,8
5
&
:)
-
L8
<* , *
< ? H8
8
8
.*
*
8
)
*-/
< ?
))
-*,8 * , L8 8
.*
-*,8 * , L8 8
.*
&
8
,
- 8
8
* 8
* * K
)*
<
?
-
*
8))
"
L8
* ,
8
L8
.*
8
/)
.*
L8 8
.* C
*6 *
H8 8
*
"
*
*
"
* <
!
< ? *
< ? <
N
* ,
< ? <*
-
<
? *
-
* J E G
C
/)
8
C E G
.*
8
-8
? *
* 8N *
*
*
*
M
1 -
5 8
*
J E G
5
-
< ? 2
/ 8 8
<
5
* ,
")8
* ,
")8
<
* "
<
* "
")8
/ 8
<
< )8 *
< ) *.
* C
))
*6 / 8 <
?
")8
? *
< ?"
? *
< ?" <
< ?"
< )8 *
< ) *.
* C
? -
<
< ? < ? -
*
*
*
* "
<
-
< ? *
< ? *
*
C
8
C
-
) *
< ? *
?" J E G
8
-8
< ? -
? *
*
0
-
8
$8
* ($ *.
8
8
<
?
8
<
?
<
? *
-
C
8
C
* "
8
<
/ 8 <
?
")8
<
* "
<
* "
? *
< ?"
? *
< ?" <
-
< ? < ? -
* "
<
*
*
< ? *
< ? *
< ? *
?" J E G
-
-
8
5
* ,
")8
* ,
")8
")8
/ 8
5
* ,
")8
* ,
")8
< ?"
< )8 *
< ) *.
* C
8
<
/ 8 <
?
")8
<
* "
<
* "
? *
< ?"
? *
< ?" <
-
")8
")8
!
* "
* "
* "
< ?"
< ?"
< )8 *
< ) *.
* C
8
8
8
-8
< ? < ? < ? -
* "
<
<
*
*
< ? < ? -
C
?
<
?
8
C
-
-
8
8
<
?
8
<
?
<
*
*
-
< ? *
< ? *
< ? *
?" J E G
8
-8
< ? -
? *
<
? *
< ? *
< ? *
-
C
8
C
-
8
/ 8
5
8
<
/ 8 <
?
")8
* ,
< ? *
")8 * "
<
* ,
< ? *
")8 * "
<
* .* , *,
* .* , *,
8
-8
* ,
* ,
* .* ,
8
< ? *
?" J E G
<
-
/ 8 <
5
&
?" <
8
. *
* * /
. *
* * /
)
)
.
-8
-*,8
-*,8
<
?
8
<
?
*
*
* *
* *
)
)
*
8
8
8
#
-*,8
.
<
*,
-*,8
*
*
-*,8
)
*
,
)
2
*,
*
)
C
C E G
.
-
8
. * *,
*
8
8
* 8
8
- *
-8 /
#
<
*
*
* * /
-*,8
" , A
8
/ 8 <
* * /
*
-
)
*
*
-8
. *
8
. *
- *
*
*
8
)
0 *,
)* Q
*
#
#
/ #
-*,8
*
-
8
8
#
*
8
8
. *
,
* , #
#
#
-
?"
* * /
-*,8
*
-*,8
*
)
M
* * K
-*,8
*
)
B *
1 . *
* * /
-*,8
*
)
J
&
<* ,
. *
* , &:
4
. 8 *,
. *
* * /
&
*
*,
* "
#
#
. *
.
* 8
J E G
* * / # < .
* 8
) *
/ -
K
*
6* * ,
7
10. Execute db2haicu on Primary node
7
>
5 8
* ,
* 8
#*,
-*
,
8
M
*
* ,
K / 8
8
/ 8
* -
)*
1 * 8
-*,8
>
*
. *
*
#*,
N
8
. *
* * / 1
*
-*,8
* *
8 * * /
*
-*,8 * , / 8
* * / 1
*
.
L8
)
8
* * / 0
.*
*
-*,8
*
8
, -*
-
* 8K
* 82N *
*
*
*
* 8 *
* 8 <*
/ 8 8
*
M
1 *.
8
* 8 - 8
-*,8
*
8
-
<
,
<*
)) /
* , * *
*.
* 8
-*,8
* * K
*
*
/ 8
/ 8
8
8
)*
* 8 *
*
8
-
<
*
<*
"
*
)) /
*
*
*,
. *
)
)
* * /
* *
* *
-*,8
*
)
8
8
*
&
8
*
N
* ,
8)
*
.*
* ,
8
*
& *
)
*
K / 8
/
.
8
*
*
*
*
?
8
<*
&
*
)
6*
* 8N *
* ,
8
*
* .* , *,
. *
* * /
-*,8
*
)
*
8
,
-*,8
*
)
0 *,
. *
2 *
M
* * K
)* P
-*,8
*
)
P *
1 *
. *
* * /
-*,8
*
)
J
&
<* ,
. *
* , *,
. *
* * /
&:
4
. 8 *,
. *
* * /
-*,8
*
)
* ,
* ,
* ,
* ,
8 * ,
* * / 0
&
)
:
/
* 82
<
8
*
*
* * /
-*,8
*
8
" , 7
8
/ 8 <
-*,8
*
)
,
*,
C
C E G
*
&:
-8
9
/ 8 <
5
&
.
*
* , #
8
&
8
#
8
* , #
* , #
/ 8 <
5
<
! <
))
#
#
-*,8
.*
8
#
- *
- 8
*
*
- 8
*
*6 -
*
*
*
.*
1$
8
.
<
8
-
1$
-
#
#
*
$
7
*
$
8
7
J E G
8 *
-8
#
#
J E G
C
'
''
:
''
8
''
")8
")8
* , .*
* , .*
8
7
? -
.*
< ? * "
< ?"
* "
< ?"
* C
8
8
1$
1$
-*,8
.*
8
8
1$
1$
'C E ''
*
*
.
''
G
'C
'
'
*
''
)
<
8
8
-8
-8
/
* 8
6* * ,
11. Check cluster status
On successful completion of commands, cluster domain and nodes status can be checked
by following commands [the command output should same across all nodes in cluster]
7
"
*
)
%):
% *
7
)
%):
% *
% *
#
%
7
*
%
*
1 ;
N7 %
%
*
1 ;
@7 %
1 ;
N7 %
N7 %
*
: &
'
*. 4
: &4
'
'
*
;*6
4
*
&:$
!R
:$
!D
*
8
8)C
"
*
"
" 7 ,
*
3% *
*
1 ; )) *
* C
"
*
"
" 7
N7 % *
1 ; )) *
* C
"
*
"
" 7 C
8
8)C
"
*
"
" 7 ,
*
3% *
*
1 ; )) *
* C
"
*
"
" 7
N7 % *
1 ; )) *
* C
"
*
"
" 7 C
8
8)C
"
*
"
*
"#
7 ,
*
3% *
*
1 ; )) *
* C
"
*
"
*
"#
7
@7 % *
1 ; )) *
* C
"
*
"
*
"#
7 C
N7 %-- *
1 ; )) *
* C
"
*
"
*
"#
7 C
*
1 ; : .* 1$C
*)" " " "'7
@7 % *
1 ; : .* 1$C
*)" " " "'7 C
N7 %-- *
1 ; : .* 1$C
*)" " " "'7 C
Now the HADR setup is all set for automatic failover in case of disaster. TSA cluster
starts automatically with absolutely no manual intervention.
10
Part 3 : Miscellaneous tasks / Diagnostics
12. Disable TSA cluster
This option is for scheduled maintenance of setup. It only stops TSA, HADR will
remain active (but failover will not happen). Execute this on primary node as instance
owner.
7
* 8 7 *
#*,
>
5 8
-*
,
* ,
8
*
M
*
* ,
K / 8
8
/ 8
* -
)*
1 * 8
-*,8
*
#*,
N
*
*
*
. *
8
. *
-
<
8
* * / 1
*
*
?*
?*
?*
?*
?*
?*
*
* ,
,
,
,
,
,
,
* ,
8
*,
*,
. *
8
8
8
8
8
8
. *
-*,8
*
,
,
,
,
,
,
*
*
* *
8 * * /
<*
)) /
*
*
8
.*
*
-*,8
*
/ 8
*
,
* * / 0
.
L8
)
-*,8 * , / 8
* * / 1
* 8 *
* , * *
* 8 <*
*.
/ 8 8
/ 8 <
*
*
& * <*
?
-*,8
*
)
&
*
*
*
*
5 8 <*
*
# *
5
#
-*,8
* ,
*
/
8
8 * ,
* * / 0
*
*
8
)
*
, -*
-
* 8K
* 82N *
*
8) & *
*
*,
. *
* * / 0# 2 8
, 8) *
<*
- * .
*/
8
* 8 , *
*
J E G
* / *
8) #
8) #
8) 8) 8) 8) * / *
.
* 82
&
/
.
?
8
*
)
*
*
8 ,
#
< *
-
8
*
*
)
)
)
)
*
#
#
)
*
*
*
*
*
*
*
*
<
<
8
8
-8
<
8
-8
<
8
-8 /
8
-8
-8
* 8
6* * ,
13. Enable TSA cluster
Launch this command from primary node as instance owner.
7
>
5 8
* ,
* 8
#*,
-*
,
8
M
*
* ,
K / 8
8
/ 8
* -
)*
1 * 8
-*,8
>
*
*
#*,
N
8
. *
* * / 1
*
-*,8
* *
8 * * /
*
-*,8 * , / 8
* * / 1
*
.
L8
)
8
* * / 0
.*
*
-*,8
/
* ,
* 82
*
8 * ,
* * / 0
8
, -*
-
* 8K
* 82N *
*
*
*
* 8 *
* 8 <*
/ 8 8
*
M
1 *.
8
* 8 - 8
-*,8
*
* 8
/ 8 <
5
. *
8
-
<
,
<*
* , * *
*.
* 8
-*,8
* * K
*
*
8
-
)) /
*
/ 8
/ 8
8
8
)*
* 8 *
<
*,
*
<*
*,
. *
)) /
"
*
. *
* * /
* * / -
*
*
*
&
8
*
N
* ,
8)
*
.*
* ,
8
*
& *
)
*
K / 8
8
/
.
*
*
*
*
?
*
8
)
<*
-
&
6*
* 8N *
* ,
8
*
*
*
*
*
*
J E G
11
* .* , *,
. *
* * /
-*,8
*
)
*
8
,
-*,8
*
)
0 *,
. *
2 *
M
* * K
)* P
-*,8
*
)
P *
1 *
. *
* * /
-*,8
*
)
J
&
<* ,
. *
* , *,
. *
* * /
&:
4
. 8 *,
. *
* * /
-*,8
*
)
*
* * /
-*,8
*
8
" , 7
8
/ 8 <
&
)
:
* ,
* ,
* ,
8
*,
*,
*,
. *
. *
. *
-*,8
* * /
-*,8
* * / *
* * / *
*
.
*
)
*
*
<
8
)
-*,8
)
C
C E G
*
8
-8
*
,
*,
*
&:
-8
/
* 8
6* * ,
14. db2haicu – maintenance mode
db2haicu will have following output if its already active. This mode is called
maintenance mode. You can add/remove nodes, databases, virtual IPs, network interfaces
and quorum device. This command should be executed from primary database node.
7
>
5 8
* 8
#*,
-*
,
* ,
8
*
* ,
K / 8
8
/ 8
*
M
* -
)*
1 -
*
#*,
N
*
*
-*,8
* *
8 * * /
*
-*,8 * , / 8
* * / 1
*
* * / 0
.
L8
)
8
.*
*
-*,8
8
*
-
* 8 *
* 8 <*
/ 8 8
*
M
1 *.
8
* 8 - 8
-*,8
*
:
!
'
+
R
D
8
. *
* * / 1
/
* ,
* 82
*
8 * ,
* * / 0
8
, -*
-
* 8K
* 82N *
*
* 8
-*,8
>
. *
6*
/ 8
,
<*
)) /
* , * *
*.
* 8
-*,8
* * K
*
<
*.
8
*
<*
?
/ 8
/ 8
8
-
8
)*
N
* ,
<
"
*
)) /
/
? *
8
-
*
*
*
&
8
*
8)
*
.*
* ,
8
*
& *
)
*
K / 8
/
.
8
*
*
*
*
?
8
<*
&
*
)
6*
* 8N *
* ,
8
*
*
<C
-
#
1$
)
< L8 8
*
/
*
* 8 *
*
8
-
* *
.
.
.
.
; .
<
*
* *
.*
#
-
-
8
*
*
C
15. HADR automated failover resource listing using db2pd
This command can be issued from any of the database nodes.
7
1
1
8
8
8
8
1 %%*
*
*
-
1 4
:
) 7
# :
8
* C
*
*
*
3
3
3
*
C
3
"
3
'
3 % *
*
12
8
-
3
1 -
*
C
:
7777777777777777777
% *
% *
777777777777777777777
8
8
8
8
8
8
8
8) 1 8)
8)
?:
8) %):
8)
*
8)
8
<
-
*
C
3
3
3 %
3 %
3
3
%):
"
*
*
*
"
*
"#
7 ,
?
<
7777777777777
;
#
#
#
#
#
8
8
8
8
*
C
:
&/)
$ *
/ 1
:
/ 1
$ *
:
8
8
8
/
/
-
3
3 %
3 #
3
3
3 #
3
3
"
*
*
"
*
"#
7
*
*
3
*)" " " "'7
3 % *
3 1$
:
&/)
8
8
8
8
8
8
1 -
8)
8)
?:
8) %):
8)
*
8)
8
<
3
3
3 %
3 %
3
3
%):
"
*
*
*
"
" 7 ,
?
<
7777777777777
;
8
1 -
*
8
8
8
$
:
&/)
* *
8
8
<
<
7777777777777
< ? 1 * C
< ?
77777777777777777777777
")8 * "
< ?"
77777777777777777777777
< ?
77777777777777777777777
")8 * "
< ?"
77777777777777777777777
C
3
3 %
3
3
3
"
*
$
*
"
" 7
* *
8
)
777777777777777777
)
777777777777777777
8
)
777777777777777777
)
777777777777777777
H8 8 1 * C
H8 8
777777777777777777777777777777777777
"H8 8 "
< ?" " " " C D" "!+
%)
M *
H8 8 :
77777777777777777777
% *
%-- *
%-- *
13
16. HADR status using db2pd
This command can be issued from any of the database nodes.
7
)
$
#
* *
1 -
$ *
*
:
/ $
:
7
7
77
#
C
8
:/
>*
#
C +C
D 0
$
+R D 2 +
''
>*
#
:
"#
/$,
/M*
:
R
%
! +
) 7
8 C
/$,
$ *
6
/ :
:
/ :
6
/
,
C
) 8
C
., 0 /
2
8
!2
<
:
"#
$ *
! +
)
;*
D 0
#
:
:
77
&*
C 'C !
<
$ *
/M*
:
R
%
*.
&*
;
$
;
;
/
77
.*
"$% &
.*
"$% &
1
*
' +'M
' +'M
14
Part 4 : Remove TSA/HADR configuration
In order to remove TSA configuration settings (cluster and resource groups) completely
from the cluster,
17. Remove TSA cluster
7
* 8 7
#*,
>
5 8
8
-*
,
* ,
M
* -
)*
1 -
* * / 1
*
* ,
K / 8
8
/ 8
*
#*,
N
*
8
. *
-*,8
* *
8 * * /
*
*
.
L8
)
-*,8 * , / 8
* * / 1
8
* * / 0
.*
*
-*,8
* ,
* 82
*
/
8
8 * ,
* * / 0
, -*
-
* 8K
* 82N *
*
* 8
-*,8
>
*
. *
*
*
/ 8 8
*
M
1 *.
8
* 8 - 8
-*,8
*
.* , #
.* , #
.* ,
.* ,
.* ,
.* ,
* ,
* ,
8
8
-
<
* 8
* *
*
8
-
*
*
-*,8
,
<*
*
)) /
-*,8
K
/ 8
*
8
)*
* 8 *
<
#
#
)
)
)
)
"
"
*
*
<*
*
*
*
*
-
*
*
*
*
*
*
.
*
*
-
N
* ,
"
*
)) /
)
*
*
*
&
8
*
.*
* ,
K / 8
8
8
*
8
*
*
*
*
<*
-
&
6*
* 8N *
* ,
8
*
<
8
8
8
8
8
8
8
<
8
-8
<
8
-8
<
8
-8
8
-8
/
-8
* 8
6* * ,
18. Stop HADR
hadrnode02
1
)
&
:&%$ #
%
&
:
)
8
-8
/
hadrnode01
*.
&
&14 &
)
1 &
:&%$ #
%
1
&
:
&
)
:
8
)
-8
8
/
-8
/
15
Part 5 : Automatic client reroute (ACR)
Automatic client reroute is a DB2 Database for Linux, UNIX, and Windows feature that
allows client applications to recover from a loss of communication with the server so that
the application can continue its work with minimal interruption. Automatic client reroute
can be accomplished only if an alternate server has been specified prior to the loss of
communication. Client should atleast connect to server once in order to be able to use
ACR feature. No configuration changes on the client side are required.
19. Add ACR configuration
hadrnode01
8
8)
1 &
-8 /
'+>
*
.
$
&
/
&
&
,
:
8 * ,
4
/
M%
&
--
)
:
*.
8
'
)
*
*
/
*
-
hadrnode02
8
8)
1 &
-8 /
'+>
*
.
$
&
/
&
&
,
:
8 * ,
4
/
M%
&
--
)
:
*.
8
'
)
*
*
/
*
-
20. Remove ACR configuration
hadrnode01
8
8)
1 &
-8 /
'+>
*
.
$
&
/
&
&
,
:
8 * ,
4
/
M%
&
--
)
:
*.
8
)
*
*
/
*
-
hadrnode02
8
8)
1 &
-8 /
'+>
*
.
$
&
/
&
,
&
:
/
8 * ,
4
M%
--
&
)
:
*.
8
)
*
*
/
*
-
16
Appendix
Why database creation is required on both nodes?
Database creation is required on both db nodes so that LOGRETAIN can be set to
recovery successfully. If backup from hadrnode01 is restored on hadrnode01 before
enabling LOGRETAIN, database will have “Backup pending” and “Rollforward
pending” flags “ON”. In this state database can not be backed up because its in
Rollforward pending state and HADR standby can not be started on this database because
database is in “Backup pending” state.
Quorum device
A quorum device helps a cluster manager make cluster management decisions when the
cluster manager's normal decision process does not produce a clear choice. When a
cluster manager has to choose between multiple potential actions, the cluster manager
counts how many cluster domain nodes support each of the potential actions; and then
cluster manager chooses the action that is supported by the majority of cluster domain
nodes. If exactly the same number of cluster domain nodes supports more than one
choice, then the cluster manager refers to a quorum device to make the choice.
A network quorum device is an IP address to which every cluster domain node can
connect (ping) at all times.
In current implementation network route IP is used assuming that as long as network
segment is UP and RUNNING, router will always be available. No special software is
required to be installed in this quorum device. It should be reachable (ping-able) to both
the nodes all the time. During the disaster/node crash event, this device should be
reachable to other node which is working fine (either as primary/standby).
Cluster node name resolution
This resolution is required only when IP address are using during HADR configuration
instead of hostname. Automatic name resolution will not happen using /etc/hosts file.
Right hostname should be provided at this prompt. Names provided at this prompt should
match the ones provided to db2haicu while configuration in step 9.
This prompt will not appear if HADR configuration uses hostname instead of IP
addresses.
Cluster Public/Private networks
Typically servers will have multiple network interfaces. In current implementation we
have 2 network interfaces on each node [eth3 and eth10]. Public network on both nodes
are using eth3 network adapter and private network uses eth10 network adapter. Network
17
interfaces with compatible IP address should be grouped together. In current
implementation eth3 on both nodes are in ")8 * " < ?" and eth10 are in
")8
* "
<
?"
HADR States
Peer
This state indicates that primary and standby nodes are in sync and standby is
ready to takeover as primary database.
LocalCatchup
This state indicates that database has just started and is reading local log files
from log paths, archive paths. Immediately after this state, node will enter into
RemoteCatchupPending state.
18
RemoteCatchupPending
This state indicates standby node is waiting for Primary to come up in order to
receive any log updates. Node can enter into this state if during a RemoteCatchup
connection between Primary and standby breaks.
RemoteCatchup
This state indicates standby is applying the log gap between Primary and standby
onto standby node.
Disconnected
This state indicates standby node is not available
Manual takeover
To make standby node as primary manually, issue following command at standby node
? .
1 &
&
%4
#
%
&
:
)
8
-8
/
If for some reason, automatic failover fails, issue following command to force manual
takeover.
? .
1 &
&
%4
#
%
&
/ :
)
8
-8
/
Definite NO NO for HADR setup
Never use “nonrecoverable” load on HADR setup. It will stop log shipping to standby
and the only option left to make HADR work again is to backup Primary (full backup)
node and restore it onto standby node. This might not be an option at all if database size
is of the order of terabytes; because backup and restore will take several hours to
complete.
19
DB2_LOAD_COPY_NO_OVERRIDE
HADR replicates load with certain restrictions. Because load data is not embedded in the
log stream, the standby can only get the data from a load copy. Thus load is replicated
only if there is a load copy. The copy must be available to the standby when it replays the
load. The standby may attempt to access the copy any time after the primary completes
the load. It is recommended that a shared copy device such as NFS is used. If you transfer
the copy by means like physically transferring a tape, it is recommended that the standby
be stopped ("db2 deactivated database") before the primary starts the load. Once the
primary finishes the load and the copy is available to the standby, restart the standby
("db2 activate database"). The standby will then reconnect to the primary and replay the
load.
When the primary does a load with COPY NO option, by default the load is
automatically converted to NONRECOVERABLE. When standby replays the load, the
table is marked as invalid. COPY NO loads can be optionally converted to COPY YES
via the DB2_LOAD_COPY_NO_OVERRIDE registry variable.
Application impact
In the even of failover from hadrnode01 to hadrnode02 (either due to disaster or manual
takeover), application will receive following message once after successful takeover.
:H
D
*
*
P
:) *
,*
:H :& & 3 D' +
- *
P
/
8
7
.*
/
7
)
)
0
*
8
&
*
1$
P'
P
3 P P2
In order to make this failover look transparent to users, application need to catch and
handle above SQL code and retry internally or request user to try after few minutes.
Here is one sample code
/ S
((
T
*
0:H 6 ) *
*-0 L 6 ,
(( 6 8
T
S
< L
T
L 62 S
8 02 ,
6O ((
*
8
02 33
, *
D2 S
6
) *
-
T
T
20
Upgrading DB2
In case of Rolling Upgrade (DB2 patch application), both the nodes should be brought on
to the same DB2 patch level as soon as possible. In other words, it is not possible to apply
patch on one node and then after few days apply the patch onto the second node. As soon
as primary node will know that standby is on a different db2 level, HADR state will
become “Disconnected”.
Rolling Upgrade for DB2 fix pack, OS
0
P
1
P
2
S
HADR connection
fpN
S
fpN
Both P and S on fpN
HADR connection
S
fpN+1
Shutdown S. Upgrade S, Restart S,
S on fpN+1
HADR connection
P
fpN+1
Takeover.
P on fpN+1
fpN
fpN
Connection is closed at end of takeover
3
S
fpN+1
HADR connection
P
fpN+1
Shutdown S. Upgrade S, Restart S,
Both on fpN+1
4
P
fpN+1
HADR connection
S
Optional 2 takeover
Both on fpN+1
nd
fpN+1
References
1. http://www.ibm.com/developerworks/wikis/display/data/HADR_home
2. http://www.ibm.com/developerworks/db2/bestpractices/
3. http://publib.boulder.ibm.com/infocenter/db2luw/v8/topic/com.ibm.db2.udb.doc/c
ore/c0011585.htm
4. http://www.demos.ibm.com/on_demand/Demo/IBM_Demo_DB2_HADRJan05.html
5. ftp://ftp.software.ibm.com/ps/products/db2/info/vr97/pdf/en_US/DB2RecoveryA
ndHA-db2hae971.pdf
21