HADR with TSA using db2haicu (A DB2 v9.7 utility) on AIX/Linux Date Version Author : 04 Apr 2010 : 2.0 : Neeraj Sharma ([email protected]) Abhishek Iyer ([email protected]) Abstract: This is a step by step guide for setting up end to end HADR with TSA using db2haicu utility. Audience: The target audience for this document is customer. Acronyms Text in Text in Text in indicates user inputs. indicates questions prompted by system/utility. indicates information message by system/utility. Above nomenclature does not apply to “ ” output in section 11, where-in indicates online node, indicates offline node (in warm mode) and indicates failed node. 2 Table of contents Part 1 : DB2 configuration.................................................................................................. 4 1. Check/Update /etc/hosts file ....................................................................................... 4 2. Check/Update /etc/services file................................................................................... 4 3. Enable HADR on databases........................................................................................ 5 4. Take database backup ................................................................................................. 5 5. Restore database backup on standby node.................................................................. 6 6. Check database states.................................................................................................. 6 7. Start HADR on databases ........................................................................................... 6 Part 2 : TSA Cluster setup .................................................................................................. 7 8. Prepare cluster nodes first........................................................................................... 7 9. On Standby node (should be executed on standby node first).................................... 7 10. Execute db2haicu on Primary node .......................................................................... 9 11. Check cluster status................................................................................................. 10 Part 3 : Miscellaneous tasks / Diagnostics........................................................................ 11 12. Disable TSA cluster ................................................................................................ 11 13. Enable TSA cluster ................................................................................................. 11 14. db2haicu – maintenance mode................................................................................ 12 15. HADR automated failover resource listing using db2pd ........................................ 12 16. HADR status using db2pd ...................................................................................... 14 Part 4 : Remove TSA/HADR configuration ..................................................................... 15 17. Remove TSA cluster ............................................................................................... 15 18. Stop HADR............................................................................................................. 15 Part 5 : Automatic client reroute (ACR) ........................................................................... 16 19. Add ACR configuration.......................................................................................... 16 20. Remove ACR configuration ................................................................................... 16 Appendix........................................................................................................................... 17 Why database creation is required on both nodes? ....................................................... 17 Quorum device.............................................................................................................. 17 Cluster node name resolution........................................................................................ 17 Cluster Public/Private networks.................................................................................... 17 HADR States................................................................................................................. 18 Peer ........................................................................................................................... 18 LocalCatchup ............................................................................................................ 18 RemoteCatchupPending............................................................................................ 19 RemoteCatchup......................................................................................................... 19 Disconnected............................................................................................................. 19 Manual takeover............................................................................................................ 19 Definite NO NO for HADR setup ................................................................................ 19 DB2_LOAD_COPY_NO_OVERRIDE ....................................................................... 20 Application impact........................................................................................................ 20 Upgrading DB2............................................................................................................. 21 References......................................................................................................................... 21 3 Part 1 : DB2 configuration Overall architecture Create database on both database servers [yes, creation on both servers is required, why its required is explained in appendix] Configuration setting on both databases Create database layout and other database objects on just one database which initially will act as primary node (say hadrnode01) 1. Check/Update /etc/hosts file Make sure following entries exits on both the nodes ! 2. Check/Update /etc/services file Make sure following entries exits on both the nodes "# " " " "$% & * * * " '''''( ' ( ) ) + + ) ) ( ( 4 3. Enable HADR on databases hadrnode01 , 1 * 6 7 , ) , # # # # # # # # - . / * * 8 * , * 6 8* * .* .* * * - 8 . 8 , < * / ) <* < 8 0 % & 1 2 3 %4 5 8* 01 9 2 3 :& & 0 % 1 9 1 2 3 % 6 . *= * * 0 2 0# " % "#%:&2 0# " % ":4 2 0# " ;%& "#%:&2 0# " ;%& ":4 2 0# " ;%& "1 :&2 0# "&1; % &2 0# ":5 ;% 2 0# "$ ">1 %>2 3 3 3 3 3 3 3 3 + "# "$% & "# * "$% & :5 hadrnode02 , 1 # # # # # # # # * 6 7 , ) , - . / * * 8 * , * 6 8* * .* .* * * - 8 . 8 , < * / ) <* < 8 0 % & 1 2 3 %4 5 8* 01 9 2 3 :& & 0 % 1 9 1 2 3 % 6 . *= * * 0 2 0# " % "#%:&2 0# " % ":4 2 0# " ;%& "#%:&2 0# " ;%& ":4 2 0# " ;%& "1 :&2 0# "&1; % &2 0# ":5 ;% 2 0# "$ ">1 %>2 3 3 3 3 3 3 3 3 + "# "$% & "# * "$% & :5 Values # "&1; % & and # "$ ">1 %> should be tuned for a particular setup. Standard recommended values of these paramenters are and respectively. 8) -, - Once % & 1 8 * , is set to %4 % 5, & 1 %4 5 database will move to “ ?8) ) * ,” D ! state. hadrnode01 , ?8) ) -, - @ , ) A* B ?8) ) * ,B 3 5 : @ , ) A* B ?8) ) * ,B 3 5 : * , hadrnode02 , ?8) ) -, * , 4. Take database backup hadrnode01 ?8) ?8) 8 -8 & * ) - * ?8) * , * C This will create a file with name # % & D ! in current directory. Make this file available on hadrnode02 for restoration during step 5. hadrnode02 ?8) ?8) 8 -8 & ( * .( 8 ) - * ?8) * , * C D ' 5 EF ?8) * , - <* 8 - * ( .( 8 G The HADR DB CFG parameters mentioned in section 3 (4th to 11th) should be set after doing the Backup. The first 3 parameters should be set before Backup itself. 5. Restore database backup on standby node hadrnode02 :H ' > ?8) * / 8 < 1 > , * ,I * 8 :&% & ? * , D * , <* 6* -* & J 0/( 2 / & : ! * ) 8 -8 / The HADR DB CFG parameters mentioned in section 3 (4th to 11th) should be set after doing the Restore. 6. Check database states hadrnode01 , - < ) -, * , @ , ) A* B - < ) * ,B 3 % @ , ) A* B - < ) * ,B 3 5 : hadrnode02 - , < ) -, * , 7. Start HADR on databases hadrnode02 1 & :& & # % & : % & ) * : / ) 8 -8 / ) 8 -8 / hadrnode01 1 & :& & # / 6 Part 2 : TSA Cluster setup RSCT version 2.5.3.3 and TSA version 3.1.0.0 was used for this implementation 8. Prepare cluster nodes first Execute following commands as on both nodes, all other commands should be executed as instance owner ( * in current example). ) ) ) ) ) ) 9. On Standby node (should be executed on standby node first) 7 * 8 #*, > 5 8 8 * , M -* , * * - )* 1 - . * * * / 1 * * , K / 8 8 / 8 * #*, N * 8 . * -*,8 * * 8 * * / * * . L8 ) -*,8 * , / 8 * * / 1 8 * * / 0 .* * -*,8 * , * 82 * / 8 8 * , * * / 0 , -* - * 8K * 82N * * * 8 -*,8 * - * 8 * * 8 <* > / 8 8 * M 1 *. 8 * 8 * / * * 8 8 * )) / * * / 8 / 8 8 )* * -* * 8 8 -* * * N * , * 8) * .* * , 8 8 * * * * /K / 8 * 8 ) * K / 8 * * 8 <* 8 < . * * & * * 8 8 *, - * & 8 * 8 * 8 * .* * 8 <* , <* * , * * *. * 8 -*,8 * * K * 8 < & * 8 8 / . * ? * ) <* 8 - 6* < L8 * * 8N * * , / -*,8 * , / 8 * O <* K 6* * * 8 J E G 5 " # < 8 *L8 * 8 < / 8 * 8 8 * * * * C < <* * * , * , < < -*,8 .* P * * K * 8 * * " * * J - * * C - * * C < 5 5 8 PH8 - - < * * <K * * , * 8 <* 6* * / 8 ) *-* 1- / 8 <J E G " " * * L8 8 * * 8 1 ) 8 8 .* * <* *.* / < 8 . -8 * M 1- / 8 8 / * * * -*,8 . *- K L8 8 )* 8 .* - 7 -*,8 5 & :) - L8 <* , * < ? H8 8 8 .* * 8 ) *-/ < ? )) -*,8 * , L8 8 .* -*,8 * , L8 8 .* & 8 , - 8 8 * 8 * * K )* < ? - * 8)) " L8 * , 8 L8 .* 8 /) .* L8 8 .* C *6 * H8 8 * " * * " * < ! < ? * < ? < N * , < ? <* - < ? * - * J E G C /) 8 C E G .* 8 -8 ? * * 8N * * * * M 1 - 5 8 * J E G 5 - < ? 2 / 8 8 < 5 * , ")8 * , ")8 < * " < * " ")8 / 8 < < )8 * < ) *. * C )) *6 / 8 < ? ")8 ? * < ?" ? * < ?" < < ?" < )8 * < ) *. * C ? - < < ? < ? - * * * * " < - < ? * < ? * * C 8 C - ) * < ? * ?" J E G 8 -8 < ? - ? * * 0 - 8 $8 * ($ *. 8 8 < ? 8 < ? < ? * - C 8 C * " 8 < / 8 < ? ")8 < * " < * " ? * < ?" ? * < ?" < - < ? < ? - * " < * * < ? * < ? * < ? * ?" J E G - - 8 5 * , ")8 * , ")8 ")8 / 8 5 * , ")8 * , ")8 < ?" < )8 * < ) *. * C 8 < / 8 < ? ")8 < * " < * " ? * < ?" ? * < ?" < - ")8 ")8 ! * " * " * " < ?" < ?" < )8 * < ) *. * C 8 8 8 -8 < ? < ? < ? - * " < < * * < ? < ? - C ? < ? 8 C - - 8 8 < ? 8 < ? < * * - < ? * < ? * < ? * ?" J E G 8 -8 < ? - ? * < ? * < ? * < ? * - C 8 C - 8 / 8 5 8 < / 8 < ? ")8 * , < ? * ")8 * " < * , < ? * ")8 * " < * .* , *, * .* , *, 8 -8 * , * , * .* , 8 < ? * ?" J E G < - / 8 < 5 & ?" < 8 . * * * / . * * * / ) ) . -8 -*,8 -*,8 < ? 8 < ? * * * * * * ) ) * 8 8 8 # -*,8 . < *, -*,8 * * -*,8 ) * , ) 2 *, * ) C C E G . - 8 . * *, * 8 8 * 8 8 - * -8 / # < * * * * / -*,8 " , A 8 / 8 < * * / * - ) * * -8 . * 8 . * - * * * 8 ) 0 *, )* Q * # # / # -*,8 * - 8 8 # * 8 8 . * , * , # # # - ?" * * / -*,8 * -*,8 * ) M * * K -*,8 * ) B * 1 . * * * / -*,8 * ) J & <* , . * * , &: 4 . 8 *, . * * * / & * *, * " # # . * . * 8 J E G * * / # < . * 8 ) * / - K * 6* * , 7 10. Execute db2haicu on Primary node 7 > 5 8 * , * 8 #*, -* , 8 M * * , K / 8 8 / 8 * - )* 1 * 8 -*,8 > * . * * #*, N 8 . * * * / 1 * -*,8 * * 8 * * / * -*,8 * , / 8 * * / 1 * . L8 ) 8 * * / 0 .* * -*,8 * 8 , -* - * 8K * 82N * * * * * 8 * * 8 <* / 8 8 * M 1 *. 8 * 8 - 8 -*,8 * 8 - < , <* )) / * , * * *. * 8 -*,8 * * K * * / 8 / 8 8 8 )* * 8 * * 8 - < * <* " * )) / * * *, . * ) ) * * / * * * * -*,8 * ) 8 8 * & 8 * N * , 8) * .* * , 8 * & * ) * K / 8 / . 8 * * * * ? 8 <* & * ) 6* * 8N * * , 8 * * .* , *, . * * * / -*,8 * ) * 8 , -*,8 * ) 0 *, . * 2 * M * * K )* P -*,8 * ) P * 1 * . * * * / -*,8 * ) J & <* , . * * , *, . * * * / &: 4 . 8 *, . * * * / -*,8 * ) * , * , * , * , 8 * , * * / 0 & ) : / * 82 < 8 * * * * / -*,8 * 8 " , 7 8 / 8 < -*,8 * ) , *, C C E G * &: -8 9 / 8 < 5 & . * * , # 8 & 8 # 8 * , # * , # / 8 < 5 < ! < )) # # -*,8 .* 8 # - * - 8 * * - 8 * *6 - * * * .* 1$ 8 . < 8 - 1$ - # # * $ 7 * $ 8 7 J E G 8 * -8 # # J E G C ' '' : '' 8 '' ")8 ")8 * , .* * , .* 8 7 ? - .* < ? * " < ?" * " < ?" * C 8 8 1$ 1$ -*,8 .* 8 8 1$ 1$ 'C E '' * * . '' G 'C ' ' * '' ) < 8 8 -8 -8 / * 8 6* * , 11. Check cluster status On successful completion of commands, cluster domain and nodes status can be checked by following commands [the command output should same across all nodes in cluster] 7 " * ) %): % * 7 ) %): % * % * # % 7 * % * 1 ; N7 % % * 1 ; @7 % 1 ; N7 % N7 % * : & ' *. 4 : &4 ' ' * ;*6 4 * &:$ !R :$ !D * 8 8)C " * " " 7 , * 3% * * 1 ; )) * * C " * " " 7 N7 % * 1 ; )) * * C " * " " 7 C 8 8)C " * " " 7 , * 3% * * 1 ; )) * * C " * " " 7 N7 % * 1 ; )) * * C " * " " 7 C 8 8)C " * " * "# 7 , * 3% * * 1 ; )) * * C " * " * "# 7 @7 % * 1 ; )) * * C " * " * "# 7 C N7 %-- * 1 ; )) * * C " * " * "# 7 C * 1 ; : .* 1$C *)" " " "'7 @7 % * 1 ; : .* 1$C *)" " " "'7 C N7 %-- * 1 ; : .* 1$C *)" " " "'7 C Now the HADR setup is all set for automatic failover in case of disaster. TSA cluster starts automatically with absolutely no manual intervention. 10 Part 3 : Miscellaneous tasks / Diagnostics 12. Disable TSA cluster This option is for scheduled maintenance of setup. It only stops TSA, HADR will remain active (but failover will not happen). Execute this on primary node as instance owner. 7 * 8 7 * #*, > 5 8 -* , * , 8 * M * * , K / 8 8 / 8 * - )* 1 * 8 -*,8 * #*, N * * * . * 8 . * - < 8 * * / 1 * * ?* ?* ?* ?* ?* ?* * * , , , , , , , * , 8 *, *, . * 8 8 8 8 8 8 . * -*,8 * , , , , , , * * * * 8 * * / <* )) / * * 8 .* * -*,8 * / 8 * , * * / 0 . L8 ) -*,8 * , / 8 * * / 1 * 8 * * , * * * 8 <* *. / 8 8 / 8 < * * & * <* ? -*,8 * ) & * * * * 5 8 <* * # * 5 # -*,8 * , * / 8 8 * , * * / 0 * * 8 ) * , -* - * 8K * 82N * * 8) & * * *, . * * * / 0# 2 8 , 8) * <* - * . */ 8 * 8 , * * J E G * / * 8) # 8) # 8) 8) 8) 8) * / * . * 82 & / . ? 8 * ) * * 8 , # < * - 8 * * ) ) ) ) * # # ) * * * * * * * * < < 8 8 -8 < 8 -8 < 8 -8 / 8 -8 -8 * 8 6* * , 13. Enable TSA cluster Launch this command from primary node as instance owner. 7 > 5 8 * , * 8 #*, -* , 8 M * * , K / 8 8 / 8 * - )* 1 * 8 -*,8 > * * #*, N 8 . * * * / 1 * -*,8 * * 8 * * / * -*,8 * , / 8 * * / 1 * . L8 ) 8 * * / 0 .* * -*,8 / * , * 82 * 8 * , * * / 0 8 , -* - * 8K * 82N * * * * * 8 * * 8 <* / 8 8 * M 1 *. 8 * 8 - 8 -*,8 * * 8 / 8 < 5 . * 8 - < , <* * , * * *. * 8 -*,8 * * K * * 8 - )) / * / 8 / 8 8 8 )* * 8 * < *, * <* *, . * )) / " * . * * * / * * / - * * * & 8 * N * , 8) * .* * , 8 * & * ) * K / 8 8 / . * * * * ? * 8 ) <* - & 6* * 8N * * , 8 * * * * * * J E G 11 * .* , *, . * * * / -*,8 * ) * 8 , -*,8 * ) 0 *, . * 2 * M * * K )* P -*,8 * ) P * 1 * . * * * / -*,8 * ) J & <* , . * * , *, . * * * / &: 4 . 8 *, . * * * / -*,8 * ) * * * / -*,8 * 8 " , 7 8 / 8 < & ) : * , * , * , 8 *, *, *, . * . * . * -*,8 * * / -*,8 * * / * * * / * * . * ) * * < 8 ) -*,8 ) C C E G * 8 -8 * , *, * &: -8 / * 8 6* * , 14. db2haicu – maintenance mode db2haicu will have following output if its already active. This mode is called maintenance mode. You can add/remove nodes, databases, virtual IPs, network interfaces and quorum device. This command should be executed from primary database node. 7 > 5 8 * 8 #*, -* , * , 8 * * , K / 8 8 / 8 * M * - )* 1 - * #*, N * * -*,8 * * 8 * * / * -*,8 * , / 8 * * / 1 * * * / 0 . L8 ) 8 .* * -*,8 8 * - * 8 * * 8 <* / 8 8 * M 1 *. 8 * 8 - 8 -*,8 * : ! ' + R D 8 . * * * / 1 / * , * 82 * 8 * , * * / 0 8 , -* - * 8K * 82N * * * 8 -*,8 > . * 6* / 8 , <* )) / * , * * *. * 8 -*,8 * * K * < *. 8 * <* ? / 8 / 8 8 - 8 )* N * , < " * )) / / ? * 8 - * * * & 8 * 8) * .* * , 8 * & * ) * K / 8 / . 8 * * * * ? 8 <* & * ) 6* * 8N * * , 8 * * <C - # 1$ ) < L8 8 * / * * 8 * * 8 - * * . . . . ; . < * * * .* # - - 8 * * C 15. HADR automated failover resource listing using db2pd This command can be issued from any of the database nodes. 7 1 1 8 8 8 8 1 %%* * * - 1 4 : ) 7 # : 8 * C * * * 3 3 3 * C 3 " 3 ' 3 % * * 12 8 - 3 1 - * C : 7777777777777777777 % * % * 777777777777777777777 8 8 8 8 8 8 8 8) 1 8) 8) ?: 8) %): 8) * 8) 8 < - * C 3 3 3 % 3 % 3 3 %): " * * * " * "# 7 , ? < 7777777777777 ; # # # # # 8 8 8 8 * C : &/) $ * / 1 : / 1 $ * : 8 8 8 / / - 3 3 % 3 # 3 3 3 # 3 3 " * * " * "# 7 * * 3 *)" " " "'7 3 % * 3 1$ : &/) 8 8 8 8 8 8 1 - 8) 8) ?: 8) %): 8) * 8) 8 < 3 3 3 % 3 % 3 3 %): " * * * " " 7 , ? < 7777777777777 ; 8 1 - * 8 8 8 $ : &/) * * 8 8 < < 7777777777777 < ? 1 * C < ? 77777777777777777777777 ")8 * " < ?" 77777777777777777777777 < ? 77777777777777777777777 ")8 * " < ?" 77777777777777777777777 C 3 3 % 3 3 3 " * $ * " " 7 * * 8 ) 777777777777777777 ) 777777777777777777 8 ) 777777777777777777 ) 777777777777777777 H8 8 1 * C H8 8 777777777777777777777777777777777777 "H8 8 " < ?" " " " C D" "!+ %) M * H8 8 : 77777777777777777777 % * %-- * %-- * 13 16. HADR status using db2pd This command can be issued from any of the database nodes. 7 ) $ # * * 1 - $ * * : / $ : 7 7 77 # C 8 :/ >* # C +C D 0 $ +R D 2 + '' >* # : "# /$, /M* : R % ! + ) 7 8 C /$, $ * 6 / : : / : 6 / , C ) 8 C ., 0 / 2 8 !2 < : "# $ * ! + ) ;* D 0 # : : 77 &* C 'C ! < $ * /M* : R % *. &* ; $ ; ; / 77 .* "$% & .* "$% & 1 * ' +'M ' +'M 14 Part 4 : Remove TSA/HADR configuration In order to remove TSA configuration settings (cluster and resource groups) completely from the cluster, 17. Remove TSA cluster 7 * 8 7 #*, > 5 8 8 -* , * , M * - )* 1 - * * / 1 * * , K / 8 8 / 8 * #*, N * 8 . * -*,8 * * 8 * * / * * . L8 ) -*,8 * , / 8 * * / 1 8 * * / 0 .* * -*,8 * , * 82 * / 8 8 * , * * / 0 , -* - * 8K * 82N * * * 8 -*,8 > * . * * * / 8 8 * M 1 *. 8 * 8 - 8 -*,8 * .* , # .* , # .* , .* , .* , .* , * , * , 8 8 - < * 8 * * * 8 - * * -*,8 , <* * )) / -*,8 K / 8 * 8 )* * 8 * < # # ) ) ) ) " " * * <* * * * * - * * * * * * . * * - N * , " * )) / ) * * * & 8 * .* * , K / 8 8 8 * 8 * * * * <* - & 6* * 8N * * , 8 * < 8 8 8 8 8 8 8 < 8 -8 < 8 -8 < 8 -8 8 -8 / -8 * 8 6* * , 18. Stop HADR hadrnode02 1 ) & :&%$ # % & : ) 8 -8 / hadrnode01 *. & &14 & ) 1 & :&%$ # % 1 & : & ) : 8 ) -8 8 / -8 / 15 Part 5 : Automatic client reroute (ACR) Automatic client reroute is a DB2 Database for Linux, UNIX, and Windows feature that allows client applications to recover from a loss of communication with the server so that the application can continue its work with minimal interruption. Automatic client reroute can be accomplished only if an alternate server has been specified prior to the loss of communication. Client should atleast connect to server once in order to be able to use ACR feature. No configuration changes on the client side are required. 19. Add ACR configuration hadrnode01 8 8) 1 & -8 / '+> * . $ & / & & , : 8 * , 4 / M% & -- ) : *. 8 ' ) * * / * - hadrnode02 8 8) 1 & -8 / '+> * . $ & / & & , : 8 * , 4 / M% & -- ) : *. 8 ' ) * * / * - 20. Remove ACR configuration hadrnode01 8 8) 1 & -8 / '+> * . $ & / & & , : 8 * , 4 / M% & -- ) : *. 8 ) * * / * - hadrnode02 8 8) 1 & -8 / '+> * . $ & / & , & : / 8 * , 4 M% -- & ) : *. 8 ) * * / * - 16 Appendix Why database creation is required on both nodes? Database creation is required on both db nodes so that LOGRETAIN can be set to recovery successfully. If backup from hadrnode01 is restored on hadrnode01 before enabling LOGRETAIN, database will have “Backup pending” and “Rollforward pending” flags “ON”. In this state database can not be backed up because its in Rollforward pending state and HADR standby can not be started on this database because database is in “Backup pending” state. Quorum device A quorum device helps a cluster manager make cluster management decisions when the cluster manager's normal decision process does not produce a clear choice. When a cluster manager has to choose between multiple potential actions, the cluster manager counts how many cluster domain nodes support each of the potential actions; and then cluster manager chooses the action that is supported by the majority of cluster domain nodes. If exactly the same number of cluster domain nodes supports more than one choice, then the cluster manager refers to a quorum device to make the choice. A network quorum device is an IP address to which every cluster domain node can connect (ping) at all times. In current implementation network route IP is used assuming that as long as network segment is UP and RUNNING, router will always be available. No special software is required to be installed in this quorum device. It should be reachable (ping-able) to both the nodes all the time. During the disaster/node crash event, this device should be reachable to other node which is working fine (either as primary/standby). Cluster node name resolution This resolution is required only when IP address are using during HADR configuration instead of hostname. Automatic name resolution will not happen using /etc/hosts file. Right hostname should be provided at this prompt. Names provided at this prompt should match the ones provided to db2haicu while configuration in step 9. This prompt will not appear if HADR configuration uses hostname instead of IP addresses. Cluster Public/Private networks Typically servers will have multiple network interfaces. In current implementation we have 2 network interfaces on each node [eth3 and eth10]. Public network on both nodes are using eth3 network adapter and private network uses eth10 network adapter. Network 17 interfaces with compatible IP address should be grouped together. In current implementation eth3 on both nodes are in ")8 * " < ?" and eth10 are in ")8 * " < ?" HADR States Peer This state indicates that primary and standby nodes are in sync and standby is ready to takeover as primary database. LocalCatchup This state indicates that database has just started and is reading local log files from log paths, archive paths. Immediately after this state, node will enter into RemoteCatchupPending state. 18 RemoteCatchupPending This state indicates standby node is waiting for Primary to come up in order to receive any log updates. Node can enter into this state if during a RemoteCatchup connection between Primary and standby breaks. RemoteCatchup This state indicates standby is applying the log gap between Primary and standby onto standby node. Disconnected This state indicates standby node is not available Manual takeover To make standby node as primary manually, issue following command at standby node ? . 1 & & %4 # % & : ) 8 -8 / If for some reason, automatic failover fails, issue following command to force manual takeover. ? . 1 & & %4 # % & / : ) 8 -8 / Definite NO NO for HADR setup Never use “nonrecoverable” load on HADR setup. It will stop log shipping to standby and the only option left to make HADR work again is to backup Primary (full backup) node and restore it onto standby node. This might not be an option at all if database size is of the order of terabytes; because backup and restore will take several hours to complete. 19 DB2_LOAD_COPY_NO_OVERRIDE HADR replicates load with certain restrictions. Because load data is not embedded in the log stream, the standby can only get the data from a load copy. Thus load is replicated only if there is a load copy. The copy must be available to the standby when it replays the load. The standby may attempt to access the copy any time after the primary completes the load. It is recommended that a shared copy device such as NFS is used. If you transfer the copy by means like physically transferring a tape, it is recommended that the standby be stopped ("db2 deactivated database") before the primary starts the load. Once the primary finishes the load and the copy is available to the standby, restart the standby ("db2 activate database"). The standby will then reconnect to the primary and replay the load. When the primary does a load with COPY NO option, by default the load is automatically converted to NONRECOVERABLE. When standby replays the load, the table is marked as invalid. COPY NO loads can be optionally converted to COPY YES via the DB2_LOAD_COPY_NO_OVERRIDE registry variable. Application impact In the even of failover from hadrnode01 to hadrnode02 (either due to disaster or manual takeover), application will receive following message once after successful takeover. :H D * * P :) * ,* :H :& & 3 D' + - * P / 8 7 .* / 7 ) ) 0 * 8 & * 1$ P' P 3 P P2 In order to make this failover look transparent to users, application need to catch and handle above SQL code and retry internally or request user to try after few minutes. Here is one sample code / S (( T * 0:H 6 ) * *-0 L 6 , (( 6 8 T S < L T L 62 S 8 02 , 6O (( * 8 02 33 , * D2 S 6 ) * - T T 20 Upgrading DB2 In case of Rolling Upgrade (DB2 patch application), both the nodes should be brought on to the same DB2 patch level as soon as possible. In other words, it is not possible to apply patch on one node and then after few days apply the patch onto the second node. As soon as primary node will know that standby is on a different db2 level, HADR state will become “Disconnected”. Rolling Upgrade for DB2 fix pack, OS 0 P 1 P 2 S HADR connection fpN S fpN Both P and S on fpN HADR connection S fpN+1 Shutdown S. Upgrade S, Restart S, S on fpN+1 HADR connection P fpN+1 Takeover. P on fpN+1 fpN fpN Connection is closed at end of takeover 3 S fpN+1 HADR connection P fpN+1 Shutdown S. Upgrade S, Restart S, Both on fpN+1 4 P fpN+1 HADR connection S Optional 2 takeover Both on fpN+1 nd fpN+1 References 1. http://www.ibm.com/developerworks/wikis/display/data/HADR_home 2. http://www.ibm.com/developerworks/db2/bestpractices/ 3. http://publib.boulder.ibm.com/infocenter/db2luw/v8/topic/com.ibm.db2.udb.doc/c ore/c0011585.htm 4. http://www.demos.ibm.com/on_demand/Demo/IBM_Demo_DB2_HADRJan05.html 5. ftp://ftp.software.ibm.com/ps/products/db2/info/vr97/pdf/en_US/DB2RecoveryA ndHA-db2hae971.pdf 21