Chuyển đổi dự phòng cá thể tự động hóa bằng cách sử dụng Tiện ích cấu hình cá thể có Tính sẵn sàng cao của DB2 IBM

Automated Instance failover
Using
IBM DB2 High Availability
Instance Configuration Utility
(db2haicu) on shared storage
(AIX/Linux)
Date: March 18, 2010
Version: 2.0
Authors: Abhishek Iyer
([email protected])
Neeraj Sharma ([email protected])
Abstract: This is a step by step guide for setting up an end to end HA DB2
Instance on shared storage with TSA using the db2haicu utility for Linux.
Same procedure is applicable to AIX as well when used with equivalent
commands for AIX.
Table of Contents
1. Introduction and Overview ............................................................................................. 4
2. Before we begin .............................................................................................................. 4
2.1 Hardware Configuration used ................................................................................... 4
2.2 Software Configuration used .................................................................................... 4
2.3 Overall Architecture.................................................................................................. 5
2.3.1 Hardware Topology ........................................................................................... 5
2.3.2 Network Topology ............................................................................................. 6
3. Pre Configuration steps................................................................................................... 7
3.1 Configuring the /etc/hosts file:.................................................................................. 7
3.2 Configuring the db2nodes.cfg file: ........................................................................... 8
3.3 Configuring the /etc/services file .............................................................................. 8
4. Configuring the Standby Node ....................................................................................... 9
4.1 NFS Server settings (configuring /etc/exports file) .................................................. 9
4.2 NFS Client settings (updating /etc/fstab file)............................................................ 9
4.3 Storage failover settings (updating /etc/fstab file) .................................................. 10
5. Configuring a DB2 Instance for HA using db2haicu.................................................... 13
5.1 Procedure of running db2haicu............................................................................... 13
5.2 Appendix for db2haicu ........................................................................................... 29
6. Configuring the NFS Server for HA............................................................................. 30
7. Post Configuration steps (at the customer site)............................................................. 34
8. Cluster monitoring ........................................................................................................ 37
9. Lessons learnt during HA implementations.................................................................. 40
9.1 Hostname conflict ................................................................................................... 40
9.2 Prevent auto-mount of shared file systems ............................................................. 40
9.3 Preventing file system consistency checks at boot time ......................................... 40
1. Introduction and Overview
This document is aimed to serve as an end to end guide for configuring a database
instance as a HA (high available) instance across shared storage. A high available
database instance across shared storage is typically needed in a Balance Warehouse
(BCU) environment where one standby node acts a failover node for all the data and
admin nodes. This will be discussed more in the Overall Architecture section below.
The implementation discussed in this document is based on the DB2 High Availability
(HA) feature and the DB2 High Availability Instance Configuration Utility (db2haicu)
which is available in DB2 Version 9.5 and beyond. This utility uses the Tivoli System
Automation (TSA) cluster manager to configure the shared database instance. A user can
use this utility in the following two modes:
• Interactive Mode: The user would need to provide all the required inputs step by
step as prompted on the command line by the utility.
• XML Mode: In this mode all the inputs need to be written into an XML file which
the utility would parse to extract the required data.
This document explains how to configure a shared instance on shared storage using the
step by step interactive mode (in section 5 below).
2. Before we begin
It is important that you go through the setup information before moving on to the actual
HA configuration steps. The hardware used in the current implementation is a D5100
Balanced Warehouse (BCU) which has the following nodes (Linux boxes):
•
•
•
•
1 Admin node (admin01)
3 Data nodes (data01,data02 and data03)
1 Standby node (stdby01)
1 Management node (mgmt01)
2.1 Hardware Configuration used
•
•
•
Each node is an x3650 server with Quad-Core Intel Xeon Proc X5470 (3.33 GHz).
All nodes have 32 GB memory except the management node which has 8 GB.
The Admin and the Data nodes have 24 external hard disks each of 146 GB
capacity.
2.2 Software Configuration used
•
•
•
DB2 Linux Enterprise Server Edition
DWE
IBM Tivoli System Automation
9.7.0.1
9.7.0.0
3.1.0.0
Operating System
SUSE Linux Enterprise Server
VERSION = 10
PATCHLEVEL = 2
Kernel Information
2.6.16.60-0.21-smp
2.3 Overall Architecture
This section describes the overall architecture in terms of hardware, network topology of
the highly available database cluster under implementation.
2.3.1 Hardware Topology
In a typical D5100 Balanced Warehouse environment, the standby node is designed to be
the failover node for the admin node as well as the data nodes. The management node is
not a part of the HA cluster, as it is only used to manage the other nodes using the cluster
system management utilities. Hence we’ll not be referring to the management node
henceforth in this document.
As mentioned in the hardware configuration above, each of the admin and data nodes
have their respective storage disks which are connected through Fiber Optic cables
(shown by red lines in Figure 1 below). The Standby node would be configured to take
control of the storage mount points of the failed node in the event of a failover (shown by
red dotted lines in Figure 1 below).
Even though the database instance resides across all of the admin and data nodes in a
balanced warehouse, any external application would typically connect only to admin
node which internally acts as the co-coordinator node. An NFS server runs on the admin
node and all other data nodes are NFS clients. The communication between the admin
and data nodes takes place using the Gigabit Ethernet network (shown in purple lines in
Figure 1 below). In the event that a data node fails over to the standby node, the standby
node must start functioning as an NFS client and in case the admin node fails over, the
standby node must function as the NFS server and take the role of the coordinator node.
The step by step configuration of the standby node for each of these failover scenarios is
described in detail in the following sections.
Figure 1: Hardware Topology
2.3.2 Network Topology
A D5100 Balanced Warehouse typically has the following networks:
• Cluster management network. This network supports the management, monitoring
and administration of the cluster. The management node (mgmt01) uses this network to
manage the network using the Cluster System Management Utilities. This network may
or may not be made highly available. In the current implementation this network is on
subnet 192.168.111.0 (shown in brown lines in Figure 1 above) and we would be making
it highly available.
• Baseboard management controller. Additionally there is a service processor network
that is linked to this network. This service processor, called the baseboard management
controller (BMC), provides alerting, monitoring, and control of the servers. This is port 1
of integrated network ports. This network is port is common to Cluster management
network.
• DB2 fast communications manager (FCM) network. The DB2 FCM network is the
network which is used for internal database communication between database partitions
on different physical servers. This Gigabit Ethernet network supports FCM traffic as well
as the NFS instance directory used in a DB2 with Database Partitioning Feature (DPF)
configuration. This network is made highly available as all the data transfers between
different nodes happen over this network. In the current implementation, this network
exists on subnet 172.16.10.0 (shown in purple lines in Figure 1 above).
• Corporate network (optional). This network allows external applications and clients
to access and query the database. Typically, external applications would only require
connecting to the admin node which would internally coordinate with all the other data
nodes on the FCM network, but in some cases for more flexibility, data nodes are also
made reachable on the corporate network. In the current implementation, only the admin
and standby nodes are reachable on the corporate network on the subnet 192.168.253.0
(shown with green lines in Figure 1 above). Standby is made available on the corporate
network to provide an alternate route for the external applications in case the admin node
goes down.
3. Pre Configuration steps
There are some pre-configuration steps that must be done in order to ensure that the
HA configuration is successful.
3.1 Configuring the /etc/hosts file:
All the nodes in the cluster must have similar entries in the /etc/hosts file (as shown
below) to ensure all hosts are mutually recognizable. Please make sure that the format of
the entries as shown below:
!
!
!
"
"
"
#
"
"
"
"
"
"
"
$
$
$
$
$
$
$
/etc/hosts file contents for all nodes on cluster
Entries for all networks for each database node should be exactly same on all nodes in the
cluster.
3.2 Configuring the db2nodes.cfg file:
Depending on the number of database partitions, the db2nodes.cfg file under the ~/sqllib/
directory must have the contents in the format shown below across all nodes. Typically in a
D5100 Balanced Warehouse, the db2nodes.cfg file is present under the /db2home directory
which is NFS shared from admin01: /shared_db2home on all the nodes.
In the current implementation there are total 13 partitions: 1 on admin and 4 on each of the
three data nodes. Hence the /db2home/bculinux/sqllib/db2nodes.cfg file looks like:
"
%
&
"
"
"
"
"
"
"
"
"
"
"
db2nodes.cfg file contents
3.3 Configuring the /etc/services file
All the nodes in the cluster must have the following entries in the /etc/services file to
enable DB2 communication both inter and intra partitions.
• The first entry below (db2c_bculinux 50001/tcp) corresponds to the port number that is
used for external communication for the node.
• The following entries (from DB2_bculinux 60000/tcp to DB2_bculinux_END
60012/tcp) correspond to the port numbers that are used for intra partition communication
for a node. In the current example since the standby node would be configured to take over
the admin and the 3 data nodes with 4 partitions each, the maximum number of partitions
that would run on the standby would be 13. Hence 13 ports from 60000 to 60012 are
reserved in this particular case. Also, since a BCU demands all the nodes must have the
same configuration, the same 13 ports must be reserved on all nodes in the cluster.
• Please ensure that all these ports numbers are unique in the /etc/services file and are not
used for any other communication.
!
!
!
!
!
'
'
'
'
'
'
(
(
('
('
('"
('%
&
)
)
)
)
")
%)
*
*
*
*
*
*
!
!
!
!
!
!
!
!
'
'
'
'
'
'
'
'
('&
('
('
('
('
('
('
('+,!
&)
)
)
)
)
)
)
)
*
*
*
*
*
*
*
*
DB2 port settings in /etc/services
4. Configuring the Standby Node
This section describes the settings that need to be done so that all the storage in the cluster
is visible and available to the standby node as well as mountable in the event of a failover.
Also, the settings that need to be done on the standby node so that it acts as the NFS server
(in case the admin node goes down) and/or as an NFS client in case any node goes down.
4.1 NFS Server settings (configuring /etc/exports file)
As mentioned before, in a D5100 Balanced Warehouse, the DB2 instance-owning admin
node (admin01) acts as an NFS server for all the nodes in the cluster (including itself)
which act as NFS clients. Typically there are two directories that are NFS shared across all
the nodes:
• /shared_db2home: The DB2 Instance home directory
• /shared_home: User home directory for all non-DB2 users.
The NFS server settings in a D5100 Balanced Warehouse are:
rw,sync,fsid=X,no_root_squash
Open the /etc/exports file of the admin node to confirm this. It should have entries like:
In case the admin node goes down, the standby node must be able to take over as the
NFS server for all the nodes. Hence, manually edit the /etc/exports file on the standby
node so that it has entries identical to that of the admin node:
4.2 NFS Client settings (updating /etc/fstab file)
Since all the nodes in the cluster (including admin node) act as NFS clients, the standby
node must be configured to act as an NFS client in the event of a failover.
The NFS client settings in a D5100 Balanced Warehouse are:
rw,hard,bg,intr,suid,tcp,nfsvers=3,timeo=600,nolock
Check the /etc/fstab file on all the nodes (including admin) for entries like these:
!
"
#
!
$ %
"
#
$ %
Manually edit the /etc/fstab file on the standby node and add identical entries:
!
"
#
!
$ %
"
#
$ %
Create the /db2home directory on standby and manually mount both /home and
/db2home on standby.
&
%
&
&
4.3 Storage failover settings (updating /etc/fstab file)
This section describes the settings that need to be done to ensure failover of the storage
mount points to the standby node. First we’ll check whether all file systems are visible and
available to standby and then we’ll configure standby so to make these mountable in the
event of a failover.
Verify that all the logical volumes from all the nodes are visible to standby:
As root, issue the following command on standby:
' $!
You should see a list of all the logical volumes from all the nodes in the HA group. In
the current example, the output of the command is like:
$
01
* /
* /
/
/
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/
7
/ 7
/ 7 .
-)
*
*
*
*
*
*
*"
*%
*&
*
*
*
*
.
.
)
12
$
$
$
/
/
/
/
/
/
/
/
/
/
/
/
/
/ 7
/ 7
(
7
7
7
7
7
7
7
7
7
7
7
7
7
*
*
*
*
*
*
*"
*%
*&
*
*
*
*
/
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
6
0# 3
6
6
6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
6 6
666
666
&%%
&%%
&%%
&%%
&%%
&%%
&%%
&%%
&%%
&%%
&%%
&%%
&%%
4
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
2
#
*5
/
0
*$5
7 /
/
7
/ 7
/
6
6
666
666
%
8
Logical volume setup on standby node
Verify that all these logical volumes are available on standby:
As root, issue the following command on standby:
' $!
$
For every logical volume listed, you should see that the ’LV Status’ is listed as
’available’, as shown in the following sample output:
666
01
12
01
01
01
0
,
,
99:!
<
#
*
01 # 3
/
666
)
/
/)/
7 * ) /
7 *
7 *
2;<(6 7 68$&#6$ ,=6$
)
/
&%%
" %
0+
<6#8> 6 $?
@
2
#
A
.
.
/
&"-
Logical volume details
If any logical volume is marked as ’not available’, reboot the standby node and check
again.
Define the file systems on standby node and configure the /etc/fstab file so that it is
able to mount respective storage in the event of a failover:
Since all the admin and data nodes can failover to the standby node, all the file systems
and /etc/fstab file on the standby node must be configured to be identical to that of
admin and data nodes. In the current example we define the following file systems on
standby:
• Define the file systems for the DB2 instance home directory, the user home
directory, and the NFS control directory (which are present in admin):
%
(
$
)
%
%
*+
•
$
)
,
!!
$!
,
!!
$!
,
!!
$!
$
)
)"
$
)"
! $
! $
)"
$
$
)
)
)
,-,--
,--
Define the file system for staging space (present in admin):
)
.
.
B)
)
/)/
) /
)
( "
C
C
'(
B DD )
)7
•
Define file systems for database partitions 0 – 12 (present in admin and data nodes):
'
%
(
$
*+
,
$
) ./01
)
$
!!
)
$!
$
) ./01
)"
$
)
,--
'
%
(
$
(+
,
$
) ./01
)
$
!!
)
$!
$
) ./01
)"
$
)
,--
'
%
(
$
(+
,
$
! !
) ./01
)
$
$!
)
$
) ./01
)"
$
)
,--
Similarly define the file systems for partitions 03 – 12 and add entries to /etc/fstab file.
In the current implementation, the /etc/fstab has these added entries:
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
)
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
/)/
7 ) / 7
.
) .
7 ) / 7 .
) .
'.
7 ) / 7 /
7 )/
) /
)
(
7 * ) /
7 * )
7
7 * ) /
7 * )
7
7 * ) /
7 * )
7
7 *") /
7 *" )
7
7 *%) /
7 *% )
7
7 *&) /
7 *& )
7
7 * ) /
7 * )
7
7 * ) /
7 * )
7
7 * ) /
7 * )
7
7 * ) /
7 * )
7
7 * ) /
7 *
)
7 * ) /
7 *
)
7 * ) /
7 *
)
-) .
'.
).
7
-) .
'
.
)
.
'
"
)
)
)
)
)
)
)
)
)
)
7
7
7
.
( "
( "
C
C
7
( "
C
C
C
'(
(),4!+
(),4!+
(),4!+
(),4!+
"
(),4!+
%
(),4!+
&
(),4!+
(),4!+
(),4!+
(),4!+
)
(),4!+
)
(),4!+
)
(),4!+
C.
C C
C
7
C.
C C
C
C
(
(
(
(
(
(
(
(
(
(
C
'(
'(
'(
"
C
"
C
"
C
"
C
"
C
"
C
"
C
"
C
"
C
"
C
( "
( "
( "
C *C 7 /
C
C *C
C
C
C
C
C
C
C
C
C
C
'(
'(
'(
'(
'(
'(
'(
'(
'(
'(
C
C
'(
C
C
'(
C
C
'(
E"C
E
C
7 /
E"C
E
C
It is extremely important that the ‘noauto’ mount option is used for each of the ext3 file
system in the /etc/fstab file. This would prevent system from auto-mounting the file
system after reboot. TSA would take care of mounting the required file systems based
on which nodes are up. For instance, initially admin and data nodes would have all their
respective file systems manually mounted. In case data01 goes down, TSA will mount
all its mount points that need to be transferred on standby. Once data01 comes up and
standby is made to go offline, TSA will again ensure that the mount control is
transferred back to data01.
As part of the D5100 Balanced Warehouse configuration, admin and data nodes
would already have this ‘noauto’ option set in their respective /etc/fstab files. In
case not, please set the ‘noauto’ option in /etc/fstab/ across all nodes.
5. Configuring a DB2 Instance for HA using db2haicu
Now that all the required pre-configuration is complete, we’ll configure the DB2 Instance
for high availability using the db2haicu utility. Recall that the db2haicu utility can be run in
two modes viz. the XML mode and the Interactive mode. This document would cover the
configuration using the step by step interactive mode on the command line.
5.1 Procedure of running db2haicu
1. Prepare cluster nodes for db2haicu: Run the following command on all admin, data and
standby nodes as root:
'
"
On a D5100 Balanced Warehouse, you can also use a single command from the mgmt
node as ‘root’ which uses the cluster management system utility ‘dsh’.
'
*
"
2. Activate the database: On admin node, as instance owner (in this case bculinux), issue:
&
!
23402
BCUDB is the database name used in the current implementation.
3. Run db2haicu: On Admin node, as instance owner, issue the db2haicu command. Once
you run db2haicu, you’ll be prompted for inputs required for the HA configuration step
by step. Below is the sample execution of the db2haicu used in the current
implementation. Please note that in the current implementation, bond0 had been created
using two network ports on the FCM network (on each node) and bond1 was created
using two network ports on the Cluster network (on each node). Typically, this utility is
run in-house i.e. before the system was shipped to the customer, so the corporate network
isn’t available. Once the setup is delivered to the customer, additional configuration
needs to be done to make the corporate network highly available which will be covered in
the ‘Post Configuration steps’ section.
Text in RED indicates user inputs.
Text in BLUE indicates questions prompted by system/utility.
Text in BLACK indicates information message by system/utility.
bculinux@admin01:~> db2haicu
Welcome to the DB2 High Availability Instance Configuration Utility
(db2haicu).
You can find detailed diagnostic information in the DB2 server
diagnostic log file called db2diag.log. Also, you can use the utility
called db2pd to query the status of the cluster domains you create.
For more information about configuring your clustered environment using
db2haicu, see the topic called 'DB2 High Availability Instance
Configuration Utility (db2haicu)' in the DB2 Information Center.
db2haicu determined the current DB2 database manager instance is
bculinux. The cluster configuration that follows will apply to this
instance.
db2haicu is collecting information on your current setup. This step may
take some time as db2haicu will need to activate all databases for the
instance to discover all paths ...
When you use db2haicu to configure your clustered environment, you
create cluster domains. For more information, see the topic 'Creating a
cluster domain with db2haicu' in the DB2 Information Center. db2haicu
is searching the current machine for an existing active cluster
domain ...
db2haicu did not find a cluster domain on this machine. db2haicu will
now query the system for information about cluster nodes to create a
new cluster domain ...
db2haicu did not find a cluster domain on this machine. To continue
configuring your clustered environment for high availability, you must
create a cluster domain; otherwise, db2haicu will exit.
Create a domain and continue? [1]
1. Yes
2. No
1
Create a unique name for the new domain:
ha_domain
Nodes must now be added to the new domain.
How many cluster nodes will the domain ha_domain contain?
5
Enter the host name of a machine to add to the domain:
admin01
Enter the host name of a machine to add to the domain:
stdby01
Enter the host name of a machine to add to the domain:
data01
Enter the host name of a machine to add to the domain:
data02
Enter the host name of a machine to add to the domain:
data03
db2haicu can now create a new domain containing the 5 machines that you
specified. If you choose not to create a domain now, db2haicu will
exit.
Create the domain now? [1]
1. Yes
2. No
1
Creating domain ha_domain in the cluster ...
Creating domain ha_domain in the cluster was successful.
You can now configure a quorum device for the domain. For more
information, see the topic "Quorum devices" in the DB2 Information
Center. If you do not configure a quorum device for the domain, then a
human operator will have to manually intervene if subsets of machines
in the cluster lose connectivity.
Configure a quorum device for the domain called ha_domain? [1]
1. Yes
2. No
1
The following is a list of supported quorum device types:
1. Network Quorum
Enter the number corresponding to the quorum device type to be used:
[1]
1
Specify the network address of the quorum device:
172.16.10.1
Refer to the appendix for details on quorum device
Configuring quorum device for domain ha_domain ...
Configuring quorum device for domain ha_domain was successful.
The cluster manager found 10 network interface cards on the machines in
the domain. You can use db2haicu to create networks for these network
interface cards. For more information, see the topic 'Creating networks
with db2haicu' in the DB2 Information Center.
Create networks for these network interface cards? [1]
1. Yes
2. No
1
Enter the name of the network for the network interface card: bond0 on
cluster node: admin01
1. Create a new public network for this network interface card.
2. Create a new private network for this network interface card.
Enter selection:
2
Refer to the appendix below for more details
Are you sure you want to add the network interface card bond0 on
cluster node admin01 to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node admin01 to the
network db2_private_network_0 ...
Adding network interface card bond0 on cluster node admin01 to the
network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: bond0 on
cluster node: data01
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond0 on
cluster node data01 to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node data01 to the
network db2_private_network_0 ...
Adding network interface card bond0 on cluster node data01 to the
network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: bond0
cluster node: data02
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond0 on
cluster node data02 to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node data02 to the
network db2_private_network_0 ...
Adding network interface card bond0 on cluster node data02 to the
network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: bond0
cluster node: data03
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond0 on
cluster node data03 to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node data03 to the
network db2_private_network_0 ...
Adding network interface card bond0 on cluster node data03 to the
network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: bond0
cluster node: stdby01
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond0 on
cluster node stdby01 to the network db2_private_network_0? [1]
1. Yes
2. No
1
Adding network interface card bond0 on cluster node stdby01 to the
network db2_private_network_0 ...
Adding network interface card bond0 on cluster node stdby01 to the
network db2_private_network_0 was successful.
Enter the name of the network for the network interface card: bond1
on
on
on
on
cluster node: stdby01
1. db2_private_network_0
2. Create a new public network for this network interface card.
3. Create a new private network for this network interface card.
Enter selection:
3
Create a separate private network for bond1
Are you sure you want to add the network interface card bond1 on
cluster node data03 to the network db2_private_network_1? [1]
1. Yes
2. No
1
Adding network interface card bond1 on cluster node data03 to the
network db2_private_network_1 ...
Adding network interface card bond1 on cluster node data03 to the
network db2_private_network_1 was successful.
Enter the name of the network for the network interface card: bond1 on
cluster node: data02
1. db2_private_network_1
2. db2_private_network_0
3. Create a new public network for this network interface card.
4. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond1 on
cluster node data02 to the network db2_private_network_1? [1]
1. Yes
2. No
1
Adding network interface card bond1 on cluster node data02 to the
network db2 _public_network_1 ...
Adding network interface card bond1 on cluster node data02 to the
network db2 _public_network_1 was successful.
Enter the name of the network for the network interface card: bond1 on
cluster n ode: data01
1. db2_private_network_1
2. db2_private_network_0
3. Create a new public network for this network interface card.
4. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond1 on
cluster node data01 to the network db2_private_network_1? [1]
1. Yes
2. No
1
Adding network interface card bond1 on cluster node data01 to the
network db2_private_network_1 ...
Adding network interface card bond1 on cluster node data01 to the
network db2_private_network_1 was successful.
Enter the name of the network for the network interface card: bond1 on
cluster node: admin01
1. db2_private_network_1
2. db2_private_network_0
3. Create a new public network for this network interface card.
4. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card bond1 on
cluster node admin01 to the network db2_private_network_1? [1]
1. Yes
2. No
1
Adding network interface card bond1 on cluster node admin01 to the
network db2_private_network_1 ...
Adding network interface card bond1 on cluster node admin01 to the
network db2_private_network_1 was successful.
Retrieving high availability configuration parameter for instance
bculinux ...
The cluster manager name configuration parameter (high availability
configuration parameter) is not set. For more information, see the
topic "cluster_mgr - Cluster manager name configuration parameter" in
the DB2 Information Center. Do you want to set the high availability
configuration parameter?
The following are valid settings for the high availability
configuration parameter:
1.TSA
2.Vendor
Enter a value for the high availability configuration parameter: [1]
1
Setting a high availability configuration parameter for instance
bculinux to TSA.
Now you need to configure the failover policy for the instance
bculinux. The failover policy determines the machines on which the
cluster manager will restart the database manager if the database
manager is brought offline unexpectedly.
The following are the available failover policies:
1. Local Restart -- during failover, the database manager will
restart in place on the local machine
2. Round Robin -- during failover, the database manager will restart
on any machine in the cluster domain
3. Mutual Takeover -- during failover, the database partitions on one
machine will failover to a specific machine and vice versa (used with
DPF instances)
4. M+N -- during failover, the database partitions on one machine
will failover to any other machine in the cluster domain (used with DPF
instances)
5. Custom -- during failover, the database manager will restart on a
machine from a user-specified list
Enter your selection:
4
You can identify mount points that are noncritical for failover. For
more information, see the topic 'Identifying mount points that are
noncritical for failover' in the DB2 Information Center. Are there any
mount points that you want to designate as noncritical? [2]
1. Yes
2. No
2
The following DB2 database partitions can be made highly available:
DB2 database partition number 0
DB2 database partition number 1
DB2 database partition number 2
DB2 database partition number 3
DB2 database partition number 4
DB2 database partition number 5
DB2 database partition number 6
DB2 database partition number 7
DB2 database partition number 8
DB2 database partition number 9
DB2 database partition number 10
DB2 database partition number 11
DB2 database partition number 12
Do you want to make all these DB2 database partitions highly available?
[1]
1. Yes
2. No
1
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 0.
Should the cluster node data01 be designated as an idle node for DB2
database partition 0? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 0? [1]
1. Yes
2. No
1
For all partitions we choose stdby01 as the idle node
Should the cluster node data03 be designated as an idle node for DB2
database partition 0? [1]
1. Yes
2. No
2
Should the cluster node data02 be designated as an idle node for DB2
database partition 0? [1]
1. Yes
2. No
2
Adding DB2 database partition 0 to the cluster ...
Adding DB2 database partition 0 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 0?
[2]
1. Yes
2. No
2
For details on virtual IP, refer to the appendix
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 1.
Should the cluster node admin01 be designated as an idle node for DB2
database partition 1? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 1? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 1? [1]
1. Yes
2. No
2
Should the cluster node data02 be designated as an idle node for DB2
database partition 1? [1]
1. Yes
2. No
2
Adding DB2 database partition 1 to the cluster ...
Adding DB2 database partition 1 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 1?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 2.
Should the cluster node admin01 be designated as an idle node for DB2
database partition 2? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 2? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 2? [1]
1. Yes
2. No
2
Should the cluster node data02 be designated as an idle node for DB2
database partition 2? [1]
1. Yes
2. No
2
Adding DB2 database partition 2 to the cluster ...
Adding DB2 database partition 2 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 2?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 3.
Should the cluster node admin01 be designated as an idle node for DB2
database partition 3? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 3? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 3? [1]
1. Yes
2. No
2
Should the cluster node data02 be designated as an idle node for DB2
database partition 3? [1]
1. Yes
2. No
2
Adding DB2 database partition 3 to the cluster ...
Adding DB2 database partition 3 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 3?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 4.
Should the cluster node admin01 be designated as an idle node for DB2
database partition 4? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 4? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 4? [1]
1. Yes
2. No
2
Should the cluster node data02 be designated as an idle node for DB2
database partition 4? [1]
1. Yes
2. No
2
Adding DB2 database partition 4 to the cluster ...
Adding DB2 database partition 4 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 4?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 5.
Should the cluster node data01 be designated as an idle node for DB2
database partition 5? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 5? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 5? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 5? [1]
1. Yes
2. No
2
Adding DB2 database partition 5 to the cluster ...
Adding DB2 database partition 5 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 5?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 6.
Should the cluster node data01 be designated as an idle node for DB2
database partition 6? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 6? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 6? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 6? [1]
1. Yes
2. No
2
Adding DB2 database partition 6 to the cluster ...
Adding DB2 database partition 6 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 6?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 7.
Should the cluster node data01 be designated as an idle node for DB2
database partition 7? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 7? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 7? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 7? [1]
1. Yes
2. No
2
Adding DB2 database partition 7 to the cluster ...
Adding DB2 database partition 7 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 7?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 8.
Should the cluster node data01 be designated as an idle node for DB2
database partition 8? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 8? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 8? [1]
1. Yes
2. No
1
Should the cluster node data03 be designated as an idle node for DB2
database partition 8? [1]
1. Yes
2. No
2
Adding DB2 database partition 8 to the cluster ...
Adding DB2 database partition 8 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 8?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 9.
Should the cluster node data01 be designated as an idle node for DB2
database partition 9? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 9? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 9? [1]
1. Yes
2. No
1
Should the cluster node data02 be designated as an idle node for DB2
database partition 9? [1]
1. Yes
2. No
2
Adding DB2 database partition 9 to the cluster ...
Adding DB2 database partition 9 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition: 9?
[2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 10.
Should the cluster node data01 be designated as an idle node for DB2
database partition 10? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 10? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 10? [1]
1. Yes
2. No
1
Should the cluster node data02 be designated as an idle node for DB2
database partition 10? [1]
1. Yes
2. No
2
Adding DB2 database partition 10 to the cluster ...
Adding DB2 database partition 10 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition:
10? [2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 11.
Should the cluster node data01 be designated as an idle node for DB2
database partition 11? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 11? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 11? [1]
1. Yes
2. No
1
Should the cluster node data02 be designated as an idle node for DB2
database partition 11? [1]
1. Yes
2. No
2
Adding DB2 database partition 11 to the cluster ...
Adding DB2 database partition 11 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition:
11? [2]
1. Yes
2. No
2
M+N failover policy was chosen. You will need to specify the idle nodes
for database partition 12.
Should the cluster node data01 be designated as an idle node for DB2
database partition 12? [1]
1. Yes
2. No
2
Should the cluster node admin01 be designated as an idle node for DB2
database partition 12? [1]
1. Yes
2. No
2
Should the cluster node stdby01 be designated as an idle node for DB2
database partition 12? [1]
1. Yes
2. No
1
Should the cluster node data02 be designated as an idle node for DB2
database partition 12? [1]
1. Yes
2. No
2
Adding DB2 database partition 12 to the cluster ...
Adding DB2 database partition 12 to the cluster was successful.
Do you want to configure a virtual IP address for the DB2 partition:
12? [2]
1. Yes
2. No
2
The following databases can be made highly available:
Database: BCUDB
Do you want to make all active databases highly available? [1]
1. Yes
2. No
1
Adding database BCUDB to the cluster domain ...
Adding database BCUDB to the cluster domain was successful.
All cluster configurations have been completed successfully. db2haicu
exiting ...
4. Check status of the cluster: Once db2haicu exists, you can use the ‘lssam’ command to
check the status of the cluster. The details on how to interpret output would be covered in
the ‘Cluster monitoring’ section below. For now, just check that it shows ‘Online’ for all
instance partitions and storage mount points on respective nodes and ‘Offline’ for all
instance partitions and storage mount points on standby as illustrated below:
bculinux@admin01:~> lssam
Online IBM.ResourceGroup:db2_bculinux_0-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_0-rs
|- Online IBM.Application:db2_bculinux_0-rs:admin01
'- Offline IBM.Application:db2_bculinux_0-rs:stdby01
|- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0000-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0000-rs:admin01
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0000-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_10-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_10-rs
|- Online IBM.Application:db2_bculinux_10-rs:data03
'- Offline IBM.Application:db2_bculinux_10-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0010-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0010-rs:data03
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0010-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_11-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_11-rs
|- Online IBM.Application:db2_bculinux_11-rs:data03
'- Offline IBM.Application:db2_bculinux_11-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0011-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0011-rs:data03
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0011-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_12-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_12-rs
|- Online IBM.Application:db2_bculinux_12-rs:data03
'- Offline IBM.Application:db2_bculinux_12-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0012-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0012-rs:data03
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0012-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_1-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_1-rs
|- Online IBM.Application:db2_bculinux_1-rs:data01
'- Offline IBM.Application:db2_bculinux_1-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0001-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0001-rs:data01
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0001-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_2-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_2-rs
|- Online IBM.Application:db2_bculinux_2-rs:data01
'- Offline IBM.Application:db2_bculinux_2-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0002-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0002-rs:data01
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0002-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_3-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_3-rs
|- Online IBM.Application:db2_bculinux_3-rs:data01
'- Offline IBM.Application:db2_bculinux_3-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0003-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0003-rs:data01
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0003-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_4-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_4-rs
|- Online IBM.Application:db2_bculinux_4-rs:data01
'- Offline IBM.Application:db2_bculinux_4-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0004-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0004-rs:data01
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0004-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_5-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_5-rs
|- Online IBM.Application:db2_bculinux_5-rs:data02
'- Offline IBM.Application:db2_bculinux_5-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0005-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0005-rs:data02
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0005-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_6-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_6-rs
|- Online IBM.Application:db2_bculinux_6-rs:data02
'- Offline IBM.Application:db2_bculinux_6-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0006-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0006-rs:data02
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0006-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_7-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_7-rs
|- Online IBM.Application:db2_bculinux_7-rs:data02
'- Offline IBM.Application:db2_bculinux_7-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0007-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0007-rs:data02
'- Offline IBM.Application:db2mnt-
db2fs_bculinux_NODE0007-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_8-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_8-rs
|- Online IBM.Application:db2_bculinux_8-rs:data02
'- Offline IBM.Application:db2_bculinux_8-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0008-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0008-rs:data02
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0008-rs:stdby01
Online IBM.ResourceGroup:db2_bculinux_9-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_9-rs
|- Online IBM.Application:db2_bculinux_9-rs:data03
'- Offline IBM.Application:db2_bculinux_9-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0009-rs
|- Online IBM.Application:db2mntdb2fs_bculinux_NODE0009-rs:data03
'- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0009-rs:stdby01
5. Setting instance name in profiles.reg on standby: On the standby node verify that the
registry file profiles.reg in the DB2 installation directory (/opt/IBM/dwe/db2/<version>)
contains the name of the DB2 instance. If necessary, add the instance name to this file.
526
$
789: '
$ 9
)
6. Taking resource groups offline and online: Once the instance has been made highly
available, you can use the ‘chrg –o <state>’ command to take the resource groups online
and offline. For example, to take all the resource groups offline you can put the following
commands in a file and run it as a script:
*
$
$
) (
*
$
$
) (
*
$
$
) (
*
$
$
) "(
*
$
$
) ;(
*
$
$
) <(
*
$
$
) #(
*
$
$
) :(
*
$
$
) =(
*
$
$
) 8(
*
$
$
)
(
*
$
$
)
(
*
$
$
)
(
Similarly, the offline script for this environment would contain:
*
$
$
) (
*
$
$
) (
*
$
$
) (
*
$
$
) "(
*
$
$
) ;(
*
$
$
) <(
*
$
$
) #(
*
$
$
) :(
*
$
$
) =(
*
$
$
) 8(
*
$
$
)
(
*
$
$
)
(
*
$
$
)
(
Although these commands return immediately, it takes some time for the resource groups
to be brought online or offline. You can use the lssam command to monitor the status of the
resource groups.
You can also check the HA domains and the nodes that are part of respective domains by
using the ‘lsrpdomain’ and ‘lsrpnode’ commands as show below:
bculinux@admin01:~> lsrpdomain
Name
OpState RSCTActiveVersion MixedVersions TSPort GSPort
ha_domain Online 2.5.1.2
No
12347 12348
bculinux@admin01:~> lsrpnode
Name
OpState RSCTVersion
admin01
Online 2.5.1.2
stdby01
Online 2.5.1.2
data01
Online 2.5.1.2
data02
Online 2.5.1.2
data03
Online 2.5.1.2
5.2 Appendix for db2haicu
Network Quorum device: A network quorum device is an IP address to which every cluster
domain node can connect (ping) at all times. In the current implementation, the FCM
network gateway IP is used assuming that as long as the FCM network segment is UP and
RUNNING, the gateway will always be ping-able. No special software is required to be
installed in this quorum device. It should just be reachable (ping-able) to all the nodes all
the time.
Public v/s Private address: In case the networks that you are trying to make highly
available are private networks (internal to the Warehouse setup) like the FCM or the
Cluster network, then you can choose to create a private network equivalency (e.g.
db2_private_network_0). For public networks, i.e. the networks which the external
applications use to connect to the setup, like the corporate network, you can choose to
create a public network equivalency (e.g. db2_public_network_0)
Virtual IP address: This is highly available IP address that the external clients/applications
will use to connect to the database. Hence, this address should be configured on the same
subnet which is exposed to the external clients/applications and only for database partition
0 on the administration BCU. Like in case of the current implementation, if db2haicu is run
before putting the system on corporate network, configuration of virtual IP is not required.
Once the system is put up on the corporate network, this additional configuration can be
done by running db2haicu again. This is covered in the ‘Post configuration steps’ section.
6. Configuring the NFS Server for HA
As mentioned earlier, the admin node in a D5100 Balanced Warehouse acts as an NFS
server for all the other nodes (including itself). We will now see how to make this NFS
server highly available to ensure that even if the admin node goes down, the NFS server
keeps running on the standby node. Recollect that the two directories that are NFS shared
across all nodes are /shared_db2home and /shared_home.
Procedure
1. The NFS server would already be running on the admin node, so before re-configuring
the NFS server for high availability, take it offline using the follow sequence of steps:
a. Take the DB2 instance offline.
b. Un-mount all of the NFS clients. (/db2home and /home)
c. Take the NFS server offline.
2. Obtain an unused IP address in the same subnet (on the FCM network) as the admin
node that will be used by the HA NFS server as a virtual IP address. In the current
implementation we take the IP address 172.16.10.40 as the NFS server virtual IP.
3. Since TSA starts the services required to run the NFS server automatically, these
services must be turned off on admin and standby nodes. In the event of a failure on the
admin node, TSA would automatically start the NFS services on the standby node to avoid
any downtime. If the NFS services start automatically at boot time, the failed node (admin)
will attempt to restart another NFS server after it is restored, even though the NFS server is
already running on the standby node. To prevent this situation from happening, we need to
ensure that the NFS server does not automatically start at boot time on both admin and
standby nodes by executing the following commands:
%
(($ ! $"<
%
(($ ! $"<
%
(($ ! $"<
!
4. There is a small file system on the lvnfsvarlibnfs logical volume that is created on the
admin node during the initial setup phase of the D5100 Balanced Warehouse. On the admin
node, mount this partition on /varlibnfs and copy all the files from /var/lib/nfs to this small
file system. Then, un-mount the file system.
5. On the standby node create the /varlibnfs directory first. Then on both admin and
standby node, back up the original /var/lib/nfs directory and create a link to the /varlibnfs
mount point using the following commands:
'
! !
'$ *
$
!
$
! $
!
$
9
$
6. Verify that these conditions are still true for admin and standby nodes:
• On both servers, the shared_home file system exists on /dev/vgnfs/lvnfshome and
the shared_db2home file system exists on /dev/vgnfs/lvnfsdb2home.
• On both servers, the /shared_home and /shared_db2home mount points exist.
• On both servers, the /etc/exports file includes the following entries for the shared
home directories:
•
•
On both servers, the /etc/fstab file includes the following entries for the shared
home directories:
!!
$!
!!
$!
)"
$
)
)"
$
)
On both servers check for the following entry in the /etc/fstab file:
!!
$!
! $
! $
)"
$
)
7. On all the NFS clients (all nodes), modify the /etc/fstab entries for the shared directories
using the HA-NFS service IP address. We initially added the following entries:
!
"
!
#
$ %
"
#
$ %
Modify these entries on all nodes so that they look like the following:
: 9 #9 9;
: 9 #9 9;
!
"
#
!
"
$ %
#
$ %
8. In the directory /usr/sbin/rsct/sapolicies/nfsserver on the admin node, edit the sanfsserver.conf file by changing the following lines.
•
In the nodes field, add the host names of both servers:
# --list of nodes in the NFS server cluster
nodes="admin01 stdby01"
•
Change the IP address to the virtual IP address and netmask of the NFS
server used before:
# --IP address and netmask for NFS server
ip_1="172.16.10.40,255.255.255.0"
•
Add the network interface name used by the NFS server and host name of each
server:
# --List of network interfaces ServiceIP ip_x depends on.
# Entries are lists of the form <network-interface-name>:<node-name>,...
nieq_1="bond0:admin01,bond0:stdby01"
•
Add the mount points for the varlibnfs, shared_home and shared_db2home file
systems:
# --common local mountpoint for shared data
# If more instances of <data_>, add more rows, like: data_tmp,
data_proj...
# Note: the keywords need to be unique!
data_varlibnfs="/varlibnfs"
data_work="/shared_db2home"
data_home="/shared_home"
•
This configuration file must be identical on the admin node and the standby node.
Therefore, copy this file over to the standby node:
# scp sa-nfsserver.conf stdby01mgt:/usr/sbin/rsct/sapolicies/nfsserver
9. The sam.policies package comes with two versions of the nfsserver scripts. On both the
admin node and the standby node, make the DB2 version of the script the active version
using the following commands:
# mv /usr/sbin/rsct/sapolicies/nfsserver/nfsserverctrl-server \
/usr/sbin/rsct/sapolicies/nfsserver/nfsserverctrl-server.original
# cp /usr/sbin/rsct/sapolicies/nfsserver/nfsserverctrl-server.DB2 \
/usr/sbin/rsct/sapolicies/nfsserver/nfsserverctrl-server
10. On one of the servers, change to the /usr/sbin/rsct/sapolicies/nfsserver directory and
then run the automatic configuration script to create the highly available NFS resources:
# cd /usr/sbin/rsct/sapolicies/nfsserver
# /usr/sbin/rsct/sapolicies/nfsserver/cfgnfsserver –p
11. Bring up the highly available NFS server:
$ chrg -o Online SA-nfsserver-rg
Although this command returns immediately, it takes some time for the NFS server to
come online. Verify the status of the resource groups by issuing the ‘lssam’ command:
After the resource groups have been brought online, your output for the SA-nfsserver-rg
should look similar to the following:
Online IBM.ResourceGroup:SA-nfsserver-rg Nominal=Online
|- Online IBM.Application:SA-nfsserver-data-home
|- Online IBM.Application:SA-nfsserver-data-home:admin01
'- Offline IBM.Application:SA-nfsserver-data-home:stdby01
|- Online IBM.Application:SA-nfsserver-data-varlibnfs
|- Online IBM.Application:SA-nfsserver-data-varlibnfs:admin01
'- Offline IBM.Application:SA-nfsserver-data-varlibnfs:stdby01
|- Online IBM.Application:SA-nfsserver-data-work
|- Online IBM.Application:SA-nfsserver-data-work:admin01
'- Offline IBM.Application:SA-nfsserver-data-work:stdby01
|- Online IBM.Application:SA-nfsserver-server
|- Online IBM.Application:SA-nfsserver-server:admin01
'- Offline IBM.Application:SA-nfsserver-server:stdby01
'- Online IBM.ServiceIP:SA-nfsserver-ip-1
|- Online IBM.ServiceIP:SA-nfsserver-ip-1:admin01
'- Offline IBM.ServiceIP:SA-nfsserver-ip-1:stdby01
12. Manually mount the client NFS mount points on all servers. Verify that the
/home and /db2home directories are mounted on both the admin node and standby node
and that the /home and /db2home directories are readable and writable by each server.
13. To verify the configuration, use the following command to move the location of the
NFS server from admin01 to stdby01:
rgreq –o move SA-nfsserver-rg
Verify that this command executes successfully. Issue the lssam command and verify that
the NFS server resources are offline on admin01 and online on stdby01. Issue the same
command to move the location of the NFS server from stdby01 back to admin01.
14. Create dependencies between the DB2 partitions and the NFS server by issuing the
following commands from a script as the root user:
# for DB2 resources:
for each data partition 'x' do:
mkrel -S IBM.Application:db2_bculinux_${x}-rs
-G IBM.Application:SA-nfsserver-server
-p DependsOnAny db2_bculinux_${x}-rs_DependsOn_SA-nfsserver-server-rel
15. Bring the DB2 instance back online and verify that all resources can start.
7. Post Configuration steps (at the customer site)
Once the setup is delivered at the customer site and is put up on the corporate network,
there are certain additional steps that must be undertaken to create public equivalency for
this corporate network to make it highly available. You need to find an unused IP on the
corporate network that would be used as a virtual IP (the highly available IP address which
the external clients/applications would use to connect to the database).
This additional setup does not require disturbing the initial setup which we did. You can
run db2haicu again as the instance owner and just create the new equivalencies.
Procedure
1. Run the db2haicu tool as the instance owner and select option 2. ’Add or remove a
network interface.’
Do you want to add or remove network interface cards to or from a
network? [1]
1. Add
2. Remove
1
Enter the name of the network interface card:
eth1
Enter the logical port name of the corporate network of admin01
Enter the host name of the cluster node which hosts the network
interface card eth1:
admin01
Enter the name of the network for the network interface card: eth1 on
cluster node: admin01
1. SA-nfsserver-nieq-1
2. db2_private_network_0
3. Create a new public network for this network interface card.
4. Create a new private network for this network interface card.
Enter selection:
3
We create a public network equivalency for the corporate network
Are you sure you want to add the network interface card eth1 on cluster
node admin01 to the network db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth1 on cluster node admin01 to the
network db2_public_network_0 ...
Adding network interface card eth1 on cluster node admin01 to the
network db2_public_network_0 was successful.
Do you want to add another network interface card to a network? [1]
1. Yes
2. No
1
Enter the name of the network interface card:
eth1
Enter the logical port name of the corporate n/w of stdby01
Enter the host name of the cluster node which hosts the network
interface card eth1:
stdby01
Enter the name of the network for the network interface card: eth1 on
cluster node: stdby01
1. db2_public_network_0
2. SA-nfsserver-nieq-1
3. db2_private_network_0
4. Create a new public network for this network interface card.
5. Create a new private network for this network interface card.
Enter selection:
1
Are you sure you want to add the network interface card eth1 on cluster
node stdby01 to the network db2_public_network_0? [1]
1. Yes
2. No
1
Adding network interface card eth1 on cluster node stdby01 to the
network db2_public_network_0 ...
Adding network interface card eth1 on cluster node stdby01 to the
network db2_public_network_0 was successful
2. We now need to configure the virtual IP which will be used as the highly available IP
by the external applications and clients. This would be configured only on the database
partition 0 (admin node). Find an unused IP address on the corporate network and run
db2haicu as the instance owner. Select option 6. ’Add or remove an IP address.’
Do you want to add or remove IP addresses to or from the cluster? [1]
1. Add
2. Remove
1
Which DB2 database partition do you want this IP address to be
associated with?
0
Enter the virtual IP address:
192.168.253.166
Enter the subnet mask for the virtual IP address 192.168.253.166:
[255.255.255.0]
255.255.255.0
Select the network for the virtual IP 192.168.253.166:
1. db2_public_network_0
2. SA-nfsserver-nieq-1
3. db2_private_network_0
Enter selection:
1
Adding virtual IP address 192.168.253.166 to the domain ...
Adding virtual IP address 192.168.253.166 to the domain was successful.
Do you want to add another virtual IP address? [1]
1. Yes
2. No
2
3. Create dependencies between database partition 0 and the corporate network
equivalency created before. Take the DB2 resources offline and then run the following
command as the root user to create the dependency:
mkrel -S IBM.Application:db2_bculinux_0-rs
-G IBM.Equivalency:db2_public_network_0
-p DependsOn db2_bculinux_0-rs_DependsOn_db2_public_network_0-rel
4. Create the network quorum device for the corporate network. Run the db2haicu tool
and select option 10. ’Create a new quorum device for the domain.’ Specify the
gateway IP of the corporate network (in this case 192.168.253.1)
This is how the network would look like after the above HA configuration:
8. Cluster monitoring
This section describes how to interpret the output on the lssam command that we used
before and how to monitor the cluster once it is configured. We will also discuss how the
lssam output would indicate that a data node or an admin node has successfully failed over
to the standby node.
To explain how to interpret the lssam output, lets take a snippet of the one we got once our
HA configuration was done.
Online IBM.ResourceGroup:db2_bculinux_9-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_9-rs
|- Online IBM.Application:db2_bculinux_9-rs:data03
'- Offline IBM.Application:db2_bculinux_9-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0009-rs
|- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0009rs:data03
'- Offline IBM.Application:db2mnt-db2fs_bculinux_NODE0009rs:stdby01
The above snippet is of the database partition 9 which exists on node data03. The state in
the first line indicates the overall state of the resource group db2_bculinux_9-rg. There
are two resources that have been made highly available in this group:
•
•
db2_bculinux_9-rs
This is the DB2 database instance partition 9 that has been made highly available. If
you see this resource is shown Online on data03 and Offline on stdby01 currently.
This indicates that this instance partition is currently active on data03 which is up
and running.
db2mnt-db2fs_bculinux_NODE0009-rs
This resource indicates the file system storage mount points that have been made
highly available for database partition 9. Here again, these are currently mounted on
data03 and hence the status is shown as Online on data03 and Offline on stdby01.
In the event that data03 node goes down (to test, take it off the FCM network), TSA would
try to bring up all the database instance partitions of data03 node and their associated
storage on standy01. If you run the lssam command as soon as you take out the FCM
network cables (both cables of bond0), you will see the following ‘lssam’ output:
Pending Offline IBM.ResourceGroup:db2_bculinux_9-rg Nominal=Online
|- Pending Offline IBM.Application:db2_bculinux_9-rs
|- Pending Offline IBM.Application:db2_bculinux_9-rs:data03
'- Offline IBM.Application:db2_bculinux_9-rs:stdby01
'- Pending Offline IBM.Application:db2mnt-db2fs_bculinux_NODE0009-rs
|- Pending Offline IBM.Application:db2mntdb2fs_bculinux_NODE0009-rs:data03
'- Offline IBM.Application:db2mnt-db2fs_bculinux_NODE0009rs:stdby01
The above output is shown only for the db2_bculinux_9-rg resource group. You will
see a similar output for the other data03 resource groups db2_bculinux_10-rg,
db2_bculinux_11-rg and db2_bculinux_12-rg as well.
Soon after, the lssam output should show you a similar output for all db2_bculinux_9rg, db2_bculinux_10-rg, db2_bculinux_11-rg and db2_bculinux_12-rg
resource groups
Online IBM.ResourceGroup:db2_bculinux_9-rg Nominal=Online
|- Online IBM.Application:db2_bculinux_9-rs
|- Offline IBM.Application:db2_bculinux_9-rs:data03
'- Online IBM.Application:db2_bculinux_9-rs:stdby01
'- Online IBM.Application:db2mnt-db2fs_bculinux_NODE0009-rs
|- Offline IBM.Application:db2mntdb2fs_bculinux_NODE0009-rs:data03
'- Online IBM.Application:db2mntdb2fs_bculinux_NODE0009-rs:stdby01
This now indicates that the database instance partition 9 and its associated storage points
have been successfully started on stdby01. In case you have a database query which selects
data from this partition, you could run the query to confirm that the data is successfully
retrieved.
Also, if you now open the ~/sqllib/db2nodes.cfg file, you will see its modified contents.
All the instances of data03 would have been replaced by stdby01:
"
%
&
"
"
$
$
$
$
"
$
$
$
$
Please note that in order bring up the resources back on data03, you will not only require to
connect it back again on the FCM network, but also take stdby01 offline. In order to take
corrective action on the node that had failed, it is mostly necessary to bring it up on the
network and only once all the hardware/software re-configurations (in case needed) have
been done can the resources be made active on that node. Hence resources do not
automatically switch back to data03 unless standby is deliberately made offline.
In case the admin node fails over to standby node then you would see the SA-nfsserverrg also Online on the standby node (and Offline on admin) along with
db2_bculinux_0-rg:
4
:
A
F6 4
F6 4
F6 4
F6 4
G6 4
2
*-# 6 7
/ 6
,
E4
:
**
-# 6 7
/ 6
6.
F6 477
:
**
-# 6 7
/ 6
6.
G6 4
:
**
-# 6 7
/ 6
6.
$
:
**
-# 6 7
/ 6
6/
7
F6 477
:
**
-# 6 7
/ 6
6/
7 G6 4
:
**
-# 6 7
/ 6
6/
7 :
**
-# 6 7
/ 6
6
F6 477
:
**
-# 6 7
/ 6
6
G6 4
:
**
-# 6 7
/ 6
6
$
:
**
-# 6 7
/ 6
/
F6 477
:
**
-# 6 7
/ 6
/ G6 4
:
**
-# 6 7
/ 6
/ $
:
# /
:>-# 6 7
/ 6 *6
F6 477
:
# /
:>-# 6 7
/ 6 *6 G6 4
:
# /
:>-# 6 7
/ 6 *6 $
$
4
:
A
F6 4
F6 4
2
*'
:
**
F6 477
:
G6 4
:
:
**
F6 477
:
G6 4
:
**
**
**
**
(' 6
'
,
E4
(' 6
'
'
6
7 '
6
6
(' 6 (' 6 $
(',4!+
6
7 '
(',4!+
7 '
(',4!+
6 6 -
$
9. Lessons learnt during HA implementations
Over various implementations we have came across few common observations which are
cause of errors and configuration issues. Those are discussed here so that same can be
avoided on the first attempt itself.
9.1 Hostname conflict
Make sure that the OS hostnames on each node match with the hostnames mentioned in
the db2nodes.cfg file. In case the hostnames do not match, you are likely to hit the
following error while running db2haicu:
>?@"
= . >
$
,", ,AB>>C /+0 6 5>>5.D,9>?@>EBE1
=
9.2 Prevent auto-mount of shared file systems
If you remember while adding entries in the /etc/fstab for all nodes, we emphasized on
the “noauto” option. This option becomes very critical for the TSA to function smoothly
and auto-mount file systems based on which node is up.
In case “noauto” is not specified, the default option is ‘auto’ which means as soon as the
failed node is re-booted, it will itself try to mount all file systems. This is undesirable
since in most cases it is required to bring up the failed node before taking any corrective
actions (hardware/software re-configurations). The “noauto” option prevents automounting of file systems at this stage and waits for TSA to mount them when the standby
node is offlined.
9.3 Preventing file system consistency checks at boot time
If you recollect the format of entries we used for /etc/fstab, you would notice that we had
set the sixth field to zero (‘0’).
/dev/vgdb2fsp0/lvdb2fsp0 /db2fs/bculinux/NODE0000 ext3 noauto,acl,user_xattr
1 0
/dev/vgdb2fsp1/lvdb2fsp1
/dev/vgdb2fsp2/lvdb2fsp2
/dev/vgdb2fsp3/lvdb2fsp3
/dev/vgdb2fsp4/lvdb2fsp4
/db2fs/bculinux/NODE0001
/db2fs/bculinux/NODE0002
/db2fs/bculinux/NODE0003
/db2fs/bculinux/NODE0004
ext3
ext3
ext3
ext3
noauto,acl,user_xattr
noauto,acl,user_xattr
noauto,acl,user_xattr
noauto,acl,user_xattr
1
1
1
1
0
0
0
0
/dev/vgdb2fsp5/lvdb2fsp5
/dev/vgdb2fsp6/lvdb2fsp6
/dev/vgdb2fsp7/lvdb2fsp7
/dev/vgdb2fsp8/lvdb2fsp8
/db2fs/bculinux/NODE0005
/db2fs/bculinux/NODE0006
/db2fs/bculinux/NODE0007
/db2fs/bculinux/NODE0008
ext3
ext3
ext3
ext3
noauto,acl,user_xattr
noauto,acl,user_xattr
noauto,acl,user_xattr
noauto,acl,user_xattr
1
1
1
1
0
0
0
0
/dev/vgdb2fsp9/lvdb2fsp9
/dev/vgdb2fsp10/lvdb2fsp10
/dev/vgdb2fsp11/lvdb2fsp11
/dev/vgdb2fsp12/lvdb2fsp12
/db2fs/bculinux/NODE0009
/db2fs/bculinux/NODE0010
/db2fs/bculinux/NODE0011
/db2fs/bculinux/NODE0012
ext3
ext3
ext3
ext3
noauto,acl,user_xattr
noauto,acl,user_xattr
noauto,acl,user_xattr
noauto,acl,user_xattr
1
1
1
1
0
0
0
0
This sixth field is used by the fsck (the file system check utility) to determine the order in
which file systems should be checked. If not set to ‘0’, at boot time the fsck utility might
attempt to check these file systems for consistency which is again undesirable. Consider
the case when a data node fails over to standby. TSA would mount all the corresponding
data node’s file systems on standby.
Now, in case the sixth field in /etc/fstab for the data node is non-zero and it is re-booted,
at boot time the linux fsck utility might attempt to check consistency of all its storage file
systems, the control of which would be with standby. In such a case, the data node might
not boot at all. Only workaround at this point would be to boot the data node in safe
mode, logon as root and remove/comment all these entries in /etc/fstab.
On the other hand, if the sixth field is set to ‘0’, the fsck does not attempt to check these
file systems at boot time. This prevents such a situation from happening.