DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Kiran Chinta ([email protected]) Software Developer IBM 17 October 2013 Rob Causley ([email protected]) Software Developer IBM Vincent Kulandai Samy ([email protected]) Software Developer IBM Although DB2® high availability disaster recovery (HADR) is billed as a feature that's easy to set up, customers often have problems picking the right settings for their environment. This article is a use case, and it shows how you can use the HADR simulator tool to configure and troubleshoot your HADR configuration in a real-world scenario. Using the examples and generalized guidance that this article provides, you should be able to test your own setups and pick the optimal settings. High availability disaster recovery overview HADR is an easy-to-use high availability and disaster recovery feature that uses physical log shipping from the primary database to the standby database. Transactional logs are shipped from the primary to the standby, which is typically in a different location than the primary, and then replayed on the standby. HADR performance relies on log shipping and replay performance. These two factors, in turn, depend on the system configuration and how well the system is tuned and maintained. The HADR system should be able to cope with varying log generation rates, network bandwidth, and various other performance-influencing factors. You can find generalized best practices on tuning and maintaining the HADR system by consulting existing documentation. This particular article, however, is an in-depth exploration of the technical details of tuning a realworld setup that provides a step-by-step guide that should help you understand how to tune the configuration of your HADR system. Although this article focuses on an HADR setup for DB2 for Linux®, UNIX®, and Windows® Version 9.7, it is also applicable to subsequent releases. © Copyright IBM Corporation 2013 DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Trademarks Page 1 of 35 developerWorks® ibm.com/developerWorks/ Influences on HADR replication The performance of HADR replication is influenced by various factors, including but not limited to, the following factors: 1. System configuration on primary and standby (such as CPU and memory) 2. The setting for the hadr_syncmode configuration parameter 3. Network bandwidth 4. File system I/O rate on the primary and standby 5. Workload on the primary 6. Replay speed on the standby This article helps you understand and evaluate each of these items. The goal is to develop an HADR configuration that performs well with the given infrastructure. Evaluating infrastructure capacity A key initial step in choosing your HADR configuration is evaluating your system’s capacity. This is not just a one-off exercise, however; due to the continuous business (and as a result, database) growth and changing business demands, your requirements likely change over time. This leads to subsequent changes in hardware and software configuration. To make sure the system can handle the growing database and workload, and can continue to meet the service level agreement, you need to evaluate the infrastructure capacity not only at the initial setup time but also in the run time at certain intervals. This article walks through the sequence of steps required to evaluate the system capacity. In this process, you can use various operating-system-level commands and the HADR Simulator tool to calculate and understand how well the system performs given the current set of configurations. The HADR Simulator is a lightweight tool that estimates HADR performance under various conditions without even requiring you to start any databases. As its name suggests, the HADR Simulator simulates DB2 log write and HADR log shipping. You can find more information on the tool and download the executable from here. The example system The primary system used in the demonstration is located in Beaverton, Oregon, USA, and the standby system is in San Jose, California, USA. The distance between the sites is approximately 1,000 km (660 miles). Figure 1. HADR setup and WAN used for the example throughout this article DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 2 of 35 ibm.com/developerWorks/ developerWorks® The installed DB2 product is DB2 Version 9.7 Fix Pack 5. The two hosts, hadrPrimaryHost and hadrStandbyHost, have the following hardware: 1. hadrPrimaryHost: 1. CPU: 4 x 2 GHz AMD Opteron 846 2. Memory: 12GB 3. Disk: 2 x 73GB, 3x146GB @ 15k RPM 4. Operating system: SUSE Linux Enterprise Server v10 SP3 2. hadrStandbyHost: 1. CPU:2.6 GHz dual-core AMD Opteron 2. Memory: 24GB 3. Disk: 4 x 200GB 4. Operating system: SUSE Linux Enterprise Server v10 SP3 Allocating storage in an HADR environment When allocating the storage for a database, it is important to understand the various storage options available and the storage performances. A database primarily needs storage for the following things: 1. Transactional log files 2. Overflow log files if the overflow path is set 3. Mirror log files if the mirror log path is set 4. Table space data 5. Log archiving if the logarchmeth1 configuration parameter is set, the logarchmeth2 configuration parameter is set, or both parameters are set Transactional logs are written in a sequential order, whereas table space data is mostly written in random order based on the page being accessed and written. Allocate a fast writing device to store transactional log files. Most devices have documentation that provides disk write performance; however, if you do not have this information, you can use the following method to approximate those values. We used the IBM DB2 HADR Simulator to perform large writes (4096 pages per write) and small writes (1 page per write) on all disks. The simulator writes multiple times and gives us both a range and an average for each type of write. We consider the throughput value achieved with large writes as the transfer rate or throughput of the device; that is when a big write is performed, most of the I/O is spent in writing and the overhead (seek time and disk rotation) is negligible. Conversely, most of the reported I/O time for a small write is spent on disk rotation and seek. As a result, we consider the throughput value achieved with the small writes as the overhead. Perform the following tasks on the primary and standby hosts: 1. List all the available storage on the system using the df command. 2. Run the HADR Simulator with the -write option to calculate disk write speed for each file system listed by the df command. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 3 of 35 developerWorks® ibm.com/developerWorks/ The -write option takes one argument: the file system with the file name to which the data is written. Use the -flushsize option to control the size of each write. The default value for flush size (16 pages) is sufficient for an OLTP system. ~/simhadr -write /filesystem/temp_file/file1 -flushsize 4096 3. Allocate your storage according to the results in Step 2. On a system that does not use the DB2 pureScale Feature, the active log (containing transaction log records) is written to a single logical device by a single thread. Each transaction is written to the log but not necessarily to a table space on disk. An application commit must wait for logs to be written to the active log path (disk). Table space data updates are buffered and written asynchronously by sophisticated and efficient algorithms. As a result, the bottleneck is at the single thread that writes all of the transaction log records to the active log path sequentially. The file system allocated to active logs must handle the peak logging rate. To handle the peak logging rate, choose the best performing disk for active logs. For the archive path, the file system should provide greater than average logging rate throughput. At the peak logging time, the archive could fall behind but it catches up at the non-peak logging time, assuming there is enough space on active log path to buffer peak time logs. Here are these steps and their corresponding output for our example system: 1. On the primary (hadrPrimaryHost) df Executing df command on hadrPrimaryHost df -kl Filesystem 1K-blocks Used Available Use% Mounted on /dev/sda7 20641788 6020460 13572688 31% / udev 5931400 176 5931224 1% /dev /dev/sda6 313200 40896 272304 14% /boot /dev/sda9 1035660 1728 981324 1% /notnfs /dev/sda8 2071384 3232 1962928 1% /var/tmp /dev/sdb2 66421880 48542864 14504968 77% /work1 /dev/sdc2 136987020 1852144 128176324 2% /work2 /dev/sdd2 136987020 36976444 93052024 29% /work3 /dev/sde2 136987020 1631756 128396712 2% /work4 /dev/sda10 42354768 28268360 11934908 71% /work5 tmpfs 4194304 12876 4181428 1% /tmp --- The command is run on all devices. Comparing the results, the file system /work3/kkchinta/ performed better. Here are the results for this disk: ~/simhadr -write /work3/kkchinta/simhadr.tmp -verbose -flushsize 4096 Measured sleep overhead: 0.003709 second, using spin time 0.004450 second. Simulation run time = 4 seconds Writing to file /work3/kkchinta/simhadr.tmp Press Ctrl-C to stop. Writing 4096 pages Writing 4096 pages Writing 4096 pages Writing 4096 pages Writing 4096 pages Writing 4096 pages Writing 4096 pages Writing 4096 pages DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 4 of 35 ibm.com/developerWorks/ Writing Writing Writing Writing Writing 4096 4096 4096 4096 4096 developerWorks® pages pages pages pages pages Total 13 writes in 4.109773 seconds, 0.316136 sec/write, 4096 pages/write Total 218.103808 MBytes written in 4.109773 seconds. 53.069551 MBytes/sec Distribution of write time (unit is microsecond): Total 13 numbers, Sum 4109773, Min 303356, Max 330640, Avg 316136 From 262144 to 524287 13 numbers --~/simhadr -write /work3/kkchinta/simhadr.tmp -verbose -flushsize 1 Total 3581 writes in 4.000320 seconds, 0.001117 sec/write, 1 pages/write Total 14.667776 MBytes written in 4.000320 seconds. 3.666651 MBytes/sec Distribution of write time (unit is microsecond): Total 3581 numbers, Sum 4000320, Min 325, Max 25220, Avg 1117 From 256 to 511 1143 numbers From 512 to 1023 2217 numbers From 1024 to 2047 1 numbers From 2048 to 4095 9 numbers From 4096 to 8191 86 numbers From 8192 to 16383 105 numbers From 16384 to 32767 20 numbers --- 2. On the standby (hadrStandbyHost) df Executing df command on hadrStandbyHost df -khl Filesystem Size Used Avail Use% /dev/sda3 276G 151G 125G 55% udev 12G 216K 12G 1% /dev/sda1 134M 96M 38M 72% /dev/sdb1 280G 220G 60G 79% /dev/sdc1 181G 80G 93G 47% /dev/sdd1 181G 116G 57G 68% /dev/sde1 181G 58G 115G 34% /dev/sdf1 181G 116G 57G 68% /dev/sdg1 181G 58G 115G 34% /dev/sdh1 181G 116G 57G 68% /dev/sdj1 181G 147G 26G 86% /dev/sdi1 136G 58G 79G 43% /dev/md0 139G 99G 40G 72% Mounted on / /dev /boot /home /perf1 /perf2 /perf3 /perf4 /perf5 /perf6 /perf8 /perf7 /stripe --simhadr -write /perf5/kkchinta/simhadr.tmp -verbose -flushsize 4096 Measured sleep overhead: 0.003970 second, using spin time 0.004764 second. Simulation run time = 4 seconds Writing to file /perf5/kkchinta/simhadr.tmp Press Ctrl-C to stop. Writing 4096 pages Writing 4096 pages Writing 4096 pages DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 5 of 35 developerWorks® Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing Writing 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 4096 ibm.com/developerWorks/ pages pages pages pages pages pages pages pages pages pages pages pages pages Total 16 writes in 4.252596 seconds, 0.265787 sec/write, 4096 pages/write Total 268.435456 MBytes written in 4.252596 seconds. 63.122727 MBytes/sec Distribution of write time (unit is microsecond): Total 16 numbers, Sum 4252596, Min 246759, Max 328503, Avg 265787 From 131072 to 262143 9 numbers From 262144 to 524287 7 numbers --simhadr -write /perf5/kkchinta/simhadr.tmp -verbose -flushsize 1 Total 165 writes in 4.018807 seconds, 0.024356 sec/write, 1 pages/write Total 0.675840 MBytes written in 4.018807 seconds. 0.168169 MBytes/sec Distribution of write time (unit is microsecond): Total 165 numbers, Sum 4018807, Min 10614, Max 110876, Avg 24356 From 8192 to 16383 26 numbers From 16384 to 32767 127 numbers From 32768 to 65535 11 numbers From 65536 to 131071 1 numbers --- Table 1 and Table 2 show the disk performance for the primary and standby: Table 1. Performance results for hadrPrimaryHost Disk /work3/kkchinta Speed 63.122727 MB/s Table 2. Performance results for hadrStandbyHost Disk /perf5/kkchinta/ Speed 63.122727 MB/s Based on these results, the recommended file system allocation is as follows: • On hadrPrimaryHost: • DB2 transactional log files: /work3/kkchinta (53.069551 MB/s) • Table space data: /u/kkchinta • Log archive: /work4/kkchinta DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 6 of 35 ibm.com/developerWorks/ developerWorks® • On hadrStandbyHost: • DB2 transactional log files: /perf5/kkchinta/ (63.122727 MB/s) • Table space data: /home/kkchinta • Log archive: /work1 Choosing an HADR synchronization mode The HADR synchronization mode determines the degree of protection your HADR database solution has against transaction loss. Choosing the correct synchronization mode is one of the most important configuration decisions that you have to make because achieving the optimal network throughput and performance from your HADR pair is part of satisfying your business's service-level agreement. At the same time, a variety of factors have an impact on how fast transactions are processed. In other words, there can be a trade off between synchronization and performance. The synchronization mode determines when the primary database considers a transaction complete. For the modes that specify tighter synchronization, SYNC and NEARSYNC, this means that the primary waits for an acknowledgement message from the standby. For the looser synchronization modes, the primary considers a transaction complete as soon as it sends the logs to the standby (ASYNC) or as soon as it writes the logs to its local log device (SUPERASYNC). Although the general rule would be to choose a synchronization mode based on network speed, there are a number of other things to consider when choosing your synchronization mode: • Distance between the primary and standby site: At a high level, the suggested synchronization modes are follows: • SYNC if the primary and standby are located in the same data center • NEARSYNC if the primary and standby are located in different data centers but same city limits • ASYNC or SUPERASYNC if the primary and standby are separated by great distances As stated earlier, the distance between the sites in our example scenario is approximately 1,000 km (660 miles). • Network type between the primary and the standby: The general recommendation is to use SYNC or NEARSYNC for systems over a LAN and ASYNC, SUPERASYNC for systems over a WAN. In our example scenario, a WAN connects the primary and standby sites: • Memory resources on the primary and standby The primary system has 12GB and the standby system has 24GB. • Log generation rate on the primary Defining the workload and estimating the amount of log data generated (as well as the flush size) is necessary to enable smooth log shipping and replay on standby. You should estimate the number of write transactions per second that take place in your business and the maximum amount of data (transactional logs) written by each transaction. Alternatively, DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 7 of 35 developerWorks® ibm.com/developerWorks/ you can do a quick test run on a standard database. The equation for log generation rate is the following: Total data generated/sec = num. of transaction per sec × data per transaction In addition, if you are using the standby purely for disaster recovery and can tolerate some risk of data loss, you might also choose one of the less synchronous modes. Using the HADR Simulator to determine performance of different synchronization modes The best way to see how your HADR deployment will perform under different synchronization modes is to use the HADR Simulator to measure throughput and performance under different modes. Use the following command to describe your HADR setup to the simulator: ~/simhadr -role HADR_ROLE_value -lhost HADR_LOCAL_HOST_value -lport HADR_LOCAL_PORT_value -rhost HADR_REMOTE_HOST_value -rport HADR_REMOTE_PORT_value -syncmode HADR_SYNCMODE_value -flushSize value -sockSndBuf TCP_socket_send_value -sockRcvBuf TCP_socket_receive_value -disk transfer_rateoverhead The HADR Simulator supports only port numbers. It does not support service names for the -lport and -rport options. Choosing a value for -flushSize The flush size is nondeterministic, so for the purposes of choosing a synchronization mode, keep the default setting of 16. Choosing a value for -sockSndBuf and -sockRcvBuf These parameters specify the socket send and receive buffer size for the HADR connection. On most platforms, the TCP buffer size is the same as the TCP window size. If the TCP window size (defined below) is too small, the network cannot fully utilize its bandwidth, and applications like HADR experience throughput lower than the nominal bandwidth. On WAN systems, you should pick a setting that is larger than the system default because of the relatively long round-trip time. On LAN systems, the system default socket buffer size is usually large enough because round-trip time is short. The rule of thumb for choosing the appropriate TCP window size is: TCP window size = send_rate × round_trip_time Check with your network equipment vendor or service provider to know the send rate of your network. Alternatively, you can calculate the send rate with the existing (or default) TCP window sizes with one of the following methods: • Send data via FTP or by using the rcp command to the other host on the network and calculate data sent/time taken. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 8 of 35 ibm.com/developerWorks/ developerWorks® • Use the test TCP (TTCP) tool. We used the TTCP tool. First, run the tool on the receiving side to have a port waiting for data: ttcp -r ttcp-r: ttcp-r: ttcp-r: ttcp-r: ttcp-r: ttcp-r: -s -p 16372 buflen=8192, nbuf=2048, align=16384/0, port=16372 tcp socket accept from 9.47.73.33 16777216 bytes in 7.93 real seconds = 2066.74 KB/sec +++ 10609 I/O calls, msec/call = 0.77, calls/sec = 1338.26 0.0user 0.0sys 0:07real 0% 0i+0d 0maxrss 0+2pf 10608+1csw Then run the tool on the sending side to send some data: ttcp -t -s -p 16372 hadrStandbyHost.svl.ibm.com ttcp-t: buflen=8192, nbuf=2048, align=16384/0, port=16372 tcp -> hadrStandbyHost.svl.ibm.com ttcp-t: socket ttcp-t: connect ttcp-t: 16777216 bytes in 7.92 real seconds = 2068.38 KB/sec +++ ttcp-t: 2048 I/O calls, msec/call = 3.96, calls/sec = 258.55 ttcp-t: 0.0user 0.0sys 0:07real 1% 0i+0d 0maxrss 0+3pf 526+0csw Based on this test, the send rate in our setup is 2.02 MB/s. To calculate the round-trip time, you can issue a ping command: ping -c 10 hadrStandbyHost.svl PING hadrStandbyHost.svl.ibm.com (9.30.4.113) 56(84) bytes of data. 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=1 ttl=51 time=26.0 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=2 ttl=51 time=25.8 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=3 ttl=51 time=26.8 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=4 ttl=51 time=34.1 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=5 ttl=51 time=26.0 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=6 ttl=51 time=26.5 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=7 ttl=51 time=27.3 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=8 ttl=51 time=28.4 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=9 ttl=51 time=29.1 ms 64 bytes from hadrStandbyHost.svl.ibm.com (9.30.4.113): icmp_seq=10 ttl=51 time=26.5 ms --- hadrStandbyHost.svl.ibm.com ping statistics --10 packets transmitted, 10 received, 0% packet loss, time 9024ms rtt min/avg/max/mdev = 25.851/27.704/34.115/2.378 ms In this scenario, choose the average value as 27.704 ms (0.02770 sec). Based on the calculated send rate and round trip time, the minimum TCP/IP receive/send buffer window size as is follows: TCP window size = = = = send_rate × round_trip_time 2.02 × 0.02770 0.055 MB 58672 bytes If the system default is larger than the calculated value, then there is no need to provide the explicit buffer size or change any system settings. If the system default is smaller, then you might need to explicitly set the buffer size. Before setting the buffer size, however, confirm that your DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 9 of 35 developerWorks® ibm.com/developerWorks/ system allows this value as a buffer size. To do this on Linux, find the TCP receive and write memory values from your system configuration, namely the following three values: 1. net.ipv4.tcp_rmem: TCP receive window 2. net.ipv4.tcp_wmem: TCP send window 3. net.ipv4.tcp_mem: Total TCP buffer space allocable You can use the following command: /sbin/sysctl -a | grep tcp net.ipv4.tcp_rmem = 4096 net.ipv4.tcp_wmem = 4096 net.ipv4.tcp_mem = 196608 87380 16384 262144 174760 131072 393216 The three values returned for each indicate the minimum, default, and maximum setting in bytes. 58,672 bytes is an allowed value for now. If the amount of memory needed is not within the allowed system limit, then you should modify the allowed limit. You can get the current settings of TCP/IP networking parameters from the OS. For Linux, based on the version you are running, determine what parameter controls the maximum settings and run /sbin/sysctl –a | grep net to get the current settings. For AIX, look at the sb_max and rfc1323 settings. You can get the current settings for the sb_max and rfc1323 variables by running the no –a command. When changing these variables, a system reboot might be necessary. Verify that no other applications running on the same host are adversely impacted by this change. After you determine the TCP window size, increase its value: try doubling or tripling (or more) its value—until you no longer see an increase in throughput. Each time you increase the TCP window size, run a test to test the network throughput. You will notice that at some point, the throughput stops increasing even after TCP window size is increased. When the throughput stops increasing, the last value used for the TCP window size is good to get the best use of the network. Choosing a value for -disk This parameter specifies disk speed (transfer rate and overhead) using two values: data rate in MB/s and per I/O operation overhead in seconds. Earlier, we tested disk write speed with the HADR Simulator tool for the file systems dedicated to the transactional log files on the primary (/work3/kkchinta) and on the standby (/perf5/kkchinta/). Here is a snippet of those results: ~/simhadr -write /work3/kkchinta -verbose -flushsize 1 Total 3581 writes in 4.000320 seconds, 0.001117 sec/write, 1 pages/write Total 14.667776 MBytes written in 4.000320 seconds. 3.666651 MBytes/sec -----------~/simhadr -write /work3/kkchinta -verbose -flushsize 4096 Total 13 writes in 4.109773 seconds, 0.316136 sec/write, 4096 pages/write Total 218.103808 MBytes written in 4.109773 seconds. 53.069551 MBytes/sec DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 10 of 35 ibm.com/developerWorks/ developerWorks® You can use these results to determine the data rate and per I/O operation overhead as follows: • The write time for the run with a 1-page flush size is the I/O operation overhead. • The MB/s amount for the run with a large flush size (in our case, 4096 pages) is the transfer rate. On the primary, the value for the overhead is 0.001117 s and the transfer rate is 53.069551 MB/s, and on the standby the value for the overhead is 0.024283 and the transfer rate is 63.122727. You can do a run with a 1-page flush size. The reported write time is an approximation of per-write overhead. Then do a run with a large flush size such as 500 or 1000. The reported MB/s is an approximation of write rate. Alternatively, you can solve the following equation to determine the write rate and per-write overhead: IO_time = data_amount × data_rate + per_IO_overhead Table 3 lists all of the set values in place to describe the system to the HADR Simulator tool. The next step is to try out the different synchronization modes, tabulate the results of each test, and then compare the performance of the different modes. Table 3. Set values for the HADR Simulator tool Host hadrPrimaryHost hadrStandbyHost Sync mode Flush size (4 K pages) 32 32 Overhead per write (seconds) 0.001117 .024283 Transfer rate (MB/s) 53.069551 63.122727 TCP/IP send buffer size (bytes) 58672 58672 TCP/IP receive buffer size (bytes) 58672 58672 HADR receive buffer size (4K Pages) 128 128 Throughput(MB/s) (primary sending/standby receiving) Percentage of network wait Throughput achieved in SYNC mode Run the HADR Simulator tool on the primary and standby, with the appropriate values. You can start the primary or standby first. The one started first waits for the other one to start to make a connection. The tool writes to standard output. It does not write log data to disk; instead, it uses the provided numbers from the -disk option to simulate log writes. For this example scenario, issue the following command on the primary: ~/simhadr -role primary -lhost hadrPrimaryHost -lport 53970 -rhost hadrStandbyHost.svl.ibm.com -rport 28239 -syncmode sync -flushSize 32 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 53.069551 0.001117 DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 11 of 35 developerWorks® ibm.com/developerWorks/ Run the HADR Simulator tool with SYNC mode only for the purposes of comparing the results. Given the long distance between the two sites, that is not a realistic setting. The output from the tool is as follows: Measured sleep overhead: 0.003727 second, using spin time 0.004472 second. Simulation run time = 4 seconds Resolving local host hadrPrimaryHost via gethostbyname() hostname=hadrPrimaryHost.beaverton.ibm.com alias: hadrPrimaryHost address_type=2 address_length=4 address: 9.47.73.33 Resolving remote host hadrStandbyHost.svl.ibm.com via gethostbyname() hostname=hadrStandbyHost.svl.ibm.com address_type=2 address_length=4 address: 9.30.4.113 Socket property upon creation BlockingIO=true NAGLE=true SO_SNDBUF=16384 SO_RCVBUF=87380 SO_LINGER: onoff=0, length=0 Calling setsockopt(SO_SNDBUF) Calling setsockopt(SO_RCVBUF) Socket property upon buffer resizing BlockingIO=true NAGLE=true SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Binding socket to local address. Listening on local host TCP port 53970 ---> [The output stops here until the simhadr tool is executed on the standby] Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Sending handshake message: syncMode=SYNC flushSize=32 connTime=2012-04-13_11:58:15_PDT Sending log flushes. Press Ctrl-C to stop. SYNC: Total 3014656 bytes in 4.126519 seconds, 0.730557 MBytes/sec Total 23 flushes, 0.179414 sec/flush, 32 pages (131072 bytes)/flush disk speed: 53.069551 MB/second, overhead: 0.001117 second/write Total 3014656 bytes written in 0.082478 seconds. 36.551032 MBytes/sec Total 23 write calls, 131.072 kBytes/write, 0.003586 sec/write Total 3014656 bytes sent in 4.126519 seconds. 0.730557 MBytes/sec DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 12 of 35 ibm.com/developerWorks/ developerWorks® Total 57 send calls, 52.888 KBytes/send, Total 34 congestions, 0.014269 seconds, 0.000419 second/congestion Total 1104 bytes recv in 4.126519 seconds. 0.000268 MBytes/sec Total 23 recv calls, 0.048 KBytes/recv Distribution of log write size (unit is byte): Total 23 numbers, Sum 3014656, Min 131072, Max 131072, Avg 131072 Exactly 131072 23 numbers Distribution of log shipping time (unit is microsecond): Total 23 numbers, Sum 4043919, Min 139617, Max 267547, Avg 175822 From 131072 to 262143 22 numbers From 262144 to 524287 1 numbers Distribution of congestion duration (unit is microsecond): Total 34 numbers, Sum 14269, Min 206, Max 893, Avg 419 From 128 to 255 7 numbers From 256 to 511 24 numbers From 512 to 1023 3 numbers Distribution of send size (unit is byte): Total 57 numbers, Sum 3014656, Min 7992, Max 79640, Avg 52888 From 4096 to 8191 1 numbers From 8192 to 16383 9 numbers From 16384 to 32767 2 numbers From 32768 to 65535 22 numbers From 65536 to 131071 23 numbers Distribution of recv size (unit is byte): Total 23 numbers, Sum 1104, Min 48, Max 48, Avg 48 Exactly 48 23 numbers Then, issue the following command on the standby: ~/simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283 + simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283 The output from the tool is as follows: Measured sleep overhead: 0.003931 second, using spin time 0.004717 second. Resolving local host hadrStandbyHost.svl.ibm.com via gethostbyname() hostname=hadrStandbyHost.svl.ibm.com alias: hadrStandbyHost address_type=2 address_length=4 address: 9.30.4.113 Resolving remote host hadrPrimaryHost.beaverton.ibm.com via gethostbyname() hostname=hadrPrimaryHost.beaverton.ibm.com address_type=2 address_length=4 address: 9.47.73.33 Socket property upon creation BlockingIO=true NAGLE=true SO_SNDBUF=16384 SO_RCVBUF=87380 SO_LINGER: onoff=0, length=0 DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 13 of 35 developerWorks® ibm.com/developerWorks/ Calling setsockopt(SO_SNDBUF) Calling setsockopt(SO_RCVBUF) Socket property upon buffer resizing BlockingIO=true NAGLE=true SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Connecting to remote host TCP port 28239 Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Received handshake message: syncMode=SYNC flushSize=32 connTime=2012-04-13_11:58:15_PDT Standby receive buffer size 128 pages (524288 bytes) Receiving log flushes. Press Ctrl-C on primary to stop. Zero byte received. Remote end closed connection. SYNC: Total 3014656 bytes in 4.118283 seconds, 0.732018 MBytes/sec Total 23 flushes, 0.179056 sec/flush, 32 pages (131072 bytes)/flush disk speed: 63.122727 MB/second, overhead: 0.024283 second/write Total 3014656 bytes written in 2.743122 seconds. 1.098987 MBytes/sec Total 111 write calls, 27.159 kBytes/write, 0.024713 sec/write Total 1104 bytes sent in 4.118283 seconds. 0.000268 MBytes/sec Total 23 send calls, 0.048 KBytes/send, Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion Total 3014656 bytes recv in 4.118283 seconds. 0.732018 MBytes/sec Total 111 recv calls, 27.159 KBytes/recv Distribution of log write size (unit is byte): Total 111 numbers, Sum 3014656, Min 4096, Max 65536, Avg 27159 Exactly 4096 2 numbers Exactly 8192 1 numbers Exactly 16384 58 numbers Exactly 32768 30 numbers Exactly 49152 15 numbers Exactly 65536 5 numbers Distribution of send size (unit is byte): Total 23 numbers, Sum 1104, Min 48, Max 48, Avg 48 Exactly 48 23 numbers Distribution of recv size (unit is byte): Total 111 numbers, Sum 3014656, Min 1024, Max 65536, Avg 27159 Exactly 4344 1 numbers Exactly 8688 1 numbers Exactly 16384 57 numbers Exactly 32768 30 numbers Exactly 18712 1 numbers Exactly 1024 1 numbers Exactly 49152 15 numbers DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 14 of 35 ibm.com/developerWorks/ Exactly 65536 developerWorks® 5 numbers After the test is complete, add the results to the table. In Table 4, the last row Percentage of network wait is calculated the following way: (time spent in waiting for network to consume more data/total time) = (total time/reported for congestion/total run time) For our primary, it is (0.014269 / 4.126519) and for the standby, it is 0. Table 4. Set values for the HADR Simulator tool Host hadrPrimaryHost hadrStandbyHost Sync mode SYNC SYNC Sync mode Flush size (4 K pages) 32 32 Overhead per write (seconds) 0.001117 .024283 Transfer rate (MB/s) 53.069551 63.122727 TCP/IP send buffer size (bytes) 58672 58672 TCP/IP receive buffer size (bytes) 58672 58672 HADR receive buffer size (4K Pages) 128 128 Throughput(MB/s) (primary sending/standby receiving) 0.730557 0.732018 Percentage of network wait YES (0.3%) NO Throughput achieved in NEARSYNC mode Run the HADR Simulator tool on the primary and standby, with the appropriate values. You can start the primary or standby first. The one started first waits for the other one to start to make a connection. The tool writes to standard output. It does not write log data to disk; instead, it uses the provided numbers from the –disk option to simulate log write. For our example system, issue the following command on the primary: ~/simhadr -role primary -lhost hadrPrimaryHost -lport 53970 -rhost hadrStandbyHost.svl.ibm.com -rport 28239 -syncmode nearsync -flushSize 32 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 53.069551 0.001117 The output from the tool is as follows: Measured sleep overhead: 0.003609 second, using spin time 0.004330 second. Simulation run time = 4 seconds Resolving local host hadrPrimaryHost via gethostbyname() hostname=hadrPrimaryHost.beaverton.ibm.com alias: hadrPrimaryHost address_type=2 address_length=4 address: 9.47.73.33 Resolving remote host hadrStandbyHost.svl.ibm.com via gethostbyname() hostname=hadrStandbyHost.svl.ibm.com DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 15 of 35 developerWorks® ibm.com/developerWorks/ address_type=2 address_length=4 address: 9.30.4.113 Socket property upon creation BlockingIO=true NAGLE=true SO_SNDBUF=16384 SO_RCVBUF=87380 SO_LINGER: onoff=0, length=0 Calling setsockopt(SO_SNDBUF) Calling setsockopt(SO_RCVBUF) Socket property upon buffer resizing BlockingIO=true NAGLE=true SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Binding socket to local address. Listening on local host TCP port 53970 --> [The output stops here until the simhadr tool is executed on the standby] Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Sending handshake message: syncMode=NEARSYNC flushSize=32 connTime=2012-04-13_11:59:02_PDT Sending log flushes. Press Ctrl-C to stop. NEARSYNC: Total 3801088 bytes in 4.099373 seconds, 0.927236 MBytes/sec Total 29 flushes, 0.141358 sec/flush, 32 pages (131072 bytes)/flush disk speed: 53.069551 MB/second, overhead: 0.001117 second/write Total 3801088 bytes written in 0.103994 seconds. 36.551032 MBytes/sec Total 29 write calls, 131.072 kBytes/write, 0.003586 sec/write Total 3801088 bytes sent in 4.099373 seconds. 0.927236 MBytes/sec Total 80 send calls, 47.513 KBytes/send, Total 51 congestions, 0.018008 seconds, 0.000353 second/congestion Total 1392 bytes recv in 4.099373 seconds. 0.000340 MBytes/sec Total 29 recv calls, 0.048 KBytes/recv Distribution of log write size (unit is byte): Total 29 numbers, Sum 3801088, Min 131072, Max 131072, Avg 131072 Exactly 131072 29 numbers Distribution of log shipping time (unit is microsecond): Total 29 numbers, Sum 4099263, Min 92349, Max 288847, Avg 141353 From 65536 to 131071 9 numbers From 131072 to 262143 19 numbers From 262144 to 524287 1 numbers Distribution of congestion duration (unit is microsecond): Total 51 numbers, Sum 18008, Min 189, Max 660, Avg 353 DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 16 of 35 ibm.com/developerWorks/ From 128 to 255 From 256 to 511 From 512 to 1023 developerWorks® 18 numbers 31 numbers 2 numbers Distribution of send size (unit is byte): Total 80 numbers, Sum 3801088, Min 752, Max 79640, Avg 47513 From 512 to 1023 1 numbers From 8192 to 16383 19 numbers From 16384 to 32767 3 numbers From 32768 to 65535 28 numbers From 65536 to 131071 29 numbers Distribution of recv size (unit is byte): Total 29 numbers, Sum 1392, Min 48, Max 48, Avg 48 Exactly 48 29 numbers Then issue the following command on the standby: ~/simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283 + simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283 The output from the tool is as follows: Measured sleep overhead: 0.003686 second, using spin time 0.004423 second. Resolving local host hadrStandbyHost.svl.ibm.com via gethostbyname() hostname=hadrStandbyHost.svl.ibm.com alias: hadrStandbyHost address_type=2 address_length=4 address: 9.30.4.113 Resolving remote host hadrPrimaryHost.beaverton.ibm.com via gethostbyname() hostname=hadrPrimaryHost.beaverton.ibm.com address_type=2 address_length=4 address: 9.47.73.33 Socket property upon creation BlockingIO=true NAGLE=true SO_SNDBUF=16384 SO_RCVBUF=87380 SO_LINGER: onoff=0, length=0 Calling setsockopt(SO_SNDBUF) Calling setsockopt(SO_RCVBUF) Socket property upon buffer resizing BlockingIO=true NAGLE=true SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Connecting to remote host TCP port 28239 Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 17 of 35 developerWorks® ibm.com/developerWorks/ NAGLE=false SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Received handshake message: syncMode=NEARSYNC flushSize=32 connTime=2012-04-13_11:59:02_PDT Standby receive buffer size 128 pages (524288 bytes) Receiving log flushes. Press Ctrl-C on primary to stop. Zero byte received. Remote end closed connection. NEARSYNC: Total 3801088 bytes in 4.124563 seconds, 0.921574 MBytes/sec Total 29 flushes, 0.142226 sec/flush, 32 pages (131072 bytes)/flush disk speed: 63.122727 MB/second, overhead: 0.024283 second/write Total 3801088 bytes written in 0.764411 seconds. 4.972571 MBytes/sec Total 29 write calls, 131.072 kBytes/write, 0.026359 sec/write Total 1392 bytes sent in 4.124563 seconds. 0.000337 MBytes/sec Total 29 send calls, 0.048 KBytes/send, Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion Total 3801088 bytes recv in 4.124563 seconds. 0.921574 MBytes/sec Total 175 recv calls, 21.720 KBytes/recv Distribution of log write size (unit is byte): Total 29 numbers, Sum 3801088, Min 131072, Max 131072, Avg 131072 Exactly 131072 29 numbers Distribution of send size (unit is byte): Total 29 numbers, Sum 1392, Min 48, Max 48, Avg 48 Exactly 48 29 numbers Distribution of recv size (unit is byte): Total 175 numbers, Sum 3801088, Min 1024, Max 65536, Avg 21720 From 1024 to 2047 2 numbers From 2048 to 4095 2 numbers From 4096 to 8191 4 numbers From 8192 to 16383 4 numbers From 16384 to 32767 135 numbers From 32768 to 65535 14 numbers From 65536 to 131071 14 numbers After the test is complete, add the results to the table. In Table 5, the last row Percentage of network wait is calculated the following way: (time spent in waiting for network to consume more data/total time) = (total time/reported for congestion/total run time) For our primary, it is (0.018008/4.099373) and for the standby, it is 0. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 18 of 35 ibm.com/developerWorks/ developerWorks® Table 5. Results for NEARSYNC mode Host hadrPrimaryHost hadrStandbyHost hadrPrimaryHost hadrStandbyHost Sync mode SYNC SYNC NEARSYNC NEARSYNC Flush size (4 K pages) 32 32 32 32 Overhead per write (seconds) 0.001117 .024283 0.001117 .024283 Transfer rate (MB/s) 53.069551 63.122727 53.069551 63.122727 TCP/IP send buffer size (bytes) 58672 58672 58672 58672 TCP/IP receive buffer size (bytes) 58672 58672 58672 58672 HADR receive buffer size (4K Pages) 128 128 128 128 Throughput (Mbytes/sec) (Primary sending/Standby receiving) 0.730557 0.732018 0.927236 0.921574 NO YES (0.4%) NO Percentage of network wait YES (0.3%) Throughput achieved in ASYNC mode Run the HADR Simulator tool on the primary and standby, with the appropriate values. You can start the primary or standby first. The one started first waits for the other one to start to make a connection. The tool writes to standard output. It does not write log data to disk instead it uses the provided numbers from the –disk option to simulate log write. For our example system, issue the following command on the primary: ~/simhadr -role primary -lhost hadrPrimaryHost -lport 53970 -rhost hadrStandbyHost.svl.ibm.com -rport 28239 -syncmode async -flushSize 32 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 53.069551 0.001117 The output from the tool is as follows: Measured sleep overhead: 0.003709 second, using spin time 0.004450 second. Simulation run time = 4 seconds Resolving local host hadrPrimaryHost via gethostbyname() hostname=hadrPrimaryHost.beaverton.ibm.com alias: hadrPrimaryHost address_type=2 address_length=4 address: 9.47.73.33 Resolving remote host hadrStandbyHost.svl.ibm.com via gethostbyname() hostname=hadrStandbyHost.svl.ibm.com address_type=2 address_length=4 address: 9.30.4.113 Socket property upon creation BlockingIO=true NAGLE=true SO_SNDBUF=16384 SO_RCVBUF=87380 SO_LINGER: onoff=0, length=0 DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 19 of 35 developerWorks® ibm.com/developerWorks/ Calling setsockopt(SO_SNDBUF) Calling setsockopt(SO_RCVBUF) Socket property upon buffer resizing BlockingIO=true NAGLE=true SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Binding socket to local address. Listening on local host TCP port 53970 --> [The output stops here until the simhadr tool is executed on the standby] Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Sending handshake message: syncMode=ASYNC flushSize=32 connTime=2012-04-13_12:00:10_PDT Sending log flushes. Press Ctrl-C to stop. ASYNC: Total 8781824 bytes in 4.068864 seconds, 2.158299 MBytes/sec Total 67 flushes, 0.060729 sec/flush, 32 pages (131072 bytes)/flush disk speed: 53.069551 MB/second, overhead: 0.001117 second/write Total 8781824 bytes written in 0.240262 seconds. 36.551032 MBytes/sec Total 67 write calls, 131.072 kBytes/write, 0.003586 sec/write Total 8781824 bytes sent in 4.068864 seconds. 2.158299 MBytes/sec Total 292 send calls, 30.074 KBytes/send, Total 225 congestions, 4.058363 seconds, 0.018037 second/congestion Total 0 bytes recv in 4.068864 seconds. 0.000000 MBytes/sec Total 0 recv calls, 0.000 KBytes/recv Distribution of log write size (unit is byte): Total 67 numbers, Sum 8781824, Min 131072, Max 131072, Avg 131072 Exactly 131072 67 numbers Distribution of log shipping time (unit is microsecond): Total 67 numbers, Sum 4063639, Min 878, Max 186689, Avg 60651 From 512 to 1023 1 numbers From 1024 to 2047 1 numbers From 32768 to 65535 26 numbers From 65536 to 131071 38 numbers From 131072 to 262143 1 numbers Distribution of congestion duration (unit is microsecond): Total 225 numbers, Sum 4058363, Min 282, Max 104746, Avg 18037 From 256 to 511 8 numbers From 512 to 1023 102 numbers From 1024 to 2047 6 numbers From 4096 to 8191 3 numbers From 8192 to 16383 1 numbers From 32768 to 65535 104 numbers From 65536 to 131071 1 numbers DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 20 of 35 ibm.com/developerWorks/ developerWorks® Distribution of send size (unit is byte): Total 292 numbers, Sum 8781824, Min 816, Max 79640, Avg 30074 From 512 to 1023 1 numbers From 1024 to 2047 1 numbers From 2048 to 4095 4 numbers From 4096 to 8191 21 numbers From 8192 to 16383 27 numbers From 16384 to 32767 86 numbers From 32768 to 65535 150 numbers From 65536 to 131071 2 numbers Then issue the following command on the standby: ~/simhadr -role standby -lhost hadrStandbyHost.svl.ibm.com -lport 28245 -rhost hadrPrimaryHost.beaverton.ibm.com -rport 28239 -sockSndBuf 58672 -sockRcvBuf 58672 -disk 63.122727 0.024283 The output from the tool is as follows: Measured sleep overhead: 0.003974 second, using spin time 0.004768 second. Resolving local host hadrStandbyHost.svl.ibm.com via gethostbyname() hostname=hadrStandbyHost.svl.ibm.com alias: hadrStandbyHost address_type=2 address_length=4 address: 9.30.4.113 Resolving remote host hadrPrimaryHost.beaverton.ibm.com via gethostbyname() hostname=hadrPrimaryHost.beaverton.ibm.com address_type=2 address_length=4 address: 9.47.73.33 Socket property upon creation BlockingIO=true NAGLE=true SO_SNDBUF=16384 SO_RCVBUF=87380 SO_LINGER: onoff=0, length=0 Calling setsockopt(SO_SNDBUF) Calling setsockopt(SO_RCVBUF) Socket property upon buffer resizing BlockingIO=true NAGLE=true SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Connecting to remote host TCP port 28239 Connected. Calling fcntl(O_NONBLOCK) Calling setsockopt(TCP_NODELAY) Socket property upon connection BlockingIO=false NAGLE=false SO_SNDBUF=117344 SO_RCVBUF=117344 SO_LINGER: onoff=0, length=0 Received handshake message: syncMode=ASYNC DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 21 of 35 developerWorks® ibm.com/developerWorks/ flushSize=32 connTime=2012-04-13_12:00:10_PDT Standby receive buffer size 128 pages (524288 bytes) Receiving log flushes. Press Ctrl-C on primary to stop. Zero byte received. Remote end closed connection. ASYNC: Total 8781824 bytes in 4.187657 seconds, 2.097073 MBytes/sec Total 67 flushes, 0.062502 sec/flush, 32 pages (131072 bytes)/flush disk speed: 63.122727 MB/second, overhead: 0.024283 second/write Total 8781824 bytes written in 1.766053 seconds. 4.972571 MBytes/sec Total 67 write calls, 131.072 kBytes/write, 0.026359 sec/write Total 0 bytes sent in 4.187657 seconds. 0.000000 MBytes/sec Total 0 send calls, 0.000 KBytes/send, Total 0 congestions, 0.000000 seconds, 0.000000 second/congestion Total 8781824 bytes recv in 4.187657 seconds. 2.097073 MBytes/sec Total 429 recv calls, 20.470 KBytes/recv Distribution of log write size (unit is byte): Total 67 numbers, Sum 8781824, Min 131072, Max 131072, Avg 131072 Exactly 131072 67 numbers Distribution of recv size (unit is byte): Total 429 numbers, Sum 8781824, Min 2328, Max 65536, Avg 20470 From 2048 to 4095 2 numbers From 4096 to 8191 1 numbers From 8192 to 16383 2 numbers From 16384 to 32767 365 numbers From 32768 to 65535 41 numbers From 65536 to 131071 18 numbers After the test is complete, add the results to the table. In Table 6, the last row Percentage of network wait is calculated the following way: (time spent in waiting for network to consume more data/total time) = (total time/reported for congestion/total run time) For our primary, it is (4.058363/4.068864) and for the standby, it is 0. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 22 of 35 ibm.com/developerWorks/ developerWorks® Table 6. Results for ASYNC mode Host hadrPrimaryHost hadrStandbyHost hadrPrimaryHost hadrStandbyHost hadrPrimaryHost hadrStandbyHost Sync mode SYNC SYNC NEARSYNC NEARSYNC Flush size (4 K pages) 32 32 32 32 32 32 Overhead per write (seconds 0.001117 .024283 0.001117 .024283 0.001117 .024283 Transfer rate (MB/ s) 53.069551 63.122727 53.069551 63.122727 53.069551 63.122727 TCP/IP send buffer size (bytes) 58672 58672 58672 58672 58672 58672 TCP/IP receive buffer size (bytes) 58672 58672 58672 58672 58672 58672 HADR receive buffer size (4K Pages) 128 128 128 128 128 128 Throughput (MB/s) 0.730557 (Primary sending/ Standby receiving) 0.000268 0.927236 0.921574 2.158299 2.097073 Percentage of network wait NO YES (0.4%) NO YES (99.7%) NO YES (0.3%) Analysis of results from synchronization mode tests We achieved the highest throughput (2.158299 MB/s) in ASYNC mode. As you can see in the Percentage of network wait row of Table 6, we experienced congestion in all three modes that were tested. We did not test SUPERASYNC mode. This is identical to RCU (remote catchup) with a flush size of 16. The results should be close to ASYNC because the primary does not wait for an acknowledgement from the standby. The network being congested for a small period of time at peak workload might be normal. In SYNC and NEARSYNC mode, the primary waits for an acknowledgment from standby, so the primary is throttled. In ASYNC mode, the primary is not throttled because it does not wait for an acknowledgement from the standby. As soon as the send call to the TCP buffer is acknowledged, the primary is ready to send more and more as transactions are being processed. When log write is faster than network, you might see congestion in ASYNC mode. In the next section, we demonstrate how to tune to address the network from being congested. Tuning the configuration to address congestion The next thing to do is set up HADR based on the results of the HADR Simulator tool. Then, you do a base run with a real production workload so that you can monitor specific aspects of the performance and then make the appropriate adjustments. HADR configurations Set the following configuration parameters and registry variables according to your testing: DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 23 of 35 developerWorks® ibm.com/developerWorks/ • logfilsiz: For more information and recommended settings, click here. We use 76800 (300MB) for our scenario. • logbufsz: For more information and recommended settings, click here. We use 2048 (8MB) for our scenario. • hadr_syncmode: For more information and recommended settings, click here. For our scenario, ASYNC is used because it provided us with the best throughput. • DB2_HADR_BUF_SIZE: For more information and recommended settings, click here. We use 4096 (2 × logbufsz) for our scenario. • HADR_PEER_WINDOW: For more information and recommended settings, click here. For our scenario, this variable is ignored because we are using ASYNC synchronization mode. • DB2_HADR_PEER_WAIT_LIMIT: Use this variable as necessary. For more information and recommended settings, click here. For our scenario, this is not set. • DB2_HADR_SORCVBUF and DB2_HADR_SOSNDBUF: For more information and recommended settings, click here. We use 58672 for our scenario. Setting up HADR 1. Set up HADR with the standard HADR-specific configuration parameters as well as the settings discussed in the preceding section. Our setup is as follows: 1. On the primary: db2 restore db raki from /u/kkchinta/info/rakibackup/ on /u/kkchinta/kkchinta DBPATH on /u/kkchinta/kkchinta NEWLOGPATH /work3/kkchinta without rolling forward db2 "update db cfg for raki using HADR_LOCAL_HOST hadrPrimaryHost.beaverton.ibm.com HADR_REMOTE_HOST hadrStandbyHost.svl.ibm.com HADR_LOCAL_SVC 53970 HADR_REMOTE_SVC 28245 HADR_REMOTE_INST kkchinta HADR_TIMEOUT 120 HADR_SYNCMODE ASYNC LOGARCHMETH1 DISK:/work4/kkchinta LOGINDEXBUILD ON LOGFILSIZ 76800 LOGBUFSZ 2048" db2set DB2_HADR_SORCVBUF=58672 DB2_HADR_SOSNDBUF=58672 DB2_HADR_BUF_SIZE=4096 DB2COMM=TCPIP 2. On the standby: DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 24 of 35 ibm.com/developerWorks/ developerWorks® db2 restore db raki from /nfshome/kkchinta/rakibackup/ on /home/kkchinta/kkchinta DBPATH on /home/kkchinta/kkchinta NEWLOGPATH /perf5/kkchinta/ db2 "update db cfg for raki using HADR_LOCAL_HOST hadrStandbyHost.svl.ibm.com HADR_REMOTE_HOST hadrPrimaryHost.beaverton.ibm.com HADR_LOCAL_SVC 28245 HADR_REMOTE_SVC 28239 HADR_REMOTE_INST kkchinta HADR_TIMEOUT 120 HADR_SYNCMODE ASYNC LOGARCHMETH1 DISK:/work1/kkchinta LOGINDEXBUILD ON LOGFILSIZ 76800 LOGBUFSZ 2048" db2set DB2_HADR_SORCVBUF=58672 DB2_HADR_SOSNDBUF=58672 DB2_HADR_BUF_SIZE=4096 DB2COMM=TCPIP 2. Start HADR on both the primary and standby, and issue the db2pd command with the -hadr option to ensure they enter peer state: Important: The format of the db2pd command with the -hadr option output is different in releases 10.1 and later. db2pd -db raki -hadr Database Partition 0 -- Database RAKI -- Active -- Up 0 days 00:22:37 -- Date 2012-04-13-14.45.17.920125 HADR Information: Role State Primary Peer SyncMode Async HeartBeatsMissed 0 LogGapRunAvg (bytes) 0 ConnectStatus ConnectTime Timeout Connected Fri Apr 13 14:35:33 2012 (1334352933) 120 LocalHost hadrPrimaryHost.beaverton.ibm.com LocalService 53970 RemoteHost hadrStandbyHost.svl.ibm.com RemoteService 28245 PrimaryFile PrimaryPg S0000000.LOG 1 PrimaryLSN 0x000000000A329BF2 StandByFile StandByPg S0000000.LOG 1 StandByLSN 0x000000000A329BF2 RemoteInstance kkchinta --db2pd -db raki -hadr Database Partition 0 -- Database RAKI -- Standby -- Up 0 days 00:09:47 -- Date 2012-04-13-14.45.19.109598 HADR Information: Role State Standby Peer SyncMode HeartBeatsMissed LogGapRunAvg (bytes) Async 0 0 ConnectStatus ConnectTime Timeout Connected Fri Apr 13 14:35:33 2012 (1334352933) 120 LocalHost hadrStandbyHost.svl.ibm.com LocalService 28245 RemoteHost hadrPrimaryHost.beaverton.ibm.com RemoteService 28239 DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide RemoteInstance kkchinta Page 25 of 35 developerWorks® ibm.com/developerWorks/ PrimaryFile PrimaryPg S0000000.LOG 1 PrimaryLSN 0x000000000A329BF2 StandByFile StandByPg S0000000.LOG 1 StandByLSN StandByRcvBufUsed 0x000000000A329BF2 0% Running the workload and monitoring performance Execute a real production workload and monitor it using the db2pd command with the -hadr option. Pay attention to the following fields: • State: This gives the current state of the database. • LogGapRunAvg: This gives the running average of the gap between the primary log sequence number (LSN) and the standby log LSN. • ConnectStatus (on the primary): This is where congestion is reported. • StandbyRcvBufUsed (on the standby): This is the percentage of standby log receiving buffer used. You can use the following script: for i in {1..15}; do echo "#################################################################" >> /tmp/kk_hadr; echo "Collecting stats $i" >> /tmp/kk_hadr; rsh hadrPrimaryHost "/bin/bash -c '~/sqllib/adm/db2pd -db raki -hadr'" >> /tmp/kk_hadr ; rsh hadrStandbyHost.svl "/bin/bash -c '~/sqllib/adm/db2pd -db raki -hadr'" >> /tmp/kk_hadr; sleep 5; done Monitor the output to see if there is any congestion and if the standby’s receive memory is full. That was the case in our example, as we looked in the output file ~/perfpaper/db2pd.out: egrep -A1 "Congested|StandByRcvBufUsed" ~/perfpaper/db2pd.out | grep -A5 Congested Congested Sun Apr 15 20:56:31 2012 (1334548591) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000001.LOG 72893 0x000000002EBE56D8 100% -Congested Sun Apr 15 20:56:36 2012 (1334548596) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000001.LOG 73604 0x000000002EEAC37B 99% -Congested Sun Apr 15 20:56:43 2012 (1334548603) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000001.LOG 74385 0x000000002F1B9F8D 96% --Congested Sun Apr 15 20:57:00 2012 (1334548620) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000002.LOG 105 0x000000002FB918AC 100% --Congested Sun Apr 15 20:57:10 2012 (1334548630) 120 -StandByFile StandByPg S0000002.LOG 1619 -- StandByLSN StandByRcvBufUsed 0x000000003017B315 94% DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 26 of 35 ibm.com/developerWorks/ -Congested developerWorks® Sun Apr 15 20:57:23 2012 (1334548643) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000002.LOG 3840 0x0000000030A28AAF 96% --Congested Sun Apr 15 20:57:34 2012 (1334548654) 120 -StandByFile StandByPg S0000002.LOG 5925 StandByLSN StandByRcvBufUsed 0x000000003124DBB4 94% Increasing the buffer size Try different settings for the HADR receive buffer to see if that addresses the congestion. The default is 2 times the primary’s setting for the logbufsz configuration parameter. To absorb the primary logging peak, a larger value is often needed. As you try different settings for the HADR buffer size, gather your results in a table as in the following example for our scenario: Table 7. Results from initial run of workload Test Test1 Synchronization mode ASYNC logfilsiz (4K) 76800 logbufsz 2048 DB2_HADR_BUF_SIZE 4096 HADR_PEER_WINDOW Ignored in ASYNC SOSNDBUF/SORCVBUF 58672 Commit delay observed YES Congestion YES Increase DB2_HADR_BUF_SIZE to 8192 and restart the instance. You can rerun the workload and capture the following relevant data: egrep -A1 "Congested|StandByRcvBufUsed" ~/perfpaper/db2pd_2.out | grep -A5 Congested Congested Sun Apr 15 22:53:03 2012 (1334555583) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000003.LOG 22423 0x0000000047EBFC28 100% -Congested Sun Apr 15 22:53:09 2012 (1334555589) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000003.LOG 23293 0x0000000048225D75 100% -Congested Sun Apr 15 22:53:14 2012 (1334555594) 120 -StandByFile StandByPg S0000003.LOG 24286 -- StandByLSN StandByRcvBufUsed 0x0000000048606FE7 100% DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 27 of 35 developerWorks® Congested ibm.com/developerWorks/ Sun Apr 15 22:53:20 2012 (1334555600) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000003.LOG 25196 0x00000000489945F5 100% --Congested Sun Apr 15 22:53:56 2012 (1334555636) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000003.LOG 29613 0x0000000049AD5932 100% --Congested Sun Apr 15 22:54:08 2012 (1334555648) 120 -StandByFile StandByPg StandByLSN StandByRcvBufUsed S0000003.LOG 30744 0x0000000049F40E75 100% -Congested Sun Apr 15 22:54:13 2012 (1334555653) 120 -StandByFile StandByPg S0000003.LOG 31350 StandByLSN StandByRcvBufUsed 0x000000004A19EC51 100% You can see the output (and Table 8) that the standby receive buffer size is still not sufficient. Table 8. Results from second run of workload Test Test1 Test2 Synchronization mode ASYNC ASYNC logfilsiz (4K) 76800 76800 logbufsz 2048 2048 DB2_HADR_BUF_SIZE 4096 8192 HADR_PEER_WINDOW Ignored in ASYNC Ignored in ASYNC SOSNDBUF/SORCVBUF 58672 58672 Commit delay observed YES YES Congestion YES YES Another thing to analyze is when the actual congestion occurs. As the output for our example shows, the congestion occurred when replaying log file S0000003.LOG. Take a look at the flush size by using the db2flushsize script, as described in the following steps: 1. Find out where the transactional logs are stored, as in the following example: db2pd -db raki -dbcfg | egrep -i "Path to log files|LOGARCHMETH" Path to log files (memory) /work3/kkchinta/ Path to log files (disk) /work3/kkchinta/ LOGARCHMETH1 (memory) DISK:/work4/kkchinta/ LOGARCHMETH1 (disk) DISK:/work4/kkchinta/ LOGARCHMETH2 (memory) OFF LOGARCHMETH2 (disk) OFF 2. Look where S0000003.LOG exists and run the db2flushsize script. In our example, the script returns: Total 24897 flushes. Average flush size 2.3 pages DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 28 of 35 ibm.com/developerWorks/ developerWorks® You can also query the relevant monitor elements and get an estimate of flush size according to the following formula: Number of I/O operations per second = LOG_WRITE_TIME_S.LOG_WRITE_TIME_NS / NUM_LOG_WRITE_IO In our example, the snapshot would be: db2 "select SNAPSHOT_TIMESTAMP,LOG_WRITES,LOG_WRITE_TIME_S,LOG_WRITE_TIME_NS, NUM_LOG_WRITE_IO,NUM_LOG_PART_PAGE_IO from TABLE(snap_get_db_v97('raki', -1)) as data" | formatdb2 ---------------------------------------SNAPSHOT_TIMESTAMP: 2012-04-15-23.07.39.388524 LOG_WRITES: 88566 LOG_WRITE_TIME_S: 100 LOG_WRITE_TIME_NS: 874278000 NUM_LOG_WRITE_IO: 62140 NUM_LOG_PART_PAGE_IO: 18768 ---------------------------------------SNAPSHOT_TIMESTAMP NUM_LOG_PART_PAGE_IO --------------------------------------------2012-04-15-23.08.05.721811 18768 LOG_WRITES LOG_WRITE_TIME_S LOG_WRITE_TIME_NS NUM_LOG_WRITE_IO ---------- ---------------- ----------------- ---------------- 874278000 62140 88566 100 1 record(s) selected. There are few things you can do to address the cause of the congestion: • Check if there is a replay speed issue on the standby. First, use db2pd command with the -hadr option to determine the standby’s replay speed by checking the LogGapRunAvg or comparing the LSNs on the primary (PrimaryLSN) and standby (StandbyLSN). Next, determine the primary log generation rate, as described earlier in this section of this document. If the standby log replay is moving at a constant speed but the standby log replay cannot catch up to the primary, then increase the HADR receive buffer size. • Check if there is an I/O issue on disk on the standby. To do this, run the DB2 HADR Simulator with the -disk option. As explained earlier in this document, the -disk option calculates write speed. If the write is taking too long, then a possible cause is that the standby’s disk is not powerful enough. • Check the receive buffer percentage by looking at the StandByRcvBufUsed value in the output for the db2pd command with the -hadr option. In our case, the primary is flushing the logs at a faster rate than the standby can replay the logs (the value of StandByRcvBufUsed is 100%), so set the standby receive buffer to a much larger value: 262 144 (which equals 1GB). After rerunning the workload, you can see that there is no congestion reported: egrep -A1 "Congested|StandByRcvBufUsed" ~/perfpaper/db2pd_2.out | grep -A5 Congested No Congestion reported. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 29 of 35 developerWorks® ibm.com/developerWorks/ Table 9. Results from third run of workload Test Test1 Test2 Test3 Synchronization mode ASYNC ASYNC ASYNC logfilsiz (4K) 76800 76800 76800 logbufsz 2048 2048 2048 DB2_HADR_BUF_SIZE 4096 8192 262144 HADR_PEER_WINDOW Ignored in ASYNC Ignored in ASYNC Ignored in ASYNC SOSNDBUF/SORCVBUF 58672 58672 58672 Commit delay observed YES YES NO Congestion YES YES NO Note: In the example system, ASYNC is chosen as our synchronization mode because it is observed to have a better throughput in this mode compared to the others. ASYNC does not guarantee data protection and so it might not meet the business SLA. In such situations, you can either use SYNC or NEARSYNC, but it is observed that there is lower throughput in using these sync modes. At times like this, you should consider providing better resources and tuning the current set of resources to address the problem at hand. If the theoretical network bandwidth is low, then try moving the HADR log shipping network to a network with a higher bandwidth. Sharing the log shipping network with other applications can hurt the log shipping throughput and can lead to high commit times for transactions on the primary database. Tuning tips for a growing database or workload After you perform the previous steps and obtain the right configuration, you should see good HADR system performance. That said, if your business grows or if you adopt new technology, this poses challenges for the HADR system. With time, it is common for data to accumulate, increasing the size of the database and the amount of log files generated. As a result, the configuration that you initially come up with might not perform as well. In general, the size of the database might not be that important to HADR. What is important to HADR is the type of workload and an increase in workload. When the workload increases, your database could be generating logs at a higher rate and cause your initial configuration to be unable to keep up. If you observe this kind of performance degradation, consider one of the troubleshooting tips in the following section or from one of the HADR best practices documents. Alternatively, you can rerun the whole exercise with the DB2 HADR Simulator and develop an updated configuration. Troubleshooting common problems Slow replay Using the db2pd command with the -hadr option, check whether there is a high log gap (LogGapRunAvg ) and whether the HADR receive buffer (StandByRcvBufUsed ) is full. If there is a high log gap and the receive bugger is full, then the replay might be processing a large databasewide transaction like a reorganization, which would make it appear that replay is slow. Avoid running database-wide transactions during peak business hours and plan for maintenance activity to be run during idle or low-activity times. You can keep monitoring if the replay is making progress DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 30 of 35 ibm.com/developerWorks/ developerWorks® from the db2pd output. Network congestion can occur if the standby does not make progress over a period of time. If the standby's log receive buffer (determined by the DB2_HADR_BUF_SIZE registry variable) fills up because of slow replay on the standby or a spike in transactions on the primary, this can block new transactions from being performed on the primary, which cannot send anymore log data to the standby. One way to avoid this is to use SUPERASYNC mode, which prevents back pressure on the primary because P never waits for S. However, you might not want to use SUPERASYNC because you want control over how far the standby can fall behind the primary (which, in turn, influences how long a graceful takeover takes to complete) or you do not want the potential for data loss if the primary fails. Another alternative, introduced in version 10.1, is to use log spooling. For more related information, consult the HADR Multiple standbys white paper and the DB2 Information Center. Log spooling allows the standby to continue receiving log data, which is written to disk on the standby and replayed later, meaning that the standby can catch up when the primary's logging rate is lower. Log spooling is enabled by default starting in version 10.5. The advantage of this feature over using SUPERASYNC mode (although the two methods can be used in tandem) is that you have protection from data loss. Basically, you're choosing where to spool the yet-to-be-replayed log data: on the primary (SUPERASYNC) or on the standby (log spooling). Note that you should choose your spool limit setting carefully. A huge spool (for example, if you set it to unlimited, the spool can be as large as the disk space in the active log path) can lead to a long takeover time because the standby cannot become the primary until it has replayed all of the spooled data. Consider revisiting the storage level design and database design to see whether the design is still holding good. Confirm that the table spaces and the transactional logs are not on the same file system. Make sure hot and dependent objects like tables and indexes are placed in different table spaces. On the standby, replay works in parallel and when these objects fall into different table spaces, there is less contention in that parallel replay, resulting in better parallelism. Also, if you are using reads on standby to read data from the standby database, using different table spaces for indexes and table data helps improve I/O efficiency. Finally, using a large extent size can be beneficial in cases where applications are performing load, import, and bulk insert operations, or issuing create index statements. Primary hang When a transaction on the primary database appears to hang, it could be hanging for reasons not related to HADR. If it is an HADR issue, a typical cause is network congestion when the database is using SYNC, NEARSYNC, or ASYNC synchronization modes. If that is the case, then this could be a side effect of a slow replay mentioned above, or because the workload is generating more logs than originally estimated. To understand this better, monitor the TCP/IP buffer usage and make sure that there are no issues at that layer. You can repeat the steps on calculating the flush size using the db2flushsize script, and getting an estimate of what the workload log generation is. Reconfigure the HADR system for a better workload transaction throughput. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 31 of 35 developerWorks® ibm.com/developerWorks/ Transaction delay on the primary If you observe a transaction delay for a short period of time and this happens intermittently, then you might be running into resource contention. If the database is using SYNC mode, then a transaction can commit only after the transaction update is received to standby and written to disk on the standby. If the disk I/O on the standby system is not as good as on the primary system, the commits on the primary could be slowed down because the primary waits to hear that the standby finished writing the log to disk. Check the I/O statistics on the disks of both the primary and standby and compare them. You might even see this situation in NEARSYNC mode if there is bad disk I/O on the standby. Even though NEARSYNC mode does not require the log page to be written to disk on the standby before the transaction is committed, there can be high commit times because when the HADR standby thread is writing the pages to disk, it is unresponsive to new data that is sent by the primary. If the HADR network is over an unstable WAN that is causing transaction delays on primary, consider using the DB2_HADR_PEER_WAIT_LIMIT registry variable to avoid the transaction delays. If the network issues are not fixed and you expect that they will be sustained for a long period of time, explore the option of using SUPERASYNC as your synchronization mode. This mode does not guarantee data protection, but it is very useful in helping you avoid transaction delays. Use this mode if you value data availability much more than data protection. Application performance is dropped after a takeover In environments where the application server is located much closer to the primary site than the standby site, you might see some drop in application performance after a takeover occurs. The performance drop is likely if the round-trip time (RTT) between the application server and the new primary server (the previous standby server) is much higher than the RTT between the application server and previous primary server. You can address this performance drop by using combinations of different optimizations. If a secondary application server exists close to the standby server, consider failing over to the secondary application server. Explore hardware and software network and protocol compression solutions. Some WAN optimization technologies can significantly improve data replication performance. You might also be able to tune your workload to optimize the data transferred between the client and server, by using DB2 stored procedures or compound SQL. Conclusion This exercise covers most of the basic configurations but is not exhaustive. We recommend the HADR configuration and tuning wiki page for details about several other tuning parameters and the uses of those parameters. Acknowledgements We would like to thank Yuke Zhuge and Roger Zheng for their technical contributions and Eric Koeck for his editorial contributions. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 32 of 35 ibm.com/developerWorks/ developerWorks® Resources Learn • Learn how to integrate InfoSphere Change Data Capture and DataStage. • Stay current with developerWorks technical events and webcasts focused on a variety of IBM products and IT industry topics. • Attend a free developerWorks Live! briefing to get up-to-speed quickly on IBM products and tools as well as IT industry trends. • Follow developerWorks on Twitter. • Watch developerWorks on-demand demos ranging from product installation and setup demos for beginners, to advanced functionality for experienced developers. Get products and technologies • Evaluate IBM products in the way that suits you best: Download a product trial, try a product online, use a product in a cloud environment. Discuss • Get involved in the My developerWorks community. Connect with other developerWorks users while exploring the developer-driven blogs, forums, groups, and wikis. DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 33 of 35 developerWorks® ibm.com/developerWorks/ About the authors Kiran Chinta Kiran Chinta is a Software Developer and has worked in DB2 for Linux, UNIX, and, Windows development at the IBM Beaverton lab since 2006. He contributed to the development of various projects, including IEEE-compliant DECFLOAT data type, HADR Reads on Standby, and Multi-Temperature Warehouse. Since 2011, he has been providing HADR technical support and resolving high-severity problems for businesses. In this role, he sometimes suggests architectural changes and presents best practices to achieve the best performance for the HADR environment. At IOD 2011, Kiran presented the best practices for HADR over WAN as experienced by banking systems in China. Rob Causley Rob Causley is a member of the DB2 Information Development team and is based in the IBM Toronto lab (although he is lucky enough to reside in Vancouver). A self-described "word nerd," Rob is passionate about crafting clear and coherent documentation. He is responsible for the high availability and data movement components of the DB2 for Linux, UNIX, and Windows library. His previous work includes the documentation for HADR multiple standbys, the ingest utility, and best practices papers for HADR and backup/restore. Vincent Kulandai Samy Vincent Kulandai Samy is a DB2 kernel developer in the IBM Beaverton Lab, working on DB2 for Linux, UNIX, and Windows kernel development for the past 10 years. He came to IBM as part of an Informix acquisition. Prior to the Informix acquisition, he was working on Informix IDS and XPS database kernel. His areas of expertise are database kernel, DB2 HADR, Multi-Temperature Warehouse, recovery, backup and restore, Linux kernel internals, and kernel debugging. He was the technical lead for DB2 HADR Reads on Standby feature, released in DB2 Version 97 Fix Pack 1. For the past three years, Vincent has also been championing several DB2 HADR adoptions and new sale/deployments through on-site customer visits, consultancy, and customer advocacy. He had presented DB2 HADR/TSA customer success stories at IOD conferences with Fidelity Investments and PepsiCo in 2008 and 2010. © Copyright IBM Corporation 2013 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/) DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide Page 34 of 35 ibm.com/developerWorks/ DB2 Linux, Unix and Windows HADR Simulator use case and troubleshooting guide developerWorks® Page 35 of 35