Page 1 IBM InfoSphere Data Replication’s Change Data Capture Version 10.2 (DB2 for z/OS) Performance Comparison to Version 6.5 April 4, 2013 Greg Bezoff Page 2 Contents Contents..............................................2 Executive Summary.....................................3 Introduction..........................................4 Testing Protocol......................................5 InfoSphere CDC Performance Measurements...............5 Insert Performance Tests..............................6 Update Performance Tests..............................9 Delete Performance Tests.............................12 Conclusions..........................................15 Appendix A: Machine and Environment Specifications...16 Appendix B: InfoSphere CDC for z/OS Configuration....17 Appendix C: Table Definitions........................18 Notices..............................................21 Page 3 Executive Summary CDC for z/OS version 10.2 can replicate upwards of 90,000 rows per second, or as much as 129 MB per second, which exceeds the transport capacity of a gigabit network. It should be noted that both the source and target LPARs for these tests were on the same physical machine, so the data never traversed the network. In order to replicate at this rate, one entire CPU will be consumed on the source and about 1½ CPUs will be consumed on the target. Lower rates of replication will consume corresponding fewer CPU cycles. For users of CDC for z/OS version 6.5, throughput in version 10.2 is dramatically increased over version 6.5. Insert and delete throughput is double to quadruple that of version 6.5. Update throughput is increased by 60 – 225%. Source CPU in version 10.2 is reduced 10 – 33% over version 6.5, depending on the workload. Target CPU in version 10.2 is reduced 2 – 34% over version 6.5, depending on the workload. Page 4 Introduction This paper describes the result of a performance comparison of IBM® InfoSphere™ Data Replication’s Change Data Capture version 10.2 (CDC) versus IBM® InfoSphere™ Change Data Capture version 6.5. The information here demonstrates the characteristics and the behavior of data replication on DB2® for z/OS®. This paper presents the study results, the performance analysis, and the tuning techniques that were applied to the environment. The performance measurements focus on answering three typical concerns that users have: • • • The rate at which data can be published from the source to the target The end-to-end throughput rate of data replication The cost in CPU cycles to run data replication To illustrate the scalability of data replication in terms of cost, the workload of each of the two series of tests was varied while the hardware configuration was kept constant. The following figure illustrates the components of Change Data Capture (CDC) replication as configured for this test. The Source Engine captures data from the database log and sends it over TCP/IP to the target system where the Target Engine will apply the data to the target database. The hardware configuration is detailed in Appendix A: Machine and Environment Specifications. Page 5 Testing Protocol The intent of these tests was to see how quickly data could be published and applied, and to determine the CPU footprint of the source and target engines in both version 6.5 and version 10.2. Three sets of tests were run: an insert test, an update test and a delete test. For each test, the requisite number of inserts, updates, or deletes were performed to the source tables. After all the changes were made to the tables, then the subscription was started with a scheduled end of the current head of log, so that the subscription would immediately stop after replicating all the changes. • • 10 Inserts/updates/deletes per commit, 20 columns 240, 480, 600, 1000, 2000, 4000 byte rows For all tests, one million rows were replicated with the rows equally distributed into ten tables. See Appendix C: Table Definitions for details. CPU measurements include the InfoSphere CDC and communication processes for both the source and target systems. The CPU for the target database itself is included in the CPU for the target apply process. InfoSphere CDC Performance Measurements All of the following tests, under the parameters described in Appendix A, were performed using a single InfoSphere CDC subscription. If higher throughput is required, additional subscriptions can be added at the cost of some additional CPU. Page 6 Insert Performance Tests The following table details the results of the insert performance tests in version 10.2. Row Length in Bytes Elapsed Time (seconds) Source CPU Consumed (seconds) Target CPU Consumed (seconds) Rows Published per second elapsed Source Rows Published per CPU sec Target Rows Published per CPU sec MB Published per second elapsed Source MB Published per CPU second Target MB Published per CPU second Source CPU Consumed per MB (ms) Target CPU Consumed per MB (ms) 240 11 13.1 16.9 90909 76336 59172 21.8 18.3 14.2 54.6 70.4 480 11 13.8 18.0 90909 72464 55556 43.6 34.8 26.7 28.8 37.5 600 12 14.2 18.8 83333 70423 53191 50.0 42.3 31.9 23.7 31.3 1000 16 16.2 24.3 62500 61728 41152 62.5 61.7 41.2 16.2 24.3 2000 20 18.6 29.1 50000 53763 34364 100.0 107.5 68.7 9.3 14.6 4000 31 27.1 39.2 32258 36900 25510 129.0 147.6 102.0 6.8 9.8 The following charts summarize the improvement in insert throughput (rows per second) and CPU consumption of version 10.2 over version 6.5. INSERT Rows per Second - V6.5 versus V10.2 100000 90000 80000 Rows per Second 70000 60000 50000 40000 30000 20000 10000 0 0 500 1000 1500 2000 2500 3000 3500 4000 Row Length (bytes) Version 10.2 Version 6.5 The insert throughput for version 10.2 increased over version 6.5 by 109% for 240-byte rows by 338% for 4,000-byte rows. This increase in throughput is due to three major factors. Firstly, the pipelined apply allows a single subscription to 4500 Page 7 use multiple CPUs for the apply process, which leads to better scaling. Secondly, an improvement in the signaling between the tasks in the apply process results in fewer task switches during replication. Thirdly, array inserts are now used whenever possible during the apply process. It should be pointed out that for the longest rows, data is replicated at the rate of 129 MB per second, which exceeds the transport capacity of a gigabit network. It should be noted that both the source and target LPARs were on the same physical machine, so the data never traversed the network. INSERT Source CPU - V6.5 versus V10.2 40.0 CPU Seconds (1,000,000 INSERTS) 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 0 500 1000 1500 2000 2500 3000 3500 4000 Row Length (bytes) Version 10.2 Version 6.5 Source CPU for inserts in version10.2 is reduced by 10 – 25% over version 6.5. This is due to general improvements in the source engine, mainly in the communications protocol. 4500 Page 8 INSERT Target CPU - V6.5 versus V10.2 60.0 CPU Seconds (1,000,000 INSERTS) 50.0 40.0 30.0 20.0 10.0 0.0 0 500 1000 1500 2000 2500 3000 3500 4000 Row Length (bytes) Version 10.2 Version 6.5 Target CPU for inserts in version 10.2 is reduced by 20 – 30% over version 6.5. This is mainly due to the use of array inserts. 4500 Page 9 Update Performance Tests The following table details the results of the update performance tests in version 10.2. Row Length in Bytes Elapsed Time (seconds) Source CPU Consumed (seconds) Target CPU Consumed (seconds) Rows Published per second elapsed Source Rows Published per CPU sec Target Rows Published per CPU sec MB Published per second elapsed Source MB Published per CPU second Target MB Published per CPU second Source CPU Consumed per MB (ms) Target CPU Consumed per MB (ms) 240 17 15.3 24.7 58824 65359 40486 14.1 15.7 9.7 63.8 102.9 480 18 16.4 25.4 55556 60976 39370 26.7 29.3 18.9 34.2 52.9 600 18 17.3 26.1 55556 57803 38314 33.3 34.7 23.0 28.8 43.5 1000 21 19.2 30.7 47619 52083 32573 47.6 52.1 32.6 19.2 30.7 2000 23 23.6 32.4 43478 42373 30864 87.0 84.7 61.7 11.8 16.2 4000 48 42.0 48.4 20833 23810 20661 83.3 95.2 82.6 10.5 12.1 The following charts summarize the improvement in update throughput (rows per second) and CPU consumption of version 10.2 over version 6.5. UPDATE Rows per Second - V6.5 versus V10.2 70000 60000 Rows per second 50000 40000 30000 20000 10000 0 0 500 1000 1500 2000 2500 3000 3500 4000 Row length (bytes) Version 10.2 Version 6.5 Update throughput in version 10.2 is increased 60 – 225% over version 6.5. This is due to the pipelined apply and the improved signaling between tasks. 4500 Page 10 UPDATE Source CPU - V6.5 versus V10.2 50.0 45.0 CPU Seconds (1,000,000 UPDATES) 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 0 500 1000 1500 2000 2500 3000 3500 4000 Row Length (bytes) Version 10.2 Version 6.5 Source CPU for updates in version 10.2 is reduced about 10% over version 6.5. Once again, this is due to general improvements in the source engine. 4500 Page 11 UPDATE Target CPU - V6.5 versus V10.2 60.0 CPU Seconds (1,000,000 UPDATES) 50.0 40.0 30.0 20.0 10.0 0.0 0 500 1000 1500 2000 2500 3000 3500 Row Length (bytes) Version 10.2 Version 6.5 Target CPU for updates in version 10.2 is reduced 2 – 12% over version 6.5. 4000 4500 Page 12 Delete Performance Tests The following table details the results of the delete performance tests in version 10.2. Row Length in Bytes Elapsed Time (seconds) Source CPU Consumed (seconds) Target CPU Consumed (seconds) Rows Published per second elapsed Source Rows Published per CPU sec Target Rows Published per CPU sec MB Published per second elapsed Source MB Published per CPU second Target MB Published per CPU second Source CPU Consumed per MB (ms) Target CPU Consumed per MB (ms) 240 13 13.8 17.9 76923 72464 55866 18.5 17.4 13.4 57.5 74.6 480 14 14.5 18.9 71429 68966 52910 34.3 33.1 25.4 30.2 39.4 600 14 14.8 19.7 71429 67568 50761 42.9 40.5 30.5 24.7 32.8 1000 17 16.2 22.6 58824 61728 44248 58.8 61.7 44.2 16.2 22.6 2000 23 19.7 25.6 43478 50761 39063 87.0 101.5 78.1 9.9 12.8 4000 40 26.9 35.1 25000 37175 28490 100.0 148.7 114.0 6.7 8.8 The following charts summarize the improvement in delete throughput (rows per second) and CPU consumption of version 10.2 over version 6.5. DELETE Rows per Second - V6.5 versus V10.2 90000 80000 70000 Rows per second 60000 50000 40000 30000 20000 10000 0 0 500 1000 1500 2000 2500 3000 3500 4000 Row length (bytes) Version 10.2 Version 6.5 Throughput for delete in V10.2 is increased 123 – 270% over version 6.5. Some of this is due to the pipelined apply and improved signaling between tasks. The 4500 Page 13 remainder is due to deletes being grouped when possible by building a WHERE clause that specifies multiple rows to be deleted. DELETE Source CPU - V6.5 versus V10.2 45.0 CPU Seconds (1,000,000 DELETES) 40.0 35.0 30.0 25.0 20.0 15.0 10.0 5.0 0.0 0 500 1000 1500 2000 2500 3000 3500 Row Length (bytes) Version 10.2 Version 6.5 Source CPU for delete in version 10.2 is reduced 12 – 33% over version 6.5. Once again, this is due to general improvements in the source engine. 4000 4500 Page 14 DELETE Target CPU - V6.5 versus V10.2 60.0 CPU Seconds (1,000,000 DELETES) 50.0 40.0 30.0 20.0 10.0 0.0 0 500 1000 1500 2000 2500 3000 3500 Row Length (bytes) Version 10.2 Version 6.5 Target CPU for delete in version 10.2 is reduced 30 – 34% over version 6.5. This is mainly due to the grouping of deletes. 4000 4500 Page 15 Conclusions Throughput in version 10.2 is dramatically increased over version 6.5. Insert and delete throughput is double to quadruple that of version 6.5. Update throughput is increased by 60 – 225%. Source CPU in version 10.2 is reduced 10 – 33% over version 6.5, depending on the workload. Target CPU in version 10.2 is reduced 2 – 34% over version 6.5, depending on the workload. Overall, version 10.2 does significantly more work while using less CPU. Page 16 Appendix A: Machine and Environment Specifications DB2 for z/OS (Source and Target) • • • • • Utilized a z10 2097 E12. System capacity number is 708 - approximate SI MIPS = 6439. Used 18 GB memory, 2 TB DASD in DS8000 with RAID 5 configuration. Source LPAR had 4 dedicated CPUs Target LPAR had 4 dedicated CPUs DB2 Version 9.1 subsystem utilized on both source and target. No data sharing defined. DB2 had 890 cylinders. Used Infosphere CDC for z/OS Version 10.2 GA. Page 17 Appendix B: InfoSphere CDC for z/OS Configuration * InfoSphere CDC for z/OS Configuration (where it differs from the defaults) * Basic configuration CONFIG PALRETPD=2, PALCLEANUPTIME=01:00, REPSTATSINTERVAL=5, HEARTBEATTIMEOUT=3, STG64LIMIT=16G /* Default 14, keep 2 days of events. */ /* Default NONE, cleanup at 1 AM. */ /* Default 0, report stats every 5 min. */ /* Default 15, time out after 3 min. */ /* Default 2G, allow big staging space.*/ * TCP/IP Configuration TCP/IP SERVICENAME=65516, CTRLQUEUESIZE=256K, DATAQUEUESIZE=16M /* Default 64K, ensure msgs flow. */ /* Default 1M, keep source scraping. */ * DB2 Configuration DB2 PLANSUFFIX=78, RECOVERYRETRYLIMIT=10, TIMEOUTRECOVERYDELAY=10, RETRYCACHESIZE=1G, COMMITFREQ=(2500,30,2500), LOGPOLLINTERVAL=(5,ALWAYS), MAXSUBSCRSTAGESIZE=(8G,200), ARRAYGROUPSIZE=(30,256K) /* Default 0, non-zero for arraying. */ /* Default 0, non-zero for arraying. */ /* Default 2G, sufficient for arraying. */ /* Default (1000,1,100), commit */ /* every 2500 records on target. */ /* Default 5, poll in non-data-sharing. */ /* Default (2G,10), big staging space */ /* to keep source scraping. */ /* Accept default value for arraying. */ Page 18 Appendix C: Table Definitions Note that the first three tables have identical DDL. The length of the values inserted into the VARCHAR(49) columns is set to 13, 37, or 49 bytes to get 240, 480, or 600 byte rows, respectively. The names of the tables using this schema on the source are: TB_20_240_0 through TB_20_240_9 TB_20_480_0 through TB_20_480_9 TB_20_600_0 through TB_20_600_9 The DDL (apart from name and index name) for all these tables is: CREATE TABLE DMC0070.TB_20_240_0 ( INT1 INTEGER DEFAULT NOT NULL, INT2 INTEGER DEFAULT NOT NULL, INT3 INTEGER DEFAULT NOT NULL, INT4 INTEGER DEFAULT NOT NULL, INT5 INTEGER DEFAULT NOT NULL, DEC1 DECIMAL(31,0) DEFAULT NOT NULL, DEC2 DECIMAL(25,0) DEFAULT NOT NULL, DEC3 DECIMAL(17,0) DEFAULT NOT NULL, CHR1 CHAR(16) DEFAULT NOT NULL, CHR2 CHAR(16) DEFAULT NOT NULL, VCR1 VARCHAR(49) DEFAULT NOT NULL, VCR2 VARCHAR(49) DEFAULT NOT NULL, VCR3 VARCHAR(49) DEFAULT NOT NULL, VCR4 VARCHAR(49) DEFAULT NOT NULL, VCR5 VARCHAR(49) DEFAULT NOT NULL, VCR6 VARCHAR(49) DEFAULT NOT NULL, VCR7 VARCHAR(49) DEFAULT NOT NULL, VCR8 VARCHAR(49) DEFAULT NOT NULL, VCR9 VARCHAR(49) DEFAULT NOT NULL, VCR0 VARCHAR(49) DEFAULT NOT NULL ) IN DB007D.TBSP007S DATA CAPTURE CHANGES; CREATE UNIQUE INDEX DMC0070.TB_20_240_0_X ON DMC0070.TB_20_240_0 (INT1 ASC) USING STOGROUP GRP007 ERASE NO FREEPAGE 0 PCTFREE 10 GBPCACHE CHANGED Page 19 BUFFERPOOL BP0 CLOSE YES PIECESIZE 2097152K; The other tables used contain 20 columns and 1000, 2000, 4000 bytes per row. The length of the values inserted into the VARCHAR(389) columns is set to 89, 189, or 389 bytes to get 1000, 2000, or 4000 byte rows, respectively. The names of these tables are: TB_20_1000_0 through TB_20_1000_9 TB_20_2000_0 through TB_20_2000_9 TB_20_4000_0 through TB_20_4000_9 The DDL (apart from names and index names) for these tables is: CREATE TABLE DMC0070.TB_20_1000_0 ( INT1 INTEGER DEFAULT NOT NULL, INT2 INTEGER DEFAULT NOT NULL, INT3 INTEGER DEFAULT NOT NULL, INT4 INTEGER DEFAULT NOT NULL, INT5 INTEGER DEFAULT NOT NULL, DEC1 DECIMAL(31,0) DEFAULT NOT NULL, DEC2 DECIMAL(25,0) DEFAULT NOT NULL, DEC3 DECIMAL(17,0) DEFAULT NOT NULL, CHR1 CHAR(16) DEFAULT NOT NULL, CHR2 CHAR(16) DEFAULT NOT NULL, VCR1 VARCHAR(389) DEFAULT NOT NULL, VCR2 VARCHAR(389) DEFAULT NOT NULL, VCR3 VARCHAR(389) DEFAULT NOT NULL, VCR4 VARCHAR(389) DEFAULT NOT NULL, VCR5 VARCHAR(389) DEFAULT NOT NULL, VCR6 VARCHAR(389) DEFAULT NOT NULL, VCR7 VARCHAR(389) DEFAULT NOT NULL, VCR8 VARCHAR(389) DEFAULT NOT NULL, VCR9 VARCHAR(389) DEFAULT NOT NULL, VCR0 VARCHAR(389) DEFAULT NOT NULL ) IN DB007D.TBSP007S DATA CAPTURE CHANGES; CREATE UNIQUE INDEX DMC0070.TB_20_1000_0_X ON DMC0070.TB_20_1000_0 (INT1 ASC) USING STOGROUP GRP007 ERASE NO FREEPAGE 0 Page 20 PCTFREE 10 GBPCACHE CHANGED BUFFERPOOL BP0 CLOSE YES PIECESIZE 2097152K; Page 21 Notices © Copyright IBM Corporation 2013 All Rights Reserved. IBM Canada 8200 Warden Avenue Markham, ON L6G 1C7 Canada 04-04 Neither this documentation nor any part of it may be copied or reproduced in any form or by any means or translated into another language, without the prior consent of the above mentioned copyright owner. IBM makes no warranties or representations with respect to the content hereof and specifically disclaims any implied warranties of merchantability or fitness for any particular purpose. IBM assumes no responsibility for any errors that may appear in this document. The information contained in this document is subject to change without any notice. IBM reserves the right to make any such changes without obligation to notify any person of such revision or changes. IBM makes no commitment to keep the information contained herein up to date. Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here. All performance data contained in this publication was obtained in the specific operating environment and under the conditions described above and is presented as an illustration only. Performance obtained in other operating environments may vary, and customers should conduct their own testing The information in this document concerning non-IBM products was obtained from the supplier(s) of those products. IBM has not tested such products and cannot confirm the accuracy of the performance, compatibility, or any other claims related to non-IBM products. Questions about the capabilities of non-IBM products should be addressed to the supplier(s) of those products. Page 22 IBM, the IBM logo, DB2, InfoSphere, and z/OS are trademarks or registered trademarks of International Business Machines Corporation in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others. References in this publication to IBM products or services do not imply that IBM intends to make them available in all countries in which IBM operates.