IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Product(s): Adaptive Analytics Framework, Adaptive Warehouse, Adaptive Analytics, DB2 LUW; Area of Interest: Performance Greg Harris Software Engineer IBM 14 December 2010 This document describes implementation of DB2 row compression in an Adaptive Warehouse generated data warehouse. View more content in this series Introduction Purpose This document describes implementation of DB2 row compression in an Adaptive Warehouse generated data warehouse. DB2 row compression features can decrease disk storage requirements while at the same time improve sequential read access of large queries which perform full table scans. Applicability Adaptive Warehouse all versions using DB2 ESE LUW 9.1 with the row compression option or greater as a target warehouse backing store. Exclusions and Exceptions None. About DB2 Row Compression DB2 row compression was designed to reduce disk storage space requirements of database objects. It does this by creating a compression dictionary of symbols which replace repeating data patterns at the row level. There is some CPU overhead in performing the dictionary translation, however there is also an improvement in disk IO efficiency as more data can be retrieved per IO unit. A data page also remains compressed in the DB2 buffer pool there by allowing more data to be cached in memory for potential reuse. © Copyright IBM Corporation 2010 IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Trademarks Page 1 of 6 developerWorks® ibm.com/developerWorks/ Applications implemented on DB2 using the Adaptive Warehouse (AW) which contain large fact tables are suitable candidates for implementing DB2 row compression. Report queries where full table scans are performed can gain marked improvement in query performance when row compression is applied. Example Query Performance without and with Row Compression Enabled Following are DB2BATCH run sample summaries illustrating query execution times before and after row compression has been enabled. These examples refer to the fact table ORDER_ACTIVITY_MEASURES from the GOSales sample warehouse application. This report query requires a full table scan of the fact table. * Timestamp: Mon Jun 29 2009 12:08:05 Eastern Daylight Time --------------------------------------------* SQL Statement Number 1: SELECT "T0"."C0" "Yearkey" , "T0"."C1" "C__memberCaption" , "T0"."C0" "Ordinal0" , "T0"."C2" "Quarterkey" , "T0"."C3" "C__memberCaption0" , "T0"."C4" "Ordinal1" , "T0"."C5" "Monthkey" , "T0"."C6" "C__memberCaption1" , "T0"."C7" "Ordinal" , "T0"."C8" "C___PMT" , "T0"."C9" "C___SMT1" FROM (SELECT "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" "C0" , "ALL_TIME"."GREGORIAN_YEAR_CAPTION" "C1" , "ALL_TIME"."GREGORIAN_CALENDAR_QUARTER" "C2" , "ALL_TIME"."GREGORIAN_QUARTER_CAPTION" "C3" , "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" * 4 + "ALL_TIME"."GREGORIAN_CALENDAR_QUARTER" "C4" , "ALL_TIME"."GREGORIAN_CALENDAR_MONTH" "C5" , "ALL_TIME"."GREGORIAN_MONTH_CAPTION" "C6" , "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" * 12 + "ALL_TIME"."GREGORIAN_CALENDAR_MONTH" "C7" , SUM("Order_Activity_Measures"."ORDER_NET_AMOUNT") "C8" , SUM("Order_Activity_Measures"."ORDER_MARGIN_AMOUNT") "C9" FROM "DB2ADMIN"."ALL_TIME" "ALL_TIME", "DB2ADMIN"."ORDER_ACTIVITY_MEASURES" "Order_Activity_Measures" WHERE "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" * 12 + "ALL_TIME"."GREGORIAN_CALENDAR_MONTH" BETWEEN 24062 AND 24073 AND ( 'Rolling Periods' = 'Rolling Periods' OR 'Rolling Periods' = 'YTD' ) AND "ALL_TIME"."CALENDAR_TYPE_NAME" IN ('Gregorian', '-' , 'Late Date', 'Early Date') AND NOT "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" IS NULL AND "ALL_TIME"."CALENDAR_TYPE_NAME" IN ('Gregorian', '-' , 'Late Date', 'Early Date') AND NOT "ALL_TIME"."GREGORIAN_CALENDAR_QUARTER" IS NULL AND "ALL_TIME"."CALENDAR_TYPE_NAME" IN ('Gregorian', '-' , IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Page 2 of 6 ibm.com/developerWorks/ developerWorks® 'Late Date', 'Early Date') AND NOT "ALL_TIME"."GREGORIAN_CALENDAR_MONTH" IS NULL AND "Order_Activity_Measures"."ORDER_DATE_ALL_TIME_SID" = "ALL_TIME"."ALL_TIME_SID" GROUP BY "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" "ALL_TIME"."GREGORIAN_YEAR_CAPTION" "ALL_TIME"."GREGORIAN_CALENDAR_QUARTER" "ALL_TIME"."GREGORIAN_QUARTER_CAPTION" "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" * 4 + "ALL_TIME"."GREGORIAN_CALENDAR_QUARTER", "ALL_TIME"."GREGORIAN_CALENDAR_MONTH" "ALL_TIME"."GREGORIAN_MONTH_CAPTION" "ALL_TIME"."GREGORIAN_CALENDAR_YEAR" * 12 + "ALL_TIME"."GREGORIAN_CALENDAR_MONTH" ) "T0" FOR FETCH ONLY; , , , , , , Yearkey C__memberCaption Ordinal0 Quarterkey C__memberCaption0 Ordinal1 Monthkey C__memberCaption1 Ordinal C___PMT C___SMT1 ----------- -------------------- ----------- ----------- -------------------- --------------------- -------------------- ----------- --------------------------------- -------------------------------2006 2006 2006 1 Q1/2006 8025 1 Jan/2006 24073 2704738640.30 1129361652.90 * 1 row(s) fetched, 1 row(s) output. * Elapsed Time is: 12.079891 seconds * Summary Table: Type Number Repetitions Total Time (s) Min Time (s) Max Time (s) Arithmetic Mean Geometric Mean Row(s) Fetched Row(s) Output --------- ----------- ----------- -------------- -------------- ---------------------------- -------------- -------------- ------------Statement 1 1 12.079891 12.079891 12.079891 12.079891 12.079891 1 1 * Total Entries: 1 * Total Time: 12.079891 seconds * Minimum Time: 12.079891 seconds * Maximum Time: 12.079891 seconds * Arithmetic Mean Time: 12.079891 seconds * Geometric Mean Time: 12.079891 seconds --------------------------------------------* Timestamp: Mon Jun 29 2009 12:08:18 Eastern Daylight Time After enabling compression the query executes nearly 30% more quickly, as shown below: * 1 row(s) fetched, 1 row(s) output. * Elapsed Time is: 8.547579 seconds * Summary Table: Type Number Repetitions Total Time (s) Min Time (s) Max Time (s) Arithmetic Mean Geometric Mean Row(s) Fetched Row(s) Output --------- ----------- ----------- -------------- -------------- ---------------------------- -------------- -------------- ------------Statement 1 1 8.547579 8.547579 8.547579 8.547579 8.547579 1 1 * Total Entries: * Total Time: 1 8.547579 seconds IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Page 3 of 6 developerWorks® ibm.com/developerWorks/ * Minimum Time: 8.547579 seconds * Maximum Time: 8.547579 seconds * Arithmetic Mean Time: 8.547579 seconds * Geometric Mean Time: 8.547579 seconds --------------------------------------------* Timestamp: Mon Jun 29 2009 12:36:24 Eastern Daylight Time Altering Adaptive Warehouse Fact Tables to enable Row Compression There are some differences in how row compression is implemented using DB2 version 9.1 and later versions. For DB2 9.1: a. Run AW load management to deliver some initial data for a fact build. Before clicking the run button, check the synchronization tab and note the name of any table objects ending with “_MEASURES”. This will be the physical name of the fact table created in the warehouse. Complete the data load. b. Using the DB2cmd SQL tool, alter the AW fact tables noted in (a) to enable row compression. For example to enable row compression for the table MYSCHEMA.JOB_ACTION_MEASURES, enter the command: ALTER TABLE MYSCHEMA.JOB_ACTION_MEASURES COMPRESS YES; c. Reorganize the table to create a compression dictionary and compress the existing contents.Eg. REORG TABLE MYSCHEMA.JOB_ACTION_MEASURES RESETDICTIONARY; d. Continue load management operations as you normally would. All subsequent data added to the table will be compressed. For later versions of DB2: NaN From DB2 version 9.5 the database can dynamically create the compression dictionary as data is loaded into the fact table. Using load management check the synchronization tab for the fact build table objects, locate those with names ending with “_MEASURES” and note the name and status. NaN If the synchronization status is “will be created”, select “Script” and choose the “Create a separate script for database synchronisation” option then click OK . Edit the generated SQL script adding the COMPRESS YES clause to the CREATE TABLE statement for the noted fact tables. Run the SQL script on the warehouse schema. NaN If the synchronization status is “same as db”, alter the table using the DB2cmd SQL editor. For example. ALTER TABLE MYSCHEMA.JOB_ACTION_MEASURES COMPRESS YES; NaN Continue load management operations as you would normally. All subsequent data added to the table will be compressed. Script to enable Row Compression across AW Fact Tables The following script can be edited and run on your warehouse database to generate a script to enable row compression across all AW created fact tables. The script below will scan for AW IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Page 4 of 6 ibm.com/developerWorks/ developerWorks® fact tables named using the wildcard expression %_MEASURES and scripts ALTER TABLE and REORG commands. Note the script runs the generated commands immediately so review the script accordingly. If running DB2 9.1 you will have had to run load management to initially deliver some data to your fact tables. For later versions of DB2 you can remove or ignore the REORG statements or their generation SQL, which are not required to be run. A copy of the script is installed in the <AW root >\PW\scripts folder when installing the Adaptive Application Framework, see the file example_row_compression.db2. db2 db2 db2 db2 db2 db2 db2 db2 db2 db2 db2 db2 db2 db2 -- example_row_compression.db2 --- Description: This script enables db2 row compression -on all AAF described fact tables. -Note that DB2 Enterprise Edition -with the row compresson option or -DB2 Data Warehouse Edition is -require to support this. -Ensure DB is idle before using and -warehouse is populated with data. --- Usage: Edit for proper <database_name> and -<schema_name> . Run from DB2 CLIP window. -- db2 connect to <database_name> db2 -x -z comp1.sql "select distinct 'ALTER TABLE <schema_name>.'||a.name||' COMPRESS YES;' from <schema_name>.pwd_physical_object a, <schema_name>.pwd_physical_object_type b where a.type_code=b.type_code and trim(b.type_name)='fact_detail'" db2 -tvf comp1.sql db2 commit db2 -x -z comp2.sql "select distinct 'REORG TABLE <schema_name>.'||a.name||' RESETDICTIONARY;' from <schema_name>.pwd_physical_object a, <schema_name>.pwd_physical_object_type b where a.type_code=b.type_code and trim(b.type_name)='fact_detail'" db2 -tvf comp2.sql db2 commit db2 terminate Sample script output created: ALTER TABLE WH.ORDER_ACTIVITY_MEASURES COMPRESS YES; ALTER TABLE WH.RETURN_ACTIVITY_MEASURES COMPRESS YES; ALTER TABLE WH.SHIPPED_ORDERS_SUMM_MEASURES COMPRESS YES; REORG TABLE WH.ORDER_ACTIVITY_MEASURES RESETDICTIONARY; REORG TABLE WH.RETURN_ACTIVITY_MEASURES RESETDICTIONARY; REORG TABLE WH.SHIPPED_ORDERS_SUMM_MEASURES RESETDICTIONARY; IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Page 5 of 6 developerWorks® ibm.com/developerWorks/ About the author Greg Harris Greg Harris is a Database Administrator and ERP software specialist who has been a member of the Adaptive Applications Framework development team since 2004. © Copyright IBM Corporation 2010 (www.ibm.com/legal/copytrade.shtml) Trademarks (www.ibm.com/developerworks/ibm/trademarks/) IBM Cognos Proven Practices: Using DB2 Row Compression in Adaptive Warehouse Projects Page 6 of 6