Case Studies in Hardware Xpath Acceleration SYSTOR’11, May 30–June 1, 2011, Haifa, Israel. Dorit Nuzman, Victor Kaplansky, Sergei Dyshel, Alon Dayan David Maze, Matthias Nicola, Glenn Marcy 1 © 2011 IBM Corporation Main Goal and Results. Acceleration of Xpath processing by Hardware in two real world applications – WBM and DB2-pureXML. Websphere Business Monitor – 27% improvement in total running time. DB2-pureXML – up to x6.2 improvement in total query processing time. 2 © 2011 IBM Corporation IBM's Power Edge of Network (PowerEN) At0 At1 At2 At3 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 A2 7. output A2 A2 A2 A2 6. XSLT transformation XPath 5. XPath Matching 4. XML (filtering/tagging) parsing compiler Pervasive Pervasive Pervasive Logic Logic Logic Mem Phy Mem Phy Pattern Comp / Matching XML Crypto DecompCrypto Engine XML 2 MB L2 2 MB L2 2 MB L2 2 MB L2 MC PIC MC Bus Internal I/F Controllers 3. HTTP handling Bus Bus External I/F Controller 1. receive network traffic PCI Express Gen. 2 Ethernet Packet Offload Engine 4x 10GE MAC XML 4B+4B EI3 3 4B +4B EI3 4B+4B EI3 2. SSL decryption Bus Internal I/F Controllers x4 PHY x4 PHY x4 PHY x4 PHY x4 PHY x4 PHY Flash ROM and Misc IO Logic 2x 1GE MAC x1 x1 PHY PHY Misc I/O © 2011 IBM Corporation XPath Filtering and Tagging <catalog> <book> <title/> <year/> </book> <book> <title/> <year/> <special edition> <year/> </special edition> </book> <magazine> <title/> <year/> </magazine> </catalog> Filter: /catalog/book Tag: //year Matched by “tag” expression, and highlighted by hardware Not matched by filter expression, so not included in parse tree XPath is a language used to navigate through elements and attributes in an XML document © 2011 IBM Corporation XPath acceleration opportunities: 1) XML in Healthcare / XML Databases Clinical XML data XNAT DB2pureXML XML: de-facto standard for electronic medical health record interoperability Hospital Appliance HTTPS DICOM DICOM PowerEN PACS PACS PACS DICOM Medical images © 2011 IBM Corporation Current DB2-pureXML flow query <xnat:dbinfo database.name="CNDA" <xnat:dbinfo database.name="CNDA" database="MYSQL"/> database="MYSQL"/> </xs:appinfo> </xs:appinfo> </xs:annotation> </xs:annotation> </xs:element> </xs:element> … … result Age: 34 Processing may consist of: Age: 64 Age: 52 1. Table operations on indexed elements 2. Navigation of the XML documents xml <xnat:dbi <xnat:dbi nfo nfo database. database. name="C name="C NDA" NDA" xml Database <xnat:dbi <xnat:dbi nfo nfo database. database. name="C name="C NDA" NDA" xml 6 <xnat:dbi <xnat:dbi nfo nfo database. database. name="C name="C NDA" NDA" © 2011 IBM Corporation Proposed DB2-pureXML flow with PowerEN query <xnat:dbinfo database.name="CNDA" <xnat:dbinfo database.name="CNDA" database="MYSQL"/> database="MYSQL"/> </xs:appinfo> </xs:appinfo> </xs:annotation> </xs:annotation> </xs:element> </xs:element> … … 2) XML navigation part: Selects relevant parts from the documents (XPath matching within documents) 1) indexed part: Filter documents (table rows) xml <xnat:dbi <xnat:dbi nfo nfo database. database. name="C name="C NDA" NDA" xml Database <xnat:dbi <xnat:dbi nfo nfo database. database. name="C name="C NDA" NDA" result XML compiler Age: 34 Age: 64 Age: 52 xml 7 <xnat:dbi <xnat:dbi nfo nfo database. database. name="C name="C NDA" NDA" © 2011 IBM Corporation XPath acceleration opportunities: 2) WebSphere Business Monitor 1: cbe:CommonBaseEvents/cbe:CommonBaseEvent/@globalInstance Id 2: cbe:CommonBaseEvents/cbe:CommonBaseEvent/@creationTime 3: wbi:event/wbi:eventHeaderData/wbi:ECSCurrentID/text() 4: wbi:event/wbi:eventHeaderData/wbi:ECSParentID/text() 5: wbi:event/wbi:eventPointData/wbi:eventNature/text() (A) XPath expression list 6: wbi:event/wbi:eventPointData/bpc:processTemplateName/text() 7: wbi:event/wbi:eventPointData/bpc:bpelId/text() >cbe:CommonBaseEvents< =”...”>cbe:CommonBaseEvent globalInstanceId=”...” creationTime < cbe:contextDataElements name=”WBIEventVersion” < ”>type=”string >cbe:contextValue>6.1</cbe:contextValue < cbe:contextDataElements> </ > <wbi:event >wbi:eventHeaderData < >wbi:ECSCurrentID>...</wbi:ECSCurrentID < >wbi:ECSParentID>...</wbi:ECSParentID < >wbi:eventPointData < >wbi:eventNature>ENTRY</wbi:eventNature < >bpc:BPCEventCode>21000</bpc:BPCEventCode < >bpc:processTemplateName>...</bpc:processTemplateName < >wbi:eventPointData </ >wbi:event </ >cbe:CommonBaseEvent </ >cbe:CommonBaseEvents</ (B) Incoming CBE event (C) WBM (1) Setup Register XPaths, create compiler 8 (2) Compile XPaths (3) XML processing (4) Business processing (a) parse, (b) XPath tag/filter, (c) deliver matched values, (d) fill in expression table Apply processing methods according to keyed values (D) expression keyed table of matching values Key Value 1 globalInstanceId=”...” 2 creationTime=”...” 3 <wbi:ECSCurrentID>...</wbi:ECSCurrentID> 4 <wbi:ECSParentID>...</wbi:ECSParentID> 5 <wbi:eventNature>ENTRY</wbi:eventNature> 6 <bpc:processTemplateName>...</bpc:processTemplateNa me> 7 null © 2011 IBM Corporation Technical details XPath file XML file 1) compiler 3) XML accelerator parser 2) XML accel. Code (PPE program) 9 PPE 7)matching bridge 4) layer data items 5) bridge layer 6) query results © 2011 IBM Corporation A few technical details: 1) the XPath compiler Filter: /catalog/book Tag: //year /catalog/cd * STATE 4 year * * STATE 0 catalog STATE 1 book initial * STATE 2 final expr: 0 cd year 10 year * year STATE 3 final expr: 1 year * year STATE 5 final expr: 2 © 2011 IBM Corporation A few technical details: 1) the XPath compiler - cont. Filter: /catalog/book Tag: //year /catalog/cd * STATE 4 T year * * STATE 0 F initial catalog STATE 1 F,T book * STATE 2 F, Ffinal expr: 0 cd year year * year STATE 3 T, Tfinal expr: 1 year * year STATE 5 T, Tfinal expr: 2 Note: streamable XPaths only (don’t support /catalog/book[special11 © 2011 IBM Corporation edition]/year ) A few technical details: 2) the bridge layer XPath 1) 2) 3) 4) XML XJ B PowerEN XCI adapter layer XPath-matcher (xpd_executable) 12 A XCI program (Example1.java) Registration and Initialization Prepare(): compile XPath “//year” Execute(): create a cursor to navigate to matching locations Navigate (toNext(), fork(), toChildren(), toAttributes()) PowerEN Java layer C PowerEN C layer (libxj.so, XG5 card) D 0x25 XG5 T L TLA2& 0x25 A SE 5 TLA7& 0x25 qcode6 TXT TLA11& 22 txt_addr4 … 12 © 2011 IBM Corporation The integrated experiment, Using JDBC DB2 " Combined DB2+Prism " Age: Age: 34 34 Age: Age: 64 64 Database Age: Age: 52 52 (1) Filter documents (rows) Database <xna <xna t:dbi t:dbi nfo " nfo " <xna <xna t:dbi t:dbi nfo nfo Age: Age: 34 34 XG5 Age: Age: 64 64 Age: Age: 52 52 <xna <xna t:dbi t:dbi nfo nfo (1) DB2 filter and serialize documents (2) Send the XML document from host to Prism (2) Navigate the parsed document to find matches (3) Parse the document to find matches (+ compile the XPath query into a program that would run on the XML accelerator) (4) Send the results back to the host (3) Serialize the results (4) Transmit the results to the client Processor: 1) dual x86 Harpertown Processors @2.83GHz 13 (5) Serialize the results (6) Transmit the results to the client Processor: 1) dual x86 Harpertown Processors @2.83GHz 2) PRISM offloading the XML processing © 2011 IBM Corporation XPath query acceleration speedups, HL7 3.2MBx10 320KBx100 32KBx1000 4.3KBX10000 Many matches, large output 6.0 5.0 Many matches, large output 4.0 3.0 Many matches, small output 2.0 Count (many matches, no output) single match, large output 1.0 (a) 14 0.0 Query1 Query2 (b) Query3 (c) Query4 Query5 © 2011 IBM Corporation Breakdown of accelerated path, HL7 query DB2 read and serialize documents process query serialize results 100% 80% 60% 40% 20% Query1 15 Many matches, large output Query2 (a) Many matches, large output Query3 (b) Many matches, small output Query4 (c) Count (many matches, no output) 4.3KBX10000 32KBx1000 320KBx100 3.2MBx10 4.3KBX10000 32KBx1000 320KBx100 3.2MBx10 4.3KBX10000 32KBx1000 320KBx100 3.2MBx10 4.3KBX10000 32KBx1000 320KBx100 3.2MBx10 4.3KBX10000 32KBx1000 320KBx100 3.2MBx10 0% Query5 single match, large ©output 2011 IBM Corporation Websphere Business Monitor acceleration speedups XML processing part improved by 27% WBM Overall application improved by 11% An efficient bridging layer is critical for overall accelerated performance - buffering of requests to the accelerator - reduced JNI calls/Java–C conversions Applications have to use the “right” API 16 © 2011 IBM Corporation Conclusions Conclusions: High potential for acceleration can be found in applications using large documents and XPath queries matching large numbers of XML nodes and producing large outputs, such as in the healthcare and life sciences domains. Limited potential for acceleration can be found in applications using small documents and XPath requests matching small numbers of XML nodes or producing small outputs, such as in the event processing and financial domains. An efficient bridging layer is critical for overall accelerated performance. Optimizations to the software bridging layers, such as buffering of requests to the accelerator, reduced JNI calls and Java–C conversion overheads, yielding a 33% improvement to the WBM accelerated path, and up to 2.7x improvement to the HL7 accelerated query processing path. 17 © 2011 IBM Corporation Future Work - extend the applicability of XPath acceleration coprocessors - increase speedups: Devise a cost model that can automatically identify scenarios that can profit from XPath acceleration. Extend XML APIs to express more involved XPath scenarios (such as simultaneous filtering and tagging, and multi-step XML processing). Specifically in the native XML Database domain, data serialization costs are relatively high, and support for compact data formats by the hardware XPath accelerator is critical. 18 © 2011 IBM Corporation The End Questions? 19 © 2011 IBM Corporation