Presentation

Case Studies in
Hardware Xpath Acceleration
SYSTOR’11, May 30–June 1, 2011, Haifa, Israel.
Dorit Nuzman, Victor Kaplansky, Sergei Dyshel, Alon Dayan
David Maze, Matthias Nicola, Glenn Marcy
1
© 2011 IBM Corporation
Main Goal and Results.
Acceleration of Xpath processing by Hardware in two
real world applications – WBM and DB2-pureXML.
Websphere Business Monitor – 27% improvement in
total running time.
DB2-pureXML – up to x6.2 improvement in total query
processing time.
2
© 2011 IBM Corporation
IBM's Power Edge of Network (PowerEN)
At0
At1
At2
At3
A2
A2
A2
A2
A2
A2
A2
A2
A2
A2
A2
A2
7. output
A2
A2
A2
A2
6. XSLT
transformation
XPath 5. XPath Matching 4. XML
(filtering/tagging) parsing
compiler
Pervasive
Pervasive
Pervasive
Logic
Logic
Logic
Mem Phy Mem Phy
Pattern
Comp /
Matching XML
Crypto
DecompCrypto
Engine
XML
2 MB L2
2 MB L2
2 MB L2
2 MB L2
MC
PIC
MC
Bus Internal I/F Controllers
3. HTTP
handling
Bus
Bus External I/F
Controller
1. receive
network
traffic
PCI
Express Gen.
2
Ethernet Packet Offload Engine
4x 10GE MAC
XML
4B+4B
EI3
3
4B +4B
EI3
4B+4B
EI3
2. SSL
decryption
Bus Internal I/F Controllers
x4 PHY x4 PHY x4 PHY x4 PHY x4 PHY x4 PHY
Flash ROM
and Misc
IO Logic
2x 1GE
MAC
x1
x1
PHY PHY
Misc
I/O
© 2011 IBM Corporation
XPath Filtering and Tagging
<catalog>
<book>
<title/>
<year/>
</book>
<book>
<title/>
<year/>
<special edition>
<year/>
</special edition>
</book>
<magazine>
<title/>
<year/>
</magazine>
</catalog>
Filter: /catalog/book
Tag: //year
Matched by “tag” expression,
and highlighted by hardware
Not matched by filter expression,
so not included in parse tree
XPath is a language used to navigate through
elements and attributes in an XML document
© 2011 IBM Corporation
XPath acceleration opportunities: 1) XML in Healthcare / XML Databases
Clinical XML data
XNAT
DB2pureXML
XML: de-facto standard
for electronic medical
health record
interoperability
Hospital
Appliance
HTTPS
DICOM
DICOM
PowerEN
PACS
PACS
PACS
DICOM
Medical images
© 2011 IBM Corporation
Current DB2-pureXML flow
query
<xnat:dbinfo database.name="CNDA"
<xnat:dbinfo
database.name="CNDA"
database="MYSQL"/>
database="MYSQL"/>
</xs:appinfo>
</xs:appinfo>
</xs:annotation>
</xs:annotation>
</xs:element>
</xs:element>
…
…
result
Age: 34
Processing may consist of:
Age: 64
Age: 52
1. Table operations on indexed
elements
2.
Navigation of the XML
documents
xml
<xnat:dbi
<xnat:dbi
nfo
nfo
database.
database.
name="C
name="C
NDA"
NDA"
xml
Database
<xnat:dbi
<xnat:dbi
nfo
nfo
database.
database.
name="C
name="C
NDA"
NDA"
xml
6
<xnat:dbi
<xnat:dbi
nfo
nfo
database.
database.
name="C
name="C
NDA"
NDA"
© 2011 IBM Corporation
Proposed DB2-pureXML flow with PowerEN
query
<xnat:dbinfo database.name="CNDA"
<xnat:dbinfo
database.name="CNDA"
database="MYSQL"/>
database="MYSQL"/>
</xs:appinfo>
</xs:appinfo>
</xs:annotation>
</xs:annotation>
</xs:element>
</xs:element>
…
…
2) XML navigation part:
Selects relevant parts from the documents
(XPath matching within documents)
1) indexed part:
Filter documents
(table rows)
xml
<xnat:dbi
<xnat:dbi
nfo
nfo
database.
database.
name="C
name="C
NDA"
NDA"
xml
Database
<xnat:dbi
<xnat:dbi
nfo
nfo
database.
database.
name="C
name="C
NDA"
NDA"
result
XML
compiler
Age: 34
Age: 64
Age: 52
xml
7
<xnat:dbi
<xnat:dbi
nfo
nfo
database.
database.
name="C
name="C
NDA"
NDA"
© 2011 IBM Corporation
XPath acceleration opportunities: 2) WebSphere Business Monitor
1:
cbe:CommonBaseEvents/cbe:CommonBaseEvent/@globalInstance
Id
2:
cbe:CommonBaseEvents/cbe:CommonBaseEvent/@creationTime
3: wbi:event/wbi:eventHeaderData/wbi:ECSCurrentID/text()
4: wbi:event/wbi:eventHeaderData/wbi:ECSParentID/text()
5: wbi:event/wbi:eventPointData/wbi:eventNature/text()
(A) XPath expression list
6: wbi:event/wbi:eventPointData/bpc:processTemplateName/text()
7: wbi:event/wbi:eventPointData/bpc:bpelId/text()
>cbe:CommonBaseEvents<
=”...”>cbe:CommonBaseEvent globalInstanceId=”...” creationTime <
cbe:contextDataElements name=”WBIEventVersion” <
”>type=”string
>cbe:contextValue>6.1</cbe:contextValue
<
cbe:contextDataElements> </
> <wbi:event
>wbi:eventHeaderData
<
>wbi:ECSCurrentID>...</wbi:ECSCurrentID
<
>wbi:ECSParentID>...</wbi:ECSParentID
<
>wbi:eventPointData
<
>wbi:eventNature>ENTRY</wbi:eventNature
<
>bpc:BPCEventCode>21000</bpc:BPCEventCode
<
>bpc:processTemplateName>...</bpc:processTemplateName
<
>wbi:eventPointData
</
>wbi:event </
>cbe:CommonBaseEvent </
>cbe:CommonBaseEvents</
(B) Incoming CBE event
(C) WBM
(1) Setup
Register XPaths,
create compiler
8
(2) Compile
XPaths
(3) XML processing
(4) Business processing
(a) parse, (b) XPath tag/filter,
(c) deliver matched values,
(d) fill in expression table
Apply processing
methods according to
keyed values
(D) expression keyed
table of matching values
Key
Value
1
globalInstanceId=”...”
2
creationTime=”...”
3
<wbi:ECSCurrentID>...</wbi:ECSCurrentID>
4
<wbi:ECSParentID>...</wbi:ECSParentID>
5
<wbi:eventNature>ENTRY</wbi:eventNature>
6
<bpc:processTemplateName>...</bpc:processTemplateNa
me>
7
null
© 2011 IBM Corporation
Technical details
XPath file
XML file
1) compiler
3) XML accelerator
parser
2) XML accel. Code
(PPE program)
9
PPE
7)matching
bridge
4)
layer
data items
5) bridge
layer
6) query
results
© 2011 IBM Corporation
A few technical details: 1) the XPath compiler
Filter: /catalog/book
Tag: //year
/catalog/cd
*
STATE 4
year
*
*
STATE 0
catalog
STATE 1
book
initial
*
STATE 2
final
expr: 0
cd
year
10
year
*
year
STATE 3
final
expr: 1
year
*
year
STATE 5
final
expr: 2
© 2011 IBM Corporation
A few technical details: 1) the XPath compiler - cont.
Filter: /catalog/book
Tag: //year
/catalog/cd
*
STATE 4
T
year
*
*
STATE 0
F
initial
catalog
STATE 1
F,T
book
*
STATE 2
F, Ffinal
expr: 0
cd
year
year
*
year
STATE 3
T, Tfinal
expr: 1
year
*
year
STATE 5
T, Tfinal
expr: 2
Note: streamable XPaths only (don’t support /catalog/book[special11
© 2011 IBM Corporation
edition]/year )
A few technical details: 2) the bridge layer
XPath
1)
2)
3)
4)
XML
XJ
B
PowerEN XCI adapter layer
XPath-matcher
(xpd_executable)
12
A
XCI program (Example1.java)
Registration and Initialization
Prepare(): compile XPath “//year”
Execute(): create a cursor to navigate to matching locations
Navigate (toNext(), fork(), toChildren(), toAttributes())
PowerEN Java layer
C
PowerEN C layer (libxj.so, XG5 card)
D
0x25
XG5
T
L
TLA2&
0x25
A
SE
5
TLA7&
0x25
qcode6
TXT
TLA11&
22
txt_addr4
…
12
© 2011 IBM Corporation
The integrated experiment, Using JDBC
DB2
"
Combined DB2+Prism
"
Age:
Age: 34
34
Age:
Age: 64
64
Database
Age:
Age: 52
52
(1) Filter documents (rows)
Database
<xna
<xna
t:dbi
t:dbi
nfo
"
nfo "
<xna
<xna
t:dbi
t:dbi
nfo
nfo
Age:
Age: 34
34
XG5
Age:
Age: 64
64
Age:
Age: 52
52
<xna
<xna
t:dbi
t:dbi
nfo
nfo
(1) DB2 filter and serialize documents
(2) Send the XML document from host to Prism
(2) Navigate the parsed document to find
matches
(3) Parse the document to find matches
(+ compile the XPath query into a program that would
run on the XML accelerator)
(4) Send the results back to the host
(3) Serialize the results
(4) Transmit the results to the client
Processor:
1) dual x86 Harpertown Processors
@2.83GHz
13
(5) Serialize the results
(6) Transmit the results to the client
Processor:
1) dual x86 Harpertown Processors @2.83GHz
2) PRISM offloading the XML processing
© 2011 IBM Corporation
XPath query acceleration speedups, HL7
3.2MBx10
320KBx100
32KBx1000
4.3KBX10000
Many matches,
large output
6.0
5.0
Many matches,
large output
4.0
3.0
Many matches,
small output
2.0
Count (many
matches, no
output)
single match,
large output
1.0
(a)
14
0.0
Query1
Query2
(b)
Query3
(c)
Query4
Query5
© 2011 IBM Corporation
Breakdown of accelerated path, HL7 query
DB2 read and serialize documents
process query
serialize results
100%
80%
60%
40%
20%
Query1
15
Many matches,
large output
Query2
(a)
Many matches,
large output
Query3
(b)
Many matches,
small output
Query4
(c)
Count (many
matches, no output)
4.3KBX10000
32KBx1000
320KBx100
3.2MBx10
4.3KBX10000
32KBx1000
320KBx100
3.2MBx10
4.3KBX10000
32KBx1000
320KBx100
3.2MBx10
4.3KBX10000
32KBx1000
320KBx100
3.2MBx10
4.3KBX10000
32KBx1000
320KBx100
3.2MBx10
0%
Query5
single match,
large ©output
2011 IBM Corporation
Websphere Business Monitor acceleration speedups
 XML processing part improved by 27% 
WBM Overall application improved by 11%
An efficient bridging layer is critical for overall accelerated
performance
- buffering of requests to the accelerator
- reduced JNI calls/Java–C conversions
Applications have to use the “right” API
16
© 2011 IBM Corporation
Conclusions
Conclusions:
 High potential for acceleration can be found in applications using
large documents and XPath queries matching large numbers of XML
nodes and producing large outputs, such as in the healthcare and life
sciences domains.
 Limited potential for acceleration can be found in applications using
small documents and XPath requests matching small numbers of XML
nodes or producing small outputs, such as in the event processing and
financial domains.
 An efficient bridging layer is critical for overall accelerated
performance. Optimizations to the software bridging layers, such as
buffering of requests to the accelerator, reduced JNI calls and Java–C
conversion overheads, yielding a 33% improvement to the WBM
accelerated path, and up to 2.7x improvement to the HL7 accelerated
query processing path.
17
© 2011 IBM Corporation
Future Work
- extend the applicability of XPath acceleration coprocessors
- increase speedups:
Devise a cost model that can automatically identify scenarios that can profit from
XPath acceleration.
Extend XML APIs to express more involved XPath scenarios (such as
simultaneous filtering and tagging, and multi-step XML processing).
Specifically in the native XML Database domain, data serialization costs are
relatively high, and support for compact data formats by the hardware XPath
accelerator is critical.
18
© 2011 IBM Corporation
The End
Questions?
19
© 2011 IBM Corporation