Syntactic Difference Based Approach for NTCIR-9 RITE Task

IBM Research
Syntactic Difference Based Approach
for NTCIR-9 RITE Task
Team ID: IBM
Yuta Tsuboi, Hiroshi Kanayama, Masaki Ohno
IBM Research – Tokyo
Yuya Unno
Preferred Infrastructure
Submitted results for four Japanese RITE subtasks (BC, MC, EXAM and RITE4QA)
- Performed the best score in MC and EXAM subtasks
1
© 2011 IBM Corporation
IBM Research
Overview of the approach
• Very difficult task due to the real-world complex dataset
Impossible to write down hand-crafted rules
Machine learning approach with various features
• Similarities and differences between <S> and <T> * are
the key features for entailment determination
Matching on the surface is not sufficient
Calculation of tree edit distance between parsed trees
 Alignment of two sentences considering syntactic structures
 Estimation of similarity (small edit operation costs  similar pair)
 Edit operations (insert, delete and replacement) as features of ML
2
* <S> and <T> denote an input sentence pair (<t1> and <t2> in the original file).© 2011 IBM Corporation
IBM Research
Techniques and resources for our machine learning approach
<S>,<T>
Resources
Input
Syntactic parsing
Japanese thesaurus
S, T
Ontology
Tree edit distance computation
Pair feature extraction
Features
Training
Expanded
training corpus
3
Logistic
regression/
Cross-validation
Prediction
Classifier
Model
Output
Labels
Y/N
F/R/B/C/I
© 2011 IBM Corporation
IBM Research
Tree edit distance - Concept
 Hypotheses
– Two sentences (<S> and <T>) have syntactic similarities and differences
– A pair of similar sentences has high possibility of entailment
– Difference parts can be clues for the determination of entailment
 Solution
– Parse two sentences
– Align parse trees by calculating the tree edit distance between them
<S>学校に毎日行く</S>
<T>学校に行く</T>
行く
go
学校に
to school
毎日
everyday
<S>大統領が殺された</S>
<T>熊が殺された</T>
行く
go
学校に
to school
delete an
an adverb
adverb “everyday”
“everyday”
delete
4
殺された
be killed
殺された
be killed
大統領が
president
熊が
bear
replace “president”
“president” into
into “bear”
“bear”
replace
© 2011 IBM Corporation
IBM Research
Tree Edit Distance – General Implementation
 Edit distance δ:
 Edit operations
Replacement
Deletion
Insertion
s
s
Replace

Delete
t
Cost for replacement: γ(s, t )
Insert
t
Cost for deletion: γ(s,ε)
Cost for insertion: γ(ε, t)
• Edit distance computation: O(|s|2|t|2) time and space
• Our code for tree edit distance is available at
https://github.com/unnonouno/tree-edit-distance (new BSD license)
5
© 2011 IBM Corporation
IBM Research
Cost Functions for Tree Edit Distance
 Insertion / Deletion cost: constant.
–γ( s, ε) = γ(ε, t ) = 1
 Replacement cost: a smaller value for a more similar bunsetsu pair
–Mixed various metrics for similarity:
Cost functions
How to measure the similarity of two bunsetsus
Jaccard distance metrics
using word overlap (WO)
Overlap ratio between each morpheme set
(handling both content and functional words)
Semantic distance metrics
using an ontology (Ontology)
Inverse of shortest path length of two head content
words in the ontology (see next slide)
Semantic distance metrics
using thesaurus (BGH)
Common depth of two head content words in the
thesaurus tree
Heuristic distance metrics (HDM)
Similarity value considering parts-of-speech and
the thesaurus above
*National Language Research Institute. Bunrui-goi-hyo(revised and enlarged edition), 1996. (In Japanese).
6
© 2011 IBM Corporation
IBM Research
Semantic Similarity and Resources

We defined two measures for semantic similarity with two complementary resources.
Manually generated resource:
Japanese Thesaurus [分類語彙表]
Automatically generated resource:
IS-A Ontology generated from Wikipedia *
 High coverage for basic Japanese expression
 Up-to-date knowledge using the latest edition
category
category
人間
(human)
instance
感覚
( sense )
視覚
(visual sense)
ソーシャル・ネットワーク・サービス
( Social Network service )
感情
(emotion)
聴覚
(hearing sense)
Similarity measure: Common depth of two
head content words in the tree
…
Google+
instance
Foursquare
ソーシャル・ブックマーク
(Social Bookmark)
Delicious
Tumbler
Similarity measure: inverse of shortest
path length in the ontology tree
*Y. Shibaki. Constructing large-scale general ontology from wikipedia. Master's thesis, Nagaoka University of Technology, Japan, 2011.
7
© 2011 IBM Corporation
IBM Research
Pair features (1) – Similarity and difference between S and T
 Represent a sentence pair with several features
 Train the logistic regression model using the annotated data
Edit distance and operations (EDO)
Normalized edit distance:
δ(s, t)
max(|s|, |t|)
insertion / deletion e.g.
Edit operations:
0 = same sentence
1 = totally different
word
Deletion : “everyday”
Deletion : ADVERB
POS
replacement e.g. Replacement: “president”-”bear”
Replacement: Noun - Noun
Word overlapping (Word)
Overlap ratio:
8
ms ∩ mt
word pair
POS pair
All combinations of
head content words
| mt |
Word pairs: (school, school) (go, school) (everyday, school)
(school, go) (go, go) (everyday, go)
© 2011 IBM Corporation
IBM Research
Pair features (2)- Ad-hoc strong clues
Sentiment polarity matching
• Applied existing sentiment detector
• “It is excellent”  positive
• “I don’t like this”  negative
• Sentiment orientation of the sentence pair
– Same polarity
Strong
– Different polarity
clue for
non-entail
– Opposite polarity
PAS fulfillment test (PAS)
• Convert S and T to the sets of
predicate-argument structures
• Whether all predicate-argument
structures in T are covered by
9
those in S
Examples of fulfillment
Strong clues for entailment
© 2011 IBM Corporation
IBM Research
Pair features (3) – Designed for EXAM subtask
Temporal Matching
 Many sentence pairs in EXAM data includes temporal expressions
 Exploit a feature whether the temporal expressions in S and T have overlap
<S>シュマルカルデン同盟とは、1531年に、…</S>
‘Schmalkaldic League is … in 1531.’
<T>16世紀に、ドイツでは、シュマルカルデン同盟…</T>
‘In the 16th century of Germany, Schmalkaldic League … ’
<S>1997年まで東南アジアバブルであったが、…</S>
‘Until 1997 there was an economic bubble in South-east Asia, …
(year of) 1531
[1531, 1531]
16th century
[1501, 1600]
(year of ) 1997
[1997, 1997]
<T>1990年代初めに起こった日本でのバブル経済の崩壊が、…</T>
‘The collapse of Japanese bubbled economy happened in the early 1990s, … ’
Early 1990s
[1990, 1994]
10
Temporal
Match
Strong
clue for
entailment
Temporal
Mismatch
Strong
clue for
non-entail
© 2011 IBM Corporation
IBM Research
Expansion of the Training Data
 Convert the training data for the MC subtask as the additional training data for BC subtask,
and vice versa.
– e.g. Forward entailment label (F) between S and T is equal to the true entailment (Y) for
S  T and false entailment for T  S.
 Label conversion rule
* F,B over arrow denotes either Forward (F) or
Bidirectional (B) entailment between S and T
(label ambiguities).
 Enhanced data
– BC+MC’ data : 500+880 pairs
– MC+BC’ data : 500+440 pairs (To handle label ambiguities, we train logistic regression
by marginal log-likelihood maximization)
11
© 2011 IBM Corporation
IBM Research
Results – BC Subtask
Cost Function
Cross validation on
training data
Features
Accuracy in
formal run
Training
CV
AC
w/o edit
distance
None
Word + Sentiment + PAS +
Temporal
BC
52.8
52.0
IBM
BC1
HDM
EDO + Word + Sentiment + PAS BC
54.8
52.2
IBM
BC2
HDM +
BGH
EDO + Word + Sentiment + PAS BC
54.0
52.6
IBM
BC3
HDM +
BGH + WO
EDO (POS fine) + Word
64.1
47.0
Oracle
BGH
EDO + Word + Sentiment + PAS BC
51.8
56.0
Best feature set
in formal run
12
BC + MC’
Positive performance
performance gain
gain with
with the
the edit
edit distance
distance method
method
•• Positive
Very low
low correlation
correlation between
between CV
CV and
and AC
AC
•• Very
→ the
the results
results are
are unpredictable
unpredictable
→
BC+MC’ increases
increases CV
CV by
by 8%~13%,
8%~13%, but
but no
no effect
effect for
for AC
AC
•• BC+MC’
© 2011 IBM Corporation
IBM Research
Results – MC Subtask
Cost Function
Features
Training
CV
AC
w/o edit
distance
None
Word + Sentiment + PAS +
Temporal
MC
33.6
35.9
IBM
MC1
HDM
EDO + Word + Sentiment + PAS
MC + BC’
46.8
43.6
IBM
MC2
HDM +
BGH + WO
EDO + Word
MC
50.2
50.2
IBM
MC3
HDM +
BGH + WO
EDO (POS fine) + Word
MC + BC’
51.3
44.5
Oracle
HDM + Ont
+ WO
EDO + Word + Sentiment
MC
51.1
51.6
Achieved high
high accuracy
accuracy for
for 5-fold
5-fold classification
classification
•• Achieved
EDO features
features increased
increased the
the accuracy
accuracy by
by 10%
10%
•• EDO
Other pair
pair features
features tend
tend not
not to
to work
work well
well
•• Other
Extended development
development data
data (MC+BC’)
(MC+BC’) was
was not
not effective
effective
•• Extended
13
© 2011 IBM Corporation
IBM Research
Results – EXAM Subtask
Cost Function
Features
Training
CV
AC
None
Word + Sentiment + PAS +
Temporal
EXAM
67.9
69.5
HDM
EDO + Word + Sentiment +
PAS
EXAM
63.7
67.6
IBM
EX1
HDM
EDO + Word + Sentiment +
PAS+ Temporal
EXAM
69.1
72.2
IBM
EX2
Ontology
EDO + Word + Sentiment +
PAS+ Temporal
EXAM
62.5
67.4
IBM
EX3
HDM + BGH +
WO+ Ontology
EDO (POS fine) + Word +
Sentiment + PAS +Temporal
EXAM
61.5
58.4
Oracle
HDM
EDO + Word + Sentiment +
PAS+ Temporal
EXAM
68.3
72.6
w/o edit
distance
14
Relatively small
small contribution
contribution of
of edit
edit distance
distance
•• Relatively
Temporal increased
increased the
the accuracy
accuracy by
by 5%
5%
•• Temporal
High correlation
correlation between
between CV
CV and
and AC
AC
•• High
© 2011 IBM Corporation
IBM Research
Summary
 Achieved good performance in MC and EXAM subtasks
–Machine learning approach with various features
–Tree edit operations worked as key features (especially in MC task)
–Use of thesaurus and ontology – complementary resources
 Performance is unpredictable – the model is still immature
 No special treatment for 5-fold classification in MC task – needed more
observation
15
© 2011 IBM Corporation
IBM Research
Backup
16
© 2011 IBM Corporation
IBM Research
Details of MC results – confusion matrix
System Output
Correct Label
17
F
R
B
C
I
F
87
6
21
30
40
R
7
89
20
9
16
B
7
12
30
16
6
C
4
0
1
5
4
I
5
3
3
5
14
© 2011 IBM Corporation
IBM Research
Results – RITE4QA Subtask
Features
Cost Function
Training
AC
Top1
MMR
5
w/o edit
distance
None
Word + Sentiment + PAS +
Temporal
BC
34.5
5.5
19.8
IBM
R4QA1
HDM & Ont
& WO
EDO (POS fine) + Word +
Sentiment + PAS
BC
33.3
11.3
23.3
IBM
R4QA2
HDM &
BGH
EDO
BC
31.6
9.1
21.7
IBM
R4QA3
HDM & Ont
& WO
EDO (POS fine) + Word +
BC+MC’
Sentiment + PAS +Temporal
40.1
8.7
22.2
Oracle
None
Word + Sentiment + PAS +
Tempora
63.5
18.1
29.0
18
BC+MC’
© 2011 IBM Corporation