The IBM semantic concept detection framework

IBM Research
The IBM Semantic Concept
Detection Framework
Arnon Amir, Giri Iyengar, Ching-Yung Lin, Chitra Dorai,
Milind Naphade, Apostol Natsev, Chalapathy Neti,
Harriet Nock, Ishan Sachdev, John Smith,
Yi Wu, Belle Tseng, Dongqing Zhang
11/17/2003 | TRECVID Workshop 2003
© 2002 IBM Corporation
IBM Research
Outline
q Concept Detection as a Machine Learning Problem
q The IBM TREC 2003 Concept Detection Framework
§ Modeling in Low-level Features
§ Multi-classifier Decision fusion
§ Modeling in High-level (semantic) Features
q Putting it All Together: TREC 2003 Concept Detection
q Observations
2
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Multimedia Analytics by Supervised Learning
User
Annotation
Training
Video
Repository
Feature
Extraction
Test
Videos
Feature
Extraction
Training
Semantic
Concept
Models
Detection
MPEG-7
Annotations
Analysis
3
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Multi-layered Concept Detection:
Working in Increasingly (Semantically) Meaningful Feature
• Improving Detection
Spaces
• Building Complex Concepts
(e.g. News Subject Monologue)
High-level Feature Space Models
e.g. Multinet, DMF (SVM, NN),
Propagation Rules
Low-level Feature Space Models
e.g. SVM, GMM, HMM, TF-IDF
Videos
Low-level
Feature
Extraction
Detection using
Models built in
low-level
Feature Spaces
e.g. Color,
texture, Shape,
MFCC, Motion
4
The IBM TREC-2003 Concept Detection Framework
High-level
Feature
Space Mapping
Detection and
Manipulation in
High-level
Feature Spaces
Cityscape
face
e.g.
Face,People,
Cityscape etc.
People
© 2003 IBM Corporation
IBM Research
The Evolving IBM Concept Detection System
IBM TREC’01, 02
Post TREC’ 02 Experiments
IBM TREC’03
Use of SVM, GMM and HMM
Classifiers for modeling lowlevel features
Use of SVM, GMM and HMM
Classifiers for modeling low-level
and high-level features
Use of SVM, GMM and HMM
Classifiers for low-level and highlevel features
Ensemble and Discriminant
Fusion (TREC02) of Multiple
Models of Same Concept
Improved performance over
single models
Ensemble and Discriminant
Fusion of Multiple Models of
Same Concept
Improved performance over single
models
Ensemble and Discriminant
Fusion of Multiple Models of
Same Concept
Improved performance over single
models
Rule-based Preprocessing
(e.g. Non-Studio Setting=
(NOT(Studio_Indoor_Setting)) OR
(Outdoors))
Validity Weighted Similarity
Improves Robustness
Validity Weighted Similarity
Improves Robustness
Semantic feature based Models
(Multinet, DMF)
Improves Performance over
Single-concept models
Semantic feature based Models
(Multinet, DMF-SVMs, NN,
Boosting), Ontology
Improves Performance over
Single-concept models
Post-Filtering
Improves Precision
5
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Video Concept Detection Pipeline
Annotation and
Data
Preparation
Feature Extraction
Low-level
Feature-based
Models
SD/A
CLG
V1
CT
V2
WT
TAM
Feature
Extraction
Videos
MI
V2
MV
AUD
: training only
: training and testing
6
V1
BOU
47 Other Concepts
EH
Best Uni-model
Ensemble Run
The IBM TREC-2003 Concept Detection Framework
VW
Filtering
Region
Extraction
Annotation
CH
17 TREC Benchmark Concepts
CC
High-level (semantic) PostFusing
Context based processing
Models of
Methods
each concept
across lowlevel
featureMN
based
techniques
DMF17
MLP
EF
MLP
ONT
EF2
DMF64
BOF
Best Multi-modal
Ensemble Run
BOBO
Best-of-the-Best
Ensemble Run
© 2003 IBM Corporation
IBM Research
Corpus Issues
q Multi-layered Detection Approach needs multiple sets for cross validation
q Partitioning of Feature Development Set so that each level of processing has a training
set and a test set partition that is unadulterated by the processing at the previous level.
q E.g. Low-level feature based concept models built using Training Set and performance
optimized over Validation Set.
q Single-Concept, multi-model fusion is performed using Validation Set for training and
Fusion Validation Set 1 for testing.
q Semantic-level fusion is performed by using Fusion Validation Set 1 as the training set
and Fusion Validation Set 2 as the test set
q Runs submitted to NIST are chosen finally on performance of all systems and algorithms
on Fusion Validation Set 2.
Fusion Validation
Set 2
20%
Training Set
Partitioning procedure
All videos aligned by their temporal
order and
Validation Set 1
For each set of 10 videos
Fusion Validation
Set 1
10%
Training Set
60%
Fusion Validation
Set 1
Validation Set 1
10%
Fusion Validation
Set 2
• First 6 -> Training Set,
• 7th -> Validation
• 8th -> Fusion Validation Set 1
• Last 2 ->Fusion Validation Set 2.
7
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Video Concept Detection Pipeline: Features
Annotation and
Data
Preparation
Feature Extraction
CC
Region
Extraction
Annotation
CH
CLG
CT
WT
TAM
EH
Feature
Extraction
Videos
MI
MV
AUD
: training only
: training and testing
8
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Feature Extraction
Features extracted globally and regionally
Color:
Lexicon
Color histograms (512 dim), Auto-Correlograms
(166 dim)
Shot
Segmentation
Annotation
Feature
Extraction
Region
Segmentation
Structure & Shape:
Edge orientation histogram (64 dim), Dudani Moment
Invariants (6 dim),
Texture
Co-occurrence texture (96 dim), Coarseness (1 dim),
Contrast (1 dim), Directionality (1 dim), Wavelet (12
dim)
Motion
Motion vector histogram (6 dim)
Audio
MFCC
Text
ASR Transcripts
Regions
Object (motion, Camera registration)
Background (5 regions / shot)
References: Lin (ICME 2003)
9
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Video Concept Detection Pipeline: Low-level Feature Modeling
Annotation and
Data
Preparation
Feature Extraction
Low-level
Feature-based
Models
Region
Extraction
Annotation
CH
SD/A
CLG
V1
CT
V2
WT
TAM
Feature
Extraction
Videos
MI
V2
MV
AUD
: training only
: training and testing
10
V1
BOU
47 Other Concepts
EH
17 TREC Benchmark Concepts
CC
Best Uni-model
Ensemble Run
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Low-level Feature-based Concept Models
Statistical Learning for Concept Building: SVM
Features
Features
f1
f2
:
fM
Training Set
SVM
Validation Set
Validation Set
Grid
Search
fM+1
fM+2
:
fK
m1
m2
:
mP
mP+1
mP+2
:
Fusion
(normalization
&
aggregation)
model
q SVM models used for 2 sets of visual features
§
§
Combined Color correlogram, edge histogram, cooccurrence features and moment invariants
Color histogram, motion, Tamura texture features
q For each concept
§
§
q
q
q
q
q
11
Built multiple models for each feature set by varying kernels and parameters.
Upto 27 models for each concept built for each feature type
A total of 64 concepts from the TREC 2003 lexicon covered through SVM-based models
Validation Set is used to then search for the best model parameters and feature set.
Identical Approach as in IBM System for TREC 2002
Fusion Validation Set II MAP: 0.22
References: IBM TREC 2002, Naphade et al (ICME 2003, ICIP 2003)
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Low-level Feature-based Concept Models:
Statistical Learning for Concept Building based on ASR Transcripts
TRAINING:
Manually examine
examples to find
frequently co-occurring
relevant words
… some weather news overseas
WEATHER NEWS
QUERY WORD SET:
weather news low
pressure storm cloudy
mild windy … (etc) …
OKAPI SYSTEM FOR
SEARCH TEXT
ASR TRANSCRIPTS
… update on low pressure
storm
Ranked Shots
Fusion Validation II MAP = 0.19
References: Nock et al (SIGIR 2003)
12
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Video Concept Detection Pipeline: Fusion I
Annotation and
Data
Preparation
Feature Extraction
Low-level
Feature-based
Models
Region
Extraction
Annotation
CH
SD/A
CLG
V1
CT
V2
WT
TAM
Feature
Extraction
Videos
MI
V2
MV
AUD
: training only
: training and testing
13
V1
BOU
47 Other Concepts
EH
17 TREC Benchmark Concepts
CC
Best Uni-model
Ensemble Run
The IBM TREC-2003 Concept Detection Framework
Fusing
Models of
each concept
across lowlevel
featurebased
techniques
VW
EF
MLP
EF2
BOF
Best Multi-modal
Ensemble Run
© 2003 IBM Corporation
IBM Research
Multi-Modality/ Multi-Concept Fusion Methods
Ensemble Fusion:
• Normalization: rank, Gaussian, linear.
• Combination: average, product, min, max
• Works well for uni-modal concepts with few training examples
• Computationally low-cost method of combining multiple classifiers.
• Fusion Validation Set II MAP: 0.254
• SearchTest MAP: 0.26
• References: Tseng et al (ICME 2003, ICIP 2003)
14
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Multi-Modality/ Multi-Concept Fusion Methods:
Validity Weighting
Validity Weighting:
• Work in the high-level feature space generated by classifier confidences for all
concepts
• Basic idea is to give more importance to reliable classifiers.
• Revise distance metric to include a measure of the goodness of the classifier.
• Many fitness or goodness measures
• Average Precision
• 10-point AP
• Equal Error rate
• Number of Training Samples in Training Set.
• Computationally efficient and low-cost option of merit/performance-based
combining multiple classifiers based on
• Improves robustness due to enhanced reliability on high-performance
classifiers.
• Fusion Validation Set II MAP: 0.255
• References: Smith et al (ICME 2003, ICIP 2003)
15
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Video Concept Detection Pipeline: Semantic-Feature based Models
Annotation and
Data
Preparation
Feature Extraction
Low-level
Feature-based
Models
Region
Extraction
Annotation
CH
SD/A
CLG
V1
CT
V2
WT
TAM
Feature
Extraction
Videos
MI
V2
MV
AUD
: training only
: training and testing
16
V1
BOU
47 Other Concepts
EH
17 TREC Benchmark Concepts
CC
Best Uni-model
Ensemble Run
The IBM TREC-2003 Concept Detection Framework
High-level (semantic)
Fusing
Context based
Models of
Methods
each concept
across lowlevel
featureMN
based
techniques
DMF17
VW
MLP
EF
MLP
ONT
EF2
DMF64
BOF
Best Multi-modal
Ensemble Run
BOBO
Best-of-the-Best
Ensemble Run
© 2003 IBM Corporation
IBM Research
Semantic Feature Based Models
Incorporating Context
q Multinet: A probabilistic graphical context modeling framework that uses
loopy probability propagation in undirected graphs. Learns conceptual
relationships automatically and uses this learnt relationships to modify
detection (e.g. Uses Outdoor Detection to influence Non-Studio Setting in the
right proportion)
q Discriminant Model Fusion using SVMs: Uses a training set of semantic
feature vectors with ground truth to learn dependence of model outputs
across concepts.
q Discriminant Model Fusion AND Regression using Neural Networks and
Boosting: Uses a training set of semantic feature vectors with ground truth to
learn dependence of model outputs across concepts. Boosting helps
especially with rare concepts.
q Ontology-based processing: Use of the manually constructed annotation
hierarchy (or ontology) to modify detection of root nodes based on robust
detection of parent nodes. i.e. Use “Outdoor” detection to influence detection
17
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Semantic Context Learning and Exploitation: Multinet
qProblem:
Sky
Building each concept model independently
fails to utilize spatial, temporal and conceptual
context and is sub-optimal use of available
information.
qApproach: Multinet:
Network of Concept Models represented as a
graph with undirected edges. Use of
probabilistic graphical models to encode and
enforce context.
+
18
+
+
Landscape
-
+
Person
+
conceptual
Urban Setting
+
Face
qResult:
§ Factor-graph multinet with Markov chain
temporal models improve mean average
precision by more than 27% over best IBM
Run for TREC 2002 and 36 % in
conjunction with SVM-DMF,
§Highest MAP for TREC’03
§ Low training cost
§ No extra training data needed
§ High inference cost
§ Fusion Validation Set II MAP: 0.268
§ SearchTest MAP: 0.263
§ References: Naphade et al (CIVR 2003,
TCSVT 2002)
+
Indoors
+
Outdoors
+
+
Greenery
Multimedia
Features
+
Tree
+
People
+
Road
Transportation
+
Factor Graph Loopy Propagation Implementation CIVR’ 03
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Multi-Modality/ Multi-Concept Fusion Methods: DMF
using SVM
Using SVM/NN to re-classify the output
results of Classifier 1-N.
• No normalization required .
• Use of Validation Set for training and
Fusion Validation Set 1 for optimization
and parameter selection.
• Training Cost low when number of
classifiers being fused is small (i.e. few
tens?)
• Classification cost low
•Used for fusing together multiple
concepts in the semantic feature-space
methods.
• Fusion Validation Set II MAP: 0.273
• SearchTest MAP: 0.247
• References: Iyengar et al (ICME 2002,
ACM ‘03)
19
The IBM TREC-2003 Concept Detection Framework
Concept Model X
Concept X
Annotation
Ground-Truth
Model vector space
| | | | | | | | |
M1
M2
M3
“model vector”
M4
M5
M6
People
© 2003 IBM Corporation
IBM Research
Multi-Concept Fusion: Semantic Space Modeling Through
Regression
Animal
q
q
0.17
Tree
0.01
Transportation
-0.29
Sky
0.0
Road
0.48
Person
-0.25
People
0.34
Outdoors
-0.1
Landscape
-0.02
Indoors
Greenery
0.07
Problem: Given a (small) set of related concept exemplars, learn concept representation
Approach: Learn and exploit semantic correlations and class co-dependencies
§
§
§
§
§
§
§
20
-0.27
Face
Building
-0.19
Build (robust) classifiers for set of basis concepts (e.g., SVM models)
Model (rare) concepts in terms of known (frequent) concepts, or anchors
• Represent images as semantic model vectors, or vectors of confidences w.r.t. known models
• Model new concepts as sub-space in semantic model vector space
Learn weights of separating hyper-plane through regression:
• Optimal linear regression (through Least Squares fit)
• Non-linear MLP regression (through Multi-Layer Perceptron neural networks)
Can be used to boost performance of basis models or for building additional models
Fusion Validation Set II MAP: 0.274
SearchTest MAP: 0.252
References: Natsev et al (ICIP 2003)
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Multi-Concept Fusion:
Ontology-based Boosting
q Basic Idea
§
§
§
Concept hierarchy is created manually based on semantics ontology
Classifiers influence each other in this ontology structure
Try best to utilize information from reliable classifiers
q Influence Within Ontology Structure
§
§
§
§
§
Boosting factor : Boosting children precision from more reliable ancestors (Shrinkage
theory: Parameter estimates in data-sparse children toward the estimates of the datarich ancestors in ways that are provably optimal under appropriate condition)
Confusion factor: The probability of misclassifying Cj into Ci , and Cj and Ci cannot
coexist
Fusion Validation Set II MAP: 0.266
SearchTest MAP: 0.261
References: Wu et al (ICME 2004 - submitted)
Ontology Learning
Outdoors
Boosting factor
Confusion factor
Boosting factor
Confusion factor
Natural-vegetation
Boosting factor
21
Tree
Indoors
Boosting factor
Boosting factor
Confusion factor
Natural-non-vegetation
Boosting factor
GreeneryThe IBM TREC-2003
SkyConcept Detection
CloudFramework
Smoke
Studio-setting
Non-Studio-setting
Boosting factor
2003 IBM Corporation
House-setting ©Meeting-setting
IBM Research
Video Concept Detection Pipeline: Post-Filtering
Annotation and
Data
Preparation
Feature Extraction
Low-level
Feature-based
Models
SD/A
CLG
V1
CT
V2
WT
TAM
Feature
Extraction
Videos
MI
V2
MV
AUD
: training only
: training and testing
22
V1
BOU
47 Other Concepts
EH
Best Uni-model
Ensemble Run
The IBM TREC-2003 Concept Detection Framework
VW
Filtering
Region
Extraction
Annotation
CH
17 TREC Benchmark Concepts
CC
High-level (semantic) PostFusing
Context based processing
Models of
Methods
each concept
across lowlevel
featureMN
based
techniques
DMF17
MLP
EF
MLP
ONT
EF2
DMF64
BOF
Best Multi-modal
Ensemble Run
BOBO
Best-of-the-Best
Ensemble Run
© 2003 IBM Corporation
IBM Research
Post Filtering - News/Commercial Detector
Keyframes of a test video
CNN template:
Binary decision:
news/non-news
Match filter
templates
Median
Filters
News
detection
result
q Match Filter:
For each template:
S = δ ( S C > τ C′ ) & δ ( S E > τ ′E )
ABC templates:
§
where C:Color: E: Edge, and
1
SC = ∑ δ (d ( PC , PMC ) > τ C )
N n
1
S E = ∑ δ (d ( PE , PME ) > τ E )
N n
- Thresholds: τ C ,τ E ,τ C′ ,τ ′E
were decided from two training
videos. All templates use the
same thresholds. Templates were
Performance: Misclassification (Miss + False Alarm) in the
arbitrarily chosen from 3 training
Validation Set :
videos.
§
CNN: 8 out of 1790 shots (accuracy = 99.6%)
§
ABC: 60 out of 2111 shots (accuracy=97.2%)
§ Our definition of news: news program shots (non-commercial, non-miscellaneous shots)
23
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
P@100 vs. Number of examples
120
P@100 (%)
100
Sport Event
Nature
NS-Face
Car
Weather
80
Aircraft
60
People
Non-studio
Outdoors
Female Speech
Building
Road
Animal
40
Zoom-in
Albright
20
Physical Violence
NS-Monologue
0
1
10
100
1000
10000
Number of Examples in Training Set (log scale)
Performance is roughly log linear in terms of number of examples
Yet there are deviations
èCan Log-linear be considered the default to evaluate concept complexity?
24
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
TRECVID 2003 – Average Precision Values
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Best IBM
M
ea
n
Sp
or
tin
g
W
ea
the
Ph
ys Zo r
ica o
l_ m
M
ald Vio _In
le
eli
ne nce
_A
lbr
igh
t
Fe
m
ale An
_S ima
pe l
C
ec
ar
h
/T
ru
ch
/B
us
NS
A
_M ircr
N
on
a
-S ono ft
tud log
io_ ue
Se
ttin
g
R
Ve oa
ge d
ta
tio
n
O
ut
do
or
s
N
S_
Fa
ce
Pe
op
le
Bu
ild
ing
Best NonIBM
q IBM has the best Average Precision at 14 out of the 17 concepts
q The best Mean Average Precision of IBM system (0.263) is 34 percent better than the second
best
q Pooling skews some AP numbers for high-frequency concepts so it makes judgement difficult
but can be considered a loose lower bound on performance.
q Bug in Female_Speech model affected second level fusion of Female_Speech,
News_Subject_Monologue, Madeleine_Albright among others. This was especially hurting
the model-vector-based techniques (DMF, NN, Multinet, Ontology)
25
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
TRECVID 2003 -- Precision at Top 100 Returns
100
90
80
70
60
50
40
30
20
10
0
Best IBM
M
ea
n
Fe
m
ale An
_S im
pe al
C
ec
ar
h
/T
ru
ck
/B
us
N
S_ Ai
N
on Mo rcra
ft
n
-S
tu olo
dio gu
_S e
et
tin
Sp
g
or
ts_
Ev
en
t
W
ea
the
Ph
r
ys Z
ica oo
l_ m
M
ald Vio _In
len
eli
ne
c
_A e
lbr
igh
t
R
Ve oa
ge d
ta
tio
n
O
ut
do
or
s
N
S_
Fa
ce
Pe
op
le
Bu
ild
ing
Best NonIBM
qIBM has the highest Precision @ 100 in 13 out of the 17
concepts
qMean Precision @ 100 of Best IBM System 0.6671
qThe best Mean Precision of IBM system is 28 percent better
than the other systems.
qDifferent Model-vector based fusion techniques improve
performance for different classes of concepts
26
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Precision of 10 IBM Runs Submitted
OutdoorsNSFace People Building Road Vege. Animal F_Speech
Vehicle AircraftMonol. NonStudio Sports Weather Zoom_In Violence Albright
BOU
81
80
90
53
46
96
10
46
68
38
24
97
81
79
44
33
32
EF
67
77
95
60
33
97
47
69
80
63
25
96
99
98
44
28
28
BOF
71
77
97
71
52
93
47
69
80
47
25
96
98
100
44
35
32
DMF17
82
93
90
54
49
97
45
35
76
70
1
99
98
99
44
9
28
DMF64
82
73
79
53
41
96
33
79
56
67
0
93
98
99
44
34
4
MLP_BOR
78
75
97
61
53
94
47
38
70
65
1
95
100
97
44
27
30
MLP_EFC
73
67
97
41
33
96
48
19
49
60
3
97
99
99
44
27
27
MN
85
55
99
52
45
97
47
66
81
63
25
96
99
98
44
22
28
ONT
67
77
95
56
42
97
47
69
83
69
6
94
99
98
44
28
28
BOBO
85
73
99
56
52
93
10
66
56
63
0
97
98
99
44
22
32
Maximum:
85
93
99
71
53
97
48
79
83
70
25
99
100
100
44
35
32
Average:
76.857 73.857 93.429 55.429
45 95.71 44.857 53.571 70.714
63 8.714 95.71429 98.71 98.5714
44
26 25.286
Mean
58.706
65.059
66.706
62.882
60.647
63.059
57.588
64.824
64.647
61.471
66.706
62.908
q Processing beyond single classifier per concept improves performance
q If we divide TREC Benchmark concepts into 3 types based on frequency of
occurrence
§
§
§
Performance of Highly Frequent (>80/100) concepts is further enhanced by Multinet (e.g.
Outdoors, Nature_Vegetation, People etc.)
Performance of Moderately Frequent concepts (>50 & < 80) is usually improved by
discriminant reclassification techniques such as SVMs (DMF17/64) or NN (MLP_BOR,
MLP_EFC)
Performance of very rare concepts needs to be boosted through better feature extraction
and processing in the initial stages.
q Based on Fusion Validation Set 2 evaluation, visual models outperform audio/ASR
models for 9 concepts while the reverse is true for 6 concepts.
q Semantic-feature based techniques improve MAP by 20 % over visual-models alone.
q Fusion of multiple modalities (audio, visual) improves MAP by 20 % over best
unimodal (visual) run (using Fusion Validation Set II for comparison)
27
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Observations and Future Directions
qGeneric Trainable Methods for Concept Detection
demonstrate impressive performance.
qNeed to increase Vocabulary of Concepts Modeled
qNeed to improve Modeling of Rare Concepts
qNeed Multimodality at an earlier level of analysis (e.g.
multimodal model of Monologue (TREC’02) better than
fusion of multiple unimodal classifiers (TREC’03)
qMulti-classifier, Multi-concept and Multi-modal fusion
offer promising improvement in detection (as measured
on TREC’02 and TREC’03 Fusion Validation Set 2 and
in part also by TREC SearchTest 03)
28
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Acknowledgements
q Thanks for additional contributions from:
§ Chitra Dorai (IBM) for Zoom-In Detector,
§ Javier Ruiz-del-Solar (Univ. of Chile) for Face Detector,
§ Ishan Sachedv (summer intern – MIT) for helping with Visual
uni-models,
§ For collaborative annotation:
• IBM -- Ying Li, Christrian Lang, Ishan Sachedv, Larry
Sansone, Matthew Hill,
• Columbia U. -- Winston Hsu
• Univ. of Chile – Alex Jaimes, Dinko Yaksic, Rodrigo
Verschae
29
The IBM TREC-2003 Concept Detection Framework
© 2003 IBM Corporation
IBM Research
Concept Detection
Example: Cars
q “Car/truck/bus: segment contains at
least one automobile, truck, or bus
exterior”
BOF
68
1
4
32
36
q Concept was trained on the
annotated training set.
q Results are shown on the test set
Run
Precision
@100
Best IBM
30
0.83
The IBM TREC-2003 Concept Detection Framework
100
© 2003 IBM Corporation
IBM Research
Concept Detection
Example: Ms. Albright
q “Person X: segment contains video of
person x (x = Madeleine Albright).”
1
4
q Contributions of the Audio-based
Models and Visual-based Models
-- Results at the CF2 (validation set)
Run
Average Precision
Best IBM Audio Models
0.30
Best IBM Visual Models
0.29
Best of Fusion
0.47
q Results are shown on the test set
TREC Evaluation by NIST
Run
Best IBM
Precision
0.32
21
31
The IBM TREC-2003 Concept Detection Framework
24
© 2003 IBM Corporation