BONy: a knowledge centric collaborative learning platform

Proceedings of the first
Workshop on Supporting eLearning with
Language Resources and Semantic Data
May 22nd, 2010
Valletta, Malta
In conjunction with LREC 2010
Language resources are of crucial importance not only for research and development in language
and speech technology but also for eLearning applications. In addition, the increasingly availability
of semantically interpreted data in the WEB 3.0 is creating a huge impact in semantic technology.
Social media applications such as Delicious, Flickr, YouTube, and Facebook, provide us with data
in the form of tags and interactions among users. We believe that the exploitation of semantic data
(emerging both from the Semantic Web and from social media) and language resources will drive
the next generation eLearning platforms. The integration of these technologies within eLearning
applications should also facilitate access to learning material in developing economies.
The workshop aims at bringing together computational linguists, language resources developers,
knowledge engineers, social media researchers and researchers involved in technology-enhanced
learning as well as developers of eLearning material, ePublishers and eLearning practitioners. It
will provide a forum for interaction among members of different research communities, and a
means for attendees to increase their knowledge and understanding of the potential of language
resources in eLearning. We will especially target eLearning practitioners in the Mediterranean
Partner Countries.
The proceedings of the workshop contain 10 papers discussing the integration of language
resources, natural language processing techniques, ontologies and social media in eLearning. The
organizers hope that the selection of papers presented here will be of interest to a broad audience,
and will be a starting point for further discussion and cooperation.
Paola Monachesi, Alfio Massimiliano Gliozzo and Eline Westerhout
The Workshop Programme
Session on Language Resources, NLP and eLearning
Language resources and CALL applications
Helmer Strik, Jozef Colpaert, Joost van Doremalen and Catia Cucchiarini
Challenges for Discontiguous Phrase Extraction
Dale Gerdemann and Gaston Burek
Towards Resolving Morphological Ambiguity in Arabic Intelligent Language
Tutoring Framework
Khaled Shalaan, Doaa Samy and Marwa Magdi
Language Resources and Visual Communication in a Deaf-Centered Multimodal ELearning Environment: Issues to be Addressed
Elena Antinoro Pizzuto, Claudia S. Bianchini, Daniele Capuano, Gabriele Gianfreda and
Paolo Rossini
Deaf People Education: crossing linguistic borders through e-learning
Giuseppe Nuccetelli and Maria Tagarelli De Monte
16.00- 16.30
Session on ontologies, social media and learning
BONy: a knowledge centric collaborative learning platform
Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede and Aldo Gangemi
Social E-SPACES; socio-collaborative spaces within the virtual world ecosystem
Vanessa Camilleri and Matthew Montebello
A Semantic Knowledge Base for Personal Learning and Cloud Learning
Alexander Mikroyannidis, Paul Lefrere and Peter Scott
Semantic Annotation for Semi-Automatic Positioning of the Learner
Petya Osenova and Kiril Simov
Facilitating cross-language retrieval and machine translation by multilingual domain
Petr Knoth, Trevor Collins, Elsa Sklavounou and Zdenek Zdrahal
Wrap up, discussion, plans for common projects
Workshop Organisers
Paola Monachesi
University of Malta, Malta
and Utrecht University, The Netherlands
[email protected]
Alfio Massimiliano Gliozzo
[email protected]
Eline Westerhout
Utrecht University, The Netherlands
[email protected]
Workshop Programme Committee
Claudio Baldassarre (Open University)
Roberto Basili (University of Rome Tor Vergata)
Eva Blomqvist (ISTC–CNR)
Antonio Branco (University of Lisbon)
Dan Cristea (University of Iaşi)
Ernesto William De Luca (TU Berlin)
Philippe Dessus (University Pierre-Mendès-France, Grenoble)
Claudio Giuliano (FBK-irst)
Wolfgang Greller (Open University of the Netherlands)
Alessio Gugliotta (Innova spa)
Jamil Itmazi (Palestine Ahliya University)
Susanne Jekat (Zürich Winterthur Hochschule)
Vladislav Kubon (Charles University Prague)
Lothar Lemnitzer (Berlin-Brandenburgische Akademie der Wissenschaften)
Stefanie Lindstaedt (Know-Center)
Angelo Marco Luccini (INSEAD)
Manuele Manente (JOGroup)
Dunja Mladenic (J. Stefan Institute)
Mattew Montebello (University of Malta)
Jad Najjar (WU Vienna)
Valentina Presutti (ISTC–CNR)
Adam Przepiorkowski (Polish Academy of Sciences)
Mike Rosner (University of Malta)
Doaa Samy (Cairo University)
Khaled Shaalan (Cairo University)
Kiril Simov (Bulgarian Academy of Sciences)
Stefan Trausan-Matu (University of Bucarest)
Cristina Vertan (University of Hamburg)
Fridolin Wild (Open University)
Table of Contents
Programme Committee
Table of Contents
Author Index
Language resources and CALL applications
Helmer Strik, Jozef Colpaert, Joost van Doremalen and Catia Cucchiarini
Challenges for Discontiguous Phrase Extraction
Dale Gerdemann and Gaston Burek
Towards Resolving Morphological Ambiguity in Arabic Intelligent Language
Tutoring Framework
Khaled Shalaan, Doaa Samy and Marwa Magdi
Language Resources and Visual Communication in a Deaf-Centered Multimodal
E-Learning Environment: Issues to be Addressed
Elena Antinoro Pizzuto, Claudia S. Bianchini, Daniele Capuano, Gabriele Gianfreda
and Paolo Rossini
Deaf People Education: crossing linguistic borders through e-learning
Giuseppe Nuccetelli and Maria Tagarelli De Monte
BONy: a knowledge centric collaborative learning platform
Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede and Aldo Gangemi
Social E-SPACES; socio-collaborative spaces within the virtual world ecosystem
Vanessa Camilleri and Matthew Montebello
A Semantic Knowledge Base for Personal Learning and Cloud Learning
Alexander Mikroyannidis, Paul Lefrere and Peter Scott
Semantic Annotation for Semi-Automatic Positioning of the Learner
Petya Osenova and Kiril Simov
Facilitating cross-language retrieval and machine translation by multilingual
domain ontologies
Petr Knoth, Trevor Collins, Elsa Sklavounou and Zdenek Zdrahal
Author Index
Antinoro Pizzuto, E.
Bianchini, C.S.
Bonafede, C.E.
Burek, G.
Camilleri, V.
Capuano, D.
Collins, T.
Colpaert, J.
Cucchiarini, C.
van Doremalen, J.
Gangemi, A.
Gerdemann, D.
Gianfreda, G.
Gliozzo, A. M.
Knoth, P.
Lefrere, P.
Magdi, M.
Mikroyannidis, A.
Montebello, M.
Nuccetelli, G.
Osenova, P.
Rossini, P.
Samy, D.
Scott, P.
Shalaan, K.
Simov, K.
Sklavounou, E.
Strik, H.
Tagarelli De Monte, M.
Zdrahal, Z.
Language resources and CALL applications: speech data and speech technology
in the DISCO project
Helmer Strik a, Jozef Colpaert b, Joost van Doremalen a, Catia Cucchiarini a
CLST, Department of Linguistics, Radboud University, Nijmegen, The Netherlands
Linguapolis, Institute for Education and Information Sciences, University of Antwerp, Antwerp, Belgium
E-mail: H.Strik | J.vanDoremalen | [email protected]; [email protected]
The current paper deals with the relation between language resources and Computer Assisted Language Learning (CALL) systems:
language resources are essential in the development of CALL applications, during the development of the system resources are created,
and finally the CALL system itself can be used to generate additional resources that are useful for research and development of new
(CALL) systems.
We focus on the system developed in the project DISCO (Development and Integration of Speech technology into COurseware for
language learning): we describe the language resources employed for developing the DISCO system and present the DISCO system
paying attention to the design, the automatic speech recognition modules, and the resources produced within the project. Finally, we
discuss how additional language resources can be generated through the DISCO system.
have the means to finance the development of such
technology. For these and other reasons, in the
Netherlands and Flanders a programme was started,
called STEVIN (a Dutch acronym that stands for
Essential Language Resources in Dutch), which is funded
by the Flemish and Dutch governments and aims at
stimulating the development of basic language and speech
technology for the Dutch language.
In the last few years the interest in applying Automatic
Speech Recognition (ASR) technology to second
language (L2) learning has been growing considerably
(Eskenazi, 2009). The addition of ASR technology to
Computer Assisted Language Learning (CALL) systems
makes it possible to assess oral skills in a second language
and to provide corrective feedback automatically. The
latter feature appears particularly appealing, since
research has shown that usage-based acquisition in the L2
is not as successful as in the L1 (Ellis and Larsen-Freeman,
2006: 571), that L2 learners have difficulty identifying
their own errors (Dlaska and Krekeler, 2008), and that
they indeed need guidance to improve their language
skills (Ellis and Bogart, 2007). Since providing practice
and feedback for speaking proficiency is particularly
time-consuming, the necessary amount of practice is
almost never achieved in traditional teacher-fronted
lessons. Against this background, ASR-based CALL
systems would seem to make for an interesting
supplement to traditional L2 classes.
Within the framework of the STEVIN programme a
project called DISCO (Development and Integration of
Speech technology into COurseware for language
was started that aims at developing a prototype of an
ASR-based CALL system for practicing oral skills in
Dutch L2. The system addresses different aspects of
speaking proficiency (syntax, morphology and
phonology), detects errors in speaking performance,
points them out to the learners and gives them the
opportunity to try again until they manage to produce the
correct form.
One of the interesting things about this project is that
since it is carried within the STEVIN programme, the
technology that is developed for the present project will
be publicly made available to interested users (researchers,
HLT companies and publishers) through the Dutch HLT
However, developing ASR-based CALL systems that can
provide accurate and useful feedback on oral proficiency
is not trivial, because the speech of L2 learners poses
special difficulties to ASR technology (Compernolle
2001; Benzeghiba et al. 2007; Doremalen et al. 2009a;
Doremalen et al. 2009b). In addition, existing systems in
general fail to provide corrective feedback that is detailed
enough and accurate, especially on L2 pronunciation
which is considered a particularly challenging skill, both
for L2 learners (Flege, 1995) and CALL systems (Menzel
et al. 2000: 54; Morton and Jack, 2005).
In the current paper we discuss the relation between
language resources and CALL systems: language
resources are essential in the development of CALL
applications, during R&D resources are created, and
finally the CALL system itself can be used to generate
additional resources that are useful for research and
development of new (CALL) systems.
Another problem that has hampered the realization of
ASR-based CALL systems, especially for the smaller
languages, is that although companies, esp. publishers,
are willing to use the technology, many companies do not
In section 2 we describe which language resources were
employed in the DISCO project. In section 3 we present
the DISCO system paying attention to the design, the
automatic speech recognition modules, some preliminary
results and the resources produced within the project. In
section 4 we discuss how additional language resources
can be generated through the DISCO system.
learning population. In the Netherlands, on the other hand,
a miscellaneous group of L2 learners with various mother
tongues was selected because this more realistically
reflects the situation in Dutch L2 classes.
Since an important aim in collecting non-native speech
material was that of developing language learning
applications for education in Dutch L2, various experts
were consulted to determine for which proficiency level
such applications are most needed. It turned out that for
the lowest levels of the Common European Framework
(CEF), namely A1, A2 or B1, there is relatively little
material and that ASR-based applications would be very
welcome. For this reason, speech from adult Dutch L2
learners at these lower proficiency levels was recorded.
CALL applications and the need for
language resources
An important requirement for developing ASR-based
CALL applications is the availability of language
resources such as language and speech corpora and
speech technology toolkits.
In order to develop technology that is able to identify
errors in oral proficiency we need to know which errors
are made by L2 learners in the first place. Part of this
information can be found in the literature, but, in general,
the information provided in the literature is not complete
and not sufficiently quantified to be suitable for
developing CALL applications.
The speech collected in the JASMIN corpus was recorded
in two different modalities: about 50% of the material
consists of read speech material while the other 50% is
made up of extemporaneous speech produced in
human-machine dialogues. The JASMIN dialogues were
collected through a Wizard-of-Oz-based platform and
were designed such that the wizard was in control of the
dialogue and could intervene when necessary. In addition,
recognition errors were simulated and difficult questions
were asked to elicit some typical phenomena of
human-machine interaction that are known to be
problematic in the development of spoken dialogue
systems, such as hyperarticulation, restarts, filled pauses,
self talk and repetitions.
In our previous research on developing a computer
assisted pronunciation training (CAPT) for Dutch,
Dutch-CAPT (Cucchiarini et al., 2009), we needed to
draw up an inventory of pronunciation errors. We
discovered that the information on L2 errors provided in
the literature was mostly based on observational studies,
was often incomplete, and not quantitative in nature. For
this reason we had no other choice than conducting L2
error studies ourselves (Neri et al., 2006). However, since
a speech corpus of non-native Dutch was not available at
the time, we had to resort to the auditory analysis of Dutch
L2 speech recordings that had been collected in the
framework of previous projects (Neri et al., 2006).
The speech recordings were annotated at different levels.
For the DISCO project, the verbatim transcription and the
automatically generated phonemic transcription are
particularly relevant.
For all the reasons mentioned above the JASMIN speech
material turned out to be extremely useful and appropriate
for the development of the DISCO system.
For the DISCO project we had the opportunity of using
the results of another STEVIN project that had been
completed in the meantime, the JASMIN corpus
(Cucchiarini et al., 2008).
Both read and extemporaneous speech were analyzed to
study which errors are made at the level of pronunciation,
morphology and syntax. For this purpose the annotations
contained in JASMIN were supplemented with extra
annotations of the morphological and syntactical errors
made by the speakers. The automatically generated
phonemic transcriptions were manually verified by
trained students and where necessary improved.
Subsequently they were used to study which
pronunciation errors are made by L2 learners of Dutch
with different mother tongues.
The JASMIN speech corpus
The JASMIN corpus is an extension of the large Spoken
Dutch Corpus (CGN; Oostdijk, 2002). JASMIN contains
speech by children of different age groups, elderly people
and non-natives with different mother tongues. The
JASMIN corpus was collected in the Netherlands and
Flanders and is specifically aimed at facilitating the
development of speech-based applications for children,
non-natives and elderly people. In the case of non-native
speakers the applications envisaged were especially
language learning applications because there is
considerable demand for CALL products that can help
making Dutch L2 teaching more efficient.
The human-machine dialogues were used for conducting
experiments for the DISCO system because they closely
resemble the situation we will encounter in this CALL
In selecting the non-native speakers for this corpus,
mother tongue constituted an important variable. For the
Flemish part, Francophone speakers were selected
because they form a significant proportion of the Dutch
enhances motivation, self-esteem and empathy”
(Hubbard, 2002),
2. it casts language in a social context, and
3. its notion implies a form of planning,
scenario-writing and fixed roles, which is consistent
with the limitations we set for the role of speech
technology in DISCO.
To summarize, this framework allows us to create a rich
and communicative CALL application that stimulates
Dutch L2 learners to produce speech and experience the
social context. On the other hand, these choices are
appropriate from a technological perspective, since they
make it possible to successfully deploy speech technology
while taking into account its limitations (Strik et al.,
The SPRAAK speech recognizer
The speech recognizer adopted in the DISCO project is
SPRAAK (Demuynck et al., 2008), a hidden Markov
model (HMM)-based ASR package developed for over 15
years by ESAT at the University of Leuven and later
enriched with knowledge and code from other partners
through the STEVIN project SPRAAK. The availability
of a speech recognition system for Dutch was considered
to be an important requirement by the whole language and
speech technology (LST) community in the Netherlands
and Flanders. For this reason a project was started within
the STEVIN programme for this specific purpose: the
SPRAAK project. The aim of SPRAAK was twofold: a)
developing a highly modular toolkit for research into
speech recognition algorithms and b) providing a
state-of-the art recogniser for Dutch with a simple
interface that could be used by non-specialists. SPRAAK
is distributed as open source for academic usage and at
moderate cost for commercial exploitation (for further
details, see
To gain more insight into appropriate feedback strategies,
pedagogical goals, and personal goals a number of
preparatory studies were carried out, such as exploratory
in-depth interviews with Dutch L2 teachers and experts,
focus group discussions to elicit the personal goals of
learners, and pilot studies through partial systems with
limited functionality (e.g. no speech technology). The
functions of the system that were not implemented (play
prompts, give feedback, etc.) were simulated. The results
of these preparatory studies were taken into account in
finalizing the design of the DISCO system.
The DISCO system
3.1 Design of the DISCO system
Within the STEVIN programme a project called DISCO
was started on 01-02-2008, in which a CALL system will
be developed. The target user group for the DISCO
system are immigrants who want to learn Dutch as L2 to
be able to work in the Netherlands or Flanders.
The learning process starts with a relatively free
conversation simulation, taking well into account what is
(not) possible with speech technology: learners are given
the opportunity to choose from a number of prompts at
every turn (branching, decision tree). Based on the errors
they make in this conversation they will be offered
remedial exercises, which are very specific exercises with
little freedom.
The model adopted for designing the system is
Distributed Language Learning (DLL), a methodological
competency-oriented and effective language education
(Colpaert, 2004). Its starting point is the design of a
language learning environment for a specific language
learning situation. The design is based on a thorough
analysis of all factors and actors in the language learning
situation, and on the identification of aspects amenable to
change or improvement. The main phases of the design
are goal-oriented conceptualization and ontological
specification. Goal-oriented conceptualization stands for
the formulation of a solution based on the realization of
‘practical goals’ as a hypothetical compromise between
(often conflicting) personal and pedagogical goals, both
for teachers and learners. Ontological specification is a
detailed description of the architecture of the language
learning environment, defined as the network of
interactions between learner, co-learner, teacher, content,
native, etc. inside or outside the learning place.
In DISCO, we limit our general design space to closed
response conversation simulation courseware and
interactive participatory drama (IPD), a genre in which
learners play an active role in a pre-programmed scenario
by interacting with computerized characters or “agents”.
The use of drama is beneficial for various reasons:
1. it “reduces inhibition, increases spontaneity, and
Feedback depends on individual learning preferences: the
default feedback strategy is immediate corrective
feedback, which is visually implemented through
highlighting, and from an interaction perspective by
putting the conversation on hold and focusing on the
mistakes. Learners that wish to have more conversational
freedom can choose to receive communicative recasts as
feedback, which let the conversation go on while
highlighting mistakes for a short period of time.
The final system will have several parameters that can be
changed by the learner or teacher. During development
and implementation, we will try to have these parameters
behave intelligently (based on error analysis and learner
behavior), so that the system can adapt itself to the learner.
For future research these parameters offer the possibility
of studying different modes of behavior of the CALL
system and their effect on language learners.
3.2 The speech recognition modules
First, we provide some technical details about our system.
As mentioned above, the human-machine dialogues were
used for conducting experiments for the DISCO system.
The material used consisted of speech from 45 speakers
who each give answers to 39 questions about a journey.
The input speech, sampled at 16kHz, is divided into
overlapping 32ms Hamming windows with a 10ms shift
and pre-emphasis factor of 0.95. 12 Mel-frequency
cepstral coefficients (MFCCs: C1-C12) plus C0 (energy),
and their first and second order derivatives were
calculated and cepstral mean subtraction (CMS) was
applied. The constrained language models and
pronunciation lexicons are implemented as finite state
machines (FSM).
morphological constructions, the approach used for
detecting phonological errors will be used also for
detecting some of the morphological errors, for instance
those concerning regular verb forms. Irregular verbs, on
the other hand, may require an approach that is more
similar to that adopted for detecting syntactic errors. Once
the system arrives at this final stage, the system has a
detailed overview of all the errors on the different levels
and based on this overview the system can provide
feedback to the student.
3.3 The resources produced in the project
The resources mentioned above are employed to develop
the DISCO system which consists of various parts. First
of all, a blue-print of the design and the speech technology
modules for recognition (i.e. for selecting an utterance
from the predicted list, and verifying the selected
utterance) and for error detection (errors in pronunciation,
morphology, and syntax). In addition, the following
resources have been developed: an inventory of errors at
all these three levels, a prototype of the DISCO system
with content, specifications for exercises and feedback
strategies, and a list of predicted correct and incorrect
In the DISCO system feedback on speaking performance
is given on three levels: syntax, morphology and
phonology. To give feedback, errors on these levels have
to be detected automatically. In our system architecture,
this task is divided in two modules: (1) the speech
recognition module and (2) the error detection module.
The first module, speech recognition, determines the
sequence of words the student uttered. For each prompt a
list of predicted correct and (grammatically) incorrect
responses is created beforehand based on errors that are
expected on empiric grounds. This list is the basis for a
Finite State Grammar (FSG) language model, which is
used by an hidden Markov model (HMM)-based speech
recognition system. The recognition system is forced to
choose among the predicted response from the list.
The fact that DISCO is being carried out within the
STEVIN programme implies that its results, all the
resources mentioned above, will become available for
research and development through the Dutch Flemish
Human Language Technology (HLT) Agency
(TST-Centrale; This makes it
possible to reuse these resources for conducting research
and for developing specific applications for ASR-based
language learning.
To avoid false accepts, for example when an utterance is
uttered that is not in the list of predicted responses,
utterance verification (UV) is carried out. Using a
combination of acoustic and durational similarity
measures it is determined whether the response chosen by
the speech recognizer reflects what has been said. If it is
rejected the user is asked to try again; if it is accepted, the
system will proceed to error detection (Van Doremalen et
al. 2009a, b).
3.4 Evaluation
A system that gives meaningful feedback must operate in
a manner that is similar to what a competent teacher
would do. Therefore, for the final evaluation of the whole
system we intend to use a design in which different groups
of students of Dutch as a second language (DL2) at the
University of Antwerp and at the Radboud University in
Nijmegen use the system and fill in a questionnaire with
which we can measure the students’ satisfaction in
working with the system.
Teachers of DL2 will then assess all sets of system prompt,
student response and system feedback for the quality of
the feedback on the level of pronunciation, morphology
and syntax. For this purpose, recordings will be made of
students who complete the exercises developed to test the
DISCO system.
Given the evaluation design sketched above, we consider
the project successful from a scientific point of view if the
DL2 teachers agree that the system behaves in a way that
makes it as useful for the students as a teacher is, and if
the students rate the system positively on its most
important aspects.
Note that once the chosen response is accepted by the
utterance verifier we can already detect errors on the
syntactic level because the system is confident enough
that the student uttered a specific sequence of words and it
also knows what the student was supposed to say.
Detecting errors on the morphological and phonological
levels requires another, more detailed analysis of the
speech signal. The starting point of this analysis is a
segmentation of the speech signal into a sequence of
phones obtained from the speech recognition module.
Using a variety of spectral and temporal features a
confidence measure (CM) is calculated for each of these
phones. Based on this CM the system decides to mark the
hypothesized phone in the segmentation as correctly
pronounced or incorrectly pronounced (Van Doremalen et
al. 2009c).
In the way described above, phonological errors can be
detected. Since some phonemes are critical for certain
4. Generating additional language resources
available for research.
Above we described which resources we used in
developing our CALL system, and which resources
become available during development of the system. In
this section, we describe which additional resources can
be collected by using the CALL system.
5. Conclusions
In this paper we have discussed the importance of
language resources for CALL application development on
the basis of our experiences in the DISCO project in
which speech data and speech technology are employed to
develop a system for practicing oral skills in a second
language.. We have seen that language resources are
actually indispensable for developing sound CALL
applications. Once developed, such applications can also
be employed to produce new valuable language resources
which can in turn be used to develop new, improved
CALL systems.
After the CALL system has been developed, language
learners can use it to practice oral skills. The system has
been designed and developed in such a way that it is
possible to log details regarding the interactions with the
users. This logbook can contain, e.g., the following
information: what appeared on the screen, how the user
responded, how long the user waited, what was done
(speak an utterance, move the mouse and click on an item,
use the keyboard, etc.), the feedback provided by the
system, how the user reacted on this feedback (listen to
example (or not), try again, ask for additional, e.g.
meta-linguistic, feedback, etc.).
6. Acknowledgements
The DISCO project is carried out within the STEVIN
programme funded by the Dutch and Flemish
Finally, all the utterances spoken by the users can be
recorded in such a way that it is possible to know exactly
in which context the utterance was spoken, i.e. it can be
related to all the information in the logbook mentioned
above. An ASR-based CALL system, like DISCO, can
thus be used for acquiring additional non-native speech
data, for extending already existing corpora like JASMIN,
or for creating new ones. This could be done within the
framework of already ongoing research without
necessarily having to start corpus collection projects.
7. References
Benzeghiba, M., Mori, R. D., Deroo, O., Dupont, S.,
Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A.,
Ris, C., Rose, R., Tyagi, V., Wellekens, C. (2007).
Automatic speech recognition and speech variability: a
review. Speech Communication, 49, 763–786.
Colpaert, J. (2004). Design of Online Interactive
Specification and Prototyping. Research into the
impact of linguistic-didactic functionality on software
architecture. (Doctoral dissertation). University of
Antwerp, 2004.
Cucchiarini, C., Neri, A., and Strik, H. (2009). Oral
proficiency training in Dutch L2: The contribution of
Communication, 51, 853-863.
Cucchiarini, C., Driesen, J., Van hamme, H., and Sanders,
E., (2008). Recording speech of children, non-natives
and elderly people for HLT applications: the
JASMIN-CGN corpus. In Proceedings of LREC-2008.
Demuynck, K., Roelens, J., van Compernolle, D., and
Wambacq, P. (2008) SPRAAK: an open source SPeech
Recognition and Automatic Annotation Kit. In
Proceedings of ICSLP-2008, p. 495.
Dlaska, A. and Krekeler, C. (2008). Self-assessment of
pronunciation. System, 36, pp. 506-516.
Ellis, N.C., Bogart, P.S.H. (2007). Speech and Language
Technology in Education: the perspective from SLA
research and practice. In Proc. SLaTE, Farmington PA,
pp. 1-8.
Ellis, N. and Larsen-Freeman, D. (2006). Language
emergence: implications for applied. Linguistics,
Applied Linguistics, 27.4: 558–89.
Eskenazi, M. (2009). An overview of Spoken Language
Technology for Education, Speech Communication.
Such a corpus and the log-files can be useful for various
purposes: for research on language acquisition and second
language learning, studying the effect of various types of
feedback, research on various aspects of man-machine
interaction, and of course for developing new, improved
CALL systems. Such a CALL system will also make it
possible to create research conditions that were hitherto
impossible to create, thus opening up possibilities for new
lines of research.
For instance, at the moment a project is being carried out
at the Radboud University of Nijmegen, which is aimed at
studying the impact of corrective feedback on the
( Within
this project the availability of an ASR-based CALL
system makes it possible to study how corrective
feedback on oral skills is processed on-line, whether it
leads to uptake in the short term and to actual acquisition
in the long term.
This has several advantages compared to other studies
that were necessarily limited to investigating interaction
in the written modality: the learner’s oral production can
be assessed on line, corrective feedback can be provided
immediately under near-optimal conditions, all
interactions between learner and system can be logged so
that data on input, output and feedback are readily
Flege, J. (1995). Second language speech learning: theory,
findings and problems. In W. Strange (Ed.) Speech
perception and linguistic experience, Baltimore: York
Press, pp. 233-272.
Hubbard, P. (2002). Interactive Participatory Dramas for
Language Learning. Simulation and Gaming, vol. 33,
pp. 210-216.
Morton, H., Jack, M. (2005). Scenario-Based Spoken
Interaction with Virtual Agents. Computer Assisted
Language Learning, 18, 171-191.
Oostdijk, N. (2002). The design of the spoken dutch
corpus. In New Frontiers of Corpus Research, P. Peters,
P. Collins, and A. Smith, Eds. Rodopi, pp. 105–112.
H. Strik, Cornillie, F., van Doremalen, J., Cucchiarini, C.
(2009). Developing a CALL System for Practicing Oral
Proficiency: How to Match Design and Speech
Technology. In Proc. SLATE, Wroxall Abbey.
Van Compernolle, D. (2001). Recognizing speech of
goats, wolves, sheep and ... non-natives. Speech
Communiciation, 35, 71-79.
Van Doremalen, J., Cucchiarini, C., Strik, H. (2009a).
Optimizing automatic speech recognition for
low-proficient non-native speakers. Accepted for
publication in EURASIP Journal on Audio, Speech, and
Music Processing, to appear.
Van Doremalen, J., Strik, H., Cucchiarini, C. (2009b).
Utterance Verification in Language Learning
Applications. In Proc. SLATE, Wroxall Abbey.
Van Doremalen, J., Cucchiarini, C., Strik, H. (2009c).
Automatic Detection of Vowel Pronunciation Errors
Using Multiple Information Sources. Proceedings of
the biannual IEEE workshop on Automatic Speech
Recognition and Understanding (ASRU).
Challenges for Discontiguous Phrase Extraction
Dale Gerdemann, Gaston Burek
Dept. Linguistics
University of Tübingen
[email protected], [email protected]
Suggestions are made as to how phrase extraction algorithms should be adapted to handle gapped phrases. Such variable phrases are
useful for many purposes, including the characterization of learner texts. The basic problem is that there is a combinatorial explosion of
such phrases. Any reasonable program must start by putting the exponentially many phrases into equivalence classes (Yamamoto and
Church, 2001). This paper discusses the proper characterization of gappy phrases and sketches a suffix-array algorithm for discovering
these phrases.
1. Introduction
we consider this Bulgarian expression as a sequence of letters. Then the inflection on
is in the middle, whereas
the inflection on
- is on the right periphery. Both of
these instances of variation are problematic. The variation
in the middle, however, is somewhat more problematic, and
is the main focus of this paper.
Most phrase extraction programs are based on pattern
matching algorithms developed for computational molecular biology. To adapt such algorithms for natural language, with worst case examples such as the Bulgarian
phrase above will require a great deal of thought. In particular, cooperation between language researchers and computer scientists is required. Too often language researchers
use off-the-shelf software packages, and apply no particular
programming skills at all.3 Hence, the goal of the present
paper is not to present a new algorithm for gapped phrase
extraction, but rather to present some features of what such
a phrase extraction program ought to provide. Some technical literature is presented, but the intended readership of
this paper is non-technical.
Writing is an essential part of learning and evaluating written texts is an essential part of teaching. A good teacher
must attempt to understand the ideas presented in a learner
text and evaluate whether or not these ideas make sense.
Such evaluation can obviously not be performed by a computer. But on the other hand, computers are good at evaluating other aspects of texts. Computers are, for example,
very good at picking out patterns of linguistic usage, in particular terms and phrases1 that are used repeatedly. It is often the case that choice of terminology can be surprisingly
effective in characterizing texts. For example, the terms
“Latent Semantic Analysis” and “Latent Semantic Indexing” mean essentially the same thing, but the former is more
characteristic of the educational and psychological communities whereas the latter is more characteristic of the information retrieval community. In a similar vein, Biber (2009)
uses characteristic phrases to distinguish between written
and spoken English. Up to now, in the eLearning community, bag-of-words based approaches have been most popular for evaluating student essays (Landauer and Dumais,
1997). It is the contention of this paper that the next step
of considering phrases will not be possible until eLearning
practitioners immerse themselves into the somewhat technical combinatorial pattern matching literature.
This paper is concerned with extracting phrases with gaps.
This is an important topic since many phrases occur in alternative forms. For example, the English phrase one and
the same has an essentially verbatim counterpart in Bulgarian, but the Bulgarian phrase occurs in a variety of forms
depending on gender and number of the following noun.
The following forms were extracted from a few Bulgarian
. In this simple Bulgarian phrase, there are three
different alternations. First
(’one’) occurs with inflections −∅, - , - and - . Second,
- (’same’) occurs with
inflections - , - , and - . And third,
also contains the
“fleeting” or “ghost” vowel , which alternates with ∅.2 If
1.1. Algorithmic Introduction
Efficient algorithms for phrase (or n-gram) extraction were
introduced into the computational linguistics literature by
Yamamoto and Church (2001) and have subsequently been
used for a wide variety of applications such as lexicography,
phrase-based machine translation and bag-of-phrases based
text categorization (Burek and Gerdemann, 2009).4 Ultimately, the goal of such algorithms is to discover repetitive
structure as represented by frequently recurring sequences
of symbols. Unfortunately, the approach of Yamamoto and
Church often misses repetitive structure since phrases often
occur with slight variations. For example, the middle term
of a phrase might occur in different morphological variants:
guages in general (Jetchev, 1997). The vowel (IPA: /i/) is, however, idiosyncratic as a ghost vowel.
For language researchers wishing to acquire some programming skills, there is probably no better starting point than
Sedgewick and Wayne (2010 forthcoming).
Similar algorithms are also used by Dickinson and Meurers
(2005) for detecting inconsistencies in annotated corpora. This is
particularly relevant, since they are specifically interested in discontinuous (or gapped) annotations.
We use the term “phrase” to mean repeated sequence of tokens. This is quite flexible, allowing any kind of tokenizer and
phrases of any non-negative length.
Ghost vowels are a characteristic of Bulgarian and Slavic lan-
advantage of avoiding some difficult problems such as compound nouns in German and word segmentation in Chinese
Zhang and Lee (2006). In this paper, we assume that some
tokenization (and also possibly normalization) is performed
on the corpus, and that tokens are replaced by integers.
all join in vs all joined in; or the middle term may vary in
other ways: give me a vs give him a.
Recently, an algorithm for finding such paired repeats was
presented by Apostolico and Satta (2009). This algorithm
is quite efficient, as it is shown to run in linear time with
respect to the output size. Unfortunately, however, the algorithm is designed to extract “tandem repeats,” which are
defined in a way that may not be entirely appropriate for
the researcher interested in extracting gapped phrasal expressions. The goal of this paper is, then, to specify the requirements of such researchers. The hope is that this paper
will provide a challenge for algorithm designers who may
either want to adapt the Apostolico and Satta algorithm or
design a new competing algorithm.
One difference between the Yamamoto-Church algorithm
and the Apostolico-Satta algorithm is the former is based
on suffix arrays, whereas the latter is based on suffix trees.
This should, however, not be seen as a major distinction,
since recent developments with suffix arrays have tended
to blur the distinction (Abouelhoda et al., 2004; Kim et al.,
2008).5 To some extent, one may think of suffix arrays simply as a data structure for implementing suffix trees. Further implementation issues will be discussed below.
We now present a rather incomplete list of desirable features for gapped phrase extraction.
3.1. Main Parameters
By default an extracted gapped phrase αmβ should have
|α| ≥ 1, |β| ≥ 1 and m = [a1 | . . . | an ] where n ≥ 2.
These are minimal values, and may be set to larger values
to extract possibly more interesting phrases. If the length
of α or β is set to 0, then the gap will be on the periphery.
The length of α may also be seen as an efficiency consideration. The central idea of the Apostolico and Satta algorithm, for example, picks out candidate left parts first, and
then for each of these, a recursive call is made to find a
corresponding right part.7 Putting a length restriction on α
means that there are fewer candidates, and therefore fewer
recursive calls. Clearly, an alternative approach would be
to start with the right piece and recursively search for corresponding left pieces.
Some Terminology
3.2. Conditions on the Gap
A language researcher studying gapped phrases may find a
gap of length 4 interesting (from one end of the Earth to
the other) but a gap of length 7 uninteresting (Medical bills
from one puppy catching something and passing it on to
the other puppy). With character-based tokenization, however, a gap of length 6 or more may well be interesting:
and half − [believ|f orm|melt|slouch]ed.8
In addition to specifying the maximum length of the gap, it
may be desirable to be able to specify a minimum length.
An alternation like b[|o]ut for ’boat’ and ’but’ seems particularly perverse, though perhaps there are other ways to
filter out such uninteresting cases. Biber (2009) limits the
gap to be of length exactly one. But this seems to merely
reflect the limitations of a particular software package since
in the context from one X to the other, there is very little
difference between the single word ’extreme’ and the four
word phrase ’end of the Earth’. It may also be possible for
the gap to have negative length, effectively meaning that the
left and right parts overlap. This is allowed, for example,
in the Apostolico-Satta algorithm, though it is unclear what
advantages this “feature” has for natural language texts.9
To start with, let us consider a typical gapped expression:
from one X to the other.6 The goal of gapped phrase extraction is to discover gapped expressions such as this. Once
such a pattern is discovered, a researcher can easily find
further instances of the pattern by searching with regular
expressions in other corpora. Initially however, the phrase
extraction may discover just a couple of instantiations for
X, which may be expressed as a simple regular expression
using only alternation: f romone[shore|edge]totheother.
In referring to patterns such as this, we will use α to refer
to the left part f rom one and β to refer to the right part
to the other. It will generally be assumed that the left and
right parts are non-empty. For the alternation in the middle,
We will use the letter m. It will generally be assumed that
the middle consists of at least two alternatives.
As usual, we will use letters from the beginning of the alphabet a, b, c to represent single symbols, and letters from
the end of the alphabet w, x, y to represent sequences. The
reader should keep in mind, however, that what counts as a
symbol depends on the tokenization. The two obvious approaches are character-based and word-based tokenization,
with the latter in particular requiring algorithms adapted to
a large alphabet. In some sense, word-based tokenization is
more natural, though the character-based approach has the
Kim et al. (2008) is of particular interest for NLP, since their
approach is optimized for a large alphabet, as opposed to most
of the bioinformatics literature which uses a four-letter alphabet.
With a large alphabet, it becomes possible to tokenize a text by
words, and treat each word as a “letter.”
Perhaps eLearning practitioners who are interested in ontologies will find this example interesting. There is clearly a class
of “polarized entities” that can serve as good instantiations for X.
Paired, but non-polarized entities like sock and shoe are not very
felicitous. Is there a WordNet synset for this?
We’re simplifying quite a bit here. The “recursive call” is, in
fact, rather different from the original call.
This pattern is found in Moby Dick. A language researcher
might be interested in such an example since it seems to pick out
a semantic class of actions that occur or can be performed in a
partial manner.
In fact, the the Apostolico-Satta algorithm has a parameter d
not for the length of the gap, but rather for the maximum distance
between the beginning of the left part and the beginning of the
right part. If d < |α|, then there could be overlap. This, however,
does not seem to be a serious limitation, since it would be easy
enough to adapt the Apostolico-Satta allgorithm to let d be some
function of |α|.
patterns as Ahab r . . . ed and Ahab re . . . ed. That is, think
of the middle part as not really part of the pattern, but rather
as providing information about occurrences of the pattern.
In this sense, Ahab re . . . ed appears to be more specific,
since the occurrence with rushed is lost. But there is a problem here. Recall that the . . . matches sequences no longer
than length d. If we set d to be 4, then the supposedly less
specific pattern will not match Ahab remained, and the supposedly more specific pattern will match this occurrence.
This suggests that the Apostolico-Satta approach of letting
d be the distance from the beginning of the left piece to
the beginning of the right piece may be preferable. On the
other hand, their approach allows the left and right parts to
More sophisticated possibilities also exist. For example,
one could specify the the gap length conditions as a function of the lengths of the left and right pieces. Or perhaps
a function of the contents of the left and right parts and the
gap could be used. Another possibility would be to measure the gap length as number of syllables or number of
some other kind of linguistic unit. Probably, it would not
be possible to incorporate such conditions directly into the
extraction algorithm. Most likely, a secondary filter would
be the required approach.
3.3. Principle of Maximal Extension
A fundamental notion in the pattern recognition literature
is that of saturation, which Apostolico (2009) defines as
3.4. No Overlap
The Apoostolico-Satta algorithm is designed to find tandem
occurrences of two strings, which they explain as follows:
. . . a pattern is saturated relative to its subject
text, if it cannot be made more specific without
losing some of its occurrences.
By the two strings occurring in tandem, we mean
that there is no intermediate occurrence of either
one in between.
This is stated in a rather imprecise way, but the intention
should be clear. Suppose that the pattern mumbo has occurrences at (i, i), (j, j) and (k, k). Suppose further that the
pattern is extended (made more specific) to mumbo jumbo
and that occurrences are now found at (i, i + 1), (j, j + 1)
and (k, k + 1). Then the 3 old occurrences should not
be seen as lost, but rather as replaced by 3 corresponding longer occurrences. So the pattern for the incomplete
phrase mumbo is unsaturated.
Suffix trees and suffix arrays are a kind of asymmetrical
data structure that make extensions to the right easier to
find than extensions to the left. So given mumbo, it is easy
to extend this to the right, but given jumbo, it is much harder
to extend this to the left. For left extensions, Abouelhoda
et al. (2004) advocate the use of a Burrows and Wheeler
transformation table.
For gapped phrases, the issue of extension to the left and
right becomes even more complex. Given a pattern α[ax1 |
· · · | axn ]β, it seems reasonable to extract the a, turning the
pattern into αa[x1 | · · · | xn ]β, capturing the generalization
that the middle part always starts with a.
If the left and right parts are both extended, then one can
find patterns like Ahab r[each|emain|etir|ush]ed (from
Moby Dick), where extension of the left part represents the
linguistically interesting fact that all the verbs are in the past
tense. The extension of the left part, on the other hand, captures the rather uninteresting fact that all the verbs happen
to start with r. If the left part is now further extended, then
the pattern becomes more specific, and loses some of its
occurrences: Ahab re[ach|main|tir]ed. It is unclear how
a gapped phrase extraction program should be designed to
rule out such uninteresting extensions.10
It is interesting to think about the example in the previous
paragraph in terms of saturation. Suppose we think of the
To illustrate the problem of intermediate occurrences, consider the following truncated version of Moby Dick (tokenized by character):
the boat. the white whale
The sequence the occurs twice, so this is a candidate left
part. The sequence wh occurs twice, both times with the
to the left (supposing d = 6, for example). So without
taking care, one might extract the nonsense pattern the [|
white] wh.
The Apostolico-Satta algorithm is designed from the beginning to rule out such overlaps. But the basic algorithm
presented in section 4. has a problem with these. An extra
step would be required just to filter out such overlaps.
3.5. Boundaries
A common feature in the study of (gapped) phrases is that
they are allowed to cross many, but not all kinds, of boundaries. For example, in the “lexical bundles” studied by
Biber (2009) is that they, more often than not, cross the category boundaries of traditional linguistics. Typical examples are: as a result of and it is possible to. With tokenizing
by letter, one often finds partial words (example from Moby
Dick): contrast [between|in|of |to] th. Here the partial
word th seems to play an important role in English.
Still there are some boundaries that should not be crossed.
Dickinson and Meurers (2005), for example, note that the
patterns that they were looking for should not cross sentence boundaries. There is therefore a temptation to put
such boundary constraints into the phrase extraction program. We believe, however, that this is a mistake. The
phrase extraction program is already complicated enough
without having to deal with such special cases.
In this case there seems to be a fairly simple-minded alternative. Simply use a tokenizer that replaces each boundary
punctuation character (period, question mark, etc) with a
unique integer identifier. This requires a bit of bookkeeping to remember which integers have been used to represent
On a personal note, it is examples like this that inspired us to
write this paper. We had started off by implementing an algorithm
similar to that of Apostolico and Satta (2009), and after encountering problematic cases like this, decided to put the algorithm
aside for a while, and to concentrate on writing a specification of
desirable features for any gapped phrase extraction program.
which punctuation characters, but it is still much easier than
modifying the suffix arrays or trees. A similar approach
is described in section 4. to avoid extraction of “phrases”
which start near the end of one text in the corpus, and conclude near the beginning of the next text.
Algorithmic Specifications
In this section, we sketch a rather basic algorithm which
may serve as the basis for something more useful.13 The
idea is quite simple. Given a phrase extraction algorithm for
non-gapped phrases, candidate left parts can be extracted.
To reduce the search space, these candidate left parts may
be required to be maximally extended or “interesting” in
various ways. For a given phrase p, find all occurrences of
p in the corpus, and denote each such occurrence as (i, j),
where i and j are the indices of the first and last tokens
of the occurrence in the corpus. For each such occurrence,
specify the right context as (j + 1, j + d + 1), where d is
the maximal length allowed for the gap. Clearly, these right
contexts can be found efficiently using either suffix trees or
suffix arrays. Now form a new corpus by treating each of
these right contexts as a single text in this subcorpus. Following the idea of Yamamoto and Church (2001), the texts
in this subcorpus should be concatenated, using sentinels to
separate one text from the next, and also with one sentinel
at the end. Assuming that the text is represented by integer id’s, then the smallest otherwise unused integers can be
used for the sentinels.
Assuming that a subcorpus is built up in this way, then finding right parts corresponding to each left part is mostly just
a matter of running the phrase extraction program again for
each subcorpus. There are, however a couple of issues to
watch out for. First,pp it is important that a different integer is used for each sentinel. Otherwise the sentinels themselves, including possibly context around the sentinels, will
be seen as repeated phrases.
Second, there is a problem with limiting the right context
to be of length d + 1. If the gap is of length d, then the
right context is just long enough to include one token from
the right part. Consider, for example the following subcorpus for the left part from one with d = 4: end of the
Earth to $ extreme to the other foo $ shore to the other
bar $.14 From this subcorpus, one would find the patterns:
f rom one [end of the Earth | extreme | shore] to and
f rom one [extreme|shore] to the other. It is clear that
the first of these patterns has been artificially truncated.
This problem is solvable, but it takes a bit of bookkeeping. The idea here is that when a subcorpus is formed,
for each token in the subcorpus, a record is kept of where
that token was located in the original (parent) corpus.15
With this record, the end locations of each occurrence of
f rom one [end of the Earth | extreme | shore] to can
be found in the parent corpus. The longest common prefix can then be found for the set of sequences starting at
these end locations, and this can be used to extend the truncated right part. There is still a problem, however, since
if f rom one [end of the Earth|extreme|shore] to is extended to f romone[extreme|shore]totheother, then two
instances of this latter pattern will be found. So an efficient
way of avoiding such duplications must be found.
Interesting Phrases
To be useful, a phrase extraction program must be equipped
with a notion of what kinds of phrases are interesting. Citing Apostolico (2009):
Irrespective of the particular model or representation chosen, the tenet of pattern discovery equates
overrepresentation with surprise, and hence with
In linguistics, there are other ways of defining interest. For
example, a phrase may be considered interesting if it exhibits some degree of non-compositional semantics, or if it
exhibits some particular syntactic pattern. For an overview,
see Evert (2009).
Another way of measuring interest is more goal directed.
One might say, for example, that a phrase is interesting
if it is useful for distinguishing positive camera reviews
from negative ones (Tchalakova, 2010). Or alternatively,
a phrase could be considered interesting if it is helpful for
distinguishing high quality online posts from low quality
ones (Burek and Gerdemann, 2009).
A central insight of (Yamamoto and Church, 2001) is that
measures of interest are most commonly based upon basic
measures of term frequency and document frequency, and
that these measures need only be calculated for the saturated phrases.1112 So, for example, the term frequency and
document frequency for mumbo is exactly the same as for
mumbo jumbo, so this information can be stored just once at
the appropriate node in a suffix tree or for an lcp-interval in
a suffix array. The problem is, of course, that jumbo really
ought to be included in this class as well, and neither suffix
trees nor suffix arrays provide a natural way of representing
such equivalence classes.
A key question to answer is how the interest measure should
be incorporated into the gapped phrase extraction algorithm. The simplest approach would be to extract phrases
initially without regard to interest, and then use the interest
measure as a filter to remove uninteresting cases. Another
approach would be to incorporate the interest measure into
the algorithm, perhaps by restricting candidate left parts to
just the interesting cases before looking for matching right
contexts. We leave this as an open question.
This was at least the basic intuition. In fact, the YamamotoChurch algorithm did not maximally extend phrases to the left
since they did not use the Burrows and Wheeler transformation
table as advocated by Abouelhoda et al. (2004).
Aires et al. (2008) presents a rather more complicated formula, in which the interest of a phrase is a function of both the
term frequency of its subphrases and the superphrases containing
the phrase as a subphrase. This is algorithmically more complex,
but may be an improvement.
An alternative is presented in Gerdemann (2010).
The tokens foo and bar are arbitrary. All sentinels are printed
as $ even though different integers are used.
Such record keeping is required in any case if document frequencies are required for the phrases.
Alberto Apostolico. 2009. Monotony and Surprise: Pattern
Discovery under Saturation Constraints. In Anne Condon, David Harel, Joost N. Kok, Arto Salomaa, and Erik
Winfree, editors, Algorithmic Bioprocesses, pages 15–
29. Springer.
Douglas Biber. 2009. A corpus-driven approach to formulaic language in english: Multi-word patterns in speech
and writing. International Journal of Corpus Linguistics, 14(3):275–311.
Gaston Burek and Dale Gerdemann. 2009. Maximal
phrases based analysis for prototyping online discussion
forums postings. In Proceedings of the RANLP workshop on Adaptation of Language Resources and Technology to New Domains (AdaptLRTtoND), Borovets, Bulgaria.
Markus Dickinson and W. Detmar Meurers. 2005. Detecting errors in discontinuous structural annotation. In Proceedings of the 43rd Annual Meeting of the Association
for Computational Linguistics (ACL-05), Ann Arbor, MI,
Stefan Evert. 2009. Corpora and collocations. In A. Lüdeling and M. Kytö, editors, Corpus Linguistics: An International Handbook of the Science of Language and Society, volume 2, chapter 58, pages 1212–1248. Mouton
de Gruyter, Berlin/New York.
Dale Gerdemann. 2010. Suffix and prefix arrays for gappy
phrase discovery. Presented at: First TübingenWorkshop
on Machine Learning; Slides at: dg/ks.pdf.
Georgi Jetchev. 1997. Ghost Vowels and Syllabification:
Evidence from Bulgarian and French. Ph.D. thesis,
Scuole Normale Superiore di Pisa.
Dong Kyue Kim, Minhwan Kim, and Heejin Park. 2008.
Linearized suffix tree: an efficient index data structure
with the capabilities of suffix trees and suffix arrays. Algorithmica, 52(3):350–377.
Thomas K. Landauer and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis
theory of acquisition, induction, and representation of
knowledge. Psychological Review, 104(2):211–240.
Robert Sedgewick and Kevin Wayne. 2010 (forthcoming). Algorithms. Addison-Wesley, 4th edition.
Web page: (see in
particular: and
Maria Tchalakova. 2010. Automatic sentiment classification of product reviews. Master’s thesis, Universität
Tübingen, Germany.
Mikio Yamamoto and Kenneth W. Church. 2001. Using
suffix arrays to compute term frequency and document
frequency for all substrings in a corpus. Comput. Linguist., 27(1):1–30.
D. Zhang and W.S. Lee. 2006. Extracting key-substringgroup features for text classification. In Proceedings
of the 12th ACM SIGKDD international conference on
Knowledge discovery and data mining, pages 474–483.
Another problem also involves maximal extension. Suppose that the saturated pattern α is chosen as the left part.
Since it is saturated, it cannot be extended to aα or αb without losing some of its occurrences. Now suppose that β is
chosen as a corresponding right part, so that the gapped pattern is α . . . β. Now it may be that α by itself is saturated,
but nevertheless in this context extensions could be made
to aα . . . β or αb . . . β without losing any occurrences. Extending the pattern to αb . . . β, since it encroaches upon the
length of the gap (represented by . . .). So rather than extending the left part, it is preferable to filter out cases such
as α . . . β where the left part is extendable. Suppose that α
can be extended to α$, where α and α$ are both saturated.
Then both α and α$ will be considered as candidate left
parts. So more specific instances of α . . . β may be found in
any case when this pattern is not saturated. The efficiency
of the algorithm is, however, an issue, since the filtering
turns it partially into a generate-and-test algorithm.16
5. Conclusion
Gapped phrase extraction clearly has a lot of utility, as witnessed by the number of language researchers who have
investigated such phrases, using very imperfect tools. The
proper tool for this purpose is an open question which has
not been resolved in this paper. The hope is that, as specified in the title, this paper will serve as a challenge, both
to someone interested in algorithm design and implementation or to someone who is interested in further specifying
what features a gapped phrase extraction program ought to
The benefits to eLearning will be that learner texts will be
better characterized in terms of the phrases that that the
learner uses, instead of simply in terms of a bag-of-words
model. Learners should get feedback indicating which
phrases are effective, high-quality, appropriate for a particular domain, etc. Such feedback will result in improved
writing, in turn leading to better communication. And ultimately, in terms of social theories of learning, better communication will result in improved learning.
6. References
Mohamed Ibrahim Abouelhoda, Stefan Kurtz, and Enno
Ohlebusch. 2004. Replacing suffix trees with enhanced
suffix arrays. J. of Discrete Algorithms, 2(1):53–86.
José Aires, Gabriel Lopes, and Joaquim Silva. 2008. Efficient multi-word expressions extractor using suffix arrays and related structures. In Proceeding of the 2nd
ACM workshop on Improving non-English web searching, pages 1–8, Napa Valley, California.
Alberto Apostolico and Giorgio Satta. 2009. Discovering
subword associations in strings in time linear in the output size. J. of Discrete Algorithms, 7(2):227–238.
Even as a partly generate-and-test algorithm, initial tests suggest that this approach may be efficient enough for practical purposes. One helpful strategy would be to recognize special cases
where the tests can be avoided. For example, if the candidate left
part is already supermaximal (Abouelhoda et al., 2004) by itself,
then it will not be necessary to check for extensions of this left
part when it combines with a right part.
Towards Resolving Morphological Ambiguity in Arabic
Intelligent Language Tutoring Framework
Khaled Shaalan1, Marwa Magdy2, Doaa Samy3
"[email protected])."6B"C6DE0)%'*"F"G,B6'D2)(6,3"C2('6"+,(-%'*().3":"<$D%H"I%J%A"K)L3"M(N2"!>O!8"=P.E)"
Q$2A%HL*$22A2,R10([email protected]%3"DLD2PH.RB@([email protected]%H0L%P3"[email protected]%H0L%P""
)$2)"J%'%"J'())%,"1."1%P(,,%'")6"(,)%'D%H(2)%"[email protected],H"V2,P02P%"V%2',%'*L"#$%"D6'E$6A6P(@2A"H(*2D1(P02)(6,"$2*"1%%,"H%-%A6E%H"2,H"
%BB%@)(-%A."%-2A02)%H"0*(,P"'%2A")%*)"H2)2L"G)"2@$(%-%H"*2)(*B2@)6'."'%*0A)*"(,")%'D*"6B")$%"[email protected]"'2)%L"
<," G,)%AA(P%,)" V2,P02P%" #0)6'(,P" K.*)%D" YGV#KZ" (*" 2"
@6DE0)%'S12*%H"[email protected])(6,2A"*.*)%D")$2)"2AA6J*"*(D0A2)(6,"
6B" 2" $0D2," )0)6'L" <," GV#K" (*" 2" -2A021A%" )66A" 0*%H" (,"
A2,P02P%" %SA%2',(,P" E'6P'2D*L" &%*(H%*3" ()" (*" $(P$A."
4'6@%**(,P" B(%AH" *(,@%" ()" $%AE*" E%6EA%" (," )$%" A2,P02P%"
*%-%'2A" J2.*" *0@$" 2*" parsing 6B" )$%" A%2',%'" (,E0)" 2,H"
diagnosis 6B" D6'E$6A6P(@2A" 2,H" *.,)2@)(@" %''6'*"
YU%'16,,%3" >;;8ZL" [6J%-%'3" GV#K" B6'" %''6'" H(2P,6*(*" )6"
2,2A.N%" A%2',%'*\" (,E0)" 2,H" E'6-(H%" (,)%AA(P%,)" 2,H" '%2AS
)(D%"[email protected]"(*"$(P$A.",%%H%H"B6'")$%"B6AA6J(,P"'%2*6,*]""
! GV#K" E'6-(H%" (,H(-(H02A(N%H" )0)6'(,P" )6" A%2',%'*"
! ^%A(21A%" %''6'" H(2P,6*(*" *.*)%D*" J60AH" 2AA6J"
0*%'*_20)$6'*" )6" [email protected]%" )$%" A(D()2)(6,*" 6B"
D0A)(EA%" @$6(@%" X0%*)(6,*" 2,H" B(AAS(,S)$%S1A2,Q*"
).E%*" 6B" %7%'@(*%*L" &%*(H%*3" GV#" *.*)%D*" @2,"
@6DD0,(@2)(-%" 2,H" (,)%'2@)(-%" )2*Q*" )6" A%2',%'*"
+,B6')0,2)%A.3" 2AD6*)" 2AA" UV4" )66A*" *0@$" 2*" E2'*%'*3"
B6'D%H" (,E0)L" K63" )6" $2,HA%" (AASB6'D%H" (,E0)" (," GV#K3"
)%@$,(X0%*" *0@$" 2*" @6,*)'2(,)" '%A272)(6," 2'%" %DEA6.%H"
Y?2A)(,3" >;;8ZL" G," 2,." A2,P02P%" D6H%A3" )$%" E2')(2A"
*)'0@)0'%*" @2," @6D1(,%" 6,A." (B" *6D%" @6,*)'2(,)*" 6'"
2))2@$D%,)" (*" 2AA6J%H" %-%," (B" )$%" @6,*)'2(,)" (*" ,6)"
*2)(*B(%HL" #$%" '%A27%H" @6,*)'2(,)" D0*)" 1%" D2'Q%H" 6," )$%"
*)'0@)0'%" *0@$" )$2)" )$%" ).E%" 2,H" E6*()(6," 6B" )$%" H%)%@)%H"
%''6'" @2," 1%" (,H(@2)%H" [email protected],B('D%HZ" A2)%'" 6,L" G," GV#K3"
'%A27(,P" )$%" @6,*)'2(,)*" 6B" )$%" A2,P02P%" )6" 2,2A.N%"
B6'" 6,A." J%AASB6'D%H" (,E0)" Y<))(23" >;;OZL" C6,*(H%'3" B6'"
#$(*" E2E%'" 2HH'%**%*" (**0%*" '%A2)%H" )6" )$%"
6B" %''6,%60*" <'21(@" -%'1*" J'())%," 1." 1%P(,,%'" )6"
(,)%'D%H(2)%" [email protected],H" V2,P02P%" V%2',%'*" YKVV*ZL" #$%"
G)" @6,*(H%'*" )$%" A(Q%A($66H" 6B" 2," %''6'" J$(@$" )2Q%*" (,)6"
2@@60,)")$%" A%-%A"6B" (,*)'0@)(6,"2,H" )$%"B'%X0%,@." 2,H_6'"
D(*A%2H(,P" 6'" (,@6''%@)" [email protected]" #$%" '%*0A)" 6B"
2,H"E'6-(H%")$%"%''6'"*E%@(B(@"[email protected]"
<$D%H" Y>;;;Z" 2HH'%**%H" )$%" E'61A%D" 6B" <'21(@"
D6'E$6A6P(@2A" H(*2D1(P02)(6," )6" *%A%@)" )$%" D6*)" A(Q%A."
)%7)L" [%" 0*%H" 2" E6J%'B0A" H.,2D(@" ,SP'2D" *)2)(*)(@2A"
H(*2D1(P02)(6," )%@$,(X0%L" #$%" *)2)(*)(@2A" Q,6JA%HP%" 6B"
E'%*%,)*" 2" 1'(%B" H([email protected]**(6," 6B" <'21(@" D6'E$6A6P(@2A"
2D1(P0()." E'61A%DL" K%@)(6," 8" H%*@'(1%*" )$%" E'6E6*%H"
*.*)%DL"K%@)(6,"9"H([email protected]**%*")$%"'%*0A)*"B'6D")$%"@6,H0@)%H"
%7E%'(D%,)L" ?(,2AA.3" (," K%@)(6," :3" J%" P(-%" *6D%"
"&[email protected])%'" )'2,*A()%'2)(6," (*" 0*%H" $%'%" )6" ^6D2,(N%" <'21(@"
%72DEA%*"Y&[email protected])%'">;;>ZL"
Arabic Morphological Ambiguity
<'21(@" A2,P02P%" (*" 6,%" 6B" )$%" K%D()(@" A2,P02P%*" )$2)" (*"
H%B(,%H"2*"2"diacritized A2,P02P%"J$%'%")$%"E'6,0,@(2)(6,"
6B" ()*" J6'H*" @2,,6)" 1%" B0AA." H%)%'D(,%H" 1." )$%('" *E%AA(,P"
1%A6J" )$%" *E%AA(,P" @$2'2@)%'*" )6" H%)%'D(,%" )$%" @6''%@)"
[email protected](N2)(6,"2,H3")$0*3")$%"@6''%@)"E'6,0,@(2)(6,L"
+,B6')0,2)%A.3" H(2@'()(@*" 2'%" '2'%A." 0*%H" (," @0''%,)"
<'21(@" J'()(,P" @6,-%,)(6,*L" #$%" @6''%@)" E'6,0,@(2)(6,"
2,H" (,)%'E'%)2)(6," 6B" ,6,%" 6'" E2')(2AA." H(2@'()(N%H" )%7)"
H%E%,H*" 6," )$%" ,2)(-%" A2,P02P%" @6DE%)%,@%" 2,H" )$%"
@6,)%7)L" /0%" )6" )$%" 6E)(6,2A" H(2@'()(N2)(6,3" )J6" 6'" D6'%"
J6'H*" (," <'21(@" 2'%" $6D6P'2E$(@]" )$%." $2-%" )$%" *2D%"
(*" )6)2AA." H(BB%'%,)" Y<$D%H3" >;;;d" <))(23" >;;Od" [212*$3"
Different Interpretations
/30."_.0=(H_"Y1'(,P"[email protected]"
6/407."_.2=0Hc_"[email protected],)Z"
@$2,P%" (," E'6,0,@(2)(6," J()$60)" 2,." %7EA(@()"
6')$6P'2E$(@2A" %BB%@)" H0%" )6" [email protected]" 6B" *$6')" -6J%A*"
YH(2@'()(@*ZL"<," %72DEA%" 6B" )$(*" (*" )$%" 2D1(P0()." 6B"
K6D%" E'%B(7%*" 2,H" *0BB(7%*" @2," 1%" $6D6P'2E$(@"
[email protected],H" E%'*6," *(,P0A2'" [email protected](,%3" 8Z" [email protected],H" E%'*6,"
*(,P0A2'" B%D(,(,%3" 6'" 9Z" )$('H" E%'*6," *(,P0A2'"
B6'D" )$2)" (*" $6D6P'2E$(@" J()$" 2,6)$%'" B0AA" B6'D"
/>2"_e2*2H_"YA(6,Z"6'"</?>2""_e2S*0Hc_"YGS&[email protected]"
/(BB(@0A)(%*" (," )$%" E'6@%**" 6B" <'21(@" D6'E$6A6P(@2A"
Table 1: <,"<'21(@"J6'H")$2)"(*"$6D6P'2E$(@"
[6J%-%'3" 6)$%'" B2@)6'*" @6,)'(10)%" )6" )$%" E'61A%D" 6B"
D6'E$6A6P(@2A" 2D1(P0()." (,"<'21(@L"<D6,P" )$%*%" B2@)6'*"
5')$6P'2E$(@" 2A)%'2)(6," 6E%'2)(6,*" Y*0@$" 2*"
#21A%" !L" #$%*%" 2A)%'2)(6," 6E%'2)(6,*" 2'%" H0%" )6" )$%"
E$6,6A6P(@2A" @6,*)'2(,)*" 6B" @%')2(," '66)" @6,*6,2,)*L"
J%2Q" -%'1*" )$2)" (,@A0H%" 6,%" 6'" D6'%" J%2Q" A%))%'L"
`%2Q" A%))%'*" @2," 1%" H%A%)%H" 6'" *01*)()0)%H" 1." 6)$%'"
A%))%'*" [email protected]*%" 6B" <'21(@" E$6,6A6P(@2A" @6,*)'2(,)*"
Y=ASK2H2,." 2,H" [2*$(*$" !fgfZL" ?6'" %72DEA%3" )$%"
H%A%)(6," 6B" )$%" A%))%'" Y5Z" (," )2Q(,P" )$%" E'%*%,)"
Y(DE%'B%@)Z" )%,*%" 6B" )$%" )'(A2)%'2A" '66)" 1S8S5" _JS=SH_3"
0*(,P" '%P0A2'" '0A%*" J60AH" P%,%'2)%" "/"%9.h" _.2SJ=(H_"
W2,." (,BA%@)(6,2A" 6E%'2)(6,*" 0,H%'A(%" 2" *A(P$)"
K6D%"<'21(@" E2))%',*" 2'%" H(BB%'%,)" 6,A." (," )$2)"
(," J'()(,P" 6B" )$%('" @6''%*E6,H(,P" B6'D*" *0@$" 2*" :" "0;"
The Proposed Disambiguation System
W2PH." 2,H" ?2$D.3" >;!;ZL" <B)%'J2'H*3" )$%" GV#K" *%,H*"
" [email protected]"W%**2P%
#$%" B6AA6J(,P" %72DEA%" @A2'(B(%*" $6J" )$%" *.*)%D" J6'Q*L"
C6,*(H%'" )$%" B6AA6J(,P" X0%*)(6," )$2)" (*" E'%*%,)%H" )6" )$%"
Example 1:
@2," 1%" 0*%H" )6" @6,T0P2)%" )$%" E'%*%,)" )%,*%" 6B" )$%" 8'H"
E%'*6," B%D(,(,%" *(,P0A2'" Y"M
" NO"F"E"NZ" 2,H" )$%" >,H" E%'*6,"
Complete the following sentence with the correct
conjugation of the given root in imperfect tense active
[email protected](,%"*(,P0A2'"YMNOF"!K2Z"
""BCD+"EF/G"[email protected]"LLLLLL
G,")$%"216-%"%72DEA%3")$%"'66)" 8SAS@"_1S.S=_"@6,)2(,*"
D(HHA%" J%2Q" A%))%'" A" _._" *6" ()" ,%%H*" *E%@(2A" '0A%*" )6"
(,)6" (DE%'B%@)" E2**(-%" -6(@%3" )$%" D(HHA%" J%2Q" A%))%'"
*$60AH"1%"*01*)()0)%H"1."+"_<_"*6"()"[email protected]%""8-"H4F"_)0S12<=_"
<**0D%" )$%" B6AA6J(,P" )J6" 2,*J%'*d" J$%'%" Y2Z"
2L B" CD+"EF/"G"8-"HF" _)2S1(<=" T2Hc2)(."<Ae2'0Nc_" YW.S
B" CD+"EF/"G"I"$HF" _)2S1(.=" T2Hc2)(." <Ae2'0Nc_" YW.S
#$%" GV#K" E'6H0@%*" )J6" E6**(1A%" 2,2A.*%*" B6'" )$%"
! Third person singular feminine imperfect verb in
the active voice with converted middle letter A /y/
to + /A/L"
! Third person singular feminine imperfect verb in
the passive voice."
#$%," )$%" H(*2D1(P02)(6," *.*)%D" *%A%@)*" )$%" D6*)"
2EE'6E'(2)%" 2,2A.*(*" 2@@6'H(,P" )6]" )$%" A%2',%'" A%-%A" 2,H"
2" -%'1" (,*)%2H" 6B" ()*" 2@)(-%" -6(@%L" #$%'%B6'%3" )$%" *.*)%D"
2H6E)*" *6D%" prioritized conditions" )6" *%A%@)" )$%" D6*)"
E'%B%''%H" J6'H" 2,2A.*(*L" [%,@%3" (," )$(*" @2*%3" )$%" *.*)%D"
J(AA"*%A%@)")$%"first analysisL"#$(*"2,2A.*(*"(*"A2)%'"6,"0*%H"
1." GV#K" )6" H%)%@)" )$%" %''6'" D2H%" 1." )$%" Y(,@6''%@)"
G," )$%" E'6E6*%H" *.*)%D3" J%" (,-%*)(P2)%H" 60'"
H(*2D1(P02)(6," 2EE'62@$" 6," )$%" B6AA6J(,P" )$'%%" ).E%*" 6B"
#$%" 6')$6P'2E$(@" D2)@$" (," ,6,SH(2@(')(N%H" )%7)"
1%)J%%,"<'21(@" @6,T0P2)%H" -%'1" B6'D*" (," E2**(-%" -6(@%3"
E%'*6," *(,P0A2'" [email protected](,%" (," 2@)(-%" -6(@%3" J$(A%" Y:""3J4K"
/nuqil_Z" (*" )$%" E%'B%@)" )%,*%" B6'" )$%" 8'H" E%'*6," *(,P0A2'"
[email protected](,%"(,"E2**(-%"-6(@%L"K2D%"E$%,6D%,6,"(*"'%E%2)%H"
(," )%'D*" 6B" *E%AA(,P" @$2'2@)%'*L"#$%*%" 2BB(7%*" 2'%" 0*%H" )6"
#$%" 6')$6P'2E$(@" D2)@$" 1%)J%%," <'21(@" -%'1"
H%'(-2)(6," E2))%',*" 2,H" ,6,SH%'(-2)(-%" E2))%',*L" ?6'"
%72DEA%3" )$%" -%'1" /" 0">" _*2=2H2_" Y)6" 1%" $2EE.Z" (*" 2" '66)3"
,6,SH%'(-2)(-%" -%'1L" <" E6**(1A%" H%'(-2)(-%" E2))%'," (*"
"/0">2_<*=2H2_Y)6" D2Q%" $2EE.ZL"#$%" (DE%'B%@)" @6,T0P2)(6,"
B6'" )$%" B('*)" E%'*6," 6B" )$%" B('*)" -%'1" (*" Y"/0">2" _AsEada_Z3"
J$(@$" (*" (H%,)(@2A" )6" )$%" @6,T0P2)(6," 6B" )$%" 8'H" E%'*6,"
*(,P0A2'" (," )$%" E%'B%@)" )%,*%" 6B" )$%" [email protected],H" -%'1" Y"/" 0">2"9"N"
6B" )$%" [email protected]%" 6B" )$%" @0''%,)" *.*)%D" 2*" )$%" *.*)%D" $2*" ,6"
*6D%" *.*)%D*3" J$%'%" )$%" *.*)%D" $2*" (,*0BB(@(%,)"
)$%" A%2',%'" (," 6'H%'" )6" P0(H%" )$%" *%A%@)(6," 6B" 2EE'6E'(2)%"
%7E'%**(6,3" %LPL" Y[*(%$" %)" 2AL3" >;;>ZL" ?(P0'%" >" E'%*%,)*"
$6J" )$%" *.*)%D" H(*2D1(P02)%*" D0A)(EA%" 2,2A.*%*" 2,H" )$%"
" <,2A.*%*"
G," @2*%" 6B" )$%" first ambiguity type3" )$%" *.*)%D" *%A%@)*"
)$%" J6'H" 2,2A.*(*" 2" *)0H%,)" D6*)" A(Q%A." (,)%,H%HL" G)"
(DEA%D%,)*"two E'(6'()(N%H"@6,H()(6,*")6"*%A%@)*")$%"D6*)"
!L GB" )$%" X0%*)(6," P62A" (*" )6" )%*)"passive voice )$%,"
)$%" *.*)%D" *%A%@)*" passive voice 2,2A.*(*d"
6)$%'J(*%3"()"*%A%@)*")$%"active voice 2,2A.*(*3"6'"
"=72DEA%" 6B" )$%*%" ).E%*" (*" J$%," )$%" noun $2*" )$%" *2D%"
GB" )$%" X0%*)(6," P62A" (*" )6" )%*)" imperative" tense
)$%," )$%" *.*)%D" *%A%@)*" )$%" imperative tense
2,2A.*(*d" 6)$%'J(*%3" ()" *%A%@)*" )$%" perfect or
imperfect tense"2,2A.*(*L"
@6,H()(6,")6"*%A%@)")$%"B('*)"2,2A.*(*"YThird person singular
feminine imperfect verb in the active voiceZL" U6)(@%3"
G,"@2*%"6B")$%"second ambiguity type"Y(L%L"6')$6P'2E$(@"
D2)@$" 1%)J%%," H(BB%'%,)" 2BB(7%*Z3" )$%" *.*)%D" @6AA%@)*" 2AA"
(," )$%('" D6'E$6S*.,)2@)(@" B%2)0'%*" (," 6,%" %,)'." J()$" 2"
?6'" %72DEA%3" @6,*(H%'" )$%" B6AA6J(,P" A%2',%'" (,E0)d"
2L :""PQ"R""S.'G"E"";"!""TC9F"/""SUV" _D0[2Dc2H" )2J2'c2#)"
B(." T2'(.D2E" X2)6A_" YW6$2D%H" J2*S(,-6A-%H" (,"
/" "./G"!"$["E"*\"+9"]JK"EF/"G"5"A/"G" _" T2Hc(." J2T2Hc2E(."
T2'(.D2E" X2)6A_" YW6$2D%H" J2*S(,-6A-%H" (,"
1%)J%%," )$%" *01T%@)" W6$2D%H/""SUV" 2,H" )$%" -%'1" J2*S
(,-6A-%H" !
" "TC9FL" ?60'" E6**(1A%" 2,2A.*%*" 6B" )$%" %''6,%60*"
! First person singular perfect verb in the active
! Second person singular masculine perfect verb in
the active voice.
! Second person singular feminine perfect verb in
the active voice.
! Third person singular feminine perfect verb in
the active voice
#$%*%" B60'" E6**(1A%" 2,2A.*%*" 2'%" @6D1(,%H" (,)6" )$%"
! Singular perfect verb in the active voice.
G," @2*%" 6B" )$%" third ambiguity type" Y(L%L" 6')$6P'2E$(@"
D2)@$"1%)J%%," H(BB%'%,)" E2))%',*Z3")$%" *.*)%D" @6AA%@)*" 2AA"
?6'" %72DEA%3" @6,*(H%'" )$%" B6AA6J(,P" X0%*)(6," )$2)" (*"
Example 2:
Complete the following sentence with the correct
conjugation of the given root in perfect tense active voice.
/" " ./G"!""$["E""*\"^JP""K+"EF/""G5"A/""G" _T2Hc(." J2T2Hc2E(."
p(,6)2X2A<" n(A2o" 12.6)" T2H(.H_" YD.SP'2,HB2)$%'"
#$%" A%2',%'" $%'%" $2*" D2H%" )J6" %''6'*]" !Z" *01T%@)S-%'1"
H(*2P'%%D%,)" 1%)J%%," )$%" *01T%@)" qD.SP'2,HD6)$%'" 2,H"
(*" H02A" J$(A%" )$%" -%'1" (*" @6,T0P2)%H" (," )$%" [email protected](,%"
EA0'2A" B6'D" 2,H3" >Z" (,@6''%@)" 0*%" 6B" )$%" '66)" E2))%'," 6B" 2"
E%'B%@)" -%'1" B6'Dd" )$%" @6''%@)" E2))%'," (*" \:""0P;+\" J$(A%" )$%"
! Third person masculine plural perfect verb in the
active voice following the pattern ':0;'.
! Third person masculine plural perfect verb in the
active voice following the pattern ':<0;'.
! Third person masculine plural perfect verb in the
active voice."
`%" @6,H0@)%H" 2," %7E%'(D%,)" )$2)" D%2*0'%*" $6J"
*0@@%**B0AA." )$%" E'6E6*%H" D6H%A" *%A%@)*" )$%" D6*)"
2EE'6E'(2)%" 2,2A.*(*" )$2)" (*" 0*%H" A2)%'" 6," )6" H%)%@)" )$%"
%72@)" *60'@%" 6B" %''6'" )$%" A%2',%'" $2*" D2H%L" #$%"
quantitative" D%2*0'%*" 2'%" 0*%HL" #$%*%" D%2*0'%*" '%A." 6,"
@6AA%@)(,P" H(BB%'%,)" )%*)" *%)*" J'())%," 1." '%2A" KVV*" (," 2"
).E(@2A" )%2@$(,P_A%2',(,P" %,-('6,D%,)L" G)" J2*" ,%@%**2'."
)$2)")$%*%"A%2',%'*"$2-%"H(BB%'%,)"[email protected]'60,H*"Y(L%L3"H(BB%'"
(," )$%('" B('*)" A2,P02P%Z" )6" )%*)" (B" )$%" *.*)%D" (*" P%,%'2A"
%,60P$" 2,H" ,6)" 2(D%H" )6" 2" *E%@(B(@" *6')" 6B" A%2',%'*L"#$%"
)%*)" *%)" (*" )$%," B%H" (,)6" )$%" *.*)%D" 2,H" )$%" *6A-%H"
2D1(P060*" @2*%*" 2,H" 0,*6A-%H" 2'%" '%E6')%HL" #$%" [email protected]"
'2)%" (*" @[email protected])%HL" #$(*" D%2*0'%" $2*" 1%%," 0*%H" (,"
%-2A02)(,P" *(D(A2'" '%*%2'@$" [email protected]" `2P,%'" %)" 2AL3" >;;rd"
)%*)" *%)" )$2)" @6,*(*)*" 6B" !!O" '%2A" <'21(@" *%,)%,@%*L" #$%"
@2*%*L" #$%" *.*)%D" *0@@%**B0AA."*6A-%H" 9O" @2*%*" 6B" )$%D"
#$%",%7)"*%@)(6,"J(AA"H([email protected]**"2AA"B2(A%H"@2*%*L"
4.1 Evaluation Problems Classification
G," )$(*" *%@)(6,3" J%" H([email protected]**" 2AA" E'61A%D*" J$(@$" )$%"
D2T6'" E'61A%D" (*" ()" (*" H(BB(@0A)" )6" H%)%'D(,%" J$2)" )$%"
! Orthographic match between un-vocalized formsL"
<'21(@" GV#K" $2,HA%*" 0,[email protected](N%H" '2)$%'" )$2," [email protected](N%H"
J'())%," <'21(@" )%7)L" #$(*" A%2H*" *6D%)(D%*" )6" D6'%" )$2,"
@2)%P6'(%*L" #$%" )6)2A" ,0D1%'" 6B" 6@@0''%,@%*" 6B" )$(*"
Orthographic matches produced for Arabic
verbs after relaxing the short vowel to the long one." ?6'"
Orthographic/homographs match between verb
and noun formsL"#$(*"@2*%"$2EE%,*"J$%,"2,"<'21(@"-%'1"
$2*" )$%" *2D%" 6')$6P'2E$(@" B6'D" 2*" 2" ,60,L" ?6'" %72DEA%3"
@6,*(H%'" )$%" J6'H" Z5-""LFd" ()" @2," A%2H" )6" )$'%%" E6**(1A%"
_=2.c2bS)0_" YGS*0*)2(,%HZ" J()$" 0*(,P" )$%" E2))%'," :""<0;"
J6'H" )6" 1%]" !Z" )$%" ,60," Z
" 5-"LF" _)2,<J0A_" YH%2A(,P" J()$_"
%2)(,PZ3" >Z" )$%" E%'B%@)" -%'1" Z
" 5-"LF" _)2,<J2A2_" Y$%_()SH%2A)"
allowing incompatible usage of connected pronouns." ?6'"
6-%'_" H%A(-%'ZL" #$%" )6)2A" ,0D1%'" 6B" 6@@0''%,@%*" 6B" )$(*"
J$%)$%'" )$%" A%2',%'" D%2,)" )$%" J6'H" )6" 1%]" !Z" )$%" E%'B%@)"
The special case of the orthographic match
-%'1" !
" "]S%2" _e2=6D2AS)0_" YGS%DEA6.%HZ" 6'3" >Z" )$%" E%'B%@)"
between the Arabic third person singular perfect verb
-%'1" !""]S%" _=2D(A)0_" YGSJ6'Q%HZ" 1." 0*(,P" (,@6DE2)(1A%"
following the pattern ":"0;2 />afoEal/ and the first person
E'6,60,*" 23" =" Y<A%B3" #%$ZL" #$%" )6)2A" ,0D1%'" 6B"
singular imperfect verb as the word IQ52L"G)"@2,"A%2H")6")J6"
Additional- orthographic matches as a result of
relaxing a constraint."<EEA.(,P")$%"@6,*)'2(,)*"'%A272)(6,"
2,*J%'*" *6D%)(D%*" (,)'6H0@%*" %7)'2" 6')$6P'2E$(@"
Orthographic matches produced for Arabic
verbs after relaxing the long vowel to the short one." ?6'"
Y$%_*$%_()S%D(P'2)%HZ" 1." D2Q(,P" )$%" A6,P" -6J%A" 2" *$6')"
matches" produced
E'61A%D" E'%*%,)*" 2" @$2AA%,P%" )6" GV#KL" #$2)" (*" [email protected]*%"
D(*A%2H(,P" [email protected]" 6'" 2," %''6'" D(P$)" 1%" 6-%'A66Q%HL"
'%*6A-%HL" #$%" E'%B%''%H" D%)$6H" (," GV#K" B6'"
H(*2D1(P02)(,P" D0A)(EA%" '%2H(,P*" 6B" 2" J'6,P" 2,*J%'"
*$60AH" @6,*(H%'" )$%" A(Q%A($66H" 6B" 2," %''6'" 2,H" )$%"
H(BB(@0A)." 6B" @6,@%E)*L" &0)" J()$" )$%" [email protected]" 6B" %''6,%60*"
@6'E0*3" J%" H%E%,H" 6," *6D%" A(,P0(*)(@" *)0H(%*" )$2)"
(,-%*)(P2)%" )$%" A(Q%A($66H" 6B" %''6'*L" [6J%-%'3" )$%"
<$D%H3" WL" <L" >;;;L" " <" [email protected]%" C6DE0)2)(6,2A"
allowing incorrect conjugation of a verb." ?6'" (,*)2,@%3"
<))(23" WL" <L" >;;OL" <," <D1(P0().SC6,)'6AA%H"
)$%" A%2',%'" D%2,)" )$%" J6'H" )6" 1%]" !Z" )$%" (DE%'B%@)" -%'1"
C$2AA%,P%" 6B" <'21(@" B6'" UV4_W#" C6,B%'%,@%3" >;;OL"
&[email protected])%'3" #L" >;;>L" &[email protected])%'" <'21(@" W6'E$6A6P(@2A"
<,2A.N%'" u%'*(6," !L;L" V(,P0(*)(@" /2)2" C6,*6')(0D3"
+,(-%'*()." 6B" 4%,,*.A-2,(23" V/C" C2)2A6P" U6L]"
=ASK2H2,.3" #L" <L" 2,H" [2*$(*$3" WL" <L" !fgfL" <," <'21(@"
?2A)(,3" <L" uL" >;;8L" K.,)2@)(@" =''6'" /(2P,6*(*" (," )$%"
C6,)%7)" 6B" C6DE0)%'" <**(*)%H" V2,P02P%" V%2',(,PL"
[212*$3" UL" >;;9L" V2'P%" [email protected]%" V%7%D%" &2*%H" <'21(@"
W6'E$6A6P(@2A" M%,%'2)(6,L" G," 4'6@%%H(,P*" 6B"
#'2()%D%,)" <0)6D2)(X0%" H0" V2,P2P%" U2)0'%A" Y#<VUS
[%(B)3" #L" !ffgL" /%*(P,%H" G,)%AA(P%,@%]" <" V2,P02P%"
=7EA6()(,P" t,6JA%HP%" ^%E'%*%,)2)(6," (," 2," G,)%AA(P%,)"
#0)6'(,P" K.*)%D" B6'" =,PA(*$" V%7(@2A" =''6'*L" G,"
4'6@%%H(,P*" 6B" )$%" G,)%',2)(6,2A" C6,B%'%,@%" 6,"
C6DE0)%'*" (," [email protected])(6," GCC=" >;;>3" <[email protected],H3" U%J"
U%'16,,%3" vL" >;;8L" U2)0'2A" V2,P02P%" 4'6@%**(,P" (,"
C6DE0)%'S<**(*)%H" V2,P02P%" V%2',(,PL" G," ^0*A2,"
W()Q6-3" %H()6'*3" )$%" 57B6'H" [2,H166Q" 6B"
K$22A2,3" tL3" W2PH.3" WL3" 2,H" ?2$D.3" <L" >;!;L"
W6'E$6A6P(@2A"<,2A.*(*" 6B" GAASB6'D%H"<'21(@" u%'1*" (,"
G,)%AA(P%,)" V2,P02P%" #0)6'(,P" ?'2D%J6'QL" G,"
4'6@%%H(,P*" 6B" ?V<G^KS>83" /2.)6,2" &%2@$3" ?A6'(H23"
KTw61%'P$3" vL3" 2,H" t,0)**6,3" 5L" >;;:L" ?2Q(,P" =''6'*" )6"
<-6(H" W2Q(,P" =''6'*]" W2@$(,%" V%2',(,P" B6'" =''6'"
`2P,%'3" vL3" ?6*)%'3" vL3" 2,H" M%,21()$3" vL" uL" >;;rL" <"
C6DE2'2)(-%" =-2A02)(6," 6B" /%%E" 2,H" K$2AA6J"
<EE'62@$%*" )6" )$%" <0)6D2)(@" /%)%@)(6," 6B" C6DD6,"
>;;r3"4'2P0%3"[email protected]"^%E01A(@3"EE]"!!>S!>!L"
80&#"'=#3/#*+*'>[email protected]'10"%-/"'BC'D/"#,[email protected]''4"#/&0&'1"G%"#*'H'I"<+/&0&'I/"#5+&-"'JA'@'A'
Q(702++8&4"!"="><[email protected]"D,7'1&24"#%20R?"J"A-&B,7$&%P")&"6(:2"S*2T&,-.2U4"K&T27%&:,-%(")&"#-V(7:2%&+24"D&+%(7&20"
/8&$"T2T,7" ,\2:&-,$" $(:,"(V"%8," :2](7"T7(G0,:$" 0&-^,)" %("%8," %2$^"(V"),$&1-&-1"2TT7(T7&2%," :'0%&0&-1'20",H0,27-&-1"
,-B&7(-:,-%$" V(7" ),2V" 0,27-,7$" _KF`?" K'," %(" %8,&7" 8,27&-1" )&$2G&0&%R" :($%" KF" ,\T,7&,-+," )72:2%&+" )&VV&+'0%&,$" &-"
2+a'&7&-1" 2TT7(T7&2%," 0&%,72+R" $^&00$?" LH0,27-&-1" %((0$" +('0)" &-" T7&-+&T0," G," B,7R" '$,V'0" V(7" V2+&0&%2%&-1" 2++,$$" %("
b,GHG2$,)" ^-(b0,)1," 2-)" T7(:(%&-1" 0&%,72+R" ),B,0(T:,-%" &-" KF?" c(b,B,74" ),$&1-&-1" 2TT7(T7&2%," ,H0,27-&-1"
,-B&7(-:,-%$"V(7"KF" &$"2" +(:T0,\" %2$^" ,$T,+&200R"G,+2'$," (V"%8,")&VV,7,-%" 0&-1'&$%&+"G2+^17('-)"2-)",\T,7&,-+,"KF"
(V"%8&$"T2T,7"&$"%b(V(0)Z"_!`"),$+7&G," 2-)")&$+'$$" &$$',$"b,"G,0&,B,"-,,)" %("G," 2))7,$$,)4"V(+'$&-1"(-"%8," 0&:&%2%&(-$"
%82%" 2TT,27" %(" +8272+%,7&.," $,B,720" ,H0,27-&-1" T02%V(7:$" %82%" 82B," G,,-" T7(T($,)" V(7" KFd" _@`" T7,$,-%" 2-)" )&$+'$$"
V2+,H%(HV2+," 02-1'21," (V" %8," #%20&2-" ),2V" +(::'-&%R"
#%20&2-" _#%20&2-HF!`?" #%" &$" &:T(7%2-%" %(" $%7,$$" %82%4" (-" %8,"
b8(0,4" !"#$%&'"()*%"+%,-" ,\T,7&,-+," $,B,7," )&VV&+'0%&,$"
&-" 2+8&,B&-1" 2TT7(T7&2%," 0&%,72+R" 0,B,0$" =" %8('18" (V"
+('7$," h,\+,T%&(-20" 0,27-,7$X" b8(" (B,7+(:," %8,$,"
i&%8" 7,$T,+%" %(" $&1-,7$4" %8," V(00(b&-1" :'$%" G,"
G,12-" b&%8" *%(^(,X$" _!I<>`" T&(-,,7&-1" b(7^" (-"
0,)" %(" ),$+7&G,4" 2-)" %(" 7,+(1-&.," 2$" V'00HV0,)1,)" 8':2-"
-2%'720" 02-1'21,$4" 2" B,7R" 0271," -':G,7" (V" -2%&(-20" *F4"
&-+0')&-1" F#*" 2-)" 200" %8," (%8,7" :2](7" L'7(T,2-" $&1-,)"
02-1'21,$?"/8,"'$,"(V"*F" V(7" &-$%7'+%&(-20"T'7T($,$"82$"
G,,-" ,\T0&+&%0R" 7,+(::,-),)" GR" %8," L'7(T,2-"
D270&2:,-%" _$,," 6,$(0'%&(-" !jH<H!IEE4" 27%?" K`?""
(720Nb7&%%,-" 02-1'21," &-$%7'+%&(-" %(" ),2V" $%'),-%$" 82B,"
G,,-" ),B,0(T,)" &-" $,B,720" +('-%7&,$4" &-+0')&-1" #%20R"
&-" V($%,7&-1" KFX$" 1,-,720" 0&-1'&$%&+" +(:T,%,-+,?"
/8," &-+0'$&(-" (V" *F" b&%8&-" ,H0,27-&-1" T02%V(7:$"
),$&1-,)" V(7" KF" 82$" +(:," 2$" 2" -2%'720" ),B,0(T:,-%" (V"
%(" ),B,0(T" 2TT7(T7&2%," ,H0,27-&-1" ,-B&7(-:,-%$" V(7" KF"
@C! ;#3+*-%,3/*#'
#%"&$"b&),0R"^-(b-"%82%" 200"(B,7"%8,"b(70)"),2V"+8&0)7,-"
2-)4" 02%,74" 2)'0%$4" ,\T,7&,-+," )72:2%&+" )&VV&+'0%&,$" &-"
(-0R" &-" (720" (7" B(+20" 02-1'21," _9F`" G'%" 20$(" &-" b7&%%,-"
02-1'21,?" /8," B2$%" :2](7&%R" (V" ),2V" 0,27-,7$" _KF`"
2+8&,B," 0&%,72+R" 0,B,0$" %82%" 27," :27^,)0R" G,0(b" %8($,"
T7(T,7"(V"%8,&7"8,27&-1"T,,7$"_$,,"2:(-1"(%8,7$" 32$,00&4"
Q2721-2" e" 9(0%,7724" @>><d" f27+&2" e" K,7R+^,4" @>!>d"
%87('18" 2)'0%8(()4" KF" ,\T,7&,-+," ,a'200R" )72:2%&+"
%8," 7&+8" 0,27-&-1" ,-B&7(-:,-%$" :2)," 2B2&02G0," GR"
2)B2-+,)" :'0%&:,)&2" %,+8-(0(1&,$4" :($%" -(%2G0R"
,H0,27-&-1" ,-B&7(-:,-%$?" gTT7(T7&2%," b7&%%,-" 02-1'21,"
$^&00$" 27," &-" V2+%" '-a',$%&(-2G0R" 2" T7,H7,a'&$&%," V(7"
,$T,+&200R" +(:T0,\" )'," %(" %8," B,7R" )&VV,7,-%" 02-1'21,"
G2+^17('-)" 2-)" ,\T,7&,-+," ),2V" T,7$(-$" :2R" 82B,"
T7,V,77,)" :,2-$" (V" +(::'-&+2%&(-4" (7" F!?" #%" &$" &-" V2+%"
-,+,$$27R" %(" )&$%&-1'&$8" %b(" 17('T$Z" _!`" %8($," b8(" '$,"
#%20&2-" *&1-" 02-1'21," _F#*`4" %8," B&$'20H1,$%'7204"
"O(7"7,2$(-$"0&-^,)"%("%8,"),:(172T8R"(V"),2V-,$$" 2-)"%("%8,"
+(:T0,\" $(+&(0&-1'&$%&+" 2-)" +'0%'720" T7(T,7%&,$" (V" $&1-,)"
02-1'21,$" %8,"(G$,7B2%&(-$"b,":2^,"8,7,"b&%8" 7,$T,+%"%("#%20R"
+2-" G," ,2$&0R" ,\%,-),)" 2+7($$" -2%&(-$" 2-)" +'0%'7,$4" b&%8" %8,"
-,+,$$27R" +82-1,$" +(-+,7-&-1" %8," -2%&(-20" $&1-,)" 2-)"
&!"&3( &-(3$6*7(L*!(!")/01!( $3( CG$6*30!)DE( (I3!!( )13-( &#!(
!"#$%$&'( )*+,-.( $/01$2)&!( 3-/!( /)4-.( 2-*2!0&5)1'(
8*( 3!2&$-*( 9( :!( 0.!3!*&( )*+( +$32533( -*6-$*6(
8"! #7'0$%%9)10+31$)'+):'1)/3*9031$)+,'
!"! #$%&'(&)&*+,'-*$.,&%/'0$)0&*)1)('
53!+'( -*!( )++$&$-*)1( 1$/$&)&$-*( -=( /)*?( 25..!*&( !==-.&3(
&-:).+3( $*&!6.)&$*6( GA( /)&!.$)13( $*&-( !>1!).*$*6(
01)&=-./3( 2-*2!.*3( )( =)$15.!( &-( .!2-6*$V!( $/0-.&)*&(
&#!( 0.-%1!/3( 0-3!+( %?( &#!( +.)/)&$2)11?( $*35==$2$!*&(
.!=!.!*2!( &--13'( )*+( -;!.)11( 1$*65$3&$2( +!32.$0&$-*3'( &#)&(
8&( /53&( =$.3&( %!( .!2)11!+( &#)&( 5))$?/$52"$ )5'(35("!$
06%+&3%$ 5$ 026%%"'$ %25,6%6&'7( Y-.!( $/0-.&)*&1?( =.-/( )(
.!3!).2#( 3&)*+0-$*&'( )*+( !;!*( &#-56#( )1/-3&( 9R( ?!).3(
#);!( 0)33!+( 3$*2!( &#!( /-+!.*( 3&5+?( -=( GA( #)3( %!65*'(
.!3!).2#!.3( 3&$11( #);!( *-&( =-5*+( )*( )6.!!/!*&( -*F( I)J(
)*+'( -*( &#$3( %)3$3'( +!;!1-0( )00.-0.$)&!( .!=!.!*2!( &--13(
I!767( +$2&$-*).$!3'( 6.)//).3'( 53)6!>%)3!+( 2-.0-.)( !&2J(
&#)&( ).!( 5*[5!3&$-*)%1?( *!2!33).?( =-.( %-&#( &#!(
2-//5*$&$!3( -=( 3$6*!.3'( )*+( &#!( !"01-$&)&$-*( -=( GA( =-.(
!+52)&$-*)1( )*+( $*3&.52&$-*)1( 05.0-3!3( I3!!( W5")2( P(
B*&$*-.-( `$VV5&-'( KR\RH( ^).2$)'( KRRaH( KR\RH( ^).2$)( P(
8&( $3( *-&( &.$;$)1( &-( 3&.!33( &#)&'( )1&#-56#( -5.(
*-&( #);!( )*?( /-*-1$*65)1( +$2&$-*).?( -.( 6.)//).'( =-.(
)*?( -=( &#!( GA( &#)&( #)3( %!!*( &-( +)&!( $*;!3&$6)&!+( ( >( *-&(
0.-0-3)13( )$/!+( )&( !"01-$&$*6( GA( =-.( $*3&.52&$-*)1(
05.0-3!3( :-51+( +!+$2)&!( 0).&$251).( 2).!( $*( /)D$*6(
!"01$2$&( &#!( /-+!13( -=( GA( !1!/!*&3( )*+( +$32-5.3!( &#!?(
)+-0&7( M#$3( )00!).3( !30!2$)11?( *!2!33).?( %!2)53!'( )3(
G)11)*+.!( IKRRTJ( :!( :$11( .!=!.( &-( &#!3!( /-+!13( )3(
-=( /-+!13( #$6#1$6#&( 0.$/).$1?( &#!( 3&.52&5.)1( 3$/$1).$&$!3(
%!&:!!*(GA( )*+(ZA'(:#$1!( &#!( 3!2-*+(-*!3((5*+!.32-.!(
&#)&'( $*( )++$&$-*( &-( $/0-.&)*&( 3$/$1).$&$!3( &#!.!( ).!(
![5)11?( .!1!;)*&( +$==!.!*2!3( %!&:!!*( GA( )*+( ZA7
b$&#$*( &#!( 1$/$&3( -=( &#!( 0.!3!*&( 2-*&!"&'( :!(
$1153&.)&!( 3-/!( -=( &#!( 2.52$)1( +$==!.!*2!3( %!&:!!*( &#!3!(
&:-( &?0!3( -=( /-+!13( $*( .!1)&$-*( &-( &#!( 0.-%1!/( -=(
+!32.$0&$-*3( -=( BGA( 0.-;$+!+( %?( G&-D-!( I\caRJ( )*+(
35%3![5!*&1?( Q1$/)( P( d!1156$( I\cTcJ'( )33$/$1)&$-*$3&(
/-+!13( )335/!(&#)&(GA(2-*3&$&5!*&3(5*$&3().!("!!"'%65))E$
*&#1525A)"$%&$4/$0&2,!'( )*+( ).!( 126#526)E$!"F3"'%65))E$
&2(5'6G",$ 6'$ %6#"7( M#!3!( /-+!13( ).!( 3&$11( 1).6!1?(
0.!;)$1$*6( $*( 25..!*&( .!3!).2#( -*( GA( )*+( #);!( %!!*( =-.(
&#!( /-3&( )2.$&$2)11?( )+-0&!+( $*( !+52)&$-*)1( )001$2)&$-*3(
+!3$.!+(01)&=-./3().!(-=&!*( 453&(C3D!&2#!+E'()*+(0.-;$+!(
$*2153$-*( -=( GA( ;$+!-3( :$&#( GA( &.)*31)&$-*3( -.(
!"01)*)&$-*3( -=( &#!( :.$&&!*( &!"&3( =-5*+( $*( )( 30!2$=$2(
/)*?( !"$3&$*6( -.( 01)**!+( 01)&=-./3( )00!).( &-( %!(
+!3$6*!+( 0.$/).$1?( =-.( @A( :#-( D*-:( GA'( %5&( !""#$ %&$
L*( &#!( :#-1!'( &#!.!( )00!).3( &#53( &-( %!( )( 6!*!.)1(
&.!*+( &-:).+3( 2.!)&$*6( )*+( $*215+$*6( GA( /)&!.$)13( =-.(
$/01!/!*&$*6( :.$&&!*( &!"&>%)3!+( !*;$.-*/!*&37( M#!(
2-*&!*&3( !*2-+!+( $*( :.$&&!*( 1)*65)6!( ).!( /)+!( /-.!(
)22!33$%1!( ( &-( I3$6*$*6J( @A( ;$)( GA( &.)*31)&$-*3( )*+(
!"01)*)&$-*37( L&#!.( !")/01!3( ).!( &#!( 01)&=-./( 2.!)&!+(
:$&#$*( &#!( 0.-4!2&( @NBA O (=-.( &!)2#$*6( =-.!$6*(
;-2)1>:.$&&!*( 1)*65)6!3( &-( @A'( -.( &#!( -*!( +!3$6*!+( %?(
B( =)$.1?( 1).6!( %-+?( -=( :-.D( #)3( %!!*( +!+$2)&!+( &-(
&#!( +!;!1-0/!*&( -=( 3$6*$*6( );)&).3( &-( %!( )++!+( &-( &#!(
3$6*!.3( I3!!( =-.( !")/01!( N=&#$/$-5( P( <-&$*!)'( KRRTH(
KRRUH( Q).-5V$3'( W).$+)D$3'( <-&$*!)( P( N=&#$/$-5'( KRRT'(
Y)*?( 0.-4!2&3( =-.( .!)1$V$*6( 3$6*$*6( );)&).3( !"#$%$&(
M#!?(3&).&'(=-.(&#!(/-3&'(=.-/(ZA(:.$&&!*( &!"&3( )*+()$/(
&#53( $6*-.!( -.( 5*+!.!3&$/)&!( &#!( 0.-%1!/( -=(%25'!)5%6'($
-2&#$ !6('$ %&$ 9&*5):026%%"'$ %";%!7( M#!.!( ).!( -*1?( =!:(
0.-4!2&3( &#)&( !"01$2$&1?( )$/( )&( .!)1$V$*6( 3$6*$*6( );)&).3(
=5*2&$-*$*6( $*( %-&#( +$.!2&$-*3'( $7!7( =.-/( 3$6*( &-( 30!!2#(
)*+,-.( )13-( :.$&&!*( &!"&3'( )*+( =.-/( 30!!2#( )*+( :.$&&!*(
(G!!( =-.( !"7F( <',696,35)!$ 0+&$ 52"$ ."5-$ &2$ =52,$ &-$ ="526'('(
W!*&!.( =-.( B33$3&$;!( M!2#*-1-6?( )*+( N*;$.-*/!*&)1( B22!33(
<>?$ (36,")6'"!$ -&2$ ."9")&16'($ @**"!!6A6)"$ /"52'6'($
/1H( B"'"25)$ (36,")6'"!$ -&2$ <'*)3!69"$ C')6'"$ D3)%325)$ D&'%"'%7$
W)*)+$)*( _!&:-.D( =-.( 8*2153$;!( W51&5.)1( N"2#)*6!'(
,!'),# )!# 7&# T3))&($&$# )!U# 3($# +'!.&,,&$-# 3($# !)<&'#
%("!'43)%!(# ,)&44%(1# "'!4# )<&# %()&'3.)%!(# =%)<# !)<&'#
"&//!=# ,)0$&(),# 3($^!'# =%)<# 3# )0)!'# @&515# %(# &8.<3(1&,#
!'%&()# )<&%'# 9%,03/# 3))&()%!(-# )!# +'!.&,,# 7!)<# ?%($,# !"#
%("!'43)%!(-# )<&# )=!# )3,?,# .3((!)# 7&# .3''%&$# !0)# 3)# )<&#
&8+/3(3)!'*# 43)&'%3/,# $%,+/3*&$# !(# )<&# .!4+0)&'# ,.'&&(#
)<&# ,34&# 43)&'%3/,# =<%.<# )<&*# 40,)# 3/=3*,# $&.!$&#
+'%43'%/*# 9%3# 9%,%!(# @&515# 7*# /%+2'&3$%(1# ,+!?&(#
0))&'3(.&,-# +'!.&,,%(1# 3# 4&,,31&# %(# :;-# '&3$%(1#
D<%,# %,# 40.<# 0(/%?&# =<3)# <3++&(,-# %(# )<&# ,34&#
.!!+&'3)%9&# /&3'(%(1# ,%)03)%!(-# "!'# <&3'%(1# /&3'(&',# =<!#
.3(# ,%40/)3(&!0,/*# +'!.&,,# .!440(%.3)%9&# 4&,,31&,#
.!(9&*&$# )<'!01<# ,!0($,# 3($# "'&&/*# !'%&()# )<&%'# 9%,03/#
3))&()%!(#)!#!)<&'# )*+&,#!"# %("!'43)%!(#.!4%(1#"'!4# )<&#
.!4+0)&'# ,.'&&(5# S&9%,%(1# 3(# 3++'!+'%3)&# &2/&3'(%(1#
&(9%'!(4&()# "!'# S;# )<0,# '&W0%'&,# 3..0'3)&# 3(3/*,&,# !"#
6(# .!()'3,)-# (!(# 3,,%4%/%3)%!(%,)# 4!$&/,-# 73,&$# !(#
&8)&(,%9&# 3(3/*,&,# !"# :;# $%,.!0',&-# ,<!=# )<3)# :;#
.!(,)%)0&()# &/&4&(),# .3((!)# 7&# &3,%/*# 3,,%4%/3)&$# )!#>;#
0(%),5# 6(# 3$$%)%!(# )!# =!'$2/%?&# &/&4&(),-# :;# +!,,&,,#
.!4+/&8-# <%1</*# %.!(%.# ,)'0.)0'&,# @A6:B# )<3)# 3'&#
,%40/)3(&!0,/*# !'13(%C&$# %(# 3# 40/)%/%(&3'# # "3,<%!(# )<3)#
3($#(!(2=!'$2/%?&#0(%),# 3'&# 43'?&$#7*#(!(#43(03/# 3($#
43(03/# 3')%.0/3)!',-# 4!,)# (!)37/*# 7*# 4!$3/%)*2,+&.%"%.#
&*&213C&# +3))&'(,E# =<&(# +'!$0.%(1# =!'$2/%?&# 0(%),-# )<&#
,%1(&'F,# 13C&# %,# $%'&.)&$# )!=3'$,# )<&# %()&'/!.0)!'-#
=<&'&3,# =<&(# +'!$0.%(1# A6:# )<&# ,%1(&'F,# 13C&# %,#
G%10'&# H# 7&/!=# +'!9%$&,# I0,)# )=!# %//0,)'3)%9&#
&834+/&,# !"# 3# =!'$2/%?&# 0(%)# @H3B# 3($# 3# (!(2=!'$2/%?&#
A6:[email protected]#)<3)#3'&#.!44!(/*#"!0($#%(#:;#$%,.!0',&5#D<&#
&834+/&,# 3'&# )3?&(# "'!4# ;6:# $%,.!0',&# 70)# 3# =&3/)<# !"#
,%4%/3'# &834+/&,# .3(# 7&# "!0($# %(# 3//# :;# @"!'# '&/&93()#
$%,.0,,%!(,-# ,&&# &,+&.%3//*# J083.-# KLLLM# J083.# N#
2"! 3-4'/5%)5+'678+,*+/+5)9&(*$($,:&'()',5)
G%10'&# K# ,.<&43)%.3//*# %//0,)'3)&,# 3# 4!$&/# "!'# 3(#
&2/&3'(%(1# +/3)"!'4# +'!)!)*+&# @];PPB# +'!)!)*+&# =&# 3'&#
.0''&()/*# $&9&/!+%(1# =%)<%(# )<&# "'34&# !"# 3# (3)%!(3/#
@HB# %4+'!9%(1# 40/)%/%(103/# ^# 40/)%4!$3/# &2/&3'(%(1#
&(9%'!(4&(),# "!'# S;# @A%1<# :.<!!/# 3($# _(%9&',%)*#
,)0$&(),[email protected]#+'!4!)%(1#)<&%'#/%)&'3.*#,?%//,#`5##
G%10'&#HE#V!'$2/%?&#,%1([email protected]#3($#A6:[email protected]#
D<&# +!%()# =&# =%,<# )!# ,)'&,,# <&'&# %,# )<&# "!//!=%(1E#
J083.# N# O()%(!'!# P%CC0)!-# KLHLM# :3//3($'&-# KLLXM# S%#
\&(C!# N# 3/-# KLLZB5# 6)# ,<!0/$# )<0,# 7&# &9%$&()# )<3)# :;#
$&,.'%+)%!(,# !"# 3(*# ,!')-# %(./0$%(1# 4!$&/%,3)%!(,# 9%3#
,%1(%(1# 393)3',-# .3((!)# $%,'&13'$# 3,# T43'1%(3/U# )<&,&#
:;#&/&4&(),# 3'&#"!'#)<&#4!,)# TI0,)# /%?&#>;#=!'$,U# )<0,#
&8<%7%)# ,&9&'&# /%4%)3)%!(,# )<3)# (&&$# )!# 7&# '&.!1(%C&$-#
!"! #$%&'()'**+,*$-,).'**+/,%)$,)01))
O(# 3++'!+'%3)&# &2/&3'(%(1# &(9%'!(4&()# "!'# S;# 3)# /3'1&-#
%5&5# "!'# 7!)<# ,%1(&',# 3($# (!(2,%1(&',-# 40,)# )3?&# %(# $0&#
+'&9%!0,# '&,&3'.<5# V<&(# =!'?%(1# =%)<# 3# .!4+0)&'-# )<&#
9%,03/# 3))&()%!(# +3))&'(,# +'!+&'# !"# S;# 43'?&$/*# $%""&'#
"'!4# )<!,&# !7,&'937/&# %(# <&3'%(1# /&3'(&',5# D<%,# %,# )'0&#
&,+&.%3//*#%(#,%)03)%!(#!"#.!!+&'3)%9&# /&3'(%(1#=<&'&#)<&#
,)0$&(),# 40,)# ,%40/)3(&!0,/*# 3))&($# )!# 9%,03/#
%("!'43)%!(# .!(.&'(%(1# ='%))&(# 43)&'%3/,# !"# $%""&'&()#
D<&# ];;P# 4!$&/# %//0,)'3)&$# %(# G%10'&# K# 3%4,# 3)#
)'3(,2$%,.%+/%(3'*# .!4+&)&(.&,# 3.'!,,# )<&# "%&/$,# !"E# 2:;#
/%(10%,)%.,M# 2,+&.%3/# 3($# 7%/%(103/# &$0.3)%!(# "!'# S;M#
240/)%4&$%3# )!!/,# "!'# S;# 3($# <&3'%(1# /&3'(&',M#
2<043(2.!4+0)&'# %()&'3.)%!(# 3($#9%,03/# /&3'(%(1# %(# &2/&3'(%(1#
&(9%'!(4&(),M# 2"!'&%1(# /3(1031&# )&3.<%(1# 4&)<!$!/!1%&,# %(#
!"#$%!&'()* +,#* -'&'+.+'!(/* 0$!0#$* !1* &.(2* #3-#.$('()*
.* ('&01-',#'/'() '12'&/,$,3) 42&#0"/!:* @,#* 0-.+1!$&* '/*
)$!6(5#5* 60!(* +,#* '5#.* +,.+* $#/#.$%,* .'&#5* .+* %$#.+'()*
6/#16-* 0$!56%+/* 1!$* 5#.1* 6/#$/* (##5/* +!* 7#* 5#"#-!0#5?*
5#.1* 0#!0-#:* A%%!$5'()-2?* .(5* $.+,#$* 5'11#$#(+-2* 1$!&*
9,.+* '/* $#0!$+#5* 1!$* &.(2* 0./+* .(5* !()!'()* 0$!<#%+/*
5'$#%+#5* +!* 5#.1* 0#$/!(/?* +,'/* '5#.* )6'5#/* !6$* .%+6.-*
B0$!<#%+* &.(.)#&#(+C* 0$.%+'%#:* @,#* 0$!<#%+3-#.5#$* +#.&*
'(%-65#/* /'D* 5#.1* %!--#.)6#/* 9,!* 4&/#$-$4&#') &.)
/'.'&/-6) 4/"8'-#?* (!+* !(-2* ./* =#(5* 6/#$/>* !$* =#(5*
#".-6.+!$/>*!1*+,#* -.()6.)#*$#/!6$%#/* .(5*5'5.%+'%* +!!-/*
.$#* ,'),-2* 0$!1'%'#(+* '(* EFGH* +,$##* -#.$(#5* +!* /')(* '(*
'(1.(%2?* 9'+,'(* 5#.1* /')('()* 1.&'-'#/?* +,$##* .5* 5'11#$#(+*
.)#/?* ./* '+* ,.00#(/* +!* &!/+* 5#.1* /')(#$/* 4/##* I6D.%* J*
A(+'(!$!* K'LL6+!?* MNON8P* +,#2* 0!//#//* 5'11#$#(+* 5#)$##/*
T#* )'"#* ,#$#* <6/+* 1#9* 0$.%+'%.-* #D.&0-#/* !1* +,#*
%$6%'.-* '("!-"#&#(+* !1* !6$* 5#.1* %!--#.)6#/:* @,#* %,!'%#*
!1* +,#* =%!(+#(+/>* 9#* 9'--* 1!%6/* !(* 1!$* 5#"#-!0'()* +,#*
9'--* #"#(+6.--2* 7#* 0$#/#(+#5* +!* VE* !(* +,#* ;EKK* 4#:):*
/0!Q#(* .(5* 9$'++#(* +#D+/?* /0##%,3+!3+#D+* %.0+'!(/?* GE*
+$.(/-.+'!(/*.(5* #D0-.(.+'!(/?*)$.0,'%*'--6/+$.+'!(/8?*9./*
&.5#* 1!--!9'()* #D+#(/'"#* 5'/%6//'!(/?* .&!()* +,#* 5#.1*
.(5* +,#* ,#.$'()* &#&7#$/* !1* +,#* 0$!<#%+?* !1* 5'11#$#(+?*
.-+#$(.+'"#* 0!//'7'-'+'#/:* W6$* 5#.1* %!--#.)6#/* .$#*
%!(+$'76+'()* +!* +,#* 0$#0.$.+'!(* !1* .53,!%* X6#/+'!((.'$#/*
.(5* +!* .* +,!$!6),* #D.&'(.+'!(* .(5* #".-6.+'!(* !1*
-.()6.)#* +./Q/?* &.+#$'.-/?* &6-+'&#5'.* +#%,(!-!)'#/* 9#*
.$#* 6/'()* .(5R!$* .$#* %6$$#(+-2* 5#"#-!0'()* 4'(%-65'()* 1!$*
!1* 5#.1* %!--#.)6#/* #(/6$#/* +,.+* +,#* #(5* 0$!56%+/* !1* !6$*
%$'5>?* 4/##* Y')6$#* M8* Z* ':#:* .* %!&0-#D* %!(1')6$.+'!(* !1*
#D0#$'#(+'.-* .(5* %!(%#0+6.-* Q(!9-#5)#* +,.+* '/* /+$!()-2*
J* \.,.(?* O]]^8?* .(5?* !(* +,#* !+,#$* ,.(5?* #11#%+'"#-2*
W(#* !+,#$* '&0!$+.(+* #-#&#(+* !1* +,#* 5#.13,#.$'()*
%!--.7!$.+'!(*9#*.$#*0$!&!+'()*9'+,'(* +,#*0$!<#%+* '/* +,#*
1!--!9'()H*.--* +,#*,#.$'()*&#&7#$/*!1*+,#*0$!<#%+3-#.5#$*
!1* +,#* 1'"#* 4,#.$'()8* 2!6()* $#/#.$%,#$/* !1* +,#* !+,#$*
$#/#.$%,* +#.&/* '("!-"#5* '(* +,#* 0$!<#%+* .$#* %6$$#(+-2*
.++#(5'()* %-.//#/* +!* -#.$(* EFG:* T#* .$#* .-/!* /##Q'()*
16$+,#$* %!--.7!$.+'!(/* 9'+,* 5#.1* #D0#$+/* 9,!* 6/#* F+.-'.(*
A/* (!+#5* .7!"#?* &!/+* #3-#.$('()* 0-.+1!$&/* 1!$* VE*
/,!9(* '(* Y')6$#* M?* !6$* ;EKK* .'&/* .+* &((/'..$,3) #6')
,''(.) "0* !"#$) .$3,$,3) ;:<=1:>?) &,() ,",) .$3,$,3)
;<#&2$&,1:>?)9::* F(* 1.%+?* ./* .-/!* (!+#5* .7!"#?* 7!+,* /6%,*
('%'2"4!',#:* W6$* $#/#.$%,* .'&/* .+* ./%#$+.'('()* +,#*
VE* .(5* +,#* #D+#(+* +!* 9,'%,* +,#/#* .$#?* !$* .$#* (!+*
%!&0.$.7-#:* T#* #D0#%+* +,.+* +,#* * $#/6-+/* !1* !6$*
'("#/+').+'!(/* 9'--* 0$!"'5#H* 4.8* (!"#-?* $#-#".(+*
)$!60/* !1* VE?* %-.$'12'()* .-/!* 9,#+,#$?* .(5R!$* ,!9*
+,#* .%X6'/'+'!(* .(5* 6/#* !1* /0!Q#(R9$'++#(* F+.-'.(P* 478*
'&0!$+.(+* '(5'%.+'!(/* !(* ,!9* 9#* &.2* (##5* +!*
7#* %$#.+#5* 1!$* 0$!&!+'()* -'+#$.%2* 5#"#-!0&#(+* '(* VE*
9'+,* EFG3EO* ./* %!&0.$#5* +!* VE* 9'+,* F+.-'.(3EO:* Y!$*
#D.&0-#?* $#%.--'()* 9,.+* (!+#5* '(* /#%+'!(* `?* '+* 9!6-5* 7#*
0-.6/'7-#* +!* ,20!+,#/'L#* +,.+?* 1!$* VE* 9'+,* EFG3EO?* +,#*
/'&6-+.(#!6/-2* !$).('L#5?* &6-+'-'(#.$* -'()6'/+'%*
/+$6%+6$#/* +,.+* .$#* ,'),-2* /0#%'1'%* !1* +,#'$* GE?* (.&#-2*
[FG?*&.2*(#).+'"#-2* '(+#$1#$#*9'+,* +,#* -#.$('()*!1*&!$#*
/#X6#(+'.--2* !$).('L#5* -'()6'/+'%* /+$6%+6$#/* +,.+* .$#*
/,!6-5*7#*.7/#(+*'(*VE*9'+,*F+.-'.(3EO:* [!9#"#$?*+,#/#*
,20!+,#/#/* %.(* 7#* #".-6.+#5* !(-2* 72* %!&0.$'()* +,#*
A* /67/+.(+'.-* (!"#-+2* !1* +,#* &6-+'-'()6.-* R*
&6-+'&!5.-* ;EKK* #3-#.$('()* #("'$!(&#(+* 9#* .$#*
5#/')('()* %!(%#$(/* +,#* 6/#?* 0$#/#(+.+'!(* 4,#(%#?* 72* +,#*
+9!* &.<!$* +20#/* !1* 2&,37&3') /'."7/-'.* +,.+* 9'--* 7#*
#&0-!2#5*1!$*0#5.)!)'%.-*06$0!/#/?*(.&#-2H*F+.-'.(* .(5*
Y')6$#* M?* 9$'++#(* +#D+/* 9'--* 7#* 0$!"'5#5* (!+* !(-2* '(*
5/$##',)<#&2$&,)4+,#* +.$)#+* -.()6.)#* '(* 9,'%,* 9#* .'&* +!*
0$!&!+#* VE* -'+#$.%2* 5#"#-!0&#(+8?* 76+* .-/!* '(* 5/$##',)
Y!$* +,#* '(/+$6%+'!(.-* &.+#$'.-/* +!* 7#* 0$!"'5#5* '(*
5/$##',) <#&2$&,?* )6'5#5* #./'1'%.+'!(* 0$!%#56$#/* 9'--* 7#*
6/#5* +!* 1.%'-'+.+#* VEC/* .%%#//* +!* +#D+6.-* &.+#$'.-/P*
/0##%,3+!3+#D+* %.0+'!('()* +!!-/* 9'--* )$.(+* "'/6.-*
.%%#//'7'-'+2* +!* &.+#$'.-/* )'"#(* '(* .4"@',) <#&2$&,P*
-'()6'/+'%* .%%#//'7'-'+2* +!* +,#* %!(+#(+/* .(5* 1!$&/* !1*
1!$* VE* 9'+,* EFG3EO?* "'.* .00$!0$'.+#* "'5#!/* 0$!"'5'()*
+$.(/-.+'!(/* .(5* #D0-.(.+'!(/* '(* 41.%#3+!31.%#8* EFG:* V6#*
+,$##* +20#/* !1* -.()6.)#* $#/!6$%#/?* 9,'%,* 9'--* 7#*
'&0-#&#(+#5* 5$'"'()* !(* .* %!(/!-'5.+#5* #D0#$'#(%#* '(*
".$'.7-#* .(5* !(-2* 0.$+'.--2* -'(Q#5* +!* +,#* #56%.+'!(.-* -#"#-*
%!--#)#* )$.56.+#?* !(#* _('"#$/'+2* /+65#(+?* +,$##* ,'),* /%,!!-*
*Y!$* /0.%#* -'&'+/* 9#* %.(* !(-2* &#(+'!(* ,#$#* +,#* B)#(#$.-*
%!(+#(+/C*!1*+,#*;EKKH*9#*9'--* 1!%6/*!(*+,#*,'/+!$2?*#"!-6+'!(*
D)( >#'$( ,-( *)B)#->( D",="$( -&/( >/-A)+,( D"##( >/-B"*)( &4(
;&+=( $))*)*6( $-B)#( "$.-/;',"-$( .-/( '( !),,)/(
&$*)/4,'$*"$%( -.( =-D( B"4&'#( "$.-/;',"-$( $))*4( ,-( !)(
4>',"'##<( '$*( ,);>-/'##<( 4,/&+,&/)*( "$( )J#)'/$"$%(
)$B"/-$;)$,4( .-/( 016( '4( +-;>'/)*( ,-( =)'/"$%( &4)/4?(
\=)4)( '$'#<4)4( D"##( '#4-( '##-D( ,-( &4( '4+)/,'"$( D=),=)/(
#'$%&'%)( !'+L%/-&$*( '++)44( '$*( &4)( B"4&'##<( %/-&$*)*(
,=)( a*)'.( D-/#*( B")D`6( D)( ,="$L( ,=',( D)!J!'4)*(
4(*+'4,5')% +,67-.*.8',$% )-5% *,)0-'-8% +..*$( .-/( '(
*)'.J+)$,)/)*( ]1ZZ( ;'<( !)( 4"%$"."+'$,#<( ";>/-B)*(
";>#);)$,"$%( '( B"4&'##<J!'4)*( %/'>="+( "$,)/.'+)?(
'++)44( '$*( &4)( )'4"#<( '$*( a"$,&","B)#<`( !)+'&4)( ,)C,&'#(
"$.-/;',"-$(2D="+=("4(*".."+&#,(.-/( ,=);:("4(4"%$"."+'$,#<(
2"+-$"+:( "$.-/;',"-$?( \="4( )$,'"#4( ,=)( $))*( -.( +/)',"$%( '(
$)D6( %/'>="+( D'<( .-/( !/-D4"$%( D)!( >'%)46( '$*(
I-/( ,=)( $',&/'#6( *)'.J>)+&#"'/( B"4&'#( D'<( -.(
,=)-/")4( -.( ,49.5',5% 6.8-'+'.-%'$*( $+.0:+,**'-8( 21'L-..(
@",="$( ,="4( >'/'*"%;6( ,=)( #)'/$"$%( >/-+)44( +'$( !)(
;),'>=-/"+'##<( /)>/)4)$,)*( '4( '( 4,-/<( ,=',( "$+#&*)4( ,=)(
&4)/('4( ,=)(;'"$( +='/'+,)/?(G++-/*"$%#<6(,=)(&4)/(a#"B)4`(
,=)( #)'/$"$%( >/-+)44( !<( >=<4"+'##<( )C>)/")$+"$%( ",( M( "$(
>#'+)6( '( 4)_&)$+)( -.( 4)B)/'#( #)'/$"$%( 4,)>46( '$*( '( ."$'#(
%-'#?( F&+=( '(;),'>=-/(4));4( ,-(!)('(B)/<("$,&","B)(D'<(
-.( /)>/)4)$,"$%( ,=)( #)'/$"$%( )$B"/-$;)$,?( S-/)-B)/6( ",(
!"#"$%&'#( )*&+',"-$( .-/( 01( 23'4)##"( 5( '#6( 7889:6( '$*(
;-/)( %)$)/'##<( "$( #'$%&'%)( ,)'+="$%( ;),=-*-#-%")46( '4(
*),'"#)*( "$( -&/( %/'$,( >/->-4'#?( @)( *)4+/"!)( !/").#<( ,=)(
/',"-$'#)6( );>"/"+'#( %/-&$*46( '$*( ;'A-/( '";4( -.( -&/(
'( D/",,)$( ,/'*","-$?( I-/( 01( D",=( 1EFJ1K6( ,=)( #'+L( -.( '(
D/",,)$( .-/;( -.( ,=)"/( -D$( F1( ;'<( D)##( !)( -$)( -.( ,=)(
-!4,'+#)4( -$( ,=)( /-'*( ,-D'/*4( '+=")B"$%( '>>/->/"',)(
*-)4( ='B)( '( D/",,)$( ,/'*","-$( !&,( "4( '#4-( ,<>-#-%"+'##<(
'!-B)( -$( F1( NEF:?( O)+)$,( /)4)'/+=( 4=-D4( ,=',( E,'#"'$(
4"%$)/4( +'$(>/-.",'!#<(&4)(F"%$( @/","$%(2F@:6('(%/'>="+(
4<4,);( >/->-4)*( !<( F&,,-$( 2KPPP:( .-/( D/","$%( F16( .-/Q(
,=)( ."/4,( ,";)( "$( ,=)( ="4,-/<( -.( ,="4( F16( ,)C,4( +-$+)"B)*(
*"/)+,#<( "$( D/",,)$( 1EF( 2F@( ='4( !))$( '*'>,)*( .-/( ,=)4)(
>&/>-4)4( ,-( 1EF:?( S-/)( ";>-/,'$,#<( .-/( ,=)( >/)4)$,(
*"4+&44"-$6( ,="4( /)4)'/+=( 4=-D4( ,=',6( /)#<"$%( -$(
[email protected])$+-*)*( 1EF( ,)C,46( 4"%$)/4( +'$( '&,-$-;-&4#<(
>)/.-/;( ;)'$"$%.&#( +-;>'/"4-$4( !),D))$( 1EF( '$*(
4>-L)$TD/",,)$( E,'#"'$6( ',( '##( 4,/&+,&/'#( #)B)#4( J( #)C"+'#6(
U$( ,="4( !'4"46( 4"%$)/4( +'$( .-/;&#',)( ;),'+-%$","B)(
+-;>'/)*( ,-( 4>-L)$TD/",,)$( E,'#"'$6( '$*( ;-/)( %)$)/'##<(
-$( ,=)( /)#',"-$4( !),D))$( V-/'#",<W( -/( .'+)J,-J.'+)( B4?(
D/",,)$( +-;;&$"+',"-$6( "$( '( D'<( ,=',( ='4( $)B)/( !))$(
>-44"!#)6( .-/( ,=);6( D",=-&,( /)#<"$%( -$( '( D/",,)$(
+/&+"'#( /-#)( ,=',( ;),'+-%$","B)( '$*( ;),'#"$%&"4,"+( 4L"##4(
$-,-/"-&4#<( >#'<( "$( ,=)( *)B)#->;)$,( -.( #",)/'+<( 4L"##46(
,=)4)( /)4)'/+=( ."$*"$%4( ='B)( ;-,"B',)*( &4( ,-( .&/,=)/(
)C>)/";)$,( D/",,)$( 1EF6( -$( -&/( ]1ZZ6( '4( '( >-,)$,"'##<(
B)/<( >-D)/.&#( >)*'%-%"+'#( ,--#( .-/( >/-;-,"$%( #",)/'+<(
'!"#",")4?( [email protected])$+-*)*6( D/",,)$( /)>/)4)$,',"-$4( -.( 1EF(
5( '#6( 788[:6( >'B"$%( ,=)( D'<( .-/( ;-/)( '>>/->/"',)(
;-*)#"4',"-$4( D="+=( ;'<( !)( &4)*( .-/( !-,=( %)$)/'#(
*)4+/">,"B)( >&/>-4)46( '$*( .-/( ";>#);)$,"$%( ,=)( &4)( -.(
@)( $-,)*( "$( 4)+,"-$( ^( ,=',( !"#$% &'$()*% )++,-+'.-%
/)++,0-$% '-% 123% ;'<( 4"%$"."+'$,#<( *"..)/( ./-;( ,=-4)( -.(
=)'/"$%( #)'/$)/4?( U$)( -,=)/( '**","-$'#( $-B)#,<( -.( -&/(
>/-A)+,( +-$+)/$4( ,=)( &4)( -.( )<)J,/'+L"$%( )_&">;)$,( .-/(
'$'#<X"$%( 01`4( B"4&'#( ',,)$,"-$( >',,)/$46( '$*( +-;>'/)(
,=);( D",=( ,=-4)( -.( =)'/"$%( #)'/$)/4`6( *&/"$%( #)'/$"$%(
,'4L4( D="+=( *);'$*( ,=)( 4";&#,'$)-&4( >/-+)44"$%( -.(
#'$%&'%)( /)4-&/+)4( '#-$%( D",=( B"4&'#( "$.-/;',"-$( -.(
*"..)/)$,( 4-/,4?( Z/)#";"$'/<( /)4&#,4( -.( '( >"#-,( 4,&*<( D)(
;&#,"#'$%&'%)( ;',)/"'#46( ,=)( %'X)( >',,)/$4( -.( 01( D",=(
1EFJ1K( ;'/L)*#<( *"..)/( ./-;( ,=-4)( -.( =)'/"$%( #)'/$)/4(
,/&4,( ,=',( ,=)( ;-/)()C,)$4"B)( "$B)4,"%',"-$4(-$(,="4( ,->"+(
!"! #$%&'()*+,*-*&./0
E,'#"'$( S"$"4,/<( -.( ]*&+',"-$( '$*( O)4)'/+=(
")-8()8,( T( @3A;"( MOdf]8c^\g1( 2788PJ78K7:6( 4))Q(
=,,>QTTBBBC&'$,*C6-0C'+R( J\=)( G44-+"',"-$( D0.8,++'%
E,*'6'+F6( Z/-A)+,( V@/","$%( 1EF( '$*( F"%$@/","$%W( O-;)6(
2788g( J:?( @)( ,='$L( -&/( +-##)'%&)4( F,).'$-( 1)B"'#*"6(
S'/"#)$'( 0)( S'/4"+-6( G$$'( 1'!)##'( '$*( G#)44"-( 0"(
1"! 2*3*4*&$*/0
/)>/)4)$,',"-$( "44&)( '$*( ",4( ;&#,".'+),)*( '4>)+,4( "$(
+-$4,/&+,"$%( 4"%$( #'$%&'%)( +-/>-/'Q( _&)4,"-$46(
'$4D)/46( .&/,=)/( >/-!#);4?( D0.6,,5'-8$% .>% +7,( G05%
?.0H$7./% .-% +7,% I,/0,$,-+)+'.-% )-5% D0.6,$$'-8% .>%
A'8-% ")-8()8,$C% 1O]3( 788[6( S'//'L)+=(
!"#$%$&' ()&' *+,,+-./%&' 0)12)&' 3' 4#5%,,6%/1*"#7+&' 8)'
9:;<;=)' >%5$#+,6$?' @#A+6-%' %$' ,+-B#%5' .%5' 56B-%5C'
%-$/%' D"-$6-##A' %$' E+/6+$6"-5)' 8-' !)' >+/D6+' %$' 0)'
2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&' -H' <I<&' A+/5' :;<;&'
M+N#+-"&' ()&' O%E6+,.6&' *)' 3' 2-$6-"/"' P677#$"&' Q)'
M+5%,,6&' 0)M)&' 0+/+B-+&' *)' 3' T",$%//+&' T)' 9:;;U=)'
M#V+D&' M)' 9:;;;=)' 3+'3+,-#('%(&'!.-,(&' 7$+,8+.&(9' *(&'
:".(&'%('*;<4",.4.)5&' 4+6$5' .%' O+-B#%5' -H<J1<U&' P+/65C'
M#V+D&' M)' 3' 2-$6-"/"' P677#$"&' Q)' 9:;<;=&' QA%/B%-D%&'
-"/A%' %$' E+/6+$6"-' .+-5' ,%5' ,+-B#%5' .%5' 56B-%5' C' E%/5'
#-%' /%.?S6-6$6"-' -"$6"--%,,%)' 8-' !)' >+/D6+' 3' 0)'
2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&' -H' <I<&' A+/5' :;<;&'
M#V+D&' M)&' (+,,%&' P)' 9:;;K=)' P/"R,?A+$6X#%' .%5'
56B-%5)' 8-C' =$+.)(1(,)' >#)"1+).?#(' %(&' 3+,-#(&@'
A>=>3>B' C' =$+.)(1(,)' +#)"1+).?#(' %(&' *+,-#(&' %(&'
M#V+D&' M)' %$' *+,,+-./%&' 012)' 9:;;K=)' 8D"-6D6$F' +-.'
+/R6$/+/6-%55' 6-' 4/%-D@' *6B-' O+-B#+B%C' @6B@,F' 6D"-6D'
5$/#D$#/%5&' .%B%-%/+$%.' 6D"-6D6$F' +-.' .6+B/+AA+$6D'
6D"-6D6$F&' 6-' Q)' P677#$"&' P)' P6%$/+-./%+' 3' [)' *6A"-%'
9%.5=&' :($J+*' +,%' !.-,(%' 3+,-#+-(&@' K"1E+$.,-'
(6' [%-7"&' 2)&' >6+-S/%.+&' >)&' O+A+-"&' O)&' O#D6",6&' ^)&'
[email protected]&' !)&' ["556-6&' P)&' [email protected]&' M)*)&' P%$6$$+&' >)&'
2-$6-"/"' P677#$"&' Q)&' 9:;;_=)' [%N/%5%-$+$6"-' `'
2-+,F565' 1' [%N/%5%-$+$6"-C' -"E%,' +NN/"+D@%5' $"' $@%'
5$#.F' "S' S+D%1$"1S+D%' +-.' \/6$$%-' -+//+$6E%5' 6-' 8$+,6+-'
*6B-' O+-B#+B%' 9O8*=)' P+N%/' P/%5%-$%.' +$' $@%' K<3!'
<,)($,+).",+*'K",L($(,4('",'!.-,'3+,-#+-(&&' Z+A#/&'
(6' [%-7"&' 2)&' O+A+-"&' O)&' O#D6",6&' ^)&' [email protected]&' !)&'
P"-7"&' O)&' 9:;;U=&' 8$+,6+-' *6B-' O+-B#+B%C' M+-' \%'
9%.5)=&' 3MNK' OPPQ@' R"$S&D"E' T$"4((%.,-&' 9RCUVBF'
!(4",%' R"$S&D"E' ",' )D(' M(E$(&(,)+).",' +,%'
(/6B+5&' 2)*)' 3' c"#/%A%-"5&' ()' 9:;;J=)' 2-' %1O%+/-6-B'
"L' R!N>!' =$+,&+4).",&' ",' >%2+,4(&' .,' N,-.,(($.,-'
[email protected]"#&'Q)'3'4"$6-%+&'*)'9:;;K=&'2-'Q-E6/"-A%-$'S"/'
(%+S' 2DD%556R6,6$F' $"' Q.#D+$6"-+,' M"-$%-$)'
T$"4((%.,-&'"L' )D('7.$&)'<,)($,+).",+*' K",L($(,4('",'
<,L"$1+).",' +,%' K"11#,.4+).",' =(4D,"*"-W' +,%'
[email protected]"#&' Q)' 3' 4"$6-%+&' *)' 9:;;Y=&' ^"",5' S"/' (%+S'
2DD%556R6,6$F' $"' +-' %>WT' Q-E6/"-A%-$&' 6-' 3(4)#$('
0")(&' .,' K"1E#)($' !4.(,4(' A30K!=&' T",)' J<;J&'
>+/D6+&' !)' 9:;;U=)' ^@%' A%$@".","B6D+,&' ,6-B#65$6D' +-.'
5%A6","B6D+,' R+5%5' S"/' $@%' %,+R"/+$6"-' "S' +' \/6$$%-'
3MNK'OPPQ'C' R"$S&D"E' T$"4((%.,-&' 9RCUVB@'!(4",%'
R"$S&D"E' ",' )D(' M(E$(&(,)+).",' +,%' T$"4(&&.,-' "L'
T$"J*51+).?#(&' %(' *+' &4$.E)#$.&+).",' ()' 1"%5*.&+).",'
%(&' J+&' ,.2(+#X' (,' 3+,-#(' %(&' !.-,(&' 7$+,8+.&('
A3!7B/' 0?A"6/%' .de+R6,6$+$6"-' f' (6/6B%/' ,%5'
>+/D6+&' !)' 3' (%/FDG%&' 0)' 9:;<;=)' 8-$/".#D$6"-)' 8-' !)'
>+/D6+' 3' 0)' (%/FDG%' 9%.5)=&' !"#$%&' ()' *+,-#(' %(&'
&.-,(&/' 0"$1(' ()' 2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&' -H'
-"/A%5' .+-5' ,%5' .%#V' ,+-B#%5' %-' N/?5%-D%' D@%7' ,%5'
5"#/.5' ,"D#$%#/5' .%' ,+' O+-B#%' .%5' *6B-%5' 4/+-j+65%'
9O*4=)' 8-' !)' >+/D6+' %$' 0)' (%/FDG%' 9%.5=&' !"#$%&' ()'
*+,-#(' %(&' &.-,(&/' 0"$1(' ()' 2+$.+).",&&' 3+,-+-(' ()'
>6+-S/%.+&'>)&'P%$6$$+&'>)&[email protected]&'M)*)&'(6'[%-7"'2)&'
["556-6&' P)&' ' O#D6",6&' ' ^)&' [email protected]&' !)&' O+A+-"&' O)'
9:;;_=)' (+,,+' A".+,6$f' S+DD6+1+1S+DD6+' +' #-+' ,6-B#+'
5D/6$$+' %A%/B%-$%C' -#"E%' N/"5N%$$6E%' 5#' $/+5D/676"-%' %'
5D/6$$#/+' .%,,+' O6-B#+' .%6' *%B-6' 8$+,6+-+' 9O8*=)' 8-' M)'
M"-5+-6&' M)' 4#/6+556&' 4)' >#+77%,,+' 3' M)' P%/$+' 9%.5)=&'
>)).' %(*' YZ' K",-$(&&"' %(**[>&&"4.+\.",(' <)+*.+,+' %.'
%.' _."$-."' M+.1",%"' K+$%",+)' P%/#B6+C' >#%//+'
8A+7&' 0)' 3' !%-F"-)' (' 9:;;K=)' `(&.-,.,-' H.)D'J*(,%&)'
c+/N"#765&'c)&'M+/6.+G65&'>)&'4"$6-%+&'*)1Q)&[email protected]"#&'
Q)' 9:;;K=&' Q.#D+$6"-+,' /%5"#/D%5' +-.' 6AN,%A%-$+$6"-'
"S' +' >/%%G' 56B-' ,+-B#+B%' 5F-$@%565' [email protected]$%D$#/%)'
[email protected]+B"'P/%55)'
g-6E%/56$F'"[email protected]+B"'P/%55)'
O+-%&' e)&' e"SSA%65$%/&' [)k)' 3' !+@+-&' !)' 9<__U=)' 2'
i"#/-%F' 6-$"' $@%' (%+S1a"/,.)' *+-' (6%B"&' M2C'
P677#$"&' Q)&' P6%$/+-./%+&' P)&' 3' *6A"-%&' [)' 8-$/".#D$6"-)'
8-' Q)' P677#$"&' P)' P6%$/+-./%+' 3' [)' *6A"-%' 9%.5)='
9:;;K=&' :($J+*' +,%' !.-,(%' 3+,-#+-(&' C' K"1E+$.,-'
&)$#4)#$(&@' 4",&)$#4)&' +,%' 1()D"%"*"-.(&&' !%/,6-' l'
P677#$"&'Q)'["556-6&' P)'3'[#55"&'^)'9:;;U=)'[%N/%5%-$6-B'
56B-%.',+-B#+B%5' 6-'\/6$$%-'S"/AC'X#%5$6"-5'$@+$'-%%.'
T$"4((%.,-&' 9RCUVBF' !(4",%' R"$S&D"E' ",' )D('
M(E$(&(,)+).",' +,%' T$"4(&&.,-' "L' !.-,' O+-B#+B%5@'
*+,,+-./%&' 0)12)' 9:;;I=)' 3(&'#,.)5&'%#'%.&4"#$&(' (,'
%(' 4+)(-"$.\+).",' %+,&' *(' 4+%$(' %[#,(' -$+11+.$('
%(' *[.4",.4.)5/' [email protected]%' .%' ("D$"/+$' %-' *D6%-D%5' .#'
*#$$"-&' T)' 9<___=)' 3(&&",&' .,' !.-,R$.).,-/' =(X)J""S' a'
R"$SJ""S)' O+'k",,+&'M2C'(%+S'2D$6"-' M"AA6$$%%'S"/'
*6B-' a/6$6-B' 9:-.' %.6$6"-&' <5$' %.6$6"-' <__J=)'
Deaf People Education: crossing linguistic borders through e-learning
Giuseppe Nuccetelli, Maria Tagarelli De Monte
Istituto per Sordi di Roma
Via Nomentana 56, 00161 Roma, Italy
E-mail: [email protected], [email protected]
The introduction of Web Technologies and the development and spread of portable devices has improved the quality of life of deaf
people making distant communication easier. In particular, the development of online systems including video-messaging and the
possibility to upload user generated contents, has given deaf people the possibility to rely on other, more direct, means of
communication. Similarly, the development of e-learning platforms and their adoption in most Universities worldwide, is shaping the
way education is conceived, leading to new and innovative systems merging in-class education with e-learning systems. Our
contribution gives a first explanation of how Information and Communication Technology (ICT) can be a strategic resource to give
deaf people equal educational opportunities focusing on the development of appropriate language skills, and the strategies through
which these opportunities can become effective. Our experience is based on the results and outcomes of DEAL Project (Deaf people in
Europe Acquiring Languages through E-Learning), carried out from Istituto Statale per Sordi Roma (ISSR - State Institute for the Deaf
in Rome) with co-financing from the European Commission. The objective being that of creating an e-learning model for teaching
foreign languages to deaf individuals in professional education, and giving new bases to researches in the field.
communication constitute a horizon of authentic
interactions in the national written language (or rather,
written/spoken) in which deaf people immerge
themselves spontaneously and with strong motivation.
This means that, inevitably, through these interactions
they acquire language skills.
In short, the use of new technologies in deaf people
education configures for the first time a domain in which
deaf people with medium/low skills in the written
language can improve themselves through the
involvement in real communication phenomena and not
only through learning contexts. They can thus acquire
languages, not only learn them.
Linguistic competences in Deaf People:
an integration problem
Deaf people officially certified in our country (Italy) are
about 60,000, but it is estimated that this number does not
reflect the true dimension of the problem. About 11 of
every 10,000 children born deaf.
Deafness is a deficit, but not a cognitive one. However,
School still offers no effective systematic response to the
problem of deaf education. The social cost of this
situation are enormous: not only deaf people are often
excluded from written communication, as well as from
the spoken one; in many cases, they cannot perform
professional tasks involving minimum competences in
written language and cannot access higher levels of
Researches done in this field (Caselli et al., 2007;
Fabbretti et al., 2006), reveal that deaf people, especially
those whose deafness aroused in pre-linguistic age
(before 18-30 months), have typical problems in the
acquisition of written language and in the development of
linguistic skills. These problems are specific for each
culture and each language, and they are not always
comparable. In Italian, for example, deaf people show
lacks in the use of free morphology, clitic pronouns,
prepositions, articles and so on. This means they need
tools and educational methods aimed at resolving them.
This is often a difficult task, due to the differences in deaf
people logopedic rehabilitation and educational paths, and,
thus, their different writing skills. Any possible solution
has to adapt both to the type (genetic, sickness, etc.) and
degree of deafness (deep, medium, light, partial), as well
as the learners’ specific linguistic and communicational
competences and abilities.
In this perspective, the evolution of web technologies
towards portability and adaptability to users’ needs, and
the use of educational strategies based on e-learning tools
can forecast an enhancement of the effectiveness of the
actions directed to this specific target.
On the user point of view, the new forms of digital
Sign Language as a possible tool for
promoting deaf people linguistic
The condition, however, is that strategies and tools are to
be really oriented on the needs and resources of deaf
learners. This is the crucial point of the researches and
experimentations achieved so far, and can be divided into
a number of critical issues that will be considered in the
development of our contribution. Most of the findings
here described are based on the experience gained
working on the DEAL Project (Deaf people in Europe
Acquiring Languages through E-Learning)1.
In the case of deaf people using sign language2, the role of
it in the didactic communication with and within the
students is particularly important as part of promoting the
development of skills in the target language. In fact, deaf
students using sign language find it particularly
comfortable as a language to refer to, putting them in the
correct emotional condition to become a learner.
Within the process of building these skills, we have
Please refer to the acknowledgement chapter for further
information on the project.
All researches and developments of the project here depicted
has considered the micro-culture of deaf people using sign
language, to which we will refer, from now on, as “deaf people”
or simply “the deaf”.
considered sign language as the perfect candidate to be
one of the cornerstone resources in the design of all
activities concerning the didactic communication:
research, problem setting and problem solving,
meta-linguistic reflection, metacognitive analysis.
Building the e-learning platform, we have chosen to use
sign language in both the interactions among peers and
with teachers, integrating the online educational path with
videos and explanations in sign language, and the
possibility for the students to obtain further information
through the video-chat system.
The effective implementation of this strategy has brought
up the importance of creating tools specially designed not
only to allow sign language interactions regulated
according to their purposes, but also to support building of
feedback structured on a mosaic of codes. This means not
only stimulating the use of sign language, but also
creating a feedback system among teachers and learners,
as well as between the learners themselves, allowing
didactic activities to be really effective. Following what
learners are doing, teacher will have the opportunity to
intervene with different feedback degrees, tailored on the
learners needs.
• Videoconference possibility
• Forum
While following the teaching activities, at various set
points along the course, deaf students uses special
supports in their own sign languages. There are two kind
of support:
One way:
• Presentation of the teaching unit
• Lexical micro windows on the dialogue
• Grammatical, syntactic and pragmatic support on the
key concepts of the unit
• Full translation of the dialogue
• Videoconference among peers
• Videoconference with the teaching team
The project has produced three courses: German, Italian
and Spanish as second languages for the deaf students of
the partner countries. For example: Italian deaf students
had a Spanish and a German course available. This means
that each course has two sign language to support it: for
example, the Italian course has both supporting windows
in Catalan Sign Language and in Austrian Sign Language.
3. Deaf People in Europe Acquiring
Language through e-learning: the
construction of a specific educational path
The actions forecasted in the DEAL project were meant to
significantly operate in this framework, through the
introduction of educational tools based on an e-learning
strategy, targeting the needs and the specific capacities of
deaf adults.
In DEAL e-learning based approach, we enhanced the
methodological strategies and educational techniques that
allowed the action upon those critical features in lexical
and grammatical production indicated by the researches
carried out in the field: we worked both on a lexicon level
and on the linguistic structures for the development of the
language skills of deaf learners through the integration of
Sign Language in an educational perspective.
The system is based on the use of an open source
e-learning platform (Moodle) and a videoconferencing
system based on Openmeetings/Red5. The choice of
Moodle has followed that of many European Universities,
adopting this platform for their online courses. Opportune
adaptations were studied and applied to meet the needs of
the target group (teenager students of technical schools
for enterprise secretaries).
The applications that have been added are:
• Explanation and introductive videos in the local sign
• Animated segments with subtitles upon which
educational activities has been developed.
• Interactive teaching activities where the tutors can
work with the students starting from their questions
and their doubts in the educational system.
Explanations are thus given from the active
interaction with the students and not “from above”.
Figure1: example of an Italian comprehension exercise
with micro-window explanation in Austrian Sign
An interesting issue in working in such a multilingual
environment has been, on several point of view, the lack
of human resources having the skills and capacities
required from the project: i.e. a tutor capable to sign in
Catalan Sign Language to give information about German
or Italian language course. This could be an issue to
discuss in an international environment, also for the
construction of possible professional figures.
Evaluating the DEAL platform, issues
and future develooments
The DEAL project has begun in September 2006 and the
main prototype test has been carried out in May 2008 in
Italy for the Spanish course. The experimentation took
place in the Istituto Statale Superiore Magarotto (ISISS State High School “Magarotto”). Eight deaf teenagers has
participated, all students of a high school for commercial
secretaries, of which six have accepted to reply to the final
interview. They were all familiar with computers and have
never studied Spanish.
The platform has been tested in a blended modality,
having a technical support in the classroom as well as a
teacher they could ask questions to. The experimentation
has also tested both the asynchronous and synchronous
interaction modality. During the test, while following the
course indications, the students could share their
questions both in a Forum (asynchronous modality) or a
Videoconference environment (synchronous modality)
where the teacher would reply to questions through the
help of an interpreter.
The materials used to collect the information coming from
the experiments has been: anamnesic questionnaires for
teachers, observation checklist filled by the researchers,
and a final interview to participant students.
Anamnesic questionnaires for teachers has collected
personal data of the participants, information concerning
the type of deafness, her familiar situation, and her
linguistic competences in Italian and foreign languages, if
any, both in vocal or sign language modality.
Observation checklist were filled by 2 researchers per
participant, in three sessions of 20 minutes each situated
in the beginning, in the middle and in the end of the
experimentation. The information collected in this phase
being the interaction of the students in the classroom and
with the teacher, the chosen linguistic form, and other free
At the end of the test, participants were asked to express
their opinion upon the degree and type of knowledge
achieved during the course, a comparison with traditional
in-class courses, feelings about the interaction with the
system as a whole and possible suggestions on how to
improve it.
The results have confirmed the validity of the chosen
educational methodology, as the participants have
confirmed learning something new about Spanish in a
more stimulating and fascinating way. Participants liked
using the videoconference system as well as the sign
language explanatory windows, which has been
considered a funny and clear way to achieve knowledge.
However, the overall data collected in this phase has
revealed the need to improve the overall navigation in the
system, making the whole online experience more
We believe that a solid evaluation of the platform will
come with its use within the deaf community to which the
system has been made available on the project website.
However, the experimentation has given important
information not only for that concerning the methodology
to use on an e-learning platform, but also for that
concerning the management of language codes and
system interfacing.
Not only the educational path needs to be adapted to the
e-learning model, but also the quantity and quality of
information to give in each step must be managed
according to the user’s special needs and visual skills, as
sight is the only sense in which all the information are
conveyed during the interaction with the platform.
5. The management of time and space on
an e-learning platform for the deaf: the
importance of data transmission efficiency
Developing an e-learning platform for the deaf also
requires a special attention to the management of time and
screen space (Keatin & Miru, 2003).
This has emerged clearly during the experimentation
phase of the DEAL project when, for example, giving
signed explanations of words or grammatical segments. In
cases like the one described here, giving students enough
time to pass from the sentence under analysis (written text)
to the video/chat is fundamental for both educational and
motivational reasons. Teacher, computer screen, (eventual)
interpreter, and other students play the role of
“educational objects” taking their turn in the construction
of sense for the student on both a spatial and linear line.
On the spatial line, all “educational objects” must be
positioned in order to allow students to return to the
selected resource when needed, well localized in space
and not undergoing changes. The linear line will be that of
“taking turn” in the dialogical relationship among the
“educational objects”, and the amount of information
In a multilingual educational environment, in blended
learning, where in-class sessions are completed by
sessions with online tutors, this becomes particularly
important. The role of the tutor is that of providing further
adaptability to the course contents, cut upon the single
learners’ specific needs. To have the tutor online while
developing educational tasks means that every single
learner will have the possibility to ask questions about the
course content, in a dialogical relationship with the tutor
and the other students. Similarly, this feature allows the
tutor to monitor the class development in relation to the
course contents and to manage the students’ community
discussion in order to enhance learning in particular
A possible scenario for this case is that of the student
being home while the tutor follows her and other students
in a separate ambient. Students are given the possibility to
follow tutor explanation both on video or written chat.
Deaf students are continuously engaged in following and
decoding messages through the only sense of sight. In a
context like the one described above, their cognitive
resources are thus engaged in processing at least three
different codes: text, sign language video and teacher’s
This means that, in the hypothesis of a teacher who is also
a sign language speaker, s/he will have to give students
enough time to allow sight to complete the video message
decoding, eventually integrated with hints given through
the written or video chat, think and then reply either in
sign language or on written chat, in a distant construction
of sense. The depicted situation is furthermore
complicated in case of teachers who are non-signers, and
the interpreter figure needs to be added.
An incorrect management of these types of interaction
could lead to frustration, demotivation and possible
abandon of the learning session. This is also the case
when working on deaf people writing skills enhancement
in the learners’ local language (i.e. Italian deaf learner –
Italian written language): it is proven that deaf people
approach to written language is often affected by the
difficulties faced during their linguistic rehabilitation and
scholastic path, and the frustration they experience in
constructing their writing skills (Fabbretti et al. 2006).
A proper management of screen space and time will
impact the emerging relationship between students and
teachers and the construction of the learning environment.
In fact, while in the case of hearing students speech and
sight works contemporarily in the construction of sense
and on two different levels (student can watch the screen
contents while listening to the teacher’s explanation), in
the case of deaf students there is only one level to work on,
sight, which is engaged in receiving multiple inputs
contemporarily. Visual elements in the screen should be
managed in order to be highly visible, easy to decode, and
giving good navigational cues also for the enhancement of
the ongoing interactions in the system.
This great use of video and visual communication tools,
makes data transmission quality one of the main issues of
e-learning platforms for the deaf. Real-time online video
communication such as video-chat for sign language or
lip movement are strongly affected by the efficiency of
data transmission, as this should be as close as possible to
real people movements. Many are, in fact, the cases in
which multiple video chats makes communication
between deaf people (either bimodal or oralists) nearly
impossible, due to the scarce quality of video
transmission. This constitutes a strong limit in the
development of online educational solutions for deaf
As it’s possible to understand, a lack of efficiency in video
transmission, a poor website visual objects management
and a incorrect management of time could end up to a loss
in deaf students comprehension of the main topics and
their motivation in following the course.
In this framework, thus, we need to search the best
structure for educational communication with deaf
learners and the role given to sign language in the variety
of possible codes. This point is strictly related to the
interaction regulation (learner/learner, learner/teacher,
etc.) and time balancing (synchronous, asynchronous) to
grant the maximum efficiency in the learning
One of the results of our researches has been that the
educational interaction in video conferences requires a
definite number of participant. Basing on the DEAL
experience, our hyphotesis is that an optimal number for a
smooth interaction could be that of 4 people: i.e. one tutor
and 3 students.
However, the problem of a system like this is the
regulation of speech turn and the different
communicational channels balancing: i.e. video-chat vs.
textual chat vs. working area where the student is
involved in her educational activity. There is a problem in
optimizing sign language as a mean of educational
communication in an environment in which the target
language remains written and, in multilingual
environment, is a foreign language.
The problems we have developed so far are surely
strategic with regards to the target group, but they also
have a relevance that seems to go beyond this specific
scenario. In a “regular” educational environment, there
are issues that are normally underrated due to the
redundancy of communicational possibilities between
hearing people who are able to pick up the information
they need from the ongoing communicational process.
Working on a multilingual platform for deaf people
education has thus opened reflection not only on the
specific problems that this type of user could meet but
have also given a base for reflection on the nature of
educational communication in foreign language learning.
In fact, these problems shows that the educational
communication in e-learning environments shows
inefficiency margins, amplified but not generated by
deafness. Working towards the solution of these issues
can thus have important theoretical implications also in
the frame of second language education in digital learning
Being one of the first experiences in Europe trying to
teach a foreign language to deaf students through the
support of e-learning, DEAL project has focused mainly
on the structure of the didactic content, and the use of sign
language and short “explanation” windows in a
complementary and innovative way, in order to support
several type of deaf learners needs. This has challenged
other aspects of the educational path, such as the selection
of the best technology to use, the design of a correct
interface for deaf learners, the combination of multiple
communicational channels and the “rhythm” of the
ongoing interactions in the system.
One of the points that the DEAL project has aroused is the
importance of creating a collaborative network among
students and tutors, through the use of an effective and
reliable technological support.
The DEAL project has been financed by the European
Union within Leonardo’s programme, and immediately
received the European Label 2008 for Innovative Projects
[]. Partners of the project has been:
Istituto Statale per Sordi di Roma - ISSR, Istituto di
Scienze e Tecnologie del CNR - ISTC, Istituto Superiore
di Istruzione Specializzata per Sordi Magarotto – ISISS
[Rome, Italy], Universitat de Barcelona, Fundaciò del
Centre d’Estudios de Llengua de Signes Catalana
[Barcelona, Spain], Klagenfurt Universitat [Austria].
The project has been re-funded in the frame of Leonardo’s
Transfer of Innovation programme 2009-2012 that sees
the University College London as a new partner of the
project, in place of Klagenfurt Universitat.
The ISSR has recently begun working on a project for the
improvement of deaf people Italian writing skills through
e-learning (VISEL).
Both authors are in complete agreement for that
concerning the paper’s contents. Main contributor for
chapters 2,3 and 4 has been Dott. Giuseppe Nuccetelli
while Dott. Maria Tagarelli De Monte is to consider the
main contributor for chapter 1, 5,6 and 7.
Keatin, E.G., Miru, G.S. (2003). American sign language
in virtual space: Interactions between deaf users of
computer-mediated video communication and the
impact of technology on language practices. Language
in Society, 32(05):693-714.
Elsendoorn, B.A.G., Coninx, F. (1993), Interactive
Learning Technology for the Deaf, Proceedings of the
NATO Advanced Research Workshop on Interactive
Learning Technology for the Deaf. The Netherlands,
NATO ASI Series, Computer and Systems Sciences,
13(113): p. 285.
Fabbretti, D., Volterra, V., Pontecorvo, C. (1998). Written
language abilities in deaf italians. Journal of Deaf
Studies and Deaf Education, 3(3):231--244.
Pizzuto, E., Caselli, M. C., Volterra, V. (2000). Language,
cognition, and deafness. Seminars in Hearing,
Rinaldi, P., Caselli, C. (2009). Lexical and grammatical
abilities in deaf italian preschoolers: The role of
duration of formal language experience. Journal of
Deaf Studies and Deaf Education, 14(1):63--75.
Maragna, S., Nuccetelli, G (2008). An e-learning model
for deaf people’s linguistic training. Proceedings of the
DEAL project final meeting. Publicacions I Edicions de
la Universitat de Barcelona.
Fabbretti, D., Tomasuolo E. (2006). Scrittura e sordità.
Roma: Carocci Editore S.p.A.
BONy: a knowledge centric collaborative learning platform
Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede, Aldo Gangemi
Via Nomentana 56, 00161, Rome, Italy
[email protected], [email protected], [email protected]
In this paper we describe BONy, a technology enhanced platform for collaborative learning. Semantic technology, and in particular an
RDF/OWL ontology, is used to integrate different modules of the system, allowing strong interoperability between linguistic data and
structured knowledge. This allows us to develop intelligent advanced functionalities, including expert finding, mentoring and semantic
search. Those functionalities largely exceed the capabilities of existing state of the art e-Learning platforms, for example allowing
multilingual search. BONy is an unique showcase for the next generation semantic systems for e-Learning. The BONy platform is
currently working as a free on-line service.
1. Introduction
1. enhance interoperability and system integration
Electronic learning (e-learning) is a type of education
where the medium of instruction is computer technology.
It is a planned teaching/learning experience using a wide
spectrum of technologies, mainly internet based, to reach
learners at a distance. The base units of e-learning systems
are called learning objects. They are resources, usually digital and web-based such as HTML pages or animations, that
can be used and re-used to support learning. They represent an atomic piece of knowledge and are composed into
courses. At their core there will be instructional content,
practice, and assessment. The way in which the units can
be stored, retrieved and managed has been the focal point
of most Learning Content Management Systems (LCMS).
The actual mechanisms to manage the learning objects,
mainly based on web standards such as XML, is not able to
face the new requirements of collaborative learning, where
teachers and users are no longer two different players in
the network. In fact, in a web2.0 perspective, students are
asked to supervise other students and are supposed to actively contribute to the development of learning objects,
playing the role of professors with respect to the areas of
expertise where theirs skills are higher. In addition, in a
collaborative learning scenario, the student is typically exposed to a very highly unstructured information (e.g. wikis
developed by other students, forums, chats), requiring the
intervention of a professor or an expert in the field to recommend a personalized learning path and to ensure the selection of high quality content.
On the other hand, non-semantic technology, such as web
2.0 platforms, do not allow us to implement a fully automatic system satisfying the new needs of collaborative
learning, and in particular to represent the user profile and
assess his skill. To this aim, semantic technology such as
ontologies can play a big role, for example to represent the
user profile with respect to different subjects and to represent the content of learning objects. To this purpose, within
the BONy project, we looked forward to semantic technologies, anticipating the next generation WEB 3.0 solutions for
eLearning while providing a showcase of the new generation capabilities.
BONy is a knowledge centric LCMS where a core ontology
is used for two main purposes:
2. integrating linguistic information from learning objects with structured information from databases
3. allow intelligent services such as expert finding, mentoring and semantic search
The core component of the system is a ”RDF/OWL” ontology, developed according to the best practices and by
applying Ontology Design Patterns (Gangemi, 2005; Presutti et al., 2008; Reich, 1999; Svatek, 2004). As far as
interoperability and system integration are concerned, the
ontology is used to enhance the integration of three existing open source platforms: a LCMS (DOKEOS) (Grandmontagne., 2008; ?), a framework for social networking
(SPREE) (Bauckhage et al., 2007; Metze et al., 2007) and
a collaborative authoring tool (Semantic Media Wiki) . The
ontology is automatically populated by re-engineering data
from the different databases exploited by the three platforms integrated so far.
A mayor role of the ontology is linking the textual data to
the knowledge structures. This is done by extracting keywords from the text embedded in the Learning Objects, and
associating different keywords to a set of topics of interest
for the domain of the course. This allows us to map different courses in different languages to the same topic structure, and to improve search and multilingual retrieval. This
allows us to implement a set of intelligent functionalities,
including an automatic mentoring algorithm designed for
the generation of personalized learning paths, multilingual
search and expert finding. To this aim, we connect Learning Objects and user profiles with a shared taxonomy of
topics describing the content of the e-Learning course, and
we used SPARQL queries and a reasoner. This has been
done by extracting keywords for each course
The platform is currently working as an free on-line service,
available on the web at the address .
We invite the reader to join the BONy network and feel the
different user experience provided by semantic technology
in use.
This paper is structured as follows: in section 2. we illustrate the architecture of the platform, section 3. is devoted
to describe the ontology used in system, in section 4. we
To allow modularity, the ontology has been subdivided into
three main components (see Figure 1):
describe the intelligent functionalities of BONy. Section 5.
concludes the paper.
• Topic Ontology: it describes the subjects covered by
the eLearning course and their conceptual dependencies. Topics are instances of the class TOPIC, and they
are related between each other by the object properties isSubTopicOf and nearTopicTo connecting different instances of the same class.
2. Architecture
The BONy platform is an integration of three existing open
source platforms: DOKEOS, SPREE and Semantic Media
The architecture of the system is described in Figure 1:
an RDF/OWL ontology is used to represent data coming
from the different databases adopted by the integrated open
source solutions. The ontology describes semantically the
three main components of the platform, and in particular
the Learning Objects, the European Project Management
domain (Topic ontology) and the user profile in the social
network (User Ontology) as in Figure 1. Differently from
other e-Learning system, data is represented in the ontology
in RDF/OWL format.
In addition, when data is represented into the ontology, it is
also linked semantically to a topic ontology, describing the
content of the course. In particular, user profiles and learning objects are linked together across topics. The richer expressivity of this formalism allows us to develop semantic
functionalities such as user profiling, learning path generation and expert finding.
Thanks to the ontology it is possible to enhance the consistency of the inserted data. This is done by using a reasoner
to check the consistency of the entire database every time
new data is inserted.
The technology adopted to represent and manage the data
in the ontology is based on state-of-the-art Java open source
solutions: Jena1 , Pellet 2(Sirin et al., 2006), and Protégé 2 .
We used protege to build the ontology, Jena to access the
ontology and Pellet to reason on the data. The access to
the ontology from the various sub-system is implement by
adopting a client/server architecture developed in Java.
• eLearning Ontology: it is about the learning objects
and describes different features, e.g. the type of electronic support adopted, dependencies between learning objects and the time required for learning. This
part of ontology is composed by different classes such
as: LearningActivity, SCO and CourseRole. The instances of those classes and their relations have been
mostly derived from the corresponding SCORM descriptions by a reengineering process.
• User Ontology: it is about the social network players,
representing students and teachers’ profiles, their relationships and their skills. All the users in the network
are represented by instances of the class AGENT. Specific subclasses are STUDENT, TEACHER and EXAMINER
The topic ontology operates as a link between the eLearning ontology and the user ontology. For example, users and
SCOs can share a relation with a common topic, allowing
the development of recommending services and the automatic assessment of the user profile.
Users are linked to topics by the knowsTopic relation, reflecting their skills into 5 specific degrees: knowsMediocre,
knowsBasic, knowsFair, knowsGood and knowsPerfect. In
a similar way, Learning Objects are linked to Topics by the
relation hasTopic. This is derived by the keywords annotation performed on the learning objects and represented in
the ontology as well. In fact, keywords are linked to topics,
allowing to infer the has topic relation between topics and
learning objects.
Our ontology is developed according to the Ontology Design Pattern (Gangemi, 2005) (ODP) paradigm, i.e. utilizing and specializing some already existing reusable ontology to describe particular piece of domain knowledge. An
ODP is usually a small ontology that solves complex modelling issues to enhance semantic interoperability of different knowledge components. The notion of ODP was introduced in 1999 for a particular problem domain in biology
(Reich, 1999). Afterwards, ODP appeared under different
names such as semantic patterns, knowledge patterns and
the designing patterns for Semantic Web ontologies that are
now called ODPs. A large repository of ODP is available
on line3 .
3. The BONy Ontology
The development of the BONy ontology has been inspired
by the following principles:
• re-usability: when adapted to a new course, the OWL
schema of the BONy ontology is preserved and only
RDF data change, it allows us to minimize the adaptation costs to new domains;
• modularity: the ontology is composed by three modules representing the eLearning content, the social networks and the topics of the course, it allows us to
change the course while preserving the community;
• best practices: the ontology has been designed by specializing Ontology Design Patters (ODP)
Regarding re-usability, we carefully distinguished the
OWL part of the ontology (i.e. the metamodel) from the
actual data. In this way, the platform can be adapted to
new communities, learning Objects and topics without any
change in the ontology. To this aim, the topic taxonomy has
been reified, so that topics are instances and not classes.
3.1. Populating the ontology
The ontology is populated by re-engineering data coming from different databases belonging to different applications, and in particular: a) from the e-Learning course
(described by the Manifest file in the SCORM syntax) b)
Figure 1: Services and data are linked by the BONy ontology
Figure 2: The Bony Ontology: concept hierarchy
from the user profiles in the LCMS and in the Social Network (see Figure 1). In addition, the Topic ontology has
been manually populated by topics of interest for the domain of the course and their relationships.
meaningful taxonomy describing the project management
Populating the eLearning Ontology To populate the
eLearnign ontology we re-engineered data from SCORM
to RDF following the metamodel developed for the eLearning ontology, which basically reflected the SCORM
distinctions. To this aim, we represent some of the relevant
distinctions in the SCORM definition into properties of the
ontology. This process is totally automatically and is performed once the course is loaded into the platform. This is
done partially by re-engineering the XML based metadata
in the SCORM manifest file. In order to connect the Leaning objects to the topic ontology we exploited the keyword
annotation developed to build the topic taxonomy and we
inferred the relations between topics and learning objects if
one or more keyword is associated to both. This is done by
a CONSTRUCT query in the SPARQL language.
Populating the Topic Ontology The ontology class
Topic is one of the most important classes. It allows us
to link users and learning objects. In our Ontology we use
a semiotic notion of Topic as a (usually potential) collection of SocialObject(s). For example, Project management
is a topic constituted by the set of social objects that are associated with project-management related entities, such as
tasks and deliverables.
Topics are related each other by Narrower and Broader relations. The procedure adopted to build the topic ontology
was entirely manual, but at the same time inspired by quantitative principles aimed at preserving a pretty uniform distribution of learning object for each topic.
To achieve this, we first selected a set of keywords describing each learning object, then we look for their corresponding pages in wikipedia, in order to find their corresponding
category. We select those categories as topic after manual revision, and we browse the narrower/broader relationships among them to figure out a meaningful taxonomy a
Populating the User Ontology The user data are derived from a variety of different systems integrated in the
BONy platform. The ontology allows us to integrate different frameworks such as DOKEOS (where personal data are
Figure 3: Interface for user profiling in the BONy platform
collected in a database) and SPREE (a open source knowledge exchange network) where the data about the knowhow of the single user are registered. Relations between
Users and Topics are first established at registration time
by the user profiling module, and then refined by the user at
any time. Synchronization between the user profile in the
Social Network and the ontology is guaranteed by updating
the ontology at every change.
Then, the BONy platform uses Information Retrieval and
Natural Language Processing techniques to match the content described so far with the topic ontology, in order to establish new relations between her/his profile and topic ontology. To this aim, we exploit one of the core capabilities
provided by the SPREE framework, which is able to crawl
specified sites, extracting bag-of-words, and therefore representing each page in a vector space model, and then measuring the similarity among vectors associated to users and
those associated to topics in the ontology by cosine similarity. To this aim, the SPREE platform generate bag of word
vectors for each topic in the ontology when it is installed
by adopting a very similar approach to what described for
the user (Bauckhage et al., 2007; Wetzker et al., 2007). The
result of this process is a preliminary assessment of the user
skills that can be further refined by the user itself by adding
new topics or modifying the degree of relevance of each
category, ranging from basic to perfect. This is illustrated
by right part of Figure 3. The user model obtained so far
is then stored in the ontology while checking the logical
The aim of this service is to recommend a minimal sequence of learning objects to a new user on the basis of
his profile. This set will be generated automatically by an
algorithm whose goal is to select a sequence of learning objects so that the user is not studying subjects he is already
aware of, while concentrating on filling the gap between
her/his initial user skills (i.e. those inferred by the user profiling module described in the previous section) and the full
range of topics covered by the course. The goal of this process is to minimize the time required to study the full topics
of the course while avoiding subjects already well known,
while taking into account dependencies between learning
The output of this process is illustrated in Figure 5. Clicking on the ”yes, I would like to try”, the automatic mentoring process starts and after a few seconds returns the the sequence of learning objects where a subset has been marked
by a green sign (see righter part of Figure 5), meaning that
the student does not need to go trough them since he is already skilled in the subject. The effect of this process is
that the system generates a minimal set of learning objects,
avoiding the student to go thought the full course, which
Intelligent Functionalities of the BONy
The BONy platform provides three main semantic services
which are far behind the capabilities of current eLearning
technology: mentoring, (i.e. the generation of personalized learning path within the course on the basis of the user
profile), semantic expert finding (i.e. looking for experts
within the network which are able to answer to specific
questions) and multilingual search (i.e. the capability of
retrieving Learning Objects in any of the 11 different languages of the BONy course). Mentoring and expert finding
are based on the user profile, automatically inferred by the
platform and represented in the ontology.
Even though some of them have been already proposed in
the literature, BONy is the first working platform implementing all them at the same time in a integrated environment, thanks to the massive use of semantic technology.
User profiling The user profile is represented in the ontology and consists of biographic data, such as email, name,
address, as well the assessment of the user skills. The user
skills are represented by relations between them and topics in the ontology, as described in section 3.. The BONy
platform is able to assess the competence of each users in
a semi-automatic way, by looking at web pages and other
content indicated by the user as a reference material for his
competence. This process is easy, quick and effective, and
works as follows.
Every time a new user is enrolled in the system, she/he is
asked to enter a set of web pages describing her/his skills
(e.g. her/his home page, the home page of her/his university
or organization). In addition, she/he is asked to enter a set
of keywords describing her/his skills. This is illustrated in
the the left part of Figure 3.
Figure 4: Expert finding process. When a question is submitted, the system categorizes the question (categories box on the
left figure) and searches the experts (Experts box on the right).
will take around 5 hours in the European Project Management case study. Rather he is supposed to study less, saving
time (about 1 hour in the example in Figure 5)
To implement this service, a typical approach in Artificial
Intelligence is to use a planner. Given the reduced number of constraints and the relatively small scale domain, it
was possible to implement the same set of capabilities in a
much simpler way by defining ad-hoc SPARQL queries and
using a reasoner. This generates a planner that is different
from those using a rigorous logical formalism and a clear
definition of goals. Instead using SPARQL we can make an
approximation because the objective is not formalized. In
fact, each learning object is linked to one or more Topics.
This allows us to link the user profile (degree of knowledge
in the different Topics) with the learning objects regarding
topics he knows better. A simple SPARQL query allows us
to select all those Learning Objects about topics that are not
in the user profile, generating the mentoring service we are
interested in. This service is implemented by adopting the
Jena API to perform the SPARQL queries and Pellet 2 to
reason on the data.
rum and can be ranked by using a feedback mechanism, in
order to assess the reputation of users in the network and to
promote new experts for forthcoming questions.
Figure 4 presents an example of the expert finding process,
showing the categories of the question and the retrieved
Multilingual Search All learning objects and their textual content have been indexed by a search engine (i.e.
Lucene). The index is done by using the text within the
slides and the keywords associated to each of them. As
far as the learning objects are aligned among languages by
a common representation in the ontology, it is possible to
write queries in any language, and to retrieve pages in different languages. Expanding text in learning objects by
the keywords in the ontology is also a way to implement
semantic search. Figure 6 describes a screen-shot of the
search engine and his multilingual capabilities.
5. Conclusion and future work
In this paper we presented BONy, a technology enhanced
platform for collaborative learning using semantic technology to enhance interoperability between systems and to allow advanced functionalities such as including expert finding, mentoring and multilingual search. Those functionalities largely exceed the capabilities of existing state of the
art e-Learning platforms. BONy is an unique showcase for
the next generation semantic systems for e-Learning and
can be used on line at the address .
The main focus of our work has been showing the new capabilities allowed by connecting linguistic data with knowledge bases, how to represent this information into a proper
knowledge base and how to make it interoperable with
linked data in the semantic web. Therefore we did not concentrated in boosting the performances of the single components, for example by using richer ontologies or more
advanced Natural Language Processing techniques. In the
future, we are going to develop the 3.0 version of the BONy
platform, where semantic web data will play a big role to
shift from an information to a knowledge centric system.
In particular, we are going to implement a knowledge centric authoring tool for learning objects, where semantic web
Expert finding The role of the expert finding service is
to look for other students in the network which are able to
answer a specific question. Every user is regarded as a possible expert on the Topics where is user profile has stronger
association. BONy is able to look for suitable experts by
simply classifying questions with respect to the topics in
the ontology, which is the same adopted to represent the
user skills.
To this aim, a bag of words for each topic in the ontology is
retrieved from on-line or off-line resources. The same process is done to describe the user profile. Then each expert
is mapped to one or more topics by using similarity metrics
(Bauckhage et al., 2007; Wetzker et al., 2007).
Every time a new question is submitted, the system classifies it with respect to the topic ontology. Classification is
used together with a similarity measure between the query
and expert profiles in order to select the first five top scored
The question is then automatically sent to them by email.
The answers collected so far are then stored in a public fo-
Figure 5: Output of the mentoring process
C. Bauckhage, T. Alpcan, S. Agarwal, F. Metze, R. Wetzker, M. Ilic, and S. Albayrak. 2007. An intelligent
knowledge sharing system for web communities. In In
IEEE Int. Conf. on Systems, Man, and Cybernetics, Montreal, Canada.
A. Gangemi. 2005. Ontology design patterns for semantic
web content. In Proceedings of the ISWC 2005, volume
1729 of Lecture Notes in Computer Science (LCNS).
Y. Grandmontagne. 2008. Technical report, DOKEOS.
Available via
F. Metze, C. Bauckhage, T. Alpcan, K. Dobbrott, and
C. Clemens. 2007. A community based expert finding
system. In Proceedings of IEEE Int. Conf. on Semantic
Computing., Irvine, CA.
V. Presutti, A. Gangemi, S. David, G. A. de Cea, M. C. S.,
E. Montiel-Ponsoda, and M. Poveda. 2008. Deliverable
2.5.1: A library of ontology design patterns: reusable solutions for collaborative design of networked ontologies.
Deliverable Project Number IST-2005-027595, NeOn:
Lifecycle Support for Networked Ontologies.
J. R. Reich. 1999. Ontological design patterns for the integration of molecular biological information. In Proceedings of the GCB’99.
E. Sirin, B. Parsia, B.C. Grau, A. Kalyanpur, and Y. Katz.
2006. Pellet: A practical owl-dl reasoner. Technical report, Maryland information and network dynamics lab
semantic web agent project.
V. Svatek. 2004. Design patterns for semantic web ontologies: Motivation and discussion. In Proceedings of the
7th Conf. on Business Information Systems., Poland.
R. Wetzker, T. Alpcan, C. Bauckhage, W. Umbrath, and
S. Albayrak. 2007. An unsupervised hierarchical approach to document categorization.. In IEEE Intl. Conf.
on Web Intelligence (WI’07), Silicon Valley, USA.
Figure 6: Fulll text multilingual search inside the eLearning
data are composed by Ontology Design Patterns specialized
on the subject of interest for the course, we are going to
exploit agent based technologies for advanced tutoring and
mentoring, we are going to replace the retrieval engine with
a more powerful recommending engine, looking for semantic web data as well as for internal repositories of learning
objects. Last but not least, we are going to explore the potentiality of applying advanced NLP tools for information
extraction from text and to link the extracted information
to dictionaries like wordnet and other linguistic resources
available from the collaborative work, such as wikitionaries and DBpedia.
This work has been supported by the BONy project, financed by the Education and culture DG of the EU, grant
agreement N 135263-2007-IT-KA3-KA3MP, under the
Lifelong Learning Programme 2007 managed by EACEA.
Social E-SPA C ES: socio-collaborative spaces within Virtual Worlds
Vanessa C amilleri M atthew Montebello
University of Malta
E-mail: [email protected], [email protected]
A bstract
This paper presents research based on a current study validating the effectiveness of the teaching and learning process in the context of
virtual spaces. A report about teens and social Media (Lenhart, Madden, Rankin Macgill, & Smith, 2007) reveals that 93% of the teens
who were interviewed use the Internet !"#!#"$%&!'#())*&+,#-'!%).##/0&"1#%$2-')3#4&*0#5)%)+*#&+*)5+)*#2"!,)#"*!*&"*&%"1#)"*!6'&"0)"#73&,&*!'#
+!*&8)"9#!"#!%*&8)#-!5*&%&-!+*"#&+#*0)#3)"&,+#$:#+)4#()3&!#!"#"$%&!'#%$''!6$5!*&8)#*$$'".#;$2'3#*0)")#"$%&!'#*$$'"#6)#)::)%*&ve in the
e-learning context or will they form part of a wider knowledge management framework? The purpose of this study is to outline the
design of the measurement of interaction processes in the virtual spaces used for e-learning.
interaction processes, rather than knowledge acquisition.
The socio constructivist approach in this scenario
describes learning as a collaborative meaning-making
experience where learners participate in a number of
interaction processes which facilitate the learning process.
The interactions between learning communities, as well
as individuals within the learning communities, as has
been argued by Alier (2006), in essence would enforce the
reason for existence of Virtual Worlds (VWs)
transforming gaming into serious ga ming, breeding social
communities of practice (CoP) which eventually develop
into learning communities.
The scope of this study is to create the framework for the
measurement criteria assessing the validity of VWs for
the teaching-learning process using human-behaviour
parameters. The rest of the paper is structured as follows:
Section 2 gives a brief overview of the insights into
e-learning perspectives, whereas Section 3 highlights the
pedagogical value of collaborative spaces. Section 4 has a
look at established pedagogies which can be implemented
in VWs whereas Section 5 and Section 6 propose a design
and framework parameters for E-SPACES. Section 7
looks at future developments of the framework for the
measurement and validity of the effectiveness of social
collaborative process in the VWs.
1. Introduction
Tiropanis, et al., (2009) discuss how the level of adoption
and use of tools and services within the higher education
sector in the UK associated with teaching and learning,
are various. In addition to these tools and services in the
form of Web2.0 applications, or Learning Management
Systems (LMS), a number of educational institutions,
make use of Virtual Worlds (VWs) for various learning
activities (NMC, 2010). These learning activities take the
form of seminars or tutorials or simulations as well as
other problem solving exercises, engaging learners in
their knowledge building process (Wrzesien & Alcaniz
Raya, 2010). More learning activities are described in
Petrakou (2009) who illustrates in detail the scope of
having a specific virtual environment for the facilitation
of transmission of the online content whilst Kumar et al.
(2008) portray the VW as being a social environment
which holds computer-based simulations which users can
make use of without any pre-defined objectives but which
yet assimilates groups of people together through an
expression of interest. Carey (2007) argues that VWs are
intended to be immersive social experiences which not
only offer alternatives to face to face interactions but
which can also provide new forms of human experiences,
built upon a vast array of communication tools which can
offer the same emotional satisfaction as gathered from the
social exchanges happening on the daily basis. This of
course is discussed within the context of the online
environment which is discussed extensively in
(Dillenbourg, Lehtinen et al., and Slavin in Petrakou,
(2009)). These describe how the transition towards the
migration of learning content to the online environment is
further assisted by a number of interaction processes
within collaborative learning. The latter, being one of the
pillars of the design for e-learning systems contributes to
the construction of new concepts, collectively brought
together through communities, most often, established by
dialogical interaction (Etelapelta & Lahti, 2008). The
premise of this study is built around learning theories
which adopt the socio-constructivist approach (Vygotsky,
1978) describing knowledge construction through
2. E-learning Perspectives
Over the years the use of ICT in education has shifted
from mere Computer Based Learning (CBL), making use
of software as a means of knowledge transmission, to
Computer Enhanced Learning (CEL), which aims to
improve the environment for creative knowledge practice.
Studies have in fact shown that merely pushing content
online is not returning the results expected (Spalter &
Simpson, 2000).
Solimeno et al (2008) show, 0$4#2-#2+*&'#*0)#'!*)#<=>?9"#
the first models of computer supported education, put the
learner as a solo-user creating an isolated niche where the
-5$($*&$+#$:#')!5+&+,#!*#$+)9"#$4+#&+3&8&32!'#-!%)# 4!"#
highlighted. However more recently due to the social
networking boom, researchers have been looking at a
more advanced form of computer supported collaborative
learning, as an additional enhancement to the online
teaching model. Such a derivation of the CEL is based
upon constructivist learning theories which focus
primarily on the social interdependence as affecting the
learning process. This in fact has given rise to a new
evolution to the use of learning management systems,
which in addition to providing content, are also providing
some means of online interactivity, paving the way for
social interactions as a means of constructing knowledge
concepts. [email protected]"# )-learning paradigm has shown an
evolution from Learning Management Systems (LMS)
where the scope is that of utilising the web as a pipe, there
only to deliver content, to a meeting point, a place where
to hang out with others in specific CoP.
Brophy (in Paechter, Maier, & Macher, 2010) propose
five fields of instruction as being core components of
e-learning design. These include the course design and the
electronic environment, the interaction between students
and instructors, the interactions among peers, the
individual learning process and the course outcomes. The
interactions and processes will be discussed in more detail
in Section 4. In addition to interactions, Granic, Mifsud,
& Cukusic (2009) further propose that clear pedagogical
objectives based on sound pedagogic principles need to be
incorporated within the e-learning design for more
effectiveness within the teaching learning process to be
achieved. The pedagogic approach chosen by the authors
:$5# *0)&5# "*23@# &"# 62&'*# !5$2+3# *0)# %$+%)-*# $:# A!%*&8)#
')!5+&+,B#4&*0#%$5e components which include aspects of
constructivism, blended learning and collaborative
learning. Engaging and further motivating the learner for
!# ($5)# !%*&8)# &+8$'8)()+*# 2"&+,# C$'69"# )D-)5&)+*&!'#
learning theories (Kolb, 1984) is one of the basic
pedagogic principles which will be adopted in this study.
This then leads to the development of a
socio-constructivist model for e-learning which will be
used to enhance deeper conceptual thinking.
social capital is defined as a pool of resources which an
individual can accumulate as a result of developed
interrelationships. The parameters within which learning
communities are assessed include:
! E# ()!"25)# $:# ')!5+)5"9# "!*&":!%*&$+ during
! Characterisation of interpersonal relationships
during collaborative practice;
! Peer support, indicating connectivity throughout
the experience;
! Change in behaviour owing to the social capital
These parameters will be taken into account when
designing the framework for measuring the effectiveness
of virtual worlds for knowledge building activities.
4. Virtual Pedagogies?
Camilleri & Montebello, (2008) propose a virtual
assistant within the social context of the VWs which not
only aim to assist and aid in cooperative knowledge
building, but which can learn and sustain a mentally
stimulating interactive conversation F a two-way
communication which finds its roots in every social
+)*4$5G&+,# !--'&%!*&$+.# H+'&+)# 2")5"9# 5)I2&5)()+*"# !5)#
considered quite distinct however those resident in VWs
have unique needs. These include:
! Maintaining student engagement;
! Developing a community;
! Providing immediate feedback;
! Similar learning opportunities;
! Hands-on interactive activities;
! Student ! content interaction;
! Faculty ! student interaction;
! Student ! student interaction.
Furthermore the authors identify a number of strategies
which when implemented within the virtual world space
contr&62*)#*$#*0)#%5)!*&$+#$:#!#7J)!5+"%!-)9# F a learning
environment within the VWs which is built upon:
Flow in balancing inactivity and challenge.
Repetition allowing learners to repeat experiments until
they are satisfied with the outcomes.
Experimentation in encouraging learners to try and learn
in the process.
Experience which is more engaging than other digitally
mediated technologies.
Doing through practice.
Observing through an essential communication platform.
M$*&8!*&$+#"*&(2'!*)3#6@#*0)#-)$-')9"#$4+#active part.
3. Collaborative Spaces
Having established that research trends in pedagogies
applied to the online learning environment point towards
the setting up of communities for collaborative
constructivist models, this research proposes to determine
the parameters around which such communities are built.
Miller & Brunner, (2008) make use of the Social Impact
Theory (SIT) (Latane in Miller & Brunner, 2008) to
2+3)5"*!+3# 0$4# ')!5+)5"9# &+*)5-)5"$+!'# %0!5!%*)5&"*&%"#
affect peers during collaborative learning experiences.
/0&"# *0)$5@# &"# 3)"%5&6)3# !"# %0!+,)"# &+# !+# &+3&8&32!'9"#
behaviour, resultant from communication exchanges with
7-)5%)&8)39# !+3# 5)!'# &+38&32!'".# /0)# %$+%)-*# $:# *0)#
perceived peers can be made use of within the virtual
world ecosystem, an environment designed and built as a
collaborative space. The online environment in itself has
been indicated as being more of a support and a
supplement to face to face interaction. Research (Tomai,
Rosa, Mebane, A, Benedetti, & Francescato, 2010) has
shown that the development of online communities and
social networks contributes to a possible increase in the
social capital for each individual within the group. The
Virtual pedagogies are also designed around different
approaches and perspectives. Bonanno (2008) in his
discussion of learning through collaborative gaming: a
process-oriented pedagogy, comes up with a new model
which derives its inspiration :5$(# A%$++)%*&$+&"*B# !+3#
A%$+"*52%*&8&"*B# -)5"-)%*&8)"# !+3# 40&%0# ")58)"# *0)#
and the major factors that influence them during
Monahan & Bertolotto, (2008) describe the transition to
the Virtual Reality (VR) environment as one in which the
shift is from the 7%$+8)+*&$+!'# *)D*-6!")39# *$# *0)#
immersive and intuitive one, where the computer
simulates the natural environment thus making it easier
for the learner to identify with. The project Virtual
European Schools (VES) F (Bouras in Monahan, G, &
Bertolotto, 2008) simulates a collaborative learning
environment within virtual classrooms themed around
specific school subjects. This project has achieved a high
level of user satisfaction highlighting the social presence
!"#!#7(!K$5#!38!+*!,)9.##/0)#!2*0$5"#3)"%5&6)#LJMN-R, a
3D learning environment which proposes the social
interaction between learners as one of the most important
elements which is exploited from its marked absence in
other conventional, text based learning systems.
E-SPACES will attempt to build upon this research, by
making use of pedagogies and principles which have
already be ascertained to bring about a change in learner
behaviour, within VWs. These will be used to create a
measurable standard for the effectiveness of VWs on
learning, using distinct parameters for integrating human
complex behaviour in community-based learning.
Perceived usefulness of VWs for learning;
connecting relationships established through
learning communities.
The setting will be piloted within a specific case scenario,
in the higher education context. In this scenario, students
following the teacher training course (B.Ed (Hons.) will
experience collaborative learning practices, through a
hands-on pilot study held inside an immersive
environment, such as Second Life (SL,2010) or Olive
(Fortrerra Systems, 2010). This pilot study will embark on
$::)5&+,# &+3&8&32!'# ')!5+&+,# 7$6K)%*"9# 4&*0in the virtual
4$5'3# :$''$4&+,# *0)# 7OPMMQHR9# ($3)'# $2*'&+)3# &+#
section 4 of this document.
6. T he E-SPA C ES F ramewor k
Based upon the perspectives of e-learning design, the
E-SPACEs framework will take into account all the
interaction processes for connectivity and build a virtual
The E-SPACES framework will be designed around a
simulating environment exploiting the VW through
collaborative, constructivist and experiential activities.
Content presented for the pilot study will focus around
specific tasks and activities which future teachers can
design and create for their students. This means that the
through the virtual world, these future teachers, will
partake into their own active learning processes to design
different activities for school children at different levels.
Collaboration will take place within this virtual meeting
place which also offers sandboxes, to be able to
)D-)5&)+%)# &+# -5!%*&%)1# *0)&5# -))5"9# *!"G# 3)"&,+".# /0)#
scope of the framework is to clearly define the
measurement parameters and establish whether VWs
increase the effectiveness of the teaching/learning process
comparing the results to a real world control group
participating in the same exercise on a face-to-face
classroom setting.
E-SPACES proposes that the content bridges the gap
between the pedagogic approaches and the interactions
between the actors involved. In the VW, E-SPACES
proposes three distinct actors all having a number of
interactions; the educator as the instructional designer, the
virtual agent as an intelligent assistant facilitating the
virtual experience, and the learners actively involved in
their own learning process. The interactions proposed
involve the three actors, interrelating with the content
presented within the socio-collaborative environment.
/0)# !--5$!%0)"# %$++)%*)3# 4&*0# *0)# !%*$5"9# &+*)5!%*&$+"#
will build this virtual ecosystem which will be the niche of
the learning experience in the social space constructed.
The research questions will be shared amongst the actors
in this framework. The methodology proposes both
qualitative and quantitative data collection, taking views
attainment targets at the end of a pilot course in the
E-SPACE framework.
The questions proposed in this study are designed for the
measurement of effectiveness within this learning
framework and will facilitate a clearer understanding of
5. Proposed Design
One very important component of this study is the
implementation of a virtual space, included in which are
the key elements which would enable users to experiment
with their own learning and interact with each other in a
collaborative environment, in a persistent space,
facilitating meetings, collaboration, and socialisation for
the construction of new concepts. It is also important for
the study to establish the validity of the theories proposed
and the implementation of the technologies applied in
terms of the teaching/learning experience.
Through the use of virtual reality learners can thus
become more visually aware of their companions, through
they are no longer isolated in their online learning sphere.
The design of E-SPACES proposes that avatar presence is
persistent within the VWs. In Camilleri & Montebello
(2008) the concept of persistence and scope is emphasised
in that VWs without a collective scope or interest remain
void and fulfil nothing more than a static representation of
content transmission. The pedagogical approaches
-5$-$")3# 2")# %$+%)-*"# $:# 7!%*&8)# ')!5+&+,9# &+8$'8&+,#
learners in their own knowledge building, using the
constructivist and collaborative models as well as
process-oriented models (Bonanno, 2008). Categories of
interactions, influenced by a number of parameters and
interpersonal factors in the connected VW will also be
applied within the design. One fundamental approach to
the design is the specification of the learning communities
of practice in the VW context, and the content which will
serve to connect learners.
The complex human behaviour relationships which will
be targeted through E-SPACES will measure the:
! attitude of users towards 3D VWs;
! Perceived behavioural control of users in
relation to the VWs;
9. References
the findings.
Question #1: How does learning occur in the VW?
S2)"*&$+# TUV# ;0!*# !5)# *0)# "*23)+*"9# -)5%)-*&$+" of
learning in the online context?
S2)"*&$+# TWV# ;0!*# !5)# *0)# "*23)+*"9# -)5%)-*&$+"# $:#
learning in the VW context?
Question #4: Does learning transfer from the VW to real
Question #5: What is the perceived usefulness of the VW
context for the learners?
Question #6: How are the interactions in the VW
Question #7: How useful for their learning do learners
find the interactions within the space?
Ajjan, H., & Hartshorne, R. (2008). Investigating Faculty
descisions to adopt Web2.0 technologies: Theory and
empirical tests. Internet and Higher Education , 71-80.
Alier, M. (2006). A Social Constructionist Approach to
Learning Communities: Moodle. In M. D. L, & N. (.
Ambjorn, Open Source for Knowledge and Learning
Management: Strategies beyond Tools. Idea Group,
Barbour, M. K., & Reeves, T. (2009). The reality of
virtual schools: A review of literature. Computers &
Education , 402-416.
Bonanno, P. (2008). Learning through Collaborative
Ga ming: A Process-oriented Pedagogy. Finland:
Camilleri, V., & Montebello, M. (2008). SLAVE F Second
Life Assistant in a Virtual Learning Environment.
RELIVE08 ! Researching Learning in Virtual
Environments. Milton-Keyes: The Open University.
Carey, J. (2007). Expressive Communication and Social
Conventions in Virtual Worlds. The Data Base for
Advances in Information Systems , 81-85.
Casamayor, A., Amandi, A., & Campo, M. (2009).
Intelligent assistance for teachers in collaborative
learning environments. Computers & Education ,
Chou, S.-W., & Min, H.-T. (2009). The impact of media
on collaborative learning in virtual settings: The
perspective of social construction. Computers &
Education , 417-431.
Etelapelta, A., & Lahti, J. (2008). The resources and
obstacles of creative collaboration in a long-term
learning community. Thinking Skills and Creativity ,
Granic, A., Mifsud, C., & Cukusic, M. (2009). Design,
implementation and validation of a Europe-wide
pedagogical framework for e-Learning. Computers &
Education , 1052-1081.
Jarmon, L., Traphagan, T., M, M., & Trivedi, A. (2009).
Virtual world teaching, experiential learning, and
assessment: An interdisciplinary communication
course in Second Life. Computers & Education ,
Kolb, D. A. (1984). Experiential learning: Experience as
a source of learning and development. Englewood
Cliffs, NJ: Prentice-Hall.
Kumar, S., Chhugani, J., Kim, C., Kim, D., Nguyen, A.,
Dubey, P., et al. (2008). Second Life and the New
Generation of Virtual Worlds. Computer , 46-53.
Miller, M., & Brunner, C. C. (2008). Social impact in
examination of online influence. Computers in Human
Behavior , 2972-2991.
Monahan, T., G, M., & Bertolotto, M. (2008). Virtual
Reality for Collaborative e-learning . Computers &
Education , 1339-1353.
NMC. (2010). What is Happening in Virtual Worlds? US:
Paechter, M., Maier, B., & Macher, D. (2010). Students'
expectations of, and experiences in e-learning: Their
relation to learning achievements and course
satisfaction. Computers & Education , 222-229.
Petrakou, A. (2009). Interacting through avatars: Virtual
worlds as a context for online education. Computers &
7. F uture Developments
The E-SPACES framework is interdependent on a number
of parameters including the VW platform chosen, the
target sector of learners involved in the pilot study, and the
content which is chosen to bridge the gap between the
pedagogic approaches and the interactions proposed. It is
being proposed that the current study undergoes specific
analysis to gather data for this framework. It is then
proposed that data and content are integrated within the
framework and implemented during a short pilot course.
The limitations and challenges of this study, will surface if
a limited number of students are chosen for this study.
This might occur depending on the content chosen and the
participants available for the duration of the course. The
quantitative measure of the effectiveness of the social
spaces, will need to be performed against a control. Such
control might be difficult to establish in the context of the
learning environment. It is expected that the future
development of E-SPACES is to identify limitations and
challenges, for the design of the study measuring the
effectiveness of social niches established in the context of
8. Conclusion
Whilst the use of 3D-VWs seem to point towards their
increased use for the learning contexts of the future, there
is limited research validating their effectiveness based
upon pedagogic approaches taking into consideration
collaborative learning in the socio-constructivist
perspective. Virtual spaces have a number of
characteristics which can be found commonly throughout
all platforms including the presence of avatars, an
immersive experience and a series of interactions between
player characters, non player characters and other world
components. VWs are a combination allowing for
real classroom, including all the interactions and
exchanges, indeed be transferred to the virtual world?
How can this challenge be identified and overcome? Can
technology be used to increase the effectiveness of this
learning medium? This research is needed to understand
how experiential collaborative activities may apply to a
number of instructional contexts within the VWs.
Education .
Solimeno, A., M.E., M., Tomai, M., & Francescato, D.
(2008). The influence of students and teachers
characteristics on the efficacy of face-to-face and
computer supported collaborative learning. Computers
& Education , 109-128.
Spalter, A., & Simpson, R. (2000). Integrating interactive
computer-based learning experiences into established
curricula: a case study. Proceedings of the 5th annual
S IG C S E/S IG C U E ITiC S Econference on Innovation and
technology in computer science education (pp. 116 119 ). Helsinki, Finland: ACM, New York.
Tiropanis, T., Davis, H., Millard, D., Weal, M., White, S.,
& Wills, G. (2009). JIS C - SemTech Project Report.
Southampton, UK: JISC CETIS.
Tomai, M., Rosa, V., Mebane, M. E., A, D., Benedetti, M.,
& Francescato, D. (2010). Virtual communities in
schools as tools to promote social capital with high
school students. Computers & Education , 265-274.
Vygotsky, L. (1978). Mind and society: The development
of higher mental processes. Cambridge, MA: Harvard
University Press.
Wrzesien, M., & Alcaniz Raya, M. (2010). Learning in
serious virtual worlds: Evaluation of learning
effectiveness and appeal to students in the E-Junior
project. Computers & Education .
A Semantic Knowledge Base for Personal Learning and Cloud Learning
Alexander Mikroyannidis, Paul Lefrere, Peter Scott
Knowledge Media Institute, The Open University
Milton Keynes MK7 6AA, United Kingdom
E-mail: {A.Mikroyannidis, P.Lefrere, [email protected]
Personal Learning Environments (PLEs) and Cloud Learning Environments (CLEs) have recently encountered a rapid growth, as a
response to the rising demand of learners for multi-sourced content and environments targeting their needs and preferences. This
paper introduces a semantic knowledge base that utilises a multi-layered architecture consisting of learning ontologies customized
for certain aspects of PLEs and CLEs. A number of stakeholder clusters, including learners, educators, and domain experts, are
identified and are assigned distinct roles for the collaborative management of this knowledge base.
presents integration mechanisms for the different layers
of the knowledge base. Section 5 describes the involved
stakeholder clusters and their roles within the
management of the knowledge base. Section 6 discusses
certain challenges arising from the collaborative nature
of the management of the knowledge base. Finally, the
paper is concluded and the next steps for progressing this
work are provided.
Personal Learning Environments (PLEs) and Cloud
Learning Environments (CLEs) are gradually gaining
ground over traditional Learning Management Systems
(LMS) by facilitating the lone or collaborative study of
user-chosen blends of content and courses from
heterogeneous sources, including Open Educational
Resources (OER).
PLEs follow a learner-centric approach, allowing the use
of lightweight services and tools that belong to and are
controlled by individual learners. Rather than integrating
different services into a centralised system, PLEs provide
the learner with a variety of services and hands over
control to her to select and use these services the way she
deems fit (Chatti et al., 2007).
The OpenLearn case study
The Open University ( provides a wide
range of OER through the OpenLearn educational
environment ( OER can be
described as “teaching, learning and research resources
that reside in the public domain or have been released
under an intellectual property license that permits their
free use or repurposing by others depends on which
Creative Commons license is used” (Atkins et al., 2007).
OER are freely available on the web and can be accessed
through common web sites or Virtual Learning
Environments (VLEs), and more recently through PLEs
and CLEs. They can be used, edited and shared by any
interested party, such as learners, teachers, institutions,
and learning communities.
CLEs extend PLEs by considering the cloud as a large
autonomous system not owned by any educational
organisation. In this system, the users of cloud-based
services are academics or learners, who share the same
privileges, including control, choice, and sharing of
content on these services. This approach has the potential
to enable and facilitate both formal and informal learning
for the learner. It also promotes the openness, sharing
and reusability of OER on the web (Malik, 2009).
OpenLearn users have the ability to learn at their own
pace, keep a learning journal in order to monitor their
progress, complete self assessment exercises, and discuss
with other learners in forums. OpenLearn has gathered
the interest of a wide audience ranging from
governmental and non-governmental entities interested
in promoting continuing professional development,
public and private higher education institutes, academic
teachers, training course designers, graduate and
postgraduate students, educational researchers, and
generally anyone interested in informal learning (Okada,
In the context of the European project ROLE
Environments we are targeting the adaptivity and
personalization of learning environments, in terms of
content and navigation, as well as the entire learning
environment and its functionalities. We propose the use
of ontologies to model various aspects of the learning
process within such an environment. In particular, we
consider a semantic knowledge base as the core of the
learning environment, enabling the collaboration
between diverse stakeholder clusters.
OpenLearn is essentially a traditional LMS, based on the
Moodle platform (, following a
course-based paradigm, rather than a learner-based one.
It has been built around units of study and not the
personal profiles of learners. Currently, OU students are
missing a place where they can aggregate the content
offered by different OU services, such as OpenLearn and
iTunesU, and mix it together with other educational
The remainder of this paper is organised as follows.
Section 2 describes the OpenLearn case study, consisting
of a traditional LMS into transition towards the PLE and
CLE paradigms. Section 3 introduces the architecture of
the proposed semantic knowledge base and discusses the
various learning ontologies that formulate it. Section 4
Figure 1. Climate change OER in OpenLearn (
content. Therefore, what we aim to offer OU students in
the context of ROLE, is a combined aggregator and
e-portfolio, where they can set their learning goals,
gather and organise various learning resources, monitor
their progress, get recommendations from the system and
their peers, and connect with other learners.
widgets, configuring them, tagging them, and organising
them into thematic categories in different tabs.
In the context of the ROLE project, we are working on
the transition from the LMS-based approach of
OpenLearn towards the PLE and CLE paradigms, by
putting emphasis to the needs and preferences of learners.
In particular, we aim at providing them with a wider
range of OER to choose from, both from OpenLearn as
well as from external Web 2.0 sources. However,
discovering OER from such a wide range is not an easy
task; therefore providing the learners with OER
recommendations based on information from their
profiles and portfolios is very important.
In order to explore the present limitations of OpenLearn,
we have been comparing its capabilities with those of a
PLE, by delivering the same learning resources with both
approaches. For this purpose, we have created a
collection of OER related to the UK 10:10 climate
change campaign ( Figure 1
shows this collection delivered by the existing
OpenLearn environment, featuring OpenLearn courses
and OU albums from iTunesU. In addition, content from
external sources, such as YouTube and SlideShare, is
included. However, syndication from dynamic Web 2.0
sources, such as the blogosphere, Twitter, and
FriendFeed, is not supported.
We propose the use of ontologies to model various
aspects of the learning process within the transformed
OpenLearn environment. In particular, we consider a
semantic knowledge base as the core of this learning
environment, enabling the use of metadata and
ontologies to annotate learning resources, and model
various aspects of the learning process, such as learner
profiles. The curation of the proposed semantic
knowledge base is supported by the active involvement
and collaboration between different stakeholder clusters.
On the other hand, the PLE of Figure 2 is a showcase of
a widget-based environment hosting the same climate
change resources as in OpenLearn, in addition to
dynamic Web 2.0 sources. Compared to OpenLearn, this
approach offers more flexibility in terms of creating new
Figure 2. A widget-based PLE for climate change OER (
Starting from the top of the pyramid, the Learner layer
contains ontologies that model the profiles of the learners
involved in the learning process. In particular, the
ontologies of this layer model the learners’ profiles
according to their interests, goals, preferences, and skills.
Some ontology standards corresponding to this layer are
the IEEE Learning Objects Metadata Standard (LOM)
al_Draft.pdf), the IEEE Personal and Private Information
for Learner (IEEE PAPI) both developed by the IEEE
Learning Technology Standards Committee (LTSC), the
(, and the IMS
Reusable Definition of Competency and Educational
Semantic knowledge base architecture
In order to efficiently manage the metadata associated
with different aspects of the learning process, we
propose their organisation into a number of ontology
layers. Figure 3 shows the multi-layered semantic
knowledge base adapted from the Heraclitus II
framework (Mikroyannidis and Theodoulidis, 2006,
Mikroyannidis, 2007, Mikroyannidis and Theodoulidis,
In this pyramid, the lower layers represent more generic
and all-purpose ontologies, while the ontologies of the
upper layers are customized for certain uses within a
PLE or CLE. When traversing the pyramid from bottom
to top, each layer reuses and extends the previous ones.
In addition, whenever a layer extends the ones below it
(e.g. with the insertion of new concepts), these
extensions are propagated to the lower layers. Different
stakeholder clusters curate each layer, depending on the
expertise that each layer requires. The integration of the
ontology pyramid layers is achieved with the use of
ontology mappings between ontologies belonging to the
same or different layers.
The Learning Resource layer models the learning
resources that are employed within a PLE or CLE by
learners. These resources are mainly widgets of
educational tools and content. For example, the climate
change PLE of Figure 2 includes widgets of:
• OpenLearn OER
• iTunesU albums
• External resources, e.g. blog feeds, YouTube videos,
SlideShare presentations, Google gadgets, etc.
Knowledge maps
The ontologies of the Learning Resource layer are
constructed out of annotations of these widgets. These
annotations can be user-generated tags, or automatically
generated semantic annotations, e.g. with the use of IE
(Information Extraction) and NLP (Natural Language
Processing) techniques. Apart from the Learner layer, the
IEEE Learning Objects Metadata Standard (LOM) also
corresponds to this layer, as it defines models for
learning objects, including multimedia content,
instructional content, as well as instructional software
and software tools.
Knowledge base integration
The integration of the ontology pyramid layers into a
single manageable scheme is achieved with the use of
ontology mappings. In terms the layers of the ontology
pyramid being mapped, ontology mappings are either
intra-layer, mapping ontologies of the same ontology
layer, or inter-layer, mapping ontologies belonging to
different layers.
From an architectural point of view, ontology mappings
can be either structural, namely referring to the structure
of the mapped ontologies, e.g. via is-a relations, or
semantic when mapping two ontology objects via a
semantic relation, such as an employer-employee
relation. OWL Full (Bechhofer et al., 2004) offers a
variety of constructs for representing structural ontology
mappings, including owl:subclassOf, owl:sameAs,
The Learning Domain layer models the learning domain
of interest. These are more generic ontologies describing
a certain domain of interest to the learner, e.g.
bioinformatics. The ontologies of the Gene Ontology
(GO) project (The Gene Ontology Consortium, 2000)
and the Foundational Model of Anatomy (FMA)
(Cornelius Rosse, 2003) are some widely used domain
ontologies in bioinformatics.
Ontology mappings are particularly useful for the
extraction of recommendations to the learner, as they
link her profile to learning resources, as well as to
profiles of other learners. They can therefore be used to
recommend learning resources of potential interest to the
learner. They can also be used to recommend a
‘study-buddy’, with whom the learner shares common
abilities and interests.
Finally, the Lexical layer contains domain-independent
ontologies of a purely lexicographical nature. An
example of such an ontology is the widely adopted
WordNet (Fellbaum, 1998). A lexical ontology is the
most generic form of ontology that can be constructed.
The ontologies of this layer can be used to model
practically any domain. The ontologies of all the other
layers are independent of the language used, or other
linguistic issues, which concern only this layer.
Stakeholder clusters
Since each ontology layer represents a different degree
of specialization, different stakeholder clusters are
required to contribute to the curation of each layer.
Starting from the bottom of the pyramid, lexicographers
have the knowledge on language structures that is
required in this level. Domain experts need to be
employed for the next layer. These are professionals on a
certain domain, e.g. biologists are responsible for a
biology-related ontology.
Although lexical ontologies constitute a strong basis for
the construction of any domain-specific ontology, their
relations tend quite often to be imprecise and thus not
suitable for logical reasoning. This can be addressed with
the use of more strictly constructed, general purpose
ontologies, such as SUMO (Sevcenko, 2003). Such
models can act as structuring mechanisms for lexical
ontologies or intermediates between lexical and domain
Figure 3. Multi-layered semantic knowledge base
For the Learning Resource layer, a more diverse group is
suitable: producers and consumers of learning resources.
The producers are those that develop learning resources,
either content or tools. They can be lecturers, learning
designers, or team leaders who develop new courses,
workshops or training sessions and author new learning
material. The consumers are learners who use and
annotate the offered learning resources.
conflicts. Various technologies can be used to
address this issue, such as CVS (The Gene Ontology
Consortium, 2000), Wiki (Auer et al., 2006,
Schaffert, 2006), or peer-to-peer based solutions
(Becker et al., 2005, Xexeo et al., 2004).
Consistency maintenance: Parts of the knowledge
base curated by different authors may be
inconsistent with each other, since an ontology
usually reflects the point of view of each author.
Mechanisms for structural and semantic consistency
preservation as well as change propagation need to
be provided to ensure that the knowledge base is
free of inconsistencies at all times.
Privilege management: In order to ensure the
accuracy of the knowledge base, a collaborative
environment needs to assign different levels of
privileges to its users, based on their expertise,
authority, and responsibility. Our architecture is
based on a flat scheme regarding privilege
management, by giving each stakeholder cluster
equal privileges in their layer of responsibility.
History maintenance: Collaborative environments
should provide the means to recover from wrong or
unintended changes to the knowledge base. All
changes to the knowledge base should be thus
recorded in order to be able to track the authorship
of a change and to prevent loss of important
information. The bitemporal ontology model of
Heraclitus II (Mikroyannidis, 2007) retains the
necessary information to achieve this goal.
Scalability: Long-term collaboration of diverse
parties usually increases the size of knowledge bases;
therefore, a collaborative environment has to be
scalable to large ontologies. This is particularly
important in the abundant environment of CLEs,
where a wide variety of cloud-based services is
Finally, the Learner layer is curated by learners, who
provide information about themselves in order to receive
recommendations about learning resources and create
personal networks with users from different learning
environments, with whom they may share common
learning interests.
Depending on the scope of intra and inter-layer ontology
mappings, these are performed by one or more
stakeholder clusters. For example, an inter-layer
ontology mapping between the lexical and the domain
layer will be created jointly by the stakeholder clusters of
these two layers, namely lexicographers and domain
experts. Intra-layer ontology mappings are performed by
the stakeholder cluster of the corresponding layer. The
assignment of stakeholder clusters as curators of the
ontology pyramid layers is summarized in Table 1.
Ontology layer
Stakeholder cluster
Lexical layer
Learning domain
Domain experts
Learning resource
Learning resource developers /
Learner layer
Inter-layer ontology
Stakeholder clusters of
corresponding layers
Intra-layer ontology
Stakeholder cluster of
corresponding layer
Table 1. Assignment of stakeholder clusters as curators
of the semantic knowledge base
Challenges in collaborative ontology
PLEs and CLEs address the crucial demands of today’s
learner for a personalized and adaptive learning
environment. In order to achieve these goals, we propose
the use of ontologies for modeling the learning process
and assigning distinct curator roles to the involved
stakeholder clusters. We perceive a semantically
enhanced PLE or CLE as the evolution of the present
OpenLearn environment, as well as the evolution of
LMS-based approaches in general.
Collaboration between stakeholder clusters in curating
the semantic knowledge base is essential; however, it
involves several challenges, including concurrency,
consistency, and scalability issues. We will be targeting
the following set of parameters for collaborative
ontology management, as outlined in (Bao et al., 2006):
Conclusion and next steps
Knowledge integration: A fundamental task in a
collaborative environment is the integration of
contributions from multiple participants. The
proposed semantic knowledge base consists of a
multi-layer architecture that is curated by diverse
clusters of stakeholders. Reusability and integration
is supported through ontology mappings.
We are currently in the process of refining the
specifications of the proposed semantic knowledge base
for addressing particular requirements of the OpenLearn
case study. This refinement includes reviewing existing
ontology standards in terms of their suitability to be
OpenLearn-specific ontology pyramid.
Concurrency management: Different ontology
authors need to be able to work on different parts of
the knowledge base simultaneously. In case the
same part of the knowledge base is concurrently
edited by more than one author, this can cause
Management and Evolution for Business Intelligence.
International Journal of Information Management,
Okada, A. (2007) Knowledge Media Technologies for
Open Learning in Online Communities. International
Journal of Technology, Knowledge and Society, 3(5),
Schaffert, S. (2006) IkeWiki: A Semantic Wiki for
Collaborative Knowledge Management. 15th IEEE
International Workshops on Enabling Technologies:
(WETICE'06). Manchester, UK, 388-396.
Sevcenko, M. (2003) Online Presentation of an Upper
Ontology. Znalosti 2003. Ostrava, Czech Republic.
The Gene Ontology Consortium (2000) Gene Ontology:
tool for the unification of biology. Nat Genetics, 25,
Xexeo, G., De Souza, J. M., Vivacqua, A., Miranda, B.,
Braga, B., Almentero, B. K., D' Almeida, J. N., Jr. &
Castilho, R. (2004) Peer-to-peer collaborative editing
of ontologies. 8th International Conference on
Computer Supported Cooperative Work in Design
(CSCWD 2004). Xiamen, China, 186-190
The research work described in this paper is partially
funded through the ROLE Integrated Project, part of the
Seventh Framework Programme for Research and
Technological Development (FP7) of the European
Union in Information and Communication Technologies.
Atkins, D. E., Brown, J. S. & Hammond, A. L. (2007) A
Review of the Open Educational Resources (OER)
Movement: Achievements, Challenges, and New
Opportunities. The William and Flora Hewlett
Auer, S., Dietzold, S. & Riechert, T. (2006) OntoWiki A Tool for Social, Semantic Collaboration. 5th
International Semantic Web Conference (ISWC 2006).
Athens, GA, USA, Springer LNCS, 736-749.
Bao, J., Hu, Z., Caragea, D., Reecy, J. & Honavar, V. G.
(2006) A Tool for Collaborative Construction of
Large Biological Ontologies. 17th International
Conference on Database and Expert Systems
Applications (DEXA'06). Krakow, Poland, 191-195.
Bechhofer, S., Harmelen, F. V., Hendler, J., Horrocks, I.,
Mcguinness, D. L., Patel-Schneider, P. F. & Stein, L.
A. (2004) OWL Web Ontology Language Reference.
Recommendation.World Wide Web Consortium,
Becker, P., Eklund, P. & Roberts, N. (2005) Peer-to-peer
based ontology editing. International Conference on
Next Generation Web Services Practices (NWeSP
2005). Seoul, Korea, 259-264.
Chatti, M. A., Jarke, M. & Frosch-Wilke, D. (2007) The
future of e-learning: a shift to knowledge networking
and social software. International Journal of
Knowledge and Learning, 3(4/5), 404-420.
Cornelius Rosse, J. L. V. M. J. (2003) A reference
Foundational Model of Anatomy. Biomedical
Informatics 36 (2003), 478 - 500.
Fellbaum, C. (1998) WordNet: An Electronic Lexical
Database: The MIT Press.
Malik, M. (2009) Cloud Learning Environment - What it
Mikroyannidis, A. (2007) Heraclitus II: A Framework
for Ontology Management and Evolution. PhD
Thesis, Manchester Business School, University of
Manchester, Manchester
Mikroyannidis, A. & Theodoulidis, B. (2006) Heraclitus
II: A Framework for Ontology Management and
Evolution. 2006 IEEE /WIC/ACM International
Conference on Web Intelligence (WI 2006). Hong
Kong, China, IEEE Computer Society, 514-521.
Mikroyannidis, A. & Theodoulidis, B. (2010) Ontology
Semantic Annotation for Semi-Automatic Positioning of the Learner
Petya Osenova, Kiril Simov
Linguistic Modelling Laboratory, IPP-BAS
Acad. G.Bonchev 25A, 1113 Sofia, Bulgaria
[email protected], [email protected]
The learner’s positioning with respect to a curriculum is of a great importance to the life-long learning (an informal learner needs to
achieve a certain level of competency) as well as to the mobility learning (a student spending a semester in another university). In both
cases it is necessary to determine learner’s prior knowledge. Thus, he might profit in an optimal way from the consequent learning
process. The learner’s positioning requires grading of pre-course questionnaires by a tutor. This grading is tedious and time-consuming
work. In this paper we present the first implementation of a knowledge-rich method for supporting the tutor in the positioning task. Our
method exploits the potential of the semantic annotation with regard to the curriculum and the learner’s questionnaire answers. The
annotation of the curriculum provides the level of the competence to be covered in the course, while the annotation of the questionnaire
answers provides evidence for the learner competence per se. The final judgment is assigned to the tutor. The presented method might
be well used also for the learner’s self-positioning with slight modifications only.
Section 4 describes the semantic annotation of the
curricula and answers. Section 5 outlines some
preliminary evaluation of the method. Section 6 presents
the further extensions over the semantic annotation.
Section 7 concludes the paper.
Learner’s positioning, on the one hand, has proved to be a
very important step in the learning process, and on the
other hand, to be a very difficult task. It has been often
considered in the context of the self-positioning (Ross
2006) or the context of various groups of practices (Braun
and Schmidt 2008). The central role in the positioning
task plays the tutor, since there has not been invented yet a
completely automatic way, which to be 100 % successful
and reliable.
Hence, our aim is to support the tutor in his judgments
when positioning the learners. We assume that the tutor is
inspecting a set of learner’s answers to a questionnaire.
The questionnaire would reflect the required knowledge
that has to be covered by the learner. The actual
positioning is with respect to a curriculum, which presents
the following aspects: the knowledge-oriented
requirements for a learner, a set of learning materials to
support him during the learning process, and links to
people who might help him with the learning topics. Thus,
the questionnaire is designed on the base of the
curriculum. The positioning in these settings is viewed as
a set of recommendations from the tutor to the learner
which directs the learner within the curriculum, i.e. which
materials to study, which people to contact, etc.
Our method relies on the comparison among the
curriculum and the related learner’s answers, both
semantically annotated. This comparison highlights the
learner abilities to express the necessary concepts in the
answers of the questions from the questionnaire. The tutor
can use the results from the comparison to balance his
judgments individually for each learner and collectively,
as a group. This method has also the advantage that some
conceptual or terminological gaps/inconsistencies might
also be discovered in the curriculum itself.
The structure of the paper is as follows: Section 2
concentrates on the various aspects of the knowledge rich
method. Section 3 overviews the design of the curriculum
and the questionnaire answers as well as their interaction.
The Knowledge-Rich Approach
In our work on the positioning of the learner with the help
of the knowledge rich approach 1 we rely on the ideas
reported in (Kalz et al. 2007). They discuss the notion of
the learning network. According to it, learner’s
competence can be automatically compared to a set of
concept evidences of the target competence. Our goal is to
achieve an ontology-based positioning where the learner
competence is represented by a learner’s competence
ontology and curriculum competence ontology. However,
reliable competence ontologies are still missing. Thus, in
our work we rely on domain ontologies, which are
supposed to reflect the knowledge part of the learner’s
competence. The ontological analyses of the learner’s
portfolio (mainly tests and CVs) and the textual
description of the relevant curriculum might be
considered an approximation of the learner’s (per se)
competence against the curriculum (required) competence.
Thus we consider the learning network a set of different
resources including tutors, experts, learning materials and
learners, whose connections are mediated by ontologies.
The positioning of a learner within the learning network is
identical to the task of creating a learning path for each
learner within the established network.
Our method facilitates the tutor in positioning task by
analyzing some of the textual elements of the network.
Thus, the knowledge rich methods rely on the analysis of
the text by using knowledge sources, external to this text,
such as ontologies, lexicons, grammars. These sources are
used to achieve a semantically rich text analysis which to
explicate the conceptual content of the learner’s answers
We call the method knowledge rich because it requires an
appropriate ontology to represent the conceptual knowledge to
be explicated in the curriculum and learner’s answers.
connection from ontology via lexicon to grammars is also
relied on for the semantic (concept) annotation of the text.
In this way, we established a connection between the
ontology and the texts. The relation between the lexicon
and the ontology is used for defining user queries with
respect to the appropriate segments within the documents.
In a more general setting, these relations can be extended
to cover the overall learning network. For example, the
annotation of a document with concepts connects it to the
ontology, which with the help of the lexicon and the
grammar connects it to other documents. Similarly, it is
possible to annotate other resources, such as images, web
sites, people profiles, etc.
to the questionnaire. The main steps in the text analysis
that we envisage as necessary in order to support the
problem in reliable way, are: (1) grammar-based semantic
annotation with concepts from an ontology, (2) discourse
segmentation; (3) lexical chains creation to support the
disambiguation of concept annotation from (1) and
concept distribution within the text; and (4) sentiment
analysis for evaluation of the concept usage in the text.
The combination of all these analyses should explicate in
the best way the conceptual content of the curriculum and
the learner texts, which to be used for the positioning.
Our first implementation of the positioning service
realizes completely only point (1) and sketches initially a
version of the other processing tasks. Therefore, in the
rest of this section we will concentrate on analysis (1),
namely grammar-based semantic annotation with
concepts from an ontology. The reason is that this analysis
has already been completely performed and preliminary
evaluated in a designed learning-based setting. The other
steps are discussed as further extensions to the task design
in Section 6.
As mentioned above, the knowledge rich approaches are
usually connected with the availability and the usage of
knowledge rich databases, such as ontologies and
lexicons. The ontologies reflect the conceptualizations in
some domain of interest – for example, DAML ontology
library, SWOOGLE, LT4eL ontology, etc. These
ontologies have to be connected to an upper ontology in
order to cover in a better way the general knowledge – for
example, DOLCE, SUMO, SIMPLE Core Ontology. The
most famous knowledge rich lexicons are the so-called
wordnets (WordNet, EuroWordNet, BalkaNet, SIMPLE).
Such resources are exploited for the semantic annotation
of documents and/or for semantic retrieval.
Within LT4eL project an ontology-to-text relation was
defined, which approaches the interaction among
conceptualizations and terms (Simov and Osenova 2007;
Simov and Osenova 2008). For clarity, this relation is
briefly presented here. We assume that the ontology is the
repository of the lexical meaning of the language. Thus,
we start with a concept in the ontology and we search for
lexical items and non-lexical phrases that convey the
content of the concept. There are two possible problems:
(1) there is no lexical item for some of the concepts in the
ontology, and (2) there are lexical items in the language
without a concept representing the meaning of the lexical
item in the ontology. The first problem is overcome by
allowing in the lexicon also non-lexical phrases to be
represented. The second problem is solved by extension
of the ontology. The lexicon items are then mapped to
grammars. We call them concept annotation grammars.
These grammars relate the lexicon to the text. Such a
mapping is necessary as much as lexical items and
phrases from the lexicons allow for multiple realizations
in the text. Thus, they require some additional linguistic
knowledge in order to disambiguate between different
meanings of some lexical item or phrase.
We have been using the relations between the different
elements for the task of ontology-based search. The
Design of the curriculum and the related
As it was mentioned above, we assume that a curriculum
consists of set of topics providing the content of a course
or a set of courses. Each topic is then associated with a set
of learning materials – lectures, tests, descriptions of
expected answers, etc.
The learner needs to acquire at least two kinds of
knowledge studying the curriculum: the content
knowledge and the skills necessary to apply the content
knowledge in a community of practise. Here we focus on
the content knowledge.
The questionnaire, on the other hand, consists of
questions of various types, which check the learner’s
status with respect to the curriculum topics. They might
reflect more surface as well as more profound aspects of
the topics.
However, as a first practical approximation we decided to
exploit a set, which more or less amalgamates both
perspectives – curriculum plus questionnaire, but at the
same time is being used in real job seeking situations. As
our design setting we used a sample of 10 topic questions,
provided by BitMedia within the LTfLL project. The
topics are in the IT area. Each question suggested a more
surface background when asking about types of things. It
also further asked about functions and properties. Some
examples are in order: Explain the meaning of the concept
RAM and describe its properties; Name as many PC ports
as you can and give some examples. These topic questions
have been equipped with a set of required example
answers. Since the set was provided in German,
translation was performed into English and Bulgarian.
Thus, only the real answers had to be gathered. This part
of the setting is described in Section 5.
On the base of this concrete curriculum, we identified the
following types of questions: (1) content questions which
require answers; (2) skill questions which highlight the
learner’s abilities to apply the knowledge in practice; (3)
personal questions which demonstrate learner’s abilities
to communicate within a group, etc. Our primary goal is
to cover evaluation of questions of the first kind.
Semantic annotation
In this section the annotation of the curriculum and
questionnaire answers will be presented.
is not covered by the learner competence and they can be
used to suggest further learning activities; (3) list of
additional concepts – these concepts could demonstrate
some wrong understanding of the topic by the learner or
gaps in the curriculum (topics or semantic annotation).
In the context of the above example a learner responded
with the following answer:
Output device, monitor, display devices of a
PC; there are two types: Monitors with
cathode ray tube (CRT) - heavy, need more
power, occupy more space; Flat panel
displays - light, need less power, and occupy
less space.
The terms in the text recognized as related to concepts in
the ontology are highlighted. The three lists are as
Common concepts:
CRT monitor, display, monitor
Missing concepts:
contrast, frame rate, graphical elements,
image, LCD monitor, picture, pixel, ratio,
refresh rate, rendering, resolution, screen, size,
Additional concepts:
types, devices, Output device, PC, power,
The concepts in the first two lists are lexicalized on the
base of the lexicon, mapped to the ontology. The concepts
in the last list are represented with the terms used by the
learner. This helps the learner and the tutor to identify the
usage context of these concepts. As it was mentioned
above, the usage of additional concepts is not always an
evidence of wrong knowledge, but could be a good
feedback to both - the learners and the tutors. The
expression output device in the above example might be
considered as a good concept to be included in the
annotation of the query.
The semantic annotation of a curriculum includes two
steps. First, all the learning materials related to the
curriculum, are annotated automatically with concepts
from the domain ontology. Then, the tutor (or the teaching
administrator responsible for the curriculum) creates a set
of queries to reflect the content knowledge of the
curriculum. Each question is also annotated with
appropriate concepts to reflect this content knowledge. A
comparison is made whether the coverage of the questions
meets the requirements within the curriculum.
It is worth mentioning that other questions concerning the
skills of the learner might be additionally provided within
the question set, but they are not necessarily annotated
with concepts from the ontology, and their answers have
to be evaluated in a different way.
During the creation of the questions, the tutor has at his
disposal the ontology and the semantic annotation of the
learning materials. Then the questions are also
automatically annotated and the mappings are again
presented to the tutor. To sum up, our approach
demonstrates the usage of automatic procedures, which
alternate with the tutor’s intervention, when required.
In our practical setting, the questions, related to the
curriculum, were given in advance. Thus, we only
provided the automatic annotation of the questions
themselves and the example answers. The question
annotation was additionally edited by experts in the area
of IT.
The following example demonstrates the questions in
BitMedia questionnaire with the list of the assigned
concepts. The learner’s answers to the questions were
annotated with concepts automatically by the semantic
annotation module, described above. In our setting this
step was performed exactly in this way. Here is one
example of a query:
Name some of the technical specifications of
different kinds of monitors.
The following is a list of concepts, selected as annotations
for this question by an expert:
CRT monitor, display, contrast, frame rate,
graphical elements, image, LCD monitor,
monitor, picture, pixel, ratio, refresh rate,
rendering, resolution, screen, size, VGA
This list of concepts demonstrates that the tutor could
include not only concepts that are directly answers to the
question, but also related concepts which are necessary in
order to ensure that the learner uses the concepts related to
the answers within the proper context. The above list also
demonstrates the case in which concepts and
sub-concepts are also included in the list because they
define slightly different contexts of usage.
The next example shows the annotation of the learner’s
answers. The annotation is done within the text of the
answer. Then the concepts from this annotation are
compared to the concepts from the question annotation
and three lists of concepts are created: (1) list of common
concepts – the concepts that demonstrate how well the
learner competence matches the required competence; (2)
list of missing concepts – these concepts determine what
Having the semantic annotation of the curriculum and the
learner’s answers, the service calculates several lists of
concepts, as it was reported in the previous section. The
real evaluation within the learning network of BitMedia is
under implementation. Here we report on a small scale
evaluation, run by us in order to have some first evidences
for the usefulness of the service and to acquire some ideas
about the future development of the service.
The concept evidence of the learner’s competence can be
automatically compared to a set of concept evidences of
the target competence (learning network in the terms of
(Kalz et al. 2007)). Those are selected, which are not
covered by the current learner’s competence. For the
comparison of the concept evidences we use the standard
vector metrics from Information Retrieval community.
The automatic evaluation was constructed as a ration of
the list of the common concepts with the list of concepts
from the annotation of the query.
In order to do evaluation of the automatic method, the 10
questions were given to Bulgarian students in IT area. We
co-references has been approached from various
perspectives. For example, (Lech and de Smedt 2006) and
(Nikolov et. al 2009), among others, exploit the semantic
features from ontology in order to improve the
co-reference chaining; (Kawazoe et al. 2003) designed a
software that helps experts in biomedical domain to create
ontologies and annotate texts with co-references. In our
task, we exploited these papers (together with the work on
anaphora and co-reference annotation in general) in the
annotation of the corpus. In our future work, we will apply
their approaches for the implementation of a new version
of our ontology-to-text relation.
One of the reasons for the underestimation of the learner
answers by the automatic method is due to the fact that the
concept annotation requires very exact answers which
sometimes are not present among learners’ answers. The
learners use freer style of expressing their knowledge.
Thus, they rely on similar concepts to the ones in the
curriculum annotation – such as, more general or sibling
concepts, etc. In order to handle this problem, we
envisage extending the annotation from domain concepts
via domain terms to general concepts via general lexica.
As it was mentioned in the goal of classifying concepts
used by the learner, we would like to evaluate the level of
knowledge of the used concept. To do this, we will exploit
a version of the sentiment analysis. In our case, the
sentiment analysis determines the attitude of the learner to
the concepts explicated within the answers. As a starting
point for developing of the sentiment analysis, we
consider the work reported within (Moilanen and Pulman
2007) and (Liu 2008). It is often underlined that adding
knowledge rich features improves the results in sentiment
analysis. For example – (Moilanen and Pulman 2007),
(Kennedy and Inkpen 2006), (Kim and Hovy 2006). The
input for this module will be the results from the previous
In order to construct a concept evidence of the learner’s
competence, we first need to extract the concepts which
are mentioned within the answers text. Then, on the base
of the ontological reasoning, the implied concepts will be
added. For example, if the answer’s holder says that
he/she is used to giving injections 2 , this automatically
means: on more general level, that he/she can intervene in
order to improve the situation, and, on more specific level,
that he/she can put liquid under the skin by using a syringe.
We also need to know in what context each of the
concepts in the text was mentioned by the learner. For
example, if the learner stated two opposite fact: it is easy
to give an intradermal injection, but it is difficult to give
an intramuscular one. From this short context a
conclusion can be drawn that the learner is not
experienced in giving injections as a whole. Thus,
comparing conceptual information and discourse relations
about the context, each mentioning of a concept will be
evaluated by one of the values: ‘well known’, ‘known’,
and ‘unknown’. We will use the methods developed in the
areas of sentiment and opinion analysis. As it was already
gathered more than 10 answers per topic at average. Then,
the same answers were given to two tutors in IT area to
grade them. First, we compared the concepts, presented in
the answers, to those, required in the descriptions. Then,
we compared the automatic grading to the tutors’ one. The
results are as follows: there is a big mismatch between the
descriptions and the answers due to short students’
productions or avoidance of certain concepts. On the
other hand, tutors’ grading was different. It accounted
certain aspects (such as detailed description of
characteristics of the main concepts to be covered by
learner’s answers), but not others (such as the
presence/absence or distribution of concepts). The last
point reflects the fact that in a verbose answer it is
relatively easy to overestimate the learner’s knowledge –
especially under time pressure.
Thus, the preliminary evaluation showed that: pure
automatic comparison might underestimate learner’s
knowledge; pure tutor grading also skips some aspects of
learner’s knowledge while putting more weight to others.
The conclusion is that the best way is for the tutor to have
at disposal the intersection list of concepts from
curriculum and learners’ answers in order to present the
final judgement with respect to learners’ status and future
learning materials. The tutor has the concepts from the
curriculum, which were mentioned in the answers as well
as the list of ones not mentioned.
However, in the long run we aim at a more profiled
concept evidence, which would be possible when the
extensions to the semantic annotation are added (see the
discussion in the next section). In such a case the learner’s
competence would be set of concept descriptions
extracted from the answer. For the moment we envisage to
extend the classification of the concepts from three lists to
five. We will divide the set of concepts in the following
subsets: (1) known concepts; (2) partially known concepts;
(3) unknown concepts; (4) concepts with contradictory
usages; and (5) additional concepts. The first subset will
contain all the concepts which are evaluated as known in
the answer. The second subset will contain concepts that
are mentioned in the answer, but there is no enough
evidence about the level of knowledge of the learner with
respect to them. The third subset will contain concepts
that are definitely mentioned as unknown by the learner.
In the fourth subset we will include the concepts for which
there are positive and negative evidences about the
knowledge of the learner. The last set is the same as the
described above. It can influence the other groups with its
relevance or irrelevance. In addition to the extracted
concepts, we will extract links to the occurrences of the
concepts in the text.
Extensions to the semantic annotation
For better semantic annotation and its usage in positioning,
we consider also additional context-oriented information:
co-referential relation annotation, annotation of general
lexica and sentiment analysis of the concept usage in the
The relation between concept annotation and
The examples in this section are from a preliminary work in
the medical domain.
European Project LTfLL (
We would like to thank the colleagues from the project for
the discussions on the topics related to the task of
positioning of the learner. Also we would like to thank the
two reviewers for their valuable comments.
mentioned, a pre-defined requirement list of necessary
concepts with definitions will be used in order to estimate
the degree of competence, delivered by the learner in the
portfolio. There will be three types of evaluation:
coverage, degree of detailness and relevance. The
coverage will be estimated over the number of the
mentioned relevant concepts that match the pre-defined
list. The degree of detailness will be evaluated over the
depth of the conceptual space. And the relevance will be
estimated via the ontological relations from a given
concept to the other co-occuring concepts within the
discourse segment.
Braun, Simone, Andreas Schmidt. (2008). People
Tagging & Ontology Maturing: Towards Collaborative
Competence Management. In: 8th International
Conference on the Design of Cooperative Systems
(COOP '08), Carry-le-Rouet, France, May 20-23, 2008.
Kalz, Marco; Van Bruggen, Jan; Rusman, Ellen; Giesbers,
Bas; Koper, Rob. (2007). Positioning of Learners in
Learning Networks with Content, Metadata and
Ontologies. In Interactive Learning Environments,
Volume 15, Issue 2 August 2007 , pages 191 – 200
Kennedy, Alstair and Inkpen, Diana. (2006). Sentiment
classification of movie reviews using contextual
valence shifters. Computational Intelligence. vol. 22, 2,
pp. 110-125.
Kim, Soo-Min and Hovy, Eduard. (2006). Extracting
Opinions, Opinion Holders, and Topics Expressed in
Online News Media Text. In Proceedings of
ACL/COLING Workshop on Sentiment and
Subjectivity in Text, Sydney, Australia.
Lech, Till Christopher and Koenraad de Smedt. (2006).
Enhancing Semantic Annotation through Coreference
Chaining: An Ontology-based Approach. In: Siegfried
Handschuh, Thierry Declerck, Marja-Riitta Koivunen
(eds.), CEUR Workshop Proceedings, Vol. 185, 2006.
Liu, Bing. (2008). Web Data Mining: Exploring
Hyperlinks, Contents, and Usage Data. Springer.
Moilanen, Karo and Pulman, Stephen. (2007). Sentiment
Composition. In Proceedings of Recent Advances in
Natural Language Processing (RANLP 2007).
September 27-29, Borovets, Bulgaria, pp. 378-382.
Nikolov , Andriy, Victoria Uren, Enrico Motta and Anne
de Roeck. (2009). Towards instance coreference
resolution in a multi-ontology environment. Presented
at: Workshop on matching and meaning, Edinburgh,
UK, April 2009.
Simov, Kiril and Petya Osenova. (2007). Applying
Ontology-Based Lexicons to the Semantic Annotation
of Learning Objects. In Proc. Of RANLP 2007
workshop: Natural Language Processing and
Environments. Borovets, 26. September 2007.
Simov, Kiril and Petya Osenova. (2008). Language
Resources and Tools for Ontology-Based Semantic
Annotation. In Proc. at the Workshop ‘OntoLex 2008’
at LREC 2008.
John A. Ross. (2006). The Reliability, Validity, and Utility
of Self-Assessment. In: Practical Assessment, Research
and Evaluation. Volume 11 Number 10, November
In this paper we presented a knowledge-rich method for
supporting the tutor in his positioning task. We presented
a preliminary evaluation setting, which showed: the
potential of the domain ontologies in the semantic
annotation within the life-long learning context; the role
of the tutor in the same context; and the natural ways for
further extension of the annotation, which aims at a more
precise and wider eliciting of learner’s knowledge
The result of the service will be used further to compare
the concept evidence of the learner’s competence with the
learner network. The comparisons will use a vector
representation of concept evidence of the learner’s
competence and concept evidence of the target
competence. The vector for the target competence will be
fixed within the learner network. The vector for learner’s
competence will be created by the assessor on the basis of
the above sets of concepts. Our goal is not just to calculate
these sets of concepts, but also to use them for giving
feedback to the learner and thus achieving better results in
the learning activities. This kind of feedback will be even
more useful when the approach is used for
self-positioning of the learner.
Knowledge rich approach requires some initial effort to
prepare the necessary resources in order to achieve the
goals of positioning of the learner. In our view (also
discussed and shared by other colleagues from the LTfLL
project – especially Christoph Mauerhofer from
Bitmedia), the effort invested at the beginning will pay off
during a long and wide exploitation. This could be true in
cases of introducing new products of big software
companies, where the company itself has the interest to
construct appropriate resources (ontologies, lexicons,
curriculum, tests, etc). The advantages of the knowledge
rich approach are: the exactness of the evidences of the
learner competency, the links to the learning materials and
the definition of learning paths. Another advantage of the
approach is multilinguality – the curriculum and its
annotation could be prepared in one language, but it might
be reused with little additional effort in many other
languages for the learners who do not know the original
language of the curriculum.
The work reported within this paper is supported by the
Facilitating cross-language retrieval and machine translation by multilingual
domain ontologies
Petr Knoth∗ , Trevor Collins∗ , Elsa Sklavounou† , Zdenek Zdrahal∗
KMI, The Open University
Milton Keynes, United Kingdom
{p.knoth, t.d.collins, [email protected]
Paris, France
[email protected]
This paper presents a method for facilitating cross-language retrieval and machine translation in domain specific collections. The method
is based on a semi-automatic adaption of a multilingual domain ontology and it is particularly suitable for the eLearning domain. The
presented approach has been integrated into a real-world system supporting cross-language retrieval and machine translation of large
amounts of learning resources in nine European languages. The system was built in the context of a European Commission Supported
project Eurogene and it is now being used as a European reference portal for teaching human genetics.
1. Introduction
Because of the low frequency of polysemy in domain specific collections, domain-specific MT systems are capable
of achieving high performance. However, one of the main
obstacles remain in the acquisition of terminology. At the
same time, the domain terminology is usually an essential
artefact used for query composition. Our method is motivated by this problem and tries to approach it by using a
single terminological access point embodied by the multilingual domain ontology for both CLIR and MT. This allows to combine the strengths of ontology-based retrieval
and domain-specific machine translation. In Section 2, approaches to domain CLIR with relation to MT are introduced. The theoretical foundation of the method for facilitating domain CLIR and MT is explained in Section 3. The
application of the approach in the Eurogene system is then
presented in Section 4 and the performance is discussed in
Section 5. Finally, the contribution of the paper for the
eLearning domain is summarized in Section 6.
A significant amount of research has been carried out in
the NLP and Semantic Web technology fields in the last
years. A few activities and projects, such as LT4eL (Lemnitzer et al., 2007) or LTfLL (LTfLL, 2008), have been
launched with the objective to integrate these technologies
with eLearning systems. One of the vital sub-objectives
of these projects is to allow seamless access and retrieval
of multilingual learning materials. In this paper we report
on the activities undertaken in the context of Eurogene (The
First Pan-European Learning Service in the Field of Genetics) project related to the problem of accessing and sharing
multilingual learning resources.
More specifically, the article builds on the idea that eLearning systems should not only allow the cross-language retrieval of learning resources, but should be extended with
machine translation capabilities to provide a better user experience. The proposed approach synchronizes the adaption of cross-language retrieval and machine translation in
such a way that the performance of both systems improves.
Although the presented method has been integrated into an
eLearning system in the human genetics field, it is applicable in a broader context.
Many of the important players in the information retrieval
field (including Google and Yahoo!) offer cross-language
information retrieval (CLIR), some of them also provide
machine translation (MT). While the performance of these
systems is usually sufficient for general queries, CLIR
and MT are often inaccurate for domain-specific queries.
Large repositories storing domain specific content, such as
PubMed which stores vast amounts of scholarly articles,
have successfully adopted large thesauri/ontologies of domain terminology to improve the performance of their retrieval system (Lu et al., 2009). While there are efforts targeting cross-language retrieval in eLearning (Lemnitzer et
al., 2007; Eichmann et al., 1998; Lu et al., 2008), the combination of the domain-specific retrieval and machine translation is rarely available.
Approaches to domain CLIR
There are two typical approaches to CLIR:
1. MT approach - The user’s query is translated from the
source language to the target language and submitted
to the search system. This approach can be further
divided into two cases:
(a) MT of the query is performed and the query is
submitted in all languages of interest.
(b) A multilingual ontology is developed and used to
map the submitted query to different languages.
2. Statistical approaches - The system is trained on a collection of texts (usually parallel). The user’s query
is then mapped to a language independent document
vector using approaches, such as Latent Semantic Indexing (LSI) (Dumais, 1997).
Approach 1(a) requires the search system to be welladapted for the translation of the terminology of the tar-
get domain. Depending on the MT system in hand, domain adaption is rule or statistically based. Rule-based
approaches allow specifying rules expressing that a given
term tL1 in language L1 corresponds to term tL2 in L2 .
Statistical approaches to machine translation support automatic learning of such pairs from parallel corpora. Approach 1(b) is motivated by the fact that monolingual domain ontologies can be employed to improve the performance of the retrieval system by query expansion leveraging the ability of ontologies to represent synonyms linked to
a concept and the hierarchical structure of concepts. Monolingual ontologies can be extended to multilingual ontologies.
Approach 2 is influenced by the size of the available parallel corpora which is critical for the performance of the
retrieval system. The approach is, in general, more suitable for bilingual cross-language retrieval as it is usually
difficult to find experts to build a domain-specific training
set that would contain parallel texts from each language of
interest to a common interlingua.
(cognitive units of meaning - abstract ideas or mental symbols), T is a set of terms (textual representations of concepts), E is a set of oriented relations (is-a relations), such
that !C, E" is a directed acyclic graph, and f : T → C is
a surjective function from terms to concepts. Note that this
implies that polysemy cannot be represented in our ontology. This is for our purposes intentional as we comprehend
a domain as an area or part of an area in which the terminology is unambiguous.1 . Today, lightweight ontologies can
be built by reusing existing ontologies or by applying NLP
methods for term extraction and ontology learning (Cimiano and Völker, 2005).
In the second step, the initial domain ontology is translated using MT and is validated by domain experts. The
accuracy of MT is at this moment usually low as the system has not yet been sufficiently trained for the target domain. The resulting multilingual ontology is a 6-tuple
O = !C, T, E, f, L, lang", where L is the set of languages
and lang : T → L is a mapping from terms to languages.
After the validation, the multilingual ontology is integrated
with the retrieval system and the available collection of language resources is indexed in terms of the ontology. A set
of terms {t|lang(t) = language of the resource} is used for
The bootstrapping phase can be iterated as many times
as necessary. The mutual updating procedure is shown in
Figure 1. This phase can be further divided into:
3. Synergy of CLIR and MT
Our method is based on the assumption that when we start
to build a domain-specific system for sharing language resources, the amount of parallel corpora available is often
limited. Our methodology uses a multilingual domain ontology as we argue that ontologies are well-suited for domain CLIR and can also be used for the adaption of the
machine translation system. We presume an IR system and
a MT system to be available. More specifically, our approach requires a hybrid MT system combining rule-based
and statistical-based MT.
The method consists of two phases, which will be discussed in this section in detail: the initialization phase
and the bootstrapping phase. The initialization phase takes
as the input a collection of domain texts or an existing
monolingual domain ontology and produces as an output
a lightweight multilingual ontology of the target domain.
While this step is performed just once, the bootstrapping
phase is repeated as many times as necessary. The bootstrapping phase takes as the input the multilingual ontology
produced in the initialization phase and adapts the MT system by extracting domain specific translation rules from the
ontology. As the amount of learning resources stored in the
system systematically grows, a statistical module of the MT
system can be applied at any time to extract bilingual pairs
of domain terms from the available collection of learning
resources. These pairs are then used to semi-automatically
enrich the multilingual ontology, thus improve the performance of the CLIR and later also the MT system.
The initialization phase can be further divided into:
1. Adaption of the MT dictionaries
2. Adaption of the multilingual ontology
In the first step of the bootstrapping phase, the MT system is adapted to the domain using bilingual substitution
rules of form tL1 → tL2 extracted from the multilingual
ontology and satisfying the condition f (tL1 ) = f (tL2 ),
where tL1 ∈ TL1 , tL2 ∈ TL2 and TLn is defined as
TLn = {t|lang(t) = Ln }. For MT systems that translate using an interlingua, the term on the left hand side of a
rule is a term in the language of the interlingua and the term
on the right hand side is a term in any other supported language. For bilingual MT systems all combinations of terms
are exploited and used for the generation of the translation
rules. Supplying MT with rules extracted from the ontology can be also useful when a domain is accessed from a
general-purpose search engine. IR systems can be equipped
with a classification component that can: calculate the most
probable domain of a document, select the most suitable
domain ontology available, and extract the rules for adaption of the MT system.
For the second step of the bootstrapping phase, let us assume that the content stored in our system grows over time.
Each time a new learning resource is submitted, it is indexed and put into the document collection. The submitted
learning resource may be a translation of an already existing resource stored in the collection. Such parallel texts
can be automatically recognized (Resnik and Smith, 2003)
and used by the machine translation system for training.2
1. Development of a seed monolingual ontology.
2. Extension of the ontology to multiple languages.
The first step of our approach requires building a small
monolingual domain ontology of concepts. For our purposes, we will define the monolingual ontology as a
quadruple O = !C, T, E, f ", where C is a set of concepts
Note that this assumption is not always true.
Most of the statistical MT systems require parallel corpora
Figure 1: Collaboration of CLIR and MT. Translation rules are extracted from the multilingual ontology and are used to
adapt the MT system. New terminology discovered in the statistical training phase is sent to the CLIR system which adapts
the multilingual ontology. The updates are validated by a domain expert.
The output of the statistical training is a set of quadruples
of the form (tL1 , tL2 , conf, langq ), where conf is the
confidence measure of translating term tL1 to tL2 estimated
from text and langq : T → L is a mapping from terms
to languages. The statistical model of the MT system is
updated and the quadruples are sent to the CLIR system
which uses the following algorithm to update the ontology:
The algorithm requires one pass through the set of quadruples Q (line 2). During initialization a sufficiently high
value of parameter τ is set (line 1). Each quadruple is
first tested for the compatibility with the ontological language set and for its confidence (line 3). Later, it is checked
whether the terms suggested by MT can be mapped to the
ontology (lines 4 and 9). The ontology is then updated using the components of the quadruple (lines 5-7 and 10-12).
Finally, the algorithm assembles the new ontology (line 16).
When the ontology is updated, domain terminology administrators are made aware of the updates by the system and,
if necessary, modifications can be performed (for example, new concepts should be added or better translation than
the one proposed exists). Performed validation causes new
pairs of rules tL1 → tL2 to be extracted from the validated
part of the ontology and to be submitted back to the rule
base of the MT system. As the amount of content grows,
the system bootstraps and the performance of both MT and
CLIR is improved.
Application in human genetics
In this section, we describe an application of the method of
Section 2 in the context of the Eurogene project, which provides an eLearning system for sharing learning resources in
human genetics.3 The learning resources are submitted to
the system typically in the form of slides, books and research articles represented in a variety of formats including Portable Document Format, Word, Power Point and
many others. The Eurogene system also supports multimedia resources, such as images and videos in a number
for training, however there have been research studies that investigated learning of multilingual terminology from non-parallel
texts, such as in (Fung and Mckeown, 1997).
of formats. Resources can be handled in nine European
languages4 , which are English, German, French, Spanish,
Italian, Greek, Dutch, Czech and Lithuanian. More than 30
universities and other institutions located mainly across Europe, but also in non-European countries are actively contributing to this collection.
In Eurogene, the initial genetic ontology was developed by
merging six monolingual ontologies5 that contained a descriptive, but not too extensive, terminology of the domain.
This ontology was translated into the above nine European
languages (English is used as an interlingua, i.e. it is used
to label the names of concepts) by domain experts and an
upper-level ontology has been inferred using Unified Medical Language System (UMLS). A more comprehensive description of the ontology building process can be found in
(Zdrahal et al., 2009).
The upper-level ontology helps to organize concepts from
a relatively flat structure into a concept hierarchy, which
is represented in the Simple Knowledge Organization System (SKOS) format which satisfies our definition of the ontology from the previous section. Figure 2 shows how a
genetic concept linkage analysis is represented in our ontology.
The multilingual ontology was then integrated with the
CLIR system. Since then, available content is being annotated. Textual resources are annotated automatically, multimedia resources are annotated manually, but the annotation
procedure is guided by the ontology.
In the first part of the bootstrapping phase, rules were extracted from the multilingual ontology to adapt the MT system as described in the previous section. This typically
helps to improve the performance of MT. For example, before the adaption, our system wrongly translated the English collocation linkage analysis to French as analyse de
triglerie, whereas since the rule Linkage analysis → Analyse de liasion was extracted from the part of the ontology
in Figure 2 and it was put into the MT rule base, the system
has correctly translated the term as Analyse de liasion.
The CLIR system is powered by Lucene extended with a
dedicated query parser that allows the user to combine terminological and full-text queries. Queries can be expressed
in any of the available languages, and the results can be
filtered by a subset of the available languages. Queries
are mapped to a language independent representation using the ontology. The CLIR system can also be used during query composition to visualize the concept hierarchy
and to interactively control query expansion for broader
and/or narrower terms (Figure 3), thus utilizing the benefits of ontology-based retrieval.
A hybrid system developed by SYSTRAN is used for MT
tasks, i.e. for the MT of resources and also for the learning of relations from parallel texts (SYSTRAN, 2009). The
Figure 2: Representation of a concept linkage analysis in
the multilingual ontology. The preferred label of this concept is the English version Linkage analysis. The concept
has a two alternative representations in German (LinkageAnalyse and Kopplungsanalyse).7 The representation in
French is Analyse de liasion and in Spanish Analisis de ligamiento. The concept Linkage analysis is a broader concept for Parametric linkage analysis and Non-parametric
linkage analysis, and it is related to a concept Marker analysis.
Figure 3: User interface of the Eurogene CLIR system.
The CLIR system allows to control the expansion for
broader/narrower terms.
CLIR and MT systems communicate using SOAP messages
that allow the sending of extracted translation rules from
CLIR to MT, and the sending of newly proposed translations from MT to CLIR. When newly proposed translations
are received by CLIR, the ontology is updated using the
algorithm in Section 2. Domain experts then perform terminology validation which is supported by the system and
results in sending new translation rules to the MT rule base.
This synchronization provides a mechanism for continuous
semi-automatic adaption of both CLIR and MT systems.
While CLIR allows to pose queries and receive results in any
of the mentioned languages, MT is limited to language pairs supported by the Systran system. Please also note that MT is not
applied to images and videos.
Published by the University of Washington in Seattle, National Institute of General Medical Sciences in Bethesda, Elsevier,
Oracle ThinkQuest, University of Michigan and Centre for Genetics Education in Sydney
Performance analysis
The performance of the proposed method and its impact
on the resulting CLIR and MT systems can be influenced
by a number of factors. These include mainly the suitability of the multilingual ontology for the target domain,
the amount of domain corpora available in the statistical
phase, the performance of the multilingual keyword extraction system and the validity of the judgements performed by
domain experts in the ontology refinement process. Given
the number of possible error sources, it seems much more
sensible to make sure that the method satisfies certain properties rather than performing a quantitative evaluation that
would be biased by too many components.
One of the important properties that the proposed method in
Section 3 should have is that the performance of both CLIR
and MT should never decrease as a result of any bootstrapping iteration. Let us assume that the initial ontology has
been validated by domain experts, so that it does not include any spurious translations. There are now two tasks
which could have a negative impact on the performance of
the CLIR or MT systems. These tasks correspond to 1) the
update of the MT rule base and 2) the update of the multilingual ontology as described in Section 3.
If we assume that our domain is sufficiently small, so that
no domain specific term appearing in the multilingual ontology is polysemous in our collection, then updating the
dictionary of the MT system may either improve or not
change the precision of the MT system. Since it is not possible to extract a spurious translation rule from the multilingual ontology, the resulting MT system cannot perform
worse than before the update.
It is essential to expect that the statistical training phase
described in Section 3 may produce quadruples describing
translations that are in fact invalid and may thus introduce
errors to the ontology. However, since all the updates must
be validated by domain experts before they can be used by
the CLIR system, it is possible to assume that no errors
are introduced. This is in reality difficult as humans are in
fact vulnerable to introducing errors. Thus, the quality of
the ontology used by CLIR can deteriorate only under the
condition that an error has been introduced by a domain
To summarize, if all the above mentioned conditions are
met, the method is guaranteed to improve or in the worst
case not to worsen the performance of the CLIR and MT
systems after each iteration.
Multilingual ontologies are particularly suitable for domains where terminology is used for query composition,
such as in eLearning. They can be used as a synchronization component for domain adaption of CLIR and MT systems. In addition, the solution is easily readable and adjustable by humans and does not preclude the use of statistical approaches for terminology extraction when a large
corpora is available. In the future, publishing of multilingual ontologies on the Web in a standard format may allow
an application to decide which domain ontology to use for
query expansion and for adaption of the MT system based
on the context of the query. This may be helpful when
a user accesses a specific domain from a general-purpose
search engine.
Philipp Cimiano and Johanna Völker. 2005. Text2onto - a
framework for ontology learning and data-driven change
Susan T. Dumais. 1997. Automatic cross-language retrieval using latent semantic indexing.
David Eichmann, Miguel E. Ruiz, and Padmini Srinivasan. 1998. Cross-language information retrieval with
the umls metathesaurus. In In: Proc. of the 21st Annual
International ACM SIGIR Conference on Research and
Development in Information Retrieval, pages 72–80.
Pascale Fung and Kathleen Mckeown. 1997. Finding terminology translations from non-parallel corpora.
Lothar Lemnitzer, Cristina Vertan, Alex Killing,
Kiril Ivanov Simov, Diane Evans, Dan Cristea, and
Paola Monachesi. 2007. Improving the search for
learning objects with keywords and ontologies. In Erik
Duval, Ralf Klamma, and Martin Wolpers, editors,
EC-TEL, volume 4753 of Lecture Notes in Computer
Science, pages 202–216. Springer.
LTfLL. 2008. Language technology for lifelong learning
Wen-Hsiang Lu, Ray S. Lin, Yi-Che Chan, and Kuan-Hsi
Chen. 2008. Using web resources to construct multilingual medical thesaurus for cross-language medical information retrieval. Decis. Support Syst., 45(3):585–595.
Zhiyong Lu, Won Kim, and W. John Wilbur. 2009. Evaluation of query expansion using mesh in pubmed. Inf.
Retr., 12(1):69–80.
Philip Resnik and Noah A. Smith. 2003. The web as a
parallel corpus. Computational Linguistics, 29:349–380.
SYSTRAN. 2009. Systran’s machine translation technology url:
Zdenek Zdrahal, Petr Knoth, Trevor Collins, and Paul Mulholland. 2009. Reasoning across multilingual learning
resources in human genetics. In Proceedings of ICL
Implications for eLearning
This paper showed that current eLearning applications supporting CLIR can also easily adopt MT and tailor it for their
domain. In addition, the synergy of CLIR and MT may
help to improve the performance of both. The main reason
why the method is particularly useful in eLearning is that
we should expect that the users of eLearning applications
will very often use domain terminology as a part of their
submitted queries, thus the added value will become more
noticeable than in other contexts.
The paper brought the following contribution:
• Development of a new method for facilitating crosslanguage retrieval and machine translation by multilingual domain ontologies.
• Development of a real-world eLearning application
enhanced by the use of the presented method.