Proceedings of the first Workshop on Supporting eLearning with Language Resources and Semantic Data May 22nd, 2010 Valletta, Malta In conjunction with LREC 2010 i Foreword Language resources are of crucial importance not only for research and development in language and speech technology but also for eLearning applications. In addition, the increasingly availability of semantically interpreted data in the WEB 3.0 is creating a huge impact in semantic technology. Social media applications such as Delicious, Flickr, YouTube, and Facebook, provide us with data in the form of tags and interactions among users. We believe that the exploitation of semantic data (emerging both from the Semantic Web and from social media) and language resources will drive the next generation eLearning platforms. The integration of these technologies within eLearning applications should also facilitate access to learning material in developing economies. The workshop aims at bringing together computational linguists, language resources developers, knowledge engineers, social media researchers and researchers involved in technology-enhanced learning as well as developers of eLearning material, ePublishers and eLearning practitioners. It will provide a forum for interaction among members of different research communities, and a means for attendees to increase their knowledge and understanding of the potential of language resources in eLearning. We will especially target eLearning practitioners in the Mediterranean Partner Countries. The proceedings of the workshop contain 10 papers discussing the integration of language resources, natural language processing techniques, ontologies and social media in eLearning. The organizers hope that the selection of papers presented here will be of interest to a broad audience, and will be a starting point for further discussion and cooperation. Paola Monachesi, Alfio Massimiliano Gliozzo and Eline Westerhout ii The Workshop Programme 14.30-14.45 Introduction Session on Language Resources, NLP and eLearning 14.45-15.05 Language resources and CALL applications Helmer Strik, Jozef Colpaert, Joost van Doremalen and Catia Cucchiarini 15.05-15.15 Challenges for Discontiguous Phrase Extraction Dale Gerdemann and Gaston Burek 15.15-15.25 Towards Resolving Morphological Ambiguity in Arabic Intelligent Language Tutoring Framework Khaled Shalaan, Doaa Samy and Marwa Magdi 15.25-15.45 Language Resources and Visual Communication in a Deaf-Centered Multimodal ELearning Environment: Issues to be Addressed Elena Antinoro Pizzuto, Claudia S. Bianchini, Daniele Capuano, Gabriele Gianfreda and Paolo Rossini 15.45-15.55 Deaf People Education: crossing linguistic borders through e-learning Giuseppe Nuccetelli and Maria Tagarelli De Monte 16.00- 16.30 Break Session on ontologies, social media and learning 16.30-16.50 BONy: a knowledge centric collaborative learning platform Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede and Aldo Gangemi 16.50-17.00 Social E-SPACES; socio-collaborative spaces within the virtual world ecosystem Vanessa Camilleri and Matthew Montebello 17.00-17.20 A Semantic Knowledge Base for Personal Learning and Cloud Learning Environments Alexander Mikroyannidis, Paul Lefrere and Peter Scott 17.20-17.30 Semantic Annotation for Semi-Automatic Positioning of the Learner Petya Osenova and Kiril Simov 17.30-17.50 Facilitating cross-language retrieval and machine translation by multilingual domain ontologies Petr Knoth, Trevor Collins, Elsa Sklavounou and Zdenek Zdrahal 18.00-19.00 Wrap up, discussion, plans for common projects iii Workshop Organisers Paola Monachesi University of Malta, Malta and Utrecht University, The Netherlands [email protected] Alfio Massimiliano Gliozzo ISTC-CNR, Italy [email protected] Eline Westerhout Utrecht University, The Netherlands [email protected] iv Workshop Programme Committee Claudio Baldassarre (Open University) Roberto Basili (University of Rome Tor Vergata) Eva Blomqvist (ISTC–CNR) Antonio Branco (University of Lisbon) Dan Cristea (University of Iaşi) Ernesto William De Luca (TU Berlin) Philippe Dessus (University Pierre-Mendès-France, Grenoble) Claudio Giuliano (FBK-irst) Wolfgang Greller (Open University of the Netherlands) Alessio Gugliotta (Innova spa) Jamil Itmazi (Palestine Ahliya University) Susanne Jekat (Zürich Winterthur Hochschule) Vladislav Kubon (Charles University Prague) Lothar Lemnitzer (Berlin-Brandenburgische Akademie der Wissenschaften) Stefanie Lindstaedt (Know-Center) Angelo Marco Luccini (INSEAD) Manuele Manente (JOGroup) Dunja Mladenic (J. Stefan Institute) Mattew Montebello (University of Malta) Jad Najjar (WU Vienna) Valentina Presutti (ISTC–CNR) Adam Przepiorkowski (Polish Academy of Sciences) Mike Rosner (University of Malta) Doaa Samy (Cairo University) Khaled Shaalan (Cairo University) Kiril Simov (Bulgarian Academy of Sciences) Stefan Trausan-Matu (University of Bucarest) Cristina Vertan (University of Hamburg) Fridolin Wild (Open University) v Table of Contents Foreword ii Programme iii Organizers iv Programme Committee v Table of Contents vi Author Index vii Language resources and CALL applications Helmer Strik, Jozef Colpaert, Joost van Doremalen and Catia Cucchiarini 1 Challenges for Discontiguous Phrase Extraction Dale Gerdemann and Gaston Burek 7 Towards Resolving Morphological Ambiguity in Arabic Intelligent Language Tutoring Framework Khaled Shalaan, Doaa Samy and Marwa Magdi 12 Language Resources and Visual Communication in a Deaf-Centered Multimodal E-Learning Environment: Issues to be Addressed Elena Antinoro Pizzuto, Claudia S. Bianchini, Daniele Capuano, Gabriele Gianfreda and Paolo Rossini 18 Deaf People Education: crossing linguistic borders through e-learning Giuseppe Nuccetelli and Maria Tagarelli De Monte 24 BONy: a knowledge centric collaborative learning platform Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede and Aldo Gangemi 29 Social E-SPACES; socio-collaborative spaces within the virtual world ecosystem Vanessa Camilleri and Matthew Montebello 35 A Semantic Knowledge Base for Personal Learning and Cloud Learning Environments Alexander Mikroyannidis, Paul Lefrere and Peter Scott 40 Semantic Annotation for Semi-Automatic Positioning of the Learner Petya Osenova and Kiril Simov 46 Facilitating cross-language retrieval and machine translation by multilingual domain ontologies Petr Knoth, Trevor Collins, Elsa Sklavounou and Zdenek Zdrahal 51 vi Author Index Antinoro Pizzuto, E. Bianchini, C.S. Bonafede, C.E. Burek, G. Camilleri, V. Capuano, D. Collins, T. Colpaert, J. Cucchiarini, C. van Doremalen, J. Gangemi, A. Gerdemann, D. Gianfreda, G. Gliozzo, A. M. Knoth, P. Lefrere, P. Magdi, M. Mikroyannidis, A. Montebello, M. Nuccetelli, G. Osenova, P. Rossini, P. Samy, D. Scott, P. Shalaan, K. Simov, K. Sklavounou, E. Strik, H. Tagarelli De Monte, M. Zdrahal, Z. 18 18 29 7 35 18 51 1 1 1 29 7 18 29 51 40 12 40 35 24 46 18 12 40 12 46 51 1 24 51 vii Language resources and CALL applications: speech data and speech technology in the DISCO project Helmer Strik a, Jozef Colpaert b, Joost van Doremalen a, Catia Cucchiarini a a b CLST, Department of Linguistics, Radboud University, Nijmegen, The Netherlands Linguapolis, Institute for Education and Information Sciences, University of Antwerp, Antwerp, Belgium E-mail: H.Strik | J.vanDoremalen | [email protected]; [email protected] Abstract The current paper deals with the relation between language resources and Computer Assisted Language Learning (CALL) systems: language resources are essential in the development of CALL applications, during the development of the system resources are created, and finally the CALL system itself can be used to generate additional resources that are useful for research and development of new (CALL) systems. We focus on the system developed in the project DISCO (Development and Integration of Speech technology into COurseware for language learning): we describe the language resources employed for developing the DISCO system and present the DISCO system paying attention to the design, the automatic speech recognition modules, and the resources produced within the project. Finally, we discuss how additional language resources can be generated through the DISCO system. 1. have the means to finance the development of such technology. For these and other reasons, in the Netherlands and Flanders a programme was started, called STEVIN (a Dutch acronym that stands for Essential Language Resources in Dutch), which is funded by the Flemish and Dutch governments and aims at stimulating the development of basic language and speech technology for the Dutch language. Introduction In the last few years the interest in applying Automatic Speech Recognition (ASR) technology to second language (L2) learning has been growing considerably (Eskenazi, 2009). The addition of ASR technology to Computer Assisted Language Learning (CALL) systems makes it possible to assess oral skills in a second language and to provide corrective feedback automatically. The latter feature appears particularly appealing, since research has shown that usage-based acquisition in the L2 is not as successful as in the L1 (Ellis and Larsen-Freeman, 2006: 571), that L2 learners have difficulty identifying their own errors (Dlaska and Krekeler, 2008), and that they indeed need guidance to improve their language skills (Ellis and Bogart, 2007). Since providing practice and feedback for speaking proficiency is particularly time-consuming, the necessary amount of practice is almost never achieved in traditional teacher-fronted lessons. Against this background, ASR-based CALL systems would seem to make for an interesting supplement to traditional L2 classes. Within the framework of the STEVIN programme a project called DISCO (Development and Integration of Speech technology into COurseware for language learning, http://lands.let.kun.nl/~strik/research/DISCO) was started that aims at developing a prototype of an ASR-based CALL system for practicing oral skills in Dutch L2. The system addresses different aspects of speaking proficiency (syntax, morphology and phonology), detects errors in speaking performance, points them out to the learners and gives them the opportunity to try again until they manage to produce the correct form. One of the interesting things about this project is that since it is carried within the STEVIN programme, the technology that is developed for the present project will be publicly made available to interested users (researchers, HLT companies and publishers) through the Dutch HLT Agency. However, developing ASR-based CALL systems that can provide accurate and useful feedback on oral proficiency is not trivial, because the speech of L2 learners poses special difficulties to ASR technology (Compernolle 2001; Benzeghiba et al. 2007; Doremalen et al. 2009a; Doremalen et al. 2009b). In addition, existing systems in general fail to provide corrective feedback that is detailed enough and accurate, especially on L2 pronunciation which is considered a particularly challenging skill, both for L2 learners (Flege, 1995) and CALL systems (Menzel et al. 2000: 54; Morton and Jack, 2005). In the current paper we discuss the relation between language resources and CALL systems: language resources are essential in the development of CALL applications, during R&D resources are created, and finally the CALL system itself can be used to generate additional resources that are useful for research and development of new (CALL) systems. Another problem that has hampered the realization of ASR-based CALL systems, especially for the smaller languages, is that although companies, esp. publishers, are willing to use the technology, many companies do not In section 2 we describe which language resources were employed in the DISCO project. In section 3 we present 1 the DISCO system paying attention to the design, the automatic speech recognition modules, some preliminary results and the resources produced within the project. In section 4 we discuss how additional language resources can be generated through the DISCO system. 2. learning population. In the Netherlands, on the other hand, a miscellaneous group of L2 learners with various mother tongues was selected because this more realistically reflects the situation in Dutch L2 classes. Since an important aim in collecting non-native speech material was that of developing language learning applications for education in Dutch L2, various experts were consulted to determine for which proficiency level such applications are most needed. It turned out that for the lowest levels of the Common European Framework (CEF), namely A1, A2 or B1, there is relatively little material and that ASR-based applications would be very welcome. For this reason, speech from adult Dutch L2 learners at these lower proficiency levels was recorded. CALL applications and the need for language resources An important requirement for developing ASR-based CALL applications is the availability of language resources such as language and speech corpora and speech technology toolkits. In order to develop technology that is able to identify errors in oral proficiency we need to know which errors are made by L2 learners in the first place. Part of this information can be found in the literature, but, in general, the information provided in the literature is not complete and not sufficiently quantified to be suitable for developing CALL applications. The speech collected in the JASMIN corpus was recorded in two different modalities: about 50% of the material consists of read speech material while the other 50% is made up of extemporaneous speech produced in human-machine dialogues. The JASMIN dialogues were collected through a Wizard-of-Oz-based platform and were designed such that the wizard was in control of the dialogue and could intervene when necessary. In addition, recognition errors were simulated and difficult questions were asked to elicit some typical phenomena of human-machine interaction that are known to be problematic in the development of spoken dialogue systems, such as hyperarticulation, restarts, filled pauses, self talk and repetitions. In our previous research on developing a computer assisted pronunciation training (CAPT) for Dutch, Dutch-CAPT (Cucchiarini et al., 2009), we needed to draw up an inventory of pronunciation errors. We discovered that the information on L2 errors provided in the literature was mostly based on observational studies, was often incomplete, and not quantitative in nature. For this reason we had no other choice than conducting L2 error studies ourselves (Neri et al., 2006). However, since a speech corpus of non-native Dutch was not available at the time, we had to resort to the auditory analysis of Dutch L2 speech recordings that had been collected in the framework of previous projects (Neri et al., 2006). The speech recordings were annotated at different levels. For the DISCO project, the verbatim transcription and the automatically generated phonemic transcription are particularly relevant. For all the reasons mentioned above the JASMIN speech material turned out to be extremely useful and appropriate for the development of the DISCO system. For the DISCO project we had the opportunity of using the results of another STEVIN project that had been completed in the meantime, the JASMIN corpus (Cucchiarini et al., 2008). Both read and extemporaneous speech were analyzed to study which errors are made at the level of pronunciation, morphology and syntax. For this purpose the annotations contained in JASMIN were supplemented with extra annotations of the morphological and syntactical errors made by the speakers. The automatically generated phonemic transcriptions were manually verified by trained students and where necessary improved. Subsequently they were used to study which pronunciation errors are made by L2 learners of Dutch with different mother tongues. 2.2.1. The JASMIN speech corpus The JASMIN corpus is an extension of the large Spoken Dutch Corpus (CGN; Oostdijk, 2002). JASMIN contains speech by children of different age groups, elderly people and non-natives with different mother tongues. The JASMIN corpus was collected in the Netherlands and Flanders and is specifically aimed at facilitating the development of speech-based applications for children, non-natives and elderly people. In the case of non-native speakers the applications envisaged were especially language learning applications because there is considerable demand for CALL products that can help making Dutch L2 teaching more efficient. The human-machine dialogues were used for conducting experiments for the DISCO system because they closely resemble the situation we will encounter in this CALL application. In selecting the non-native speakers for this corpus, mother tongue constituted an important variable. For the Flemish part, Francophone speakers were selected because they form a significant proportion of the Dutch 2 enhances motivation, self-esteem and empathy” (Hubbard, 2002), 2. it casts language in a social context, and 3. its notion implies a form of planning, scenario-writing and fixed roles, which is consistent with the limitations we set for the role of speech technology in DISCO. To summarize, this framework allows us to create a rich and communicative CALL application that stimulates Dutch L2 learners to produce speech and experience the social context. On the other hand, these choices are appropriate from a technological perspective, since they make it possible to successfully deploy speech technology while taking into account its limitations (Strik et al., 2009). 2.2.2. The SPRAAK speech recognizer The speech recognizer adopted in the DISCO project is SPRAAK (Demuynck et al., 2008), a hidden Markov model (HMM)-based ASR package developed for over 15 years by ESAT at the University of Leuven and later enriched with knowledge and code from other partners through the STEVIN project SPRAAK. The availability of a speech recognition system for Dutch was considered to be an important requirement by the whole language and speech technology (LST) community in the Netherlands and Flanders. For this reason a project was started within the STEVIN programme for this specific purpose: the SPRAAK project. The aim of SPRAAK was twofold: a) developing a highly modular toolkit for research into speech recognition algorithms and b) providing a state-of-the art recogniser for Dutch with a simple interface that could be used by non-specialists. SPRAAK is distributed as open source for academic usage and at moderate cost for commercial exploitation (for further details, see http://www.spraak.org/). 3. To gain more insight into appropriate feedback strategies, pedagogical goals, and personal goals a number of preparatory studies were carried out, such as exploratory in-depth interviews with Dutch L2 teachers and experts, focus group discussions to elicit the personal goals of learners, and pilot studies through partial systems with limited functionality (e.g. no speech technology). The functions of the system that were not implemented (play prompts, give feedback, etc.) were simulated. The results of these preparatory studies were taken into account in finalizing the design of the DISCO system. The DISCO system 3.1 Design of the DISCO system Within the STEVIN programme a project called DISCO was started on 01-02-2008, in which a CALL system will be developed. The target user group for the DISCO system are immigrants who want to learn Dutch as L2 to be able to work in the Netherlands or Flanders. The learning process starts with a relatively free conversation simulation, taking well into account what is (not) possible with speech technology: learners are given the opportunity to choose from a number of prompts at every turn (branching, decision tree). Based on the errors they make in this conversation they will be offered remedial exercises, which are very specific exercises with little freedom. The model adopted for designing the system is Distributed Language Learning (DLL), a methodological and conceptual framework for designing competency-oriented and effective language education (Colpaert, 2004). Its starting point is the design of a language learning environment for a specific language learning situation. The design is based on a thorough analysis of all factors and actors in the language learning situation, and on the identification of aspects amenable to change or improvement. The main phases of the design are goal-oriented conceptualization and ontological specification. Goal-oriented conceptualization stands for the formulation of a solution based on the realization of ‘practical goals’ as a hypothetical compromise between (often conflicting) personal and pedagogical goals, both for teachers and learners. Ontological specification is a detailed description of the architecture of the language learning environment, defined as the network of interactions between learner, co-learner, teacher, content, native, etc. inside or outside the learning place. In DISCO, we limit our general design space to closed response conversation simulation courseware and interactive participatory drama (IPD), a genre in which learners play an active role in a pre-programmed scenario by interacting with computerized characters or “agents”. The use of drama is beneficial for various reasons: 1. it “reduces inhibition, increases spontaneity, and Feedback depends on individual learning preferences: the default feedback strategy is immediate corrective feedback, which is visually implemented through highlighting, and from an interaction perspective by putting the conversation on hold and focusing on the mistakes. Learners that wish to have more conversational freedom can choose to receive communicative recasts as feedback, which let the conversation go on while highlighting mistakes for a short period of time. The final system will have several parameters that can be changed by the learner or teacher. During development and implementation, we will try to have these parameters behave intelligently (based on error analysis and learner behavior), so that the system can adapt itself to the learner. For future research these parameters offer the possibility of studying different modes of behavior of the CALL system and their effect on language learners. 3.2 The speech recognition modules First, we provide some technical details about our system. As mentioned above, the human-machine dialogues were 3 used for conducting experiments for the DISCO system. The material used consisted of speech from 45 speakers who each give answers to 39 questions about a journey. The input speech, sampled at 16kHz, is divided into overlapping 32ms Hamming windows with a 10ms shift and pre-emphasis factor of 0.95. 12 Mel-frequency cepstral coefficients (MFCCs: C1-C12) plus C0 (energy), and their first and second order derivatives were calculated and cepstral mean subtraction (CMS) was applied. The constrained language models and pronunciation lexicons are implemented as finite state machines (FSM). morphological constructions, the approach used for detecting phonological errors will be used also for detecting some of the morphological errors, for instance those concerning regular verb forms. Irregular verbs, on the other hand, may require an approach that is more similar to that adopted for detecting syntactic errors. Once the system arrives at this final stage, the system has a detailed overview of all the errors on the different levels and based on this overview the system can provide feedback to the student. 3.3 The resources produced in the project The resources mentioned above are employed to develop the DISCO system which consists of various parts. First of all, a blue-print of the design and the speech technology modules for recognition (i.e. for selecting an utterance from the predicted list, and verifying the selected utterance) and for error detection (errors in pronunciation, morphology, and syntax). In addition, the following resources have been developed: an inventory of errors at all these three levels, a prototype of the DISCO system with content, specifications for exercises and feedback strategies, and a list of predicted correct and incorrect utterances. In the DISCO system feedback on speaking performance is given on three levels: syntax, morphology and phonology. To give feedback, errors on these levels have to be detected automatically. In our system architecture, this task is divided in two modules: (1) the speech recognition module and (2) the error detection module. The first module, speech recognition, determines the sequence of words the student uttered. For each prompt a list of predicted correct and (grammatically) incorrect responses is created beforehand based on errors that are expected on empiric grounds. This list is the basis for a Finite State Grammar (FSG) language model, which is used by an hidden Markov model (HMM)-based speech recognition system. The recognition system is forced to choose among the predicted response from the list. The fact that DISCO is being carried out within the STEVIN programme implies that its results, all the resources mentioned above, will become available for research and development through the Dutch Flemish Human Language Technology (HLT) Agency (TST-Centrale; www.inl.nl/tst-centrale). This makes it possible to reuse these resources for conducting research and for developing specific applications for ASR-based language learning. To avoid false accepts, for example when an utterance is uttered that is not in the list of predicted responses, utterance verification (UV) is carried out. Using a combination of acoustic and durational similarity measures it is determined whether the response chosen by the speech recognizer reflects what has been said. If it is rejected the user is asked to try again; if it is accepted, the system will proceed to error detection (Van Doremalen et al. 2009a, b). 3.4 Evaluation A system that gives meaningful feedback must operate in a manner that is similar to what a competent teacher would do. Therefore, for the final evaluation of the whole system we intend to use a design in which different groups of students of Dutch as a second language (DL2) at the University of Antwerp and at the Radboud University in Nijmegen use the system and fill in a questionnaire with which we can measure the students’ satisfaction in working with the system. Teachers of DL2 will then assess all sets of system prompt, student response and system feedback for the quality of the feedback on the level of pronunciation, morphology and syntax. For this purpose, recordings will be made of students who complete the exercises developed to test the DISCO system. Given the evaluation design sketched above, we consider the project successful from a scientific point of view if the DL2 teachers agree that the system behaves in a way that makes it as useful for the students as a teacher is, and if the students rate the system positively on its most important aspects. Note that once the chosen response is accepted by the utterance verifier we can already detect errors on the syntactic level because the system is confident enough that the student uttered a specific sequence of words and it also knows what the student was supposed to say. Detecting errors on the morphological and phonological levels requires another, more detailed analysis of the speech signal. The starting point of this analysis is a segmentation of the speech signal into a sequence of phones obtained from the speech recognition module. Using a variety of spectral and temporal features a confidence measure (CM) is calculated for each of these phones. Based on this CM the system decides to mark the hypothesized phone in the segmentation as correctly pronounced or incorrectly pronounced (Van Doremalen et al. 2009c). In the way described above, phonological errors can be detected. Since some phonemes are critical for certain 4 4. Generating additional language resources available for research. Above we described which resources we used in developing our CALL system, and which resources become available during development of the system. In this section, we describe which additional resources can be collected by using the CALL system. 5. Conclusions In this paper we have discussed the importance of language resources for CALL application development on the basis of our experiences in the DISCO project in which speech data and speech technology are employed to develop a system for practicing oral skills in a second language.. We have seen that language resources are actually indispensable for developing sound CALL applications. Once developed, such applications can also be employed to produce new valuable language resources which can in turn be used to develop new, improved CALL systems. After the CALL system has been developed, language learners can use it to practice oral skills. The system has been designed and developed in such a way that it is possible to log details regarding the interactions with the users. This logbook can contain, e.g., the following information: what appeared on the screen, how the user responded, how long the user waited, what was done (speak an utterance, move the mouse and click on an item, use the keyboard, etc.), the feedback provided by the system, how the user reacted on this feedback (listen to example (or not), try again, ask for additional, e.g. meta-linguistic, feedback, etc.). 6. Acknowledgements The DISCO project is carried out within the STEVIN programme funded by the Dutch and Flemish Governments (http://taalunieversum.org/taal/technologie/stevin/). Finally, all the utterances spoken by the users can be recorded in such a way that it is possible to know exactly in which context the utterance was spoken, i.e. it can be related to all the information in the logbook mentioned above. An ASR-based CALL system, like DISCO, can thus be used for acquiring additional non-native speech data, for extending already existing corpora like JASMIN, or for creating new ones. This could be done within the framework of already ongoing research without necessarily having to start corpus collection projects. 7. References Benzeghiba, M., Mori, R. D., Deroo, O., Dupont, S., Erbes, T., Jouvet, D., Fissore, L., Laface, P., Mertins, A., Ris, C., Rose, R., Tyagi, V., Wellekens, C. (2007). Automatic speech recognition and speech variability: a review. Speech Communication, 49, 763–786. Colpaert, J. (2004). Design of Online Interactive Language Courseware: Conceptualization, Specification and Prototyping. Research into the impact of linguistic-didactic functionality on software architecture. (Doctoral dissertation). University of Antwerp, 2004. Cucchiarini, C., Neri, A., and Strik, H. (2009). Oral proficiency training in Dutch L2: The contribution of ASR-based corrective feedback. Speech Communication, 51, 853-863. Cucchiarini, C., Driesen, J., Van hamme, H., and Sanders, E., (2008). Recording speech of children, non-natives and elderly people for HLT applications: the JASMIN-CGN corpus. In Proceedings of LREC-2008. Demuynck, K., Roelens, J., van Compernolle, D., and Wambacq, P. (2008) SPRAAK: an open source SPeech Recognition and Automatic Annotation Kit. In Proceedings of ICSLP-2008, p. 495. Dlaska, A. and Krekeler, C. (2008). Self-assessment of pronunciation. System, 36, pp. 506-516. Ellis, N.C., Bogart, P.S.H. (2007). Speech and Language Technology in Education: the perspective from SLA research and practice. In Proc. SLaTE, Farmington PA, pp. 1-8. Ellis, N. and Larsen-Freeman, D. (2006). Language emergence: implications for applied. Linguistics, Applied Linguistics, 27.4: 558–89. Eskenazi, M. (2009). An overview of Spoken Language Technology for Education, Speech Communication. Such a corpus and the log-files can be useful for various purposes: for research on language acquisition and second language learning, studying the effect of various types of feedback, research on various aspects of man-machine interaction, and of course for developing new, improved CALL systems. Such a CALL system will also make it possible to create research conditions that were hitherto impossible to create, thus opening up possibilities for new lines of research. For instance, at the moment a project is being carried out at the Radboud University of Nijmegen, which is aimed at studying the impact of corrective feedback on the acquisition of syntax in oral proficiency (http://lands.let.kun.nl/~strik/research/FASOP). Within this project the availability of an ASR-based CALL system makes it possible to study how corrective feedback on oral skills is processed on-line, whether it leads to uptake in the short term and to actual acquisition in the long term. This has several advantages compared to other studies that were necessarily limited to investigating interaction in the written modality: the learner’s oral production can be assessed on line, corrective feedback can be provided immediately under near-optimal conditions, all interactions between learner and system can be logged so that data on input, output and feedback are readily 5 Flege, J. (1995). Second language speech learning: theory, findings and problems. In W. Strange (Ed.) Speech perception and linguistic experience, Baltimore: York Press, pp. 233-272. Hubbard, P. (2002). Interactive Participatory Dramas for Language Learning. Simulation and Gaming, vol. 33, pp. 210-216. Morton, H., Jack, M. (2005). Scenario-Based Spoken Interaction with Virtual Agents. Computer Assisted Language Learning, 18, 171-191. Oostdijk, N. (2002). The design of the spoken dutch corpus. In New Frontiers of Corpus Research, P. Peters, P. Collins, and A. Smith, Eds. Rodopi, pp. 105–112. H. Strik, Cornillie, F., van Doremalen, J., Cucchiarini, C. (2009). Developing a CALL System for Practicing Oral Proficiency: How to Match Design and Speech Technology. In Proc. SLATE, Wroxall Abbey. Van Compernolle, D. (2001). Recognizing speech of goats, wolves, sheep and ... non-natives. Speech Communiciation, 35, 71-79. Van Doremalen, J., Cucchiarini, C., Strik, H. (2009a). Optimizing automatic speech recognition for low-proficient non-native speakers. Accepted for publication in EURASIP Journal on Audio, Speech, and Music Processing, to appear. Van Doremalen, J., Strik, H., Cucchiarini, C. (2009b). Utterance Verification in Language Learning Applications. In Proc. SLATE, Wroxall Abbey. Van Doremalen, J., Cucchiarini, C., Strik, H. (2009c). Automatic Detection of Vowel Pronunciation Errors Using Multiple Information Sources. Proceedings of the biannual IEEE workshop on Automatic Speech Recognition and Understanding (ASRU). 6 Challenges for Discontiguous Phrase Extraction Dale Gerdemann, Gaston Burek Dept. Linguistics University of Tübingen [email protected], [email protected] Abstract Suggestions are made as to how phrase extraction algorithms should be adapted to handle gapped phrases. Such variable phrases are useful for many purposes, including the characterization of learner texts. The basic problem is that there is a combinatorial explosion of such phrases. Any reasonable program must start by putting the exponentially many phrases into equivalence classes (Yamamoto and Church, 2001). This paper discusses the proper characterization of gappy phrases and sketches a suffix-array algorithm for discovering these phrases. 1. Introduction we consider this Bulgarian expression as a sequence of letters. Then the inflection on is in the middle, whereas the inflection on - is on the right periphery. Both of these instances of variation are problematic. The variation in the middle, however, is somewhat more problematic, and is the main focus of this paper. Most phrase extraction programs are based on pattern matching algorithms developed for computational molecular biology. To adapt such algorithms for natural language, with worst case examples such as the Bulgarian phrase above will require a great deal of thought. In particular, cooperation between language researchers and computer scientists is required. Too often language researchers use off-the-shelf software packages, and apply no particular programming skills at all.3 Hence, the goal of the present paper is not to present a new algorithm for gapped phrase extraction, but rather to present some features of what such a phrase extraction program ought to provide. Some technical literature is presented, but the intended readership of this paper is non-technical. Writing is an essential part of learning and evaluating written texts is an essential part of teaching. A good teacher must attempt to understand the ideas presented in a learner text and evaluate whether or not these ideas make sense. Such evaluation can obviously not be performed by a computer. But on the other hand, computers are good at evaluating other aspects of texts. Computers are, for example, very good at picking out patterns of linguistic usage, in particular terms and phrases1 that are used repeatedly. It is often the case that choice of terminology can be surprisingly effective in characterizing texts. For example, the terms “Latent Semantic Analysis” and “Latent Semantic Indexing” mean essentially the same thing, but the former is more characteristic of the educational and psychological communities whereas the latter is more characteristic of the information retrieval community. In a similar vein, Biber (2009) uses characteristic phrases to distinguish between written and spoken English. Up to now, in the eLearning community, bag-of-words based approaches have been most popular for evaluating student essays (Landauer and Dumais, 1997). It is the contention of this paper that the next step of considering phrases will not be possible until eLearning practitioners immerse themselves into the somewhat technical combinatorial pattern matching literature. This paper is concerned with extracting phrases with gaps. This is an important topic since many phrases occur in alternative forms. For example, the English phrase one and the same has an essentially verbatim counterpart in Bulgarian, but the Bulgarian phrase occurs in a variety of forms depending on gender and number of the following noun. The following forms were extracted from a few Bulgarian texts: , , , . In this simple Bulgarian phrase, there are three different alternations. First (’one’) occurs with inflections −∅, - , - and - . Second, - (’same’) occurs with inflections - , - , and - . And third, also contains the “fleeting” or “ghost” vowel , which alternates with ∅.2 If 1.1. Algorithmic Introduction Efficient algorithms for phrase (or n-gram) extraction were introduced into the computational linguistics literature by Yamamoto and Church (2001) and have subsequently been used for a wide variety of applications such as lexicography, phrase-based machine translation and bag-of-phrases based text categorization (Burek and Gerdemann, 2009).4 Ultimately, the goal of such algorithms is to discover repetitive structure as represented by frequently recurring sequences of symbols. Unfortunately, the approach of Yamamoto and Church often misses repetitive structure since phrases often occur with slight variations. For example, the middle term of a phrase might occur in different morphological variants: guages in general (Jetchev, 1997). The vowel (IPA: /i/) is, however, idiosyncratic as a ghost vowel. 3 For language researchers wishing to acquire some programming skills, there is probably no better starting point than Sedgewick and Wayne (2010 forthcoming). 4 Similar algorithms are also used by Dickinson and Meurers (2005) for detecting inconsistencies in annotated corpora. This is particularly relevant, since they are specifically interested in discontinuous (or gapped) annotations. 1 We use the term “phrase” to mean repeated sequence of tokens. This is quite flexible, allowing any kind of tokenizer and phrases of any non-negative length. 2 Ghost vowels are a characteristic of Bulgarian and Slavic lan- 7 advantage of avoiding some difficult problems such as compound nouns in German and word segmentation in Chinese Zhang and Lee (2006). In this paper, we assume that some tokenization (and also possibly normalization) is performed on the corpus, and that tokens are replaced by integers. all join in vs all joined in; or the middle term may vary in other ways: give me a vs give him a. Recently, an algorithm for finding such paired repeats was presented by Apostolico and Satta (2009). This algorithm is quite efficient, as it is shown to run in linear time with respect to the output size. Unfortunately, however, the algorithm is designed to extract “tandem repeats,” which are defined in a way that may not be entirely appropriate for the researcher interested in extracting gapped phrasal expressions. The goal of this paper is, then, to specify the requirements of such researchers. The hope is that this paper will provide a challenge for algorithm designers who may either want to adapt the Apostolico and Satta algorithm or design a new competing algorithm. One difference between the Yamamoto-Church algorithm and the Apostolico-Satta algorithm is the former is based on suffix arrays, whereas the latter is based on suffix trees. This should, however, not be seen as a major distinction, since recent developments with suffix arrays have tended to blur the distinction (Abouelhoda et al., 2004; Kim et al., 2008).5 To some extent, one may think of suffix arrays simply as a data structure for implementing suffix trees. Further implementation issues will be discussed below. 2. 3. Desiderata We now present a rather incomplete list of desirable features for gapped phrase extraction. 3.1. Main Parameters By default an extracted gapped phrase αmβ should have |α| ≥ 1, |β| ≥ 1 and m = [a1 | . . . | an ] where n ≥ 2. These are minimal values, and may be set to larger values to extract possibly more interesting phrases. If the length of α or β is set to 0, then the gap will be on the periphery. The length of α may also be seen as an efficiency consideration. The central idea of the Apostolico and Satta algorithm, for example, picks out candidate left parts first, and then for each of these, a recursive call is made to find a corresponding right part.7 Putting a length restriction on α means that there are fewer candidates, and therefore fewer recursive calls. Clearly, an alternative approach would be to start with the right piece and recursively search for corresponding left pieces. Some Terminology 3.2. Conditions on the Gap A language researcher studying gapped phrases may find a gap of length 4 interesting (from one end of the Earth to the other) but a gap of length 7 uninteresting (Medical bills from one puppy catching something and passing it on to the other puppy). With character-based tokenization, however, a gap of length 6 or more may well be interesting: and half − [believ|f orm|melt|slouch]ed.8 In addition to specifying the maximum length of the gap, it may be desirable to be able to specify a minimum length. An alternation like b[|o]ut for ’boat’ and ’but’ seems particularly perverse, though perhaps there are other ways to filter out such uninteresting cases. Biber (2009) limits the gap to be of length exactly one. But this seems to merely reflect the limitations of a particular software package since in the context from one X to the other, there is very little difference between the single word ’extreme’ and the four word phrase ’end of the Earth’. It may also be possible for the gap to have negative length, effectively meaning that the left and right parts overlap. This is allowed, for example, in the Apostolico-Satta algorithm, though it is unclear what advantages this “feature” has for natural language texts.9 To start with, let us consider a typical gapped expression: from one X to the other.6 The goal of gapped phrase extraction is to discover gapped expressions such as this. Once such a pattern is discovered, a researcher can easily find further instances of the pattern by searching with regular expressions in other corpora. Initially however, the phrase extraction may discover just a couple of instantiations for X, which may be expressed as a simple regular expression using only alternation: f romone[shore|edge]totheother. In referring to patterns such as this, we will use α to refer to the left part f rom one and β to refer to the right part to the other. It will generally be assumed that the left and right parts are non-empty. For the alternation in the middle, We will use the letter m. It will generally be assumed that the middle consists of at least two alternatives. As usual, we will use letters from the beginning of the alphabet a, b, c to represent single symbols, and letters from the end of the alphabet w, x, y to represent sequences. The reader should keep in mind, however, that what counts as a symbol depends on the tokenization. The two obvious approaches are character-based and word-based tokenization, with the latter in particular requiring algorithms adapted to a large alphabet. In some sense, word-based tokenization is more natural, though the character-based approach has the 5 Kim et al. (2008) is of particular interest for NLP, since their approach is optimized for a large alphabet, as opposed to most of the bioinformatics literature which uses a four-letter alphabet. With a large alphabet, it becomes possible to tokenize a text by words, and treat each word as a “letter.” 6 Perhaps eLearning practitioners who are interested in ontologies will find this example interesting. There is clearly a class of “polarized entities” that can serve as good instantiations for X. Paired, but non-polarized entities like sock and shoe are not very felicitous. Is there a WordNet synset for this? 8 7 We’re simplifying quite a bit here. The “recursive call” is, in fact, rather different from the original call. 8 This pattern is found in Moby Dick. A language researcher might be interested in such an example since it seems to pick out a semantic class of actions that occur or can be performed in a partial manner. 9 In fact, the the Apostolico-Satta algorithm has a parameter d not for the length of the gap, but rather for the maximum distance between the beginning of the left part and the beginning of the right part. If d < |α|, then there could be overlap. This, however, does not seem to be a serious limitation, since it would be easy enough to adapt the Apostolico-Satta allgorithm to let d be some function of |α|. patterns as Ahab r . . . ed and Ahab re . . . ed. That is, think of the middle part as not really part of the pattern, but rather as providing information about occurrences of the pattern. In this sense, Ahab re . . . ed appears to be more specific, since the occurrence with rushed is lost. But there is a problem here. Recall that the . . . matches sequences no longer than length d. If we set d to be 4, then the supposedly less specific pattern will not match Ahab remained, and the supposedly more specific pattern will match this occurrence. This suggests that the Apostolico-Satta approach of letting d be the distance from the beginning of the left piece to the beginning of the right piece may be preferable. On the other hand, their approach allows the left and right parts to overlap. More sophisticated possibilities also exist. For example, one could specify the the gap length conditions as a function of the lengths of the left and right pieces. Or perhaps a function of the contents of the left and right parts and the gap could be used. Another possibility would be to measure the gap length as number of syllables or number of some other kind of linguistic unit. Probably, it would not be possible to incorporate such conditions directly into the extraction algorithm. Most likely, a secondary filter would be the required approach. 3.3. Principle of Maximal Extension A fundamental notion in the pattern recognition literature is that of saturation, which Apostolico (2009) defines as follows: 3.4. No Overlap The Apoostolico-Satta algorithm is designed to find tandem occurrences of two strings, which they explain as follows: . . . a pattern is saturated relative to its subject text, if it cannot be made more specific without losing some of its occurrences. By the two strings occurring in tandem, we mean that there is no intermediate occurrence of either one in between. This is stated in a rather imprecise way, but the intention should be clear. Suppose that the pattern mumbo has occurrences at (i, i), (j, j) and (k, k). Suppose further that the pattern is extended (made more specific) to mumbo jumbo and that occurrences are now found at (i, i + 1), (j, j + 1) and (k, k + 1). Then the 3 old occurrences should not be seen as lost, but rather as replaced by 3 corresponding longer occurrences. So the pattern for the incomplete phrase mumbo is unsaturated. Suffix trees and suffix arrays are a kind of asymmetrical data structure that make extensions to the right easier to find than extensions to the left. So given mumbo, it is easy to extend this to the right, but given jumbo, it is much harder to extend this to the left. For left extensions, Abouelhoda et al. (2004) advocate the use of a Burrows and Wheeler transformation table. For gapped phrases, the issue of extension to the left and right becomes even more complex. Given a pattern α[ax1 | · · · | axn ]β, it seems reasonable to extract the a, turning the pattern into αa[x1 | · · · | xn ]β, capturing the generalization that the middle part always starts with a. If the left and right parts are both extended, then one can find patterns like Ahab r[each|emain|etir|ush]ed (from Moby Dick), where extension of the left part represents the linguistically interesting fact that all the verbs are in the past tense. The extension of the left part, on the other hand, captures the rather uninteresting fact that all the verbs happen to start with r. If the left part is now further extended, then the pattern becomes more specific, and loses some of its occurrences: Ahab re[ach|main|tir]ed. It is unclear how a gapped phrase extraction program should be designed to rule out such uninteresting extensions.10 It is interesting to think about the example in the previous paragraph in terms of saturation. Suppose we think of the To illustrate the problem of intermediate occurrences, consider the following truncated version of Moby Dick (tokenized by character): the boat. the white whale The sequence the occurs twice, so this is a candidate left part. The sequence wh occurs twice, both times with the to the left (supposing d = 6, for example). So without taking care, one might extract the nonsense pattern the [| white] wh. The Apostolico-Satta algorithm is designed from the beginning to rule out such overlaps. But the basic algorithm presented in section 4. has a problem with these. An extra step would be required just to filter out such overlaps. 3.5. Boundaries A common feature in the study of (gapped) phrases is that they are allowed to cross many, but not all kinds, of boundaries. For example, in the “lexical bundles” studied by Biber (2009) is that they, more often than not, cross the category boundaries of traditional linguistics. Typical examples are: as a result of and it is possible to. With tokenizing by letter, one often finds partial words (example from Moby Dick): contrast [between|in|of |to] th. Here the partial word th seems to play an important role in English. Still there are some boundaries that should not be crossed. Dickinson and Meurers (2005), for example, note that the patterns that they were looking for should not cross sentence boundaries. There is therefore a temptation to put such boundary constraints into the phrase extraction program. We believe, however, that this is a mistake. The phrase extraction program is already complicated enough without having to deal with such special cases. In this case there seems to be a fairly simple-minded alternative. Simply use a tokenizer that replaces each boundary punctuation character (period, question mark, etc) with a unique integer identifier. This requires a bit of bookkeeping to remember which integers have been used to represent 10 On a personal note, it is examples like this that inspired us to write this paper. We had started off by implementing an algorithm similar to that of Apostolico and Satta (2009), and after encountering problematic cases like this, decided to put the algorithm aside for a while, and to concentrate on writing a specification of desirable features for any gapped phrase extraction program. 9 4. which punctuation characters, but it is still much easier than modifying the suffix arrays or trees. A similar approach is described in section 4. to avoid extraction of “phrases” which start near the end of one text in the corpus, and conclude near the beginning of the next text. 3.6. Algorithmic Specifications In this section, we sketch a rather basic algorithm which may serve as the basis for something more useful.13 The idea is quite simple. Given a phrase extraction algorithm for non-gapped phrases, candidate left parts can be extracted. To reduce the search space, these candidate left parts may be required to be maximally extended or “interesting” in various ways. For a given phrase p, find all occurrences of p in the corpus, and denote each such occurrence as (i, j), where i and j are the indices of the first and last tokens of the occurrence in the corpus. For each such occurrence, specify the right context as (j + 1, j + d + 1), where d is the maximal length allowed for the gap. Clearly, these right contexts can be found efficiently using either suffix trees or suffix arrays. Now form a new corpus by treating each of these right contexts as a single text in this subcorpus. Following the idea of Yamamoto and Church (2001), the texts in this subcorpus should be concatenated, using sentinels to separate one text from the next, and also with one sentinel at the end. Assuming that the text is represented by integer id’s, then the smallest otherwise unused integers can be used for the sentinels. Assuming that a subcorpus is built up in this way, then finding right parts corresponding to each left part is mostly just a matter of running the phrase extraction program again for each subcorpus. There are, however a couple of issues to watch out for. First,pp it is important that a different integer is used for each sentinel. Otherwise the sentinels themselves, including possibly context around the sentinels, will be seen as repeated phrases. Second, there is a problem with limiting the right context to be of length d + 1. If the gap is of length d, then the right context is just long enough to include one token from the right part. Consider, for example the following subcorpus for the left part from one with d = 4: end of the Earth to $ extreme to the other foo $ shore to the other bar $.14 From this subcorpus, one would find the patterns: f rom one [end of the Earth | extreme | shore] to and f rom one [extreme|shore] to the other. It is clear that the first of these patterns has been artificially truncated. This problem is solvable, but it takes a bit of bookkeeping. The idea here is that when a subcorpus is formed, for each token in the subcorpus, a record is kept of where that token was located in the original (parent) corpus.15 With this record, the end locations of each occurrence of f rom one [end of the Earth | extreme | shore] to can be found in the parent corpus. The longest common prefix can then be found for the set of sequences starting at these end locations, and this can be used to extend the truncated right part. There is still a problem, however, since if f rom one [end of the Earth|extreme|shore] to is extended to f romone[extreme|shore]totheother, then two instances of this latter pattern will be found. So an efficient way of avoiding such duplications must be found. Interesting Phrases To be useful, a phrase extraction program must be equipped with a notion of what kinds of phrases are interesting. Citing Apostolico (2009): Irrespective of the particular model or representation chosen, the tenet of pattern discovery equates overrepresentation with surprise, and hence with interest. In linguistics, there are other ways of defining interest. For example, a phrase may be considered interesting if it exhibits some degree of non-compositional semantics, or if it exhibits some particular syntactic pattern. For an overview, see Evert (2009). Another way of measuring interest is more goal directed. One might say, for example, that a phrase is interesting if it is useful for distinguishing positive camera reviews from negative ones (Tchalakova, 2010). Or alternatively, a phrase could be considered interesting if it is helpful for distinguishing high quality online posts from low quality ones (Burek and Gerdemann, 2009). A central insight of (Yamamoto and Church, 2001) is that measures of interest are most commonly based upon basic measures of term frequency and document frequency, and that these measures need only be calculated for the saturated phrases.1112 So, for example, the term frequency and document frequency for mumbo is exactly the same as for mumbo jumbo, so this information can be stored just once at the appropriate node in a suffix tree or for an lcp-interval in a suffix array. The problem is, of course, that jumbo really ought to be included in this class as well, and neither suffix trees nor suffix arrays provide a natural way of representing such equivalence classes. A key question to answer is how the interest measure should be incorporated into the gapped phrase extraction algorithm. The simplest approach would be to extract phrases initially without regard to interest, and then use the interest measure as a filter to remove uninteresting cases. Another approach would be to incorporate the interest measure into the algorithm, perhaps by restricting candidate left parts to just the interesting cases before looking for matching right contexts. We leave this as an open question. 11 This was at least the basic intuition. In fact, the YamamotoChurch algorithm did not maximally extend phrases to the left since they did not use the Burrows and Wheeler transformation table as advocated by Abouelhoda et al. (2004). 12 Aires et al. (2008) presents a rather more complicated formula, in which the interest of a phrase is a function of both the term frequency of its subphrases and the superphrases containing the phrase as a subphrase. This is algorithmically more complex, but may be an improvement. 13 An alternative is presented in Gerdemann (2010). The tokens foo and bar are arbitrary. All sentinels are printed as $ even though different integers are used. 15 Such record keeping is required in any case if document frequencies are required for the phrases. 14 10 Alberto Apostolico. 2009. Monotony and Surprise: Pattern Discovery under Saturation Constraints. In Anne Condon, David Harel, Joost N. Kok, Arto Salomaa, and Erik Winfree, editors, Algorithmic Bioprocesses, pages 15– 29. Springer. Douglas Biber. 2009. A corpus-driven approach to formulaic language in english: Multi-word patterns in speech and writing. International Journal of Corpus Linguistics, 14(3):275–311. Gaston Burek and Dale Gerdemann. 2009. Maximal phrases based analysis for prototyping online discussion forums postings. In Proceedings of the RANLP workshop on Adaptation of Language Resources and Technology to New Domains (AdaptLRTtoND), Borovets, Bulgaria. Markus Dickinson and W. Detmar Meurers. 2005. Detecting errors in discontinuous structural annotation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL-05), Ann Arbor, MI, USA. Stefan Evert. 2009. Corpora and collocations. In A. Lüdeling and M. Kytö, editors, Corpus Linguistics: An International Handbook of the Science of Language and Society, volume 2, chapter 58, pages 1212–1248. Mouton de Gruyter, Berlin/New York. Dale Gerdemann. 2010. Suffix and prefix arrays for gappy phrase discovery. Presented at: First TübingenWorkshop on Machine Learning; Slides at: http://www.sfs.unituebingen.de/ dg/ks.pdf. Georgi Jetchev. 1997. Ghost Vowels and Syllabification: Evidence from Bulgarian and French. Ph.D. thesis, Scuole Normale Superiore di Pisa. Dong Kyue Kim, Minhwan Kim, and Heejin Park. 2008. Linearized suffix tree: an efficient index data structure with the capabilities of suffix trees and suffix arrays. Algorithmica, 52(3):350–377. Thomas K. Landauer and Susan T. Dumais. 1997. A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104(2):211–240. Robert Sedgewick and Kevin Wayne. 2010 (forthcoming). Algorithms. Addison-Wesley, 4th edition. Web page: www.cs.princeton.edu/algs4/home (see in particular: www.cs.princeton.edu/algs4/51radix and www.cs.princeton.edu/courses/archive/spring10/cos226/ lectures/16-51RadixSorts-2x2.pdf). Maria Tchalakova. 2010. Automatic sentiment classification of product reviews. Master’s thesis, Universität Tübingen, Germany. Mikio Yamamoto and Kenneth W. Church. 2001. Using suffix arrays to compute term frequency and document frequency for all substrings in a corpus. Comput. Linguist., 27(1):1–30. D. Zhang and W.S. Lee. 2006. Extracting key-substringgroup features for text classification. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 474–483. ACM. Another problem also involves maximal extension. Suppose that the saturated pattern α is chosen as the left part. Since it is saturated, it cannot be extended to aα or αb without losing some of its occurrences. Now suppose that β is chosen as a corresponding right part, so that the gapped pattern is α . . . β. Now it may be that α by itself is saturated, but nevertheless in this context extensions could be made to aα . . . β or αb . . . β without losing any occurrences. Extending the pattern to αb . . . β, since it encroaches upon the length of the gap (represented by . . .). So rather than extending the left part, it is preferable to filter out cases such as α . . . β where the left part is extendable. Suppose that α can be extended to α$, where α and α$ are both saturated. Then both α and α$ will be considered as candidate left parts. So more specific instances of α . . . β may be found in any case when this pattern is not saturated. The efficiency of the algorithm is, however, an issue, since the filtering turns it partially into a generate-and-test algorithm.16 5. Conclusion Gapped phrase extraction clearly has a lot of utility, as witnessed by the number of language researchers who have investigated such phrases, using very imperfect tools. The proper tool for this purpose is an open question which has not been resolved in this paper. The hope is that, as specified in the title, this paper will serve as a challenge, both to someone interested in algorithm design and implementation or to someone who is interested in further specifying what features a gapped phrase extraction program ought to have. The benefits to eLearning will be that learner texts will be better characterized in terms of the phrases that that the learner uses, instead of simply in terms of a bag-of-words model. Learners should get feedback indicating which phrases are effective, high-quality, appropriate for a particular domain, etc. Such feedback will result in improved writing, in turn leading to better communication. And ultimately, in terms of social theories of learning, better communication will result in improved learning. 6. References Mohamed Ibrahim Abouelhoda, Stefan Kurtz, and Enno Ohlebusch. 2004. Replacing suffix trees with enhanced suffix arrays. J. of Discrete Algorithms, 2(1):53–86. José Aires, Gabriel Lopes, and Joaquim Silva. 2008. Efficient multi-word expressions extractor using suffix arrays and related structures. In Proceeding of the 2nd ACM workshop on Improving non-English web searching, pages 1–8, Napa Valley, California. Alberto Apostolico and Giorgio Satta. 2009. Discovering subword associations in strings in time linear in the output size. J. of Discrete Algorithms, 7(2):227–238. 16 Even as a partly generate-and-test algorithm, initial tests suggest that this approach may be efficient enough for practical purposes. One helpful strategy would be to recognize special cases where the tests can be avoided. For example, if the candidate left part is already supermaximal (Abouelhoda et al., 2004) by itself, then it will not be necessary to check for extensions of this left part when it combines with a right part. 11 Towards Resolving Morphological Ambiguity in Arabic Intelligent Language Tutoring Framework Khaled Shaalan1, Marwa Magdy2, Doaa Samy3 ! "#$%"&'()(*$"+,(-%'*()."(,"/012(3"45"&67"89:;!:"/012(3"+<=" > "?2@0A)."6B"C6DE0)%'*"F"G,B6'D2)(6,3"C2('6"+,(-%'*().3":"<$D%H"I%J%A"K)L3"M(N2"!>O!8"=P.E)" 8" C2('6"+,(-%'*()." Q$2A%HL*$22A2,R10(HL2@L2%3"DLD2PH.RB@(S@0L%H0L%P3"H622*2D.R@0L%H0L%P"" Abstract <D1(P0()."(*"2"D2T6'"(**0%"(,"2,."UV4"2EEA(@2)(6,")$2)"6@@0'*"J$%,"D0A)(EA%"(,)%'E'%)2)(6,*"6B")$%"*2D%"A2,P02P%"E$%,6D%,6,"2'%" E'6H0@%HL"M(-%,")$%"@6DEA%7()."6B")$%"<'21(@"D6'E$6A6P(@2A"*.*)%D3"()"(*"H(BB(@0A)")6"H%)%'D(,%"J$2)")$%"(,)%,H%H"D%2,(,P"6B")$%" J'()%'"(*L"W6'%6-%'3"G,)%AA(P%,)"V2,P02P%"#0)6'(,P"K.*)%D*"J$(@$",%%H")6"2,2A.N%"%''6,%60*"A%2',%'"2,*J%'*3"P%,%'2AA.3"(,)'6H0@%" )%@$,(X0%*3"*0@$"2*"@6,*)'2(,)*"'%A272)(6,3")$2)"J60AH"E'6H0@%"D6'%"(,)%'E'%)2)(6,*")$2,"*.*)%D*"H%*(P,%H"B6'"E'6@%**(,P"J%AASB6'D%H" (,E0)L"#$(*"E2E%'"2HH'%**%*"(**0%*"'%A2)%H")6")$%"D6'E$6A6P(@2A"H(*2D1(P02)(6,"6B"@6''%@)%H"(,)%'E'%)2)(6,*"6B"%''6,%60*"<'21(@"-%'1*" )$2)"J%'%"J'())%,"1."1%P(,,%'")6"(,)%'D%H(2)%"K%@6,H"V2,P02P%"V%2',%'*L"#$%"D6'E$6A6P(@2A"H(*2D1(P02)(6,"$2*"1%%,"H%-%A6E%H"2,H" %BB%@)(-%A."%-2A02)%H"0*(,P"'%2A")%*)"H2)2L"G)"2@$(%-%H"*2)(*B2@)6'."'%*0A)*"(,")%'D*"6B")$%"'%@2AA"'2)%L" " 1. Introduction <," G,)%AA(P%,)" V2,P02P%" #0)6'(,P" K.*)%D" YGV#KZ" (*" 2" @6DE0)%'S12*%H"%H0@2)(6,2A"*.*)%D")$2)"2AA6J*"*(D0A2)(6," 6B" 2" $0D2," )0)6'L" <," GV#K" (*" 2" -2A021A%" )66A" 0*%H" (," A2,P02P%" %SA%2',(,P" E'6P'2D*L" &%*(H%*3" ()" (*" $(P$A." H%D2,H%H"2*"2,"2EEA(@2)(6,"J()$(,")$%"U2)0'2A"V2,P02P%" 4'6@%**(,P" B(%AH" *(,@%" ()" $%AE*" E%6EA%" (," )$%" A2,P02P%" A%2',(,P"E'6@%**"%()$%'"B6'",2)(-%"6'"B6'"B6'%(P,"A2,P02P%*L" #$%*%"UV4")66A*"0*%H"(,"A2,P02P%"A%2',(,P"@2,"1%"0*%H"(," *%-%'2A" J2.*" *0@$" 2*" parsing 6B" )$%" A%2',%'" (,E0)" 2,H" diagnosis 6B" D6'E$6A6P(@2A" 2,H" *.,)2@)(@" %''6'*" YU%'16,,%3" >;;8ZL" [6J%-%'3" GV#K" B6'" %''6'" H(2P,6*(*" )6" 2,2A.N%" A%2',%'*\" (,E0)" 2,H" E'6-(H%" (,)%AA(P%,)" 2,H" '%2AS )(D%"B%%H12@Q"(*"$(P$A.",%%H%H"B6'")$%"B6AA6J(,P"'%2*6,*]"" ! GV#K" E'6-(H%" (,H(-(H02A(N%H" )0)6'(,P" )6" A%2',%'*" J$6"2'%"6B)%,"A%B)")6")$%D*%A-%*"2,H"@2,,6)"'%A." 0E6,")%2@$%'*"2,H")0)6'*")6"$%AE")$%DL"" ! ^%A(21A%" %''6'" H(2P,6*(*" *.*)%D*" J60AH" 2AA6J" 0*%'*_20)$6'*" )6" 6-%'@6D%" )$%" A(D()2)(6,*" 6B" D0A)(EA%" @$6(@%" X0%*)(6,*" 2,H" B(AAS(,S)$%S1A2,Q*" ).E%*" 6B" %7%'@(*%*L" &%*(H%*3" GV#" *.*)%D*" @2," E'6-(H%"2"*0()21A%"EA2)B6'D"B6'"(,)'6H0@(,P"D6'%" @6DD0,(@2)(-%" 2,H" (,)%'2@)(-%" )2*Q*" )6" A%2',%'*" YV\$2('%"2,H"?2A)(,3">;;8ZL" +,B6')0,2)%A.3" 2AD6*)" 2AA" UV4" )66A*" *0@$" 2*" E2'*%'*3" D6'E$6A6P(@2A"2,2A.N%'3"%)@3"2'%"H%*(P,%H")6"$2,HA%"J%AAS B6'D%H" (,E0)L" K63" )6" $2,HA%" (AASB6'D%H" (,E0)" (," GV#K3" )%@$,(X0%*" *0@$" 2*" @6,*)'2(,)" '%A272)(6," 2'%" %DEA6.%H" Y?2A)(,3" >;;8ZL" G," 2,." A2,P02P%" D6H%A3" )$%" E2')(2A" *)'0@)0'%*" @2," @6D1(,%" 6,A." (B" *6D%" @6,*)'2(,)*" 6'" @6,H()(6,*"2'%"D%)L"`$%,")$%*%"@6,*)'2(,)*"2'%"'%A27%H3"2," 2))2@$D%,)" (*" 2AA6J%H" %-%," (B" )$%" @6,*)'2(,)" (*" ,6)" *2)(*B(%HL" #$%" '%A27%H" @6,*)'2(,)" D0*)" 1%" D2'Q%H" 6," )$%" *)'0@)0'%" *0@$" )$2)" )$%" ).E%" 2,H" E6*()(6," 6B" )$%" H%)%@)%H" %''6'" @2," 1%" (,H(@2)%H" Y@6,B('D%HZ" A2)%'" 6,L" G," GV#K3" '%A27(,P" )$%" @6,*)'2(,)*" 6B" )$%" A2,P02P%" )6" 2,2A.N%" A%2',%'a*"2,*J%'"(,%-()21A."E'6H0@%"2D1(P060*"*6A0)(6,*3" (L%L3"D6'%"@6''%@)%H"(,)%'E'%)2)(6,*3")$2,"*.*)%D*"H%*(P,%H" B6'" 6,A." J%AASB6'D%H" (,E0)" Y<))(23" >;;OZL" C6,*(H%'3" B6'" 12 %72DEA%3")$%"A%2',%'"(,E0)"<'21(@"J6'H""!"#$%L"#$(*"J60AH" $2-%")J6"(,)%'E'%)2)(6,*]"!Z")$%"A%2',%'"D(P$)"D%2,""!"#%" _=(b)0_!"YA(-%HSGZ"J$(@$"(*"'%A2)%H")6"E'61A%D*"J()$"-6J%A" A%))%'*")$2)"D2Q%*")$%"*$6')"-6J%A"&'()*+"_(_"A6,P"6,%"",-"."_._3" 6'">Z"*_$%"D(P$)"D%2,"!#$%"_=2.c2b)0_"Y*0*)2(,%HSGZL"" #$(*" E2E%'" 2HH'%**%*" (**0%*" '%A2)%H" )6" )$%" D6'E$6A6P(@2A"H(*2D1(P02)(6,"6B"@6''%@)%H"(,)%'E'%)2)(6,*" 6B" %''6,%60*" <'21(@" -%'1*" J'())%," 1." 1%P(,,%'" )6" (,)%'D%H(2)%" K%@6,H" V2,P02P%" V%2',%'*" YKVV*ZL" #$%" E'6E6*%H"*.*)%D"B6AA6J*")$%"2EE'62@$"2"A2,P02P%")%2@$%'" 0*%*"(,"H(*2D1(P02)(,P"2,H"*%A%@)(,P"2"E'%B%''%H"2,2A.*(*L" G)" @6,*(H%'*" )$%" A(Q%A($66H" 6B" 2," %''6'" J$(@$" )2Q%*" (,)6" 2@@60,)")$%" A%-%A"6B" (,*)'0@)(6,"2,H" )$%"B'%X0%,@." 2,H_6'" H(BB(@0A)."6B"<'21(@"@6,@%E)*L"#$%"@6,@%',"$%'%"(*")6"2-6(H" D(*A%2H(,P" 6'" (,@6''%@)" B%%H12@QL" #$%" '%*0A)" 6B" H(*2D1(P02)(6,"2,H"*%A%@)(,P"2EE'6E'(2)%"2,2A.*(*"(*"0*%H" J()$(,"GV#K"B'2D%J6'Q")6"H%)%@)")$%"%72@)"*60'@%"6B"%''6'" 2,H"E'6-(H%")$%"%''6'"*E%@(B(@"B%%H12@QL" <$D%H" Y>;;;Z" 2HH'%**%H" )$%" E'61A%D" 6B" <'21(@" D6'E$6A6P(@2A" H(*2D1(P02)(6," )6" *%A%@)" )$%" D6*)" A(Q%A." D6'E$6A6P(@2A"2,2A.*(*"B6'"%2@$"J%AASB6'D%H"J6'H"(,")$%" )%7)L" [%" 0*%H" 2" E6J%'B0A" H.,2D(@" ,SP'2D" *)2)(*)(@2A" H(*2D1(P02)(6," )%@$,(X0%L" #$%" *)2)(*)(@2A" Q,6JA%HP%" 6B" )$%"*.*)%D"D2."1%"2A)%'%H"6'"2HT0*)%H"2,.)(D%")6"@6,*(H%'" 2,."H%*('%H")%7)"@6'E0*L"&0)3")6")$%"1%*)"6B"60'"Q,6JA%HP%" ,6"'%*%2'@$"$2*"2HH'%**%H")$%"E'61A%D"6B"H(*2D1(P02)(,P" corrected"(,)%'E'%)2)(6,*"6B"(AASB6'D%H"<'21(@"-%'1*L"" #$%"'%*)"6B")$(*"E2E%'"(*"*)'0@)0'%H"2*"B6AA6J*L"K%@)(6,">" E'%*%,)*" 2" 1'(%B" H(*@0**(6," 6B" <'21(@" D6'E$6A6P(@2A" 2D1(P0()." E'61A%DL" K%@)(6," 8" H%*@'(1%*" )$%" E'6E6*%H" *.*)%DL"K%@)(6,"9"H(*@0**%*")$%"'%*0A)*"B'6D")$%"@6,H0@)%H" %7E%'(D%,)L" ?(,2AA.3" (," K%@)(6," :3" J%" P(-%" *6D%" @6,@A0H(,P"'%D2'Q*L" "" """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" ! "&0@QJ2A)%'" )'2,*A()%'2)(6," (*" 0*%H" $%'%" )6" ^6D2,(N%" <'21(@" %72DEA%*"Y&0@QJ2A)%'">;;>ZL" " 2. Arabic Morphological Ambiguity Problem 8L <'21(@" A2,P02P%" (*" 6,%" 6B" )$%" K%D()(@" A2,P02P%*" )$2)" (*" H%B(,%H"2*"2"diacritized A2,P02P%"J$%'%")$%"E'6,0,@(2)(6," 6B" ()*" J6'H*" @2,,6)" 1%" B0AA." H%)%'D(,%H" 1." )$%('" *E%AA(,P" @$2'2@)%'*"6,A.L"/(2@'()(@*"2'%"*E%@(2A"D2'Q*"E0)"216-%"6'" 1%A6J" )$%" *E%AA(,P" @$2'2@)%'*" )6" H%)%'D(,%" )$%" @6''%@)" -6@2A(N2)(6,"2,H3")$0*3")$%"@6''%@)"E'6,0,@(2)(6,L" +,B6')0,2)%A.3" H(2@'()(@*" 2'%" '2'%A." 0*%H" (," @0''%,)" <'21(@" J'()(,P" @6,-%,)(6,*L" #$%" @6''%@)" E'6,0,@(2)(6," 2,H" (,)%'E'%)2)(6," 6B" ,6,%" 6'" E2')(2AA." H(2@'()(N%H" )%7)" H%E%,H*" 6," )$%" ,2)(-%" A2,P02P%" @6DE%)%,@%" 2,H" )$%" @6,)%7)L" /0%" )6" )$%" 6E)(6,2A" H(2@'()(N2)(6,3" )J6" 6'" D6'%" J6'H*" (," <'21(@" 2'%" $6D6P'2E$(@]" )$%." $2-%" )$%" *2D%" 6')$6P'2E$(@"B6'D3")$60P$")$%"E'6,0,@(2)(6,"2,H"D%2,(,P" (*" )6)2AA." H(BB%'%,)" Y<$D%H3" >;;;d" <))(23" >;;Od" [212*$3" >;;9ZL"#21A%"!"A(*)%H"*6D%"$6D6P'2E$(@"%72DEA%*L" " Word Lemma /0."_.=H_" Different Interpretations 1-%2"_e2=<H_" /30."_.0=(H_"Y1'(,P"12@QZ" 1-%"_=<H_" /40."_.2=0H_"Y'%)0',Z" /%5"_J2=(H_" /30."_.2=(H_"YE'6D(*%Z" /%"_=2Hc_" 6/407."_.2=0Hc_"Y@60,)Z" /%2"_e2=Hc_" 6/304."_.0=(Hc_"YE'%E2'%Z"" @$2,P%" (," E'6,0,@(2)(6," J()$60)" 2,." %7EA(@()" 6')$6P'2E$(@2A" %BB%@)" H0%" )6" A2@Q" 6B" *$6')" -6J%A*" YH(2@'()(@*ZL"<," %72DEA%" 6B" )$(*" (*" )$%" 2D1(P0()." 6B" 2@)(-%"-*L"E2**(-%"-*L"(DE%'2)(-%"-%'1"B6'D*L" 9L K6D%" E'%B(7%*" 2,H" *0BB(7%*" @2," 1%" $6D6P'2E$(@" J()$"%2@$"6)$%'L"?6'"%72DEA%3")$%"E%'B%@)"-%'1"*0BB(7" ="_#%$_"@2,"(,H(@2)%"%()$%']"!Z"B('*)"E%'*6,"*(,P0A2'3">Z" *%@6,H" E%'*6," *(,P0A2'" D2*@0A(,%3" 8Z" *%@6,H" E%'*6," *(,P0A2'" B%D(,(,%3" 6'" 9Z" )$('H" E%'*6," *(,P0A2'" B%D(,(,%L"" :L 4'%B(7%*"2,H"*0BB(7%*"@2,"2@@(H%,)2AA."E'6H0@%"2" B6'D" )$2)" (*" $6D6P'2E$(@" J()$" 2,6)$%'" B0AA" B6'D" J6'HL"?6'"%72DEA%3")$%"J6'H""/">2"@2,"1%"(,)%'E'%)%H"2*" />2"_e2*2H_"YA(6,Z"6'"</?>2""_e2S*0Hc_"YGS&A6@QZL" /(BB(@0A)(%*" (," )$%" E'6@%**" 6B" <'21(@" D6'E$6A6P(@2A" H(*2D1(P02)(6,"2'%")$%"D2(,"'%2*6,"1%$(,H"2HH'%**(,P")$%" @$2AA%,P%*"6B"H%-%A6E(,P"2"D6'E$6A6P(@2A"H(*2D1(P02)(6," D6H0A%_)66A_"%)@")$2)"@2,"$2,HA%"(AASB6'D%H"<'21(@"-%'1*L" 3. Table 1: <,"<'21(@"J6'H")$2)"(*"$6D6P'2E$(@" " [6J%-%'3" 6)$%'" B2@)6'*" @6,)'(10)%" )6" )$%" E'61A%D" 6B" D6'E$6A6P(@2A" 2D1(P0()." (,"<'21(@L"<D6,P" )$%*%" B2@)6'*" Y<))(23">;;OZ]"" !L 5')$6P'2E$(@" 2A)%'2)(6," 6E%'2)(6,*" Y*0@$" 2*" H%A%)(6,Z"B'%X0%,)A."E'6H0@%"(,BA%@)%H"B6'D*")$2)"@2," 1%A6,P")6")J6"6'"D6'%"H(BB%'%,)"A%DD2*"2*"*$6J,"(," #21A%" !L" #$%*%" 2A)%'2)(6," 6E%'2)(6,*" 2'%" H0%" )6" )$%" E$6,6A6P(@2A" @6,*)'2(,)*" 6B" @%')2(," '66)" @6,*6,2,)*L" #$%"(DE6')2,)"(''%P0A2'()."(**0%*"2'%"'%A2)%H")6"<'21(@" J%2Q" -%'1*" )$2)" (,@A0H%" 6,%" 6'" D6'%" J%2Q" A%))%'L" `%2Q" A%))%'*" @2," 1%" H%A%)%H" 6'" *01*)()0)%H" 1." 6)$%'" A%))%'*" 1%@20*%" 6B" <'21(@" E$6,6A6P(@2A" @6,*)'2(,)*" Y=ASK2H2,." 2,H" [2*$(*$" !fgfZL" ?6'" %72DEA%3" )$%" H%A%)(6," 6B" )$%" A%))%'" Y5Z" (," )2Q(,P" )$%" E'%*%,)" Y(DE%'B%@)Z" )%,*%" 6B" )$%" )'(A2)%'2A" '66)" 1S8S5" _JS=SH_3" 0*(,P" '%P0A2'" '0A%*" J60AH" P%,%'2)%" "/"%9.h" _.2SJ=(H_" 10)"2*"()"(*"2"2**(D(A2)%H"YB('*)"J%2QZ"-%'1"()"*$60AH"1%" P%,%'2)%H"2@@6'H(,P")6"*E%@(2A"J%2Q"'0A%*"2,H")$0*"()" 2EE%2'*"(,"J'())%,")%7)*"2*"/0."_.2S=(H_"YE'6D(*%ZL" >L W2,." (,BA%@)(6,2A" 6E%'2)(6,*" 0,H%'A(%" 2" *A(P$)" K6D%"<'21(@" E2))%',*" 2'%" H(BB%'%,)" 6,A." (," )$2)" 6,%"6B")$%D"$2*"2"H601A%H"*60,H"J$(@$"(*",6)"%7EA(@()" (," J'()(,P" 6B" )$%('" @6''%*E6,H(,P" B6'D*" *0@$" 2*" :" "0;" _B2=2A2_"2,H":<0;"_B2=c2A2_L" 13 The Proposed Disambiguation System #$%"E'6E6*%H"*.*)%D"(*"2,"(,)%P'2A"E2')"6B"2,"<'21(@"GV#K" B6'"KVV*L"#$%"*.*)%D"(*"@21A%"6B"2,2A.N(,P"16)$"J%AAS"2,H" (AASB6'D%H"A%2',%'"2,*J%'*L"#$%"GV#K"2,2A.N%*"%2@$"(,E0)" J6'H"2,H"E'6H0@%*"2AA"6B"()*"E6**(1A%"2,2A.*%*"YK$22A2,3" W2PH." 2,H" ?2$D.3" >;!;ZL" <B)%'J2'H*3" )$%" GV#K" *%,H*" )$%*%"2,2A.*%*")6")$%"H(*2D1(P02)(6,"*.*)%D")6"*%A%@)")$%" 2EE'6E'(2)%"2,2A.*(*L"#$%"*%A%@)%H"2,2A.*(*"(*")$%,"0*%H")6" H%)%@)")$%"%72@)"*60'@%"6B"%''6'"(,)'6H0@%H"1.")$%"A%2',%'" 2,H3"@6,*%X0%,)A.3")$%"GV#K"P%,%'2)%*"2"B0AA"H(2P,6*(*"6B" )$%"A%2',%'"(,E0)L"#$(*"(*"@A2'(B(%H"1.")$%"B6AA6J(,P"B(P0'%L" " V%2',%'"<,*J%' " " 46**(1A%"`6'H" " <,2A.*%*" " `6'H" /(*2D1(P02)(6, " W6H0A%" <,2A.N%'" " W6H0A% " " K%A%@)%H"`6'H" i0%*)(6, <,2A.*(*" " " =''6'" " /%)%@)(6," G)%D" " W6H0A% &2,Q(,P" " =''6'"#.E% " " " #0)6'(,P" " ?%%H12@Q"W%**2P% W6H0A%" " " ?(P0'%"!]"<'21(@"GV#K"?'2D%J6'Q" " #$%" B6AA6J(,P" %72DEA%" @A2'(B(%*" $6J" )$%" *.*)%D" J6'Q*L" C6,*(H%'" )$%" B6AA6J(,P" X0%*)(6," )$2)" (*" E'%*%,)%H" )6" )$%" A%2',%']" " @6,T0P2)%"H(BB%'%,)"-%'1"B6'D*L"?6'"%72DEA%")$%"E'%B(7"Y=Z" Example 1: @2," 1%" 0*%H" )6" @6,T0P2)%" )$%" E'%*%,)" )%,*%" 6B" )$%" 8'H" " E%'*6," B%D(,(,%" *(,P0A2'" Y"M " NO"F"E"NZ" 2,H" )$%" >,H" E%'*6," Complete the following sentence with the correct conjugation of the given root in imperfect tense active voice. D2*@0A(,%"*(,P0A2'"YMNOF"!K2Z" 8L ""BCD+"EF/G"Y8SAS@Z"LLLLLL "_jL"Y1S.S=Z"T2Hc2)(."<Ae2'0Nc_"kD."P'2,HD6)$%'"LLLL"Y*%AAZ" )$%"'(@%l" G,")$%"216-%"%72DEA%3")$%"'66)" 8SAS@"_1S.S=_"@6,)2(,*" D(HHA%" J%2Q" A%))%'" A" _._" *6" ()" ,%%H*" *E%@(2A" '0A%*" )6" @6,T0P2)%"()"(,"H(BB%'%,)"B6'D*L"?6'"%72DEA%")6"@6,T0P2)%"()" (,)6" (DE%'B%@)" E2**(-%" -6(@%3" )$%" D(HHA%" J%2Q" A%))%'" *$60AH"1%"*01*)()0)%H"1."+"_<_"*6"()"1%@6D%""8-"H4F"_)0S12<=_" YJ2*"*6AHZ"" <**0D%" )$%" B6AA6J(,P" )J6" 2,*J%'*d" J$%'%" Y2Z" (,@A0H%*"2"J'6,P"@6,T0P2)(6,"6B"2"Hollow"YD(HHA%"J%2QZ" -%'13"2,H"Y1Z"(*")$%"@6''%@)"2,*J%'L" 2L B" CD+"EF/"G"8-"HF" _)2S1(<=" T2Hc2)(."<Ae2'0Nc_" YW.S P'2,HD6)$%'"*%AA*")$%S'(@%ZL" 1L B" CD+"EF/"G"I"$HF" _)2S1(.=" T2Hc2)(." <Ae2'0Nc_" YW.S P'2,HD6)$%'"*%AA*")$%S'(@%ZL" #$%" GV#K" E'6H0@%*" )J6" E6**(1A%" 2,2A.*%*" B6'" )$%" %''6,%60*"J6'H"8-HF]" ! Third person singular feminine imperfect verb in the active voice with converted middle letter A /y/ to + /A/L" ! Third person singular feminine imperfect verb in the passive voice." #$%," )$%" H(*2D1(P02)(6," *.*)%D" *%A%@)*" )$%" D6*)" 2EE'6E'(2)%" 2,2A.*(*" 2@@6'H(,P" )6]" )$%" A%2',%'" A%-%A" 2,H" H(BB(@0A)."6B"<'21(@"@6,@%E)*>L"?6'"%72DEA%"(,"<'21(@3")$%" E2**(-%"-6(@%"(*"2"'2'%"@6,*)'0@)(6,"2,H"()"(*"H601)B0A")$2)"2" 1%P(,,%'"A%2',%'"6B"<'21(@"J60AH"J'()%"2"E2**(-%"-6(@%"6B" 2" -%'1" (,*)%2H" 6B" ()*" 2@)(-%" -6(@%L" #$%'%B6'%3" )$%" *.*)%D" 2H6E)*" *6D%" prioritized conditions" )6" *%A%@)" )$%" D6*)" E'%B%''%H" J6'H" 2,2A.*(*L" [%,@%3" (," )$(*" @2*%3" )$%" *.*)%D" J(AA"*%A%@)")$%"first analysisL"#$(*"2,2A.*(*"(*"A2)%'"6,"0*%H" 1." GV#K" )6" H%)%@)" )$%" %''6'" D2H%" 1." )$%" Y(,@6''%@)" @6,T0P2)(6,"6B"-%'1"(,"(DE%'B%@)")%,*%"2@)(-%"-6(@%Z"""" G," )$%" E'6E6*%H" *.*)%D3" J%" (,-%*)(P2)%H" 60'" H(*2D1(P02)(6," 2EE'62@$" 6," )$%" B6AA6J(,P" )$'%%" ).E%*" 6B" 2D1(P060*"2,2A.*(*"6B"%''6,%60*"A%2',%'"(,E0)]" !L #$%" 6')$6P'2E$(@" D2)@$" (," ,6,SH(2@(')(N%H" )%7)" 1%)J%%,"<'21(@" @6,T0P2)%H" -%'1" B6'D*" (," E2**(-%" -6(@%3" 2,H"2@)(-%"-6(@%3"(DE%'B%@)"6'"E%'B%@)")%,*%3"'%*E%@)(-%A.L" ?6'"%72DEA%3"Y"":"7J7K"_naqala/Z"(*")$%"E%'B%@)")%,*%"6B")$%"8'H" E%'*6," *(,P0A2'" D2*@0A(,%" (," 2@)(-%" -6(@%3" J$(A%" Y:""3J4K" /nuqil_Z" (*" )$%" E%'B%@)" )%,*%" B6'" )$%" 8'H" E%'*6," *(,P0A2'" D2*@0A(,%"(,"E2**(-%"-6(@%L"K2D%"E$%,6D%,6,"(*"'%E%2)%H" (,")$%"(DE%'B%@)")%,*%"Y:7JL4.m:4JL7."/yanqul|yunqal/Z" >L #$%"6')$6P'2E$(@"D2)@$"1%)J%%,"H(BB%'%,)"2BB(7%*" (," )%'D*" 6B" *E%AA(,P" @$2'2@)%'*L"#$%*%" 2BB(7%*" 2'%" 0*%H" )6" #$%" 6')$6P'2E$(@" D2)@$" 1%)J%%," <'21(@" -%'1" H%'(-2)(6," E2))%',*" 2,H" ,6,SH%'(-2)(-%" E2))%',*L" ?6'" %72DEA%3" )$%" -%'1" /" 0">" _*2=2H2_" Y)6" 1%" $2EE.Z" (*" 2" '66)3" ,6,SH%'(-2)(-%" -%'1L" <" E6**(1A%" H%'(-2)(-%" E2))%'," (*" "/0">2_<*=2H2_Y)6" D2Q%" $2EE.ZL"#$%" (DE%'B%@)" @6,T0P2)(6," B6'" )$%" B('*)" E%'*6," 6B" )$%" B('*)" -%'1" (*" Y"/0">2" _AsEada_Z3" J$(@$" (*" (H%,)(@2A" )6" )$%" @6,T0P2)(6," 6B" )$%" 8'H" E%'*6," *(,P0A2'" (," )$%" E%'B%@)" )%,*%" 6B" )$%" *%@6,H" -%'1" Y"/" 0">2"9"N" _AsEada_ZL" #$%'%"2'%"*6D%"6)$%'").E%*"6B"2D1(P0()(%*8")$2)"2'%"60)" 6B" )$%" *@6E%" 6B" )$%" @0''%,)" *.*)%D" 2*" )$%" *.*)%D" $2*" ,6" H('%@)"Q,6JA%HP%"6B"J$2)")$%"*)0H%,)"D%2,)")6"%7E'%**L"G," *6D%" *.*)%D*3" J$%'%" )$%" *.*)%D" $2*" (,*0BB(@(%,)" Q,6JA%HP%")6"E'6@%%H"J()$3"2"H(2A6P0%"(*"%*)21A(*$%H"J()$" )$%" A%2',%'" (," 6'H%'" )6" P0(H%" )$%" *%A%@)(6," 6B" 2EE'6E'(2)%" %7E'%**(6,3" %LPL" Y[*(%$" %)" 2AL3" >;;>ZL" ?(P0'%" >" E'%*%,)*" $6J" )$%" *.*)%D" H(*2D1(P02)%*" D0A)(EA%" 2,2A.*%*" 2,H" )$%" '%*)"6B")$(*"*%@)(6,"%7EA2(,*"(,"D6'%"H%)2(A*L" " " " 4'(6'()(N%H" " C6,H()(6,*" " " " " " <BB(7" " C6AA%@)(6," " " " 5' " " " 42))%'," " K%A%@)%H"`6'H" C6AA%@)(6," " <,2A.*(*" " W0A)(EA%"`6'H" " <,2A.*%*" " U6"<@)(6," " " " ?(P0'%">]"/(*2D1(P02)(6,"K.*)%D"K)'0@)0'%" " G," @2*%" 6B" )$%" first ambiguity type3" )$%" *.*)%D" *%A%@)*" )$%" J6'H" 2,2A.*(*" 2" *)0H%,)" D6*)" A(Q%A." (,)%,H%HL" G)" (DEA%D%,)*"two E'(6'()(N%H"@6,H()(6,*")6"*%A%@)*")$%"D6*)" E'%B%''%H"J6'H"2,2A.*(*]" !L GB" )$%" X0%*)(6," P62A" (*" )6" )%*)"passive voice )$%," "" " )$%" *.*)%D" *%A%@)*" passive voice 2,2A.*(*d" 6)$%'J(*%3"()"*%A%@)*")$%"active voice 2,2A.*(*3"6'" """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" 8 "=72DEA%" 6B" )$%*%" ).E%*" (*" J$%," )$%" noun $2*" )$%" *2D%" 6')$6P'2E$(@"B6'D"2*"verb" """"""""""""""""""""""""""""""""""""""""""""""""""""""""""" > "#$(*"'0A%"(*"2EEA(%H"1."<'21(@"A2,P02P%")%2@$%'"Y[%(B)3"!ffgZL" 14 >L 2L GB" )$%" X0%*)(6," P62A" (*" )6" )%*)" imperative" tense )$%," )$%" *.*)%D" *%A%@)*" )$%" imperative tense ,2Xc2A0J<"n(A2o"12.6)"T2H(.H"_"YD.SP'2,HB2)$%'" 2,2A.*(*d" 6)$%'J(*%3" ()" *%A%@)*" )$%" perfect or 2,H"D.SP'2,HD6)$%'"D6-%H")6"2",%J"$60*%ZL" imperfect tense"2,2A.*(*L" &.")$(*"J2.3"(,"=72DEA%"!3")$%"*.*)%D"2EEA(%*")$%"B('*)" @6,H()(6,")6"*%A%@)")$%"B('*)"2,2A.*(*"YThird person singular feminine imperfect verb in the active voiceZL" U6)(@%3" $6J%-%'3")$2)")$%"X0%*)(6,"61T%@)(-%"(*")6")%*)"@6,T0P2)(6," 6B"(DE%'B%@)"2@)(-%"-6(@%"-%'1L"" G,"@2*%"6B")$%"second ambiguity type"Y(L%L"6')$6P'2E$(@" D2)@$" 1%)J%%," H(BB%'%,)" 2BB(7%*Z3" )$%" *.*)%D" @6AA%@)*" 2AA" 2BB(7%*"J()$")$%"*2D%"6')$6P'2E$(@"B6'D"10)"J$(@$"H(BB%'*" (," )$%('" D6'E$6S*.,)2@)(@" B%2)0'%*" (," 6,%" %,)'." J()$" 2" P%,%'(@"B%2)0'%"*)'0@)0'%L" ?6'" %72DEA%3" @6,*(H%'" )$%" B6AA6J(,P" A%2',%'" (,E0)d" J$%'%"Y1Z"(*")$%"@6''%@)"2,*J%']" 2L :""PQ"R""S.'G"E"";"!""TC9F"/""SUV" _D0[2Dc2H" )2J2'c2#)" B(." T2'(.D2E" X2)6A_" YW6$2D%H" J2*S(,-6A-%H" (," D0'H%'"@'(D%ZL" 1L /" "./G"!"$["E"*\"+9"]JK"EF/"G"5"A/"G" _" T2Hc(." J2T2Hc2E(." ":"PQ"RS.'G"E;"WC9F"/SUV"_D0[2Dc2H")2SJ2'c2#2"B(." T2'(.D2E" X2)6A_" YW6$2D%H" J2*S(,-6A-%H" (," D0'H%'"@'(D%ZL" #$%"A%2',%'"$%'%"$2*"D2H%"2"*01T%@)S-%'1"H(*2P'%%D%,)" 1%)J%%," )$%" *01T%@)" W6$2D%H/""SUV" 2,H" )$%" -%'1" J2*S (,-6A-%H" ! " "TC9FL" ?60'" E6**(1A%" 2,2A.*%*" 6B" )$%" %''6,%60*" -%'1"2'%"E'6H0@%H]"" ! First person singular perfect verb in the active voice. ! Second person singular masculine perfect verb in the active voice. ! Second person singular feminine perfect verb in the active voice. ! Third person singular feminine perfect verb in the active voice #$%*%" B60'" E6**(1A%" 2,2A.*%*" 2'%" @6D1(,%H" (,)6" )$%" P%,%'(@"2,2A.*(*]"" ! Singular perfect verb in the active voice. G," @2*%" 6B" )$%" third ambiguity type" Y(L%L" 6')$6P'2E$(@" D2)@$"1%)J%%," H(BB%'%,)" E2))%',*Z3")$%" *.*)%D" @6AA%@)*" 2AA" )$%*%"E2))%',*"(,"6,%"%,)'."J()$"2"P%,%'(@"B%2)0'%"*)'0@)0'%L" " ?6'" %72DEA%3" @6,*(H%'" )$%" B6AA6J(,P" X0%*)(6," )$2)" (*" E'%*%,)%H")6")$%"A%2',%']" " Example 2: " Complete the following sentence with the correct conjugation of the given root in perfect tense active voice. 1L /" " ./G"!""$["E""*\"^JP""K+"EF/""G5"A/""G" _T2Hc(." J2T2Hc2E(." p(,6)2X2A<" n(A2o" 12.6)" T2H(.H_" YD.SP'2,HB2)$%'" 2,H"D.SP'2,HD6)$%'"D6-%H")6"2",%J"$60*%ZL" #$%" A%2',%'" $%'%" $2*" D2H%" )J6" %''6'*]" !Z" *01T%@)S-%'1" H(*2P'%%D%,)" 1%)J%%," )$%" *01T%@)" qD.SP'2,HD6)$%'" 2,H" D.SP'2,HB2)$%'"EF/G5"A/Gq"2,H")$%"-%'1"q"+9"]JKq3")$%"*01T%@)" (*" H02A" J$(A%" )$%" -%'1" (*" @6,T0P2)%H" (," )$%" D2*@0A(,%" EA0'2A" B6'D" 2,H3" >Z" (,@6''%@)" 0*%" 6B" )$%" '66)" E2))%'," 6B" 2" E%'B%@)" -%'1" B6'Dd" )$%" @6''%@)" E2))%'," (*" \:""0P;+\" J$(A%" )$%" A%2',%'"0*%H")$%"E2))%',"\":"0;\L"[6J%-%'3")$%"GV#K"E'6H0@%H" )J6"E6**(1A%"2,2A.*%*"2*"*$6J,"(,")$%"B6AA6J(,P]" ! Third person masculine plural perfect verb in the active voice following the pattern ':0;'. ! Third person masculine plural perfect verb in the active voice following the pattern ':<0;'. #$%*%")J6"E6**(1A%"2,2A.*%*"2'%"@6D1(,%H"(,)6"P%,%'(@" B%2)0'%"*)'0@)0'%]"" ! Third person masculine plural perfect verb in the active voice." 4. Experiment `%" @6,H0@)%H" 2," %7E%'(D%,)" )$2)" D%2*0'%*" $6J" *0@@%**B0AA." )$%" E'6E6*%H" D6H%A" *%A%@)*" )$%" D6*)" 2EE'6E'(2)%" 2,2A.*(*" )$2)" (*" 0*%H" A2)%'" 6," )6" H%)%@)" )$%" %72@)" *60'@%" 6B" %''6'" )$%" A%2',%'" $2*" D2H%L" #$%" quantitative" D%2*0'%*" 2'%" 0*%HL" #$%*%" D%2*0'%*" '%A." 6," @6AA%@)(,P" H(BB%'%,)" )%*)" *%)*" J'())%," 1." '%2A" KVV*" (," 2" ).E(@2A" )%2@$(,P_A%2',(,P" %,-('6,D%,)L" G)" J2*" ,%@%**2'." )$2)")$%*%"A%2',%'*"$2-%"H(BB%'%,)"12@QP'60,H*"Y(L%L3"H(BB%'" (," )$%('" B('*)" A2,P02P%Z" )6" )%*)" (B" )$%" *.*)%D" (*" P%,%'2A" %,60P$" 2,H" ,6)" 2(D%H" )6" 2" *E%@(B(@" *6')" 6B" A%2',%'*L"#$%" )%*)" *%)" (*" )$%," B%H" (,)6" )$%" *.*)%D" 2,H" )$%" *6A-%H" 2D1(P060*" @2*%*" 2,H" 0,*6A-%H" 2'%" '%E6')%HL" #$%" '%@2AA" '2)%" (*" @2A@0A2)%HL" #$(*" D%2*0'%" $2*" 1%%," 0*%H" (," %-2A02)(,P" *(D(A2'" '%*%2'@$" Y@BL" `2P,%'" %)" 2AL3" >;;rd" KTs1%'P$"2,H"t,0)**6,">;;:d"?2A)(,">;;8ZL" #$%"216-%D%,)(6,%H"D%)$6H6A6P."(*"2EEA(%H"6,"2"'%2A" )%*)" *%)" )$2)" @6,*(*)*" 6B" !!O" '%2A" <'21(@" *%,)%,@%*L" #$%" ,0D1%'"6B"J6'H*"E%'"*%,)%,@%"-2'(%*"B'6D"8")6"!:"J6'H*3" J()$"2,"2-%'2P%"6B":L!"J6'H*"E%'")%*)"*%,)%,@%L"#$%")6)2A" ,0D1%'"6B"J6'H*"(,"2AA")%*)"*%,)%,@%*"2'%":gr"J6'H*3"!!g" 6B")$%D"$2-%"A%7(@2A"-%'1"%''6'*L"r>"-%'1*"2'%"2D1(P060*" @2*%*L" #$%" *.*)%D" *0@@%**B0AA."*6A-%H" 9O" @2*%*" 6B" )$%D" J$(A%"()"B2(A%H")6"*%A%@)")$%"@6''%@)"2,2A.*(*"B6'">O"@2*%*L" #$%",%7)"*%@)(6,"J(AA"H(*@0**"2AA"B2(A%H"@2*%*L" 4.1 Evaluation Problems Classification ""/./G"!$["E*\"YZSYSXZ"LLLL"EF/G5"A/G "_T2Hc(."J2T2Hc2E(."jL"Y,SXSAZ"n(A2o"12.6)"T2H(.H_"YD." P'2,HB2)$%'"2,H"D."P'2,HD6)$%'"LLLL")6"2",%J"$60*%Z" <**0D%")$%"B6AA6J(,P"A%2',%'"(,E0)d"J$%'%"(,E0)"Y1Z"(*")$%" @6''%@)"2,*J%']" G," )$(*" *%@)(6,3" J%" H(*@0**" 2AA" E'61A%D*" J$(@$" )$%" E'6E6*%H"*.*)%D"B2(A%H")6"*%A%@)")$%"@6''%@)"2,2A.*(*L"#$%" D2T6'" E'61A%D" (*" ()" (*" H(BB(@0A)" )6" H%)%'D(,%" J$2)" )$%" (,)%,H%H"D%2,(,P"6B")$%"A%2',%'"P(-%,")$%"@6DEA%7()."6B" <'21(@"A2,P02P%L"" #$%">O"B2(A%H"@2*%*"2'%"@A2**(B(%H"2*"B6AA6J*]"" ! Orthographic match between un-vocalized formsL" <'21(@" GV#K" $2,HA%*" 0,S-6@2A(N%H" '2)$%'" )$2," -6@2A(N%H" 15 J'())%," <'21(@" )%7)L" #$(*" A%2H*" *6D%)(D%*" )6" D6'%" )$2," T0J1_"YGS%7EA6'%ZL"#$%")6)2A",0D1%'"6B"6@@0''%,@%*"6B")$(*" 6,%"E6**(1A%"D2)@$"1%)J%%,")$%"*2D%"2,H"H(BB%'%,)"J6'H" E'61A%D"(*"r"@2*%*""""""" @2)%P6'(%*L" #$%" )6)2A" ,0D1%'" 6B" 6@@0''%,@%*" 6B" )$(*" @2)%P6'."(*"g"@2*%*L"#$%."2'%"@A2**(B(%H"2*"B6AA6J*]" o Orthographic matches produced for Arabic verbs after relaxing the short vowel to the long one." ?6'" Orthographic/homographs match between verb (,*)2,@%3"@6,*(H%'")$%"%''6,%60*"J6'H""!"#$%L"G)"(*",6)"@A%2'" and noun formsL"#$(*"@2*%"$2EE%,*"J$%,"2,"<'21(@"-%'1" J$%)$%'")$%"A%2',%'"D%2,)")$%"J6'H")6"1%]"!Z""!"#%"_=(bS)0_" $2*" )$%" *2D%" 6')$6P'2E$(@" B6'D" 2*" 2" ,60,L" ?6'" %72DEA%3" YGSA(-%HZ"1."D2Q(,P")$%"*$6')"-6J%A"2"A6,P"6,%"6'3">Z"!#$%" @6,*(H%'" )$%" J6'H" Z5-""LFd" ()" @2," A%2H" )6" )$'%%" E6**(1A%" _=2.c2bS)0_" YGS*0*)2(,%HZ" J()$" 0*(,P" )$%" E2))%'," :""<0;" @6''%@)"J6'H*L"G)"(*",6)"@A%2'"J$%)$%'")$%"A%2',%'"D%2,)")$%" _B2=c2A_L"#$%")6)2A",0D1%'"6B"6@@0''%,@%*"6B")$(*"E'61A%D" J6'H" )6" 1%]" !Z" )$%" ,60," Z " 5-"LF" _)2,<J0A_" YH%2A(,P" J()$_" (*">"@2*%*L"" %2)(,PZ3" >Z" )$%" E%'B%@)" -%'1" Z " 5-"LF" _)2,<J2A2_" Y$%_()SH%2A)" o o Orthographic matches produced after J()$_"2)%Z3"6'"8Z")$%"(DE%'B%@)"-%'1""Z5-"LF"_)0S,<J(A_"Y$2,H" allowing incompatible usage of connected pronouns." ?6'" 6-%'_" H%A(-%'ZL" #$%" )6)2A" ,0D1%'" 6B" 6@@0''%,@%*" 6B" )$(*" (,*)2,@%3"@6,*(H%'")$%"%''6,%60*"J6'H""!"]S%2L"G)"(*",6)"@A%2'" E'61A%D"(*"r"@2*%*L" J$%)$%'" )$%" A%2',%'" D%2,)" )$%" J6'H" )6" 1%]" !Z" )$%" E%'B%@)" The special case of the orthographic match -%'1" ! " "]S%2" _e2=6D2AS)0_" YGS%DEA6.%HZ" 6'3" >Z" )$%" E%'B%@)" between the Arabic third person singular perfect verb -%'1" !""]S%" _=2D(A)0_" YGSJ6'Q%HZ" 1." 0*(,P" (,@6DE2)(1A%" following the pattern ":"0;2 />afoEal/ and the first person E'6,60,*" 23" =" Y<A%B3" #%$ZL" #$%" )6)2A" ,0D1%'" 6B" singular imperfect verb as the word IQ52L"G)"@2,"A%2H")6")J6" 6@@0''%,@%*"6B")$(*"E'61A%D"(*"6,%"@2*%L"""" U6)(@%3"$6J%-%'3")$2)"J%"2*Q%H"$0D2,"A(,P0(*)*"2160)" B2(A%H"@2*%*"2,H"$%"$2*"(H%,)(B(%H"D6*)"6B")$%*%*"@2*%*"2*" 2D1(P060*L"" o E6**(1A%"(,)%'E'%)2)(6,*L"G)"(*",6)"@A%2'"J$%)$%'")$%"A%2',%'" D%2,)")$%"J6'H")6"1%]"!Z")$%"E%'B%@)"-%'1""I"Q52"_e2J6X2=2_" Y$%_()S(,BA(@)%HZ3"6'">Z"(DE%'B%@)"-%'1""I"Q52"_e0SJ2Xc(=_"YGS *(P,ZL""#$%")6)2A",0D1%'"6B"6@@0''%,@%*"6B")$(*"E'61A%D"(*" 6,%"@2*%L"" "" ! Additional- orthographic matches as a result of relaxing a constraint."<EEA.(,P")$%"@6,*)'2(,)*"'%A272)(6," )%@$,(X0%"(,"6'H%'")6"1%"21A%")6"2,2A.N%"%''6,%60*"A%2',%'" 2,*J%'*" *6D%)(D%*" (,)'6H0@%*" %7)'2" 6')$6P'2E$(@" D2)@$%*L"#$%")6)2A",0D1%'"6B"6@@0''%,@%*"6B")$(*"@2)%P6'." (*"!g"@2*%*L"#$%."2'%"@A2**(B(%H"2*"B6AA6J*]" o Orthographic matches produced for Arabic verbs after relaxing the long vowel to the short one." ?6'" (,*)2,@%3"@6,*(H%'")$%"%''6,%60*"J6'H""'"_NL"G)"(*",6)"@A%2'" J$%)$%'")$%"A%2',%'"D%2,)")$%"J6'H")6"1%]"!Z""'G-"N"_$<T2'2_" Y$%_*$%_()S%D(P'2)%HZ" 1." D2Q(,P" )$%" A6,P" -6J%A" 2" *$6')" 6,%3">Z""'"<_N"_$2Tc2'2_"Y$%_()SH%E6')%HZ"1."0*(,P")$%"E2))%'," :<0;"_B2=c2A_3"8Z'_N""_$2T2'2_"Y$%_()SA%B)Z"1."0*(,P")$%"E2))%'," ":"0;"_B2=2A_3"6'"9Z""'"_N"_$2T6'_"Y212,H6,(,PZ"1."0*(,P",60,*" (,*)%2H"6B"-%'1*L"#$%")6)2A",0D1%'"6B"6@@0''%,@%*"6B")$(*" E'61A%D"(*"g"@2*%*"""""" o Orthographic matches" produced after 5. Conclusion #$%"2D1(P0()."E'61A%D"(*"2"*)2,H2'H"E'61A%D"(,"2,."UV4" 2EEA(@2)(6,L"G)"(*")$%"D2T6'"'%2*6,"J$."@6DE0)%'*"H6",6)" .%)"0,H%'*)2,H",2)0'2A"A2,P02P%L"[6J%-%'3")$%"2D1(P0()." E'61A%D" E'%*%,)*" 2" @$2AA%,P%" )6" GV#KL" #$2)" (*" 1%@20*%" *%A%@)(,P")$%"J'6,P"2,2A.*(*"6B"*)0H%,)"(,E0)"@2,"A%2H")6" D(*A%2H(,P" B%%H12@Q" 6'" 2," %''6'" D(P$)" 1%" 6-%'A66Q%HL" &%*(H%")$2)"P(-%,")$%"@6DEA%7()."6B"<'21(@"A2,P02P%3")$(*" D2Q%*")$%"2D1(P0()."2"*%'(60*"E'61A%D"2,H",%%H*")6"1%" '%*6A-%HL" #$%" E'%B%''%H" D%)$6H" (," GV#K" B6'" H(*2D1(P02)(,P" D0A)(EA%" '%2H(,P*" 6B" 2" J'6,P" 2,*J%'" *$60AH" @6,*(H%'" )$%" A(Q%A($66H" 6B" 2," %''6'" 2,H" )$%" H(BB(@0A)." 6B" @6,@%E)*L" &0)" J()$" )$%" A2@Q" 6B" %''6,%60*" @6'E0*3" J%" H%E%,H" 6," *6D%" A(,P0(*)(@" *)0H(%*" )$2)" (,-%*)(P2)%" )$%" A(Q%A($66H" 6B" %''6'*L" [6J%-%'3" )$%" 2D1(P0()."E'61A%D"@2,,6)"1%"'%*6A-%H")6)2AA."2,H")$%'%"(*" 2",%%H")6"(**0%"2"H(2A6P0%"J()$")$%"A%2',%'")6"Q,6J"J$2)" %72@)A."$%"D%2,*L"W6'%6-%'3"(B"2"A2'P%")2PP%H"%''6,%60*" @6'E0*"%7(*)")$%,")$%"2D1(P0()."E'61A%D"@2,"1%"'%*6A-%H" 1."@6,*(H%'(,P")$%"A(Q%A($66H"6B"%''6'*"""" 6. References <$D%H3" WL" <L" >;;;L" " <" V2'P%SK@2A%" C6DE0)2)(6,2A" 4'6@%**6'"6B")$%"<'21(@"W6'E$6A6P.3"2,H"<EEA(@2)(6,*L" W2*)%'")$%*(*3"C2('6"+,(-%'*().3"=P.E)L" allowing incorrect conjugation of a verb." ?6'" (,*)2,@%3" <))(23" WL" <L" >;;OL" <," <D1(P0().SC6,)'6AA%H" @6,*(H%'")$%"%''6,%60*"J6'H""@9"G2L"G)"(*",6)"@A%2'"J$%)$%'" W6'E$6A6P(@2A"<,2A.N%'"B6'"W6H%',"K)2,H2'H"<'21(@" )$%" A%2',%'" D%2,)" )$%" J6'H" )6" 1%]" !Z" )$%" (DE%'B%@)" -%'1" W6H%A(,P"?(,()%"K)2)%"U%)J6'Q*L"G,"4'6@%%H(,P*"6B")$%" "M"$G2"_e0ST(.1_"YGS2,*J%'Z3">Z"6'"(DE%'B%@)"-%'1""@9"G2"_e2S C$2AA%,P%" 6B" <'21(@" B6'" UV4_W#" C6,B%'%,@%3" >;;OL" #$%"&'()(*$"C6DE0)%'"K6@(%).3"V6,H6,L" 16 &0@QJ2A)%'3" #L" >;;>L" &0@QJ2A)%'" <'21(@" W6'E$6A6P(@2A" <,2A.N%'" u%'*(6," !L;L" V(,P0(*)(@" /2)2" C6,*6')(0D3" +,(-%'*()." 6B" 4%,,*.A-2,(23" V/C" C2)2A6P" U6L]" V/C>;;>V9f3"GK&U"!S:g:O8S>:rS;L" =ASK2H2,.3" #L" <L" 2,H" [2*$(*$3" WL" <L" !fgfL" <," <'21(@" W6'E$6A6P(@2A"K.*)%DL"G,"G&W"K.*)%D*"v60',2A3">gY9Z]" O;;S"O!>L" ?2A)(,3" <L" uL" >;;8L" K.,)2@)(@" =''6'" /(2P,6*(*" (," )$%" C6,)%7)" 6B" C6DE0)%'" <**(*)%H" V2,P02P%" V%2',(,PL" 4$/")$%*(*3"+,(-%'*()."6B"M%,%-23"KJ()N%'A2,HL" [212*$3" UL" >;;9L" V2'P%" K@2A%" V%7%D%" &2*%H" <'21(@" W6'E$6A6P(@2A" M%,%'2)(6,L" G," 4'6@%%H(,P*" 6B" #'2()%D%,)" <0)6D2)(X0%" H0" V2,P2P%" U2)0'%A" Y#<VUS >;;9ZL"?%N3"W6'6@@6L" [%(B)3" #L" !ffgL" /%*(P,%H" G,)%AA(P%,@%]" <" V2,P02P%" #%2@$%'"W6H%AL"4$L/L"#$%*(*3"K(D6,"?'2*%'"+,(-%'*().3" C2,2H2L" [*(%$3"CLSCL3"#*2(3"#LS[L3"`(1A%3"/L"2,H"[*03"`LSVL">;;>L" =7EA6()(,P" t,6JA%HP%" ^%E'%*%,)2)(6," (," 2," G,)%AA(P%,)" #0)6'(,P" K.*)%D" B6'" =,PA(*$" V%7(@2A" =''6'*L" G," 4'6@%%H(,P*" 6B" )$%" G,)%',2)(6,2A" C6,B%'%,@%" 6," C6DE0)%'*" (," =H0@2)(6," GCC=" >;;>3" <0@QA2,H3" U%J" I%2A2,H3"EE]"!!:S!!OL" V\$2('%3"KL"2,H"?2A)(,3"<L"uL">;;8L"=''6'"/(2P,6*(*"(,")$%" ?'%%#%7)"4'6T%@)L"G,"C2A(@6"v60',2A3">;"Y8Z]"9g!S9f:L" U%'16,,%3" vL" >;;8L" U2)0'2A" V2,P02P%" 4'6@%**(,P" (," C6DE0)%'S<**(*)%H" V2,P02P%" V%2',(,PL" G," ^0*A2," W()Q6-3" %H()6'*3" )$%" 57B6'H" [2,H166Q" 6B" C6DE0)2)(6,2A"V(,P0(*)(@*L"57B6'H3"EE]"Or;SOfgL" K$22A2,3" tL3" W2PH.3" WL3" 2,H" ?2$D.3" <L" >;!;L" W6'E$6A6P(@2A"<,2A.*(*" 6B" GAASB6'D%H"<'21(@" u%'1*" (," G,)%AA(P%,)" V2,P02P%" #0)6'(,P" ?'2D%J6'QL" G," 4'6@%%H(,P*" 6B" ?V<G^KS>83" /2.)6,2" &%2@$3" ?A6'(H23" +K<L"#6"2EE%2'L" KTw61%'P$3" vL3" 2,H" t,0)**6,3" 5L" >;;:L" ?2Q(,P" =''6'*" )6" <-6(H" W2Q(,P" =''6'*]" W2@$(,%" V%2',(,P" B6'" =''6'" /%)%@)(6,"(,"`'()(,PL"G,"4'6@%%H(,P*"6B"^<UV4">;;:3" &6'6-%)*3"&0AP2'(23"EE]":;OS:!>L" `2P,%'3" vL3" ?6*)%'3" vL3" 2,H" M%,21()$3" vL" uL" >;;rL" <" C6DE2'2)(-%" =-2A02)(6," 6B" /%%E" 2,H" K$2AA6J" <EE'62@$%*" )6" )$%" <0)6D2)(@" /%)%@)(6," 6B" C6DD6," M'2DD2)(@2A"=''6'*L"G,"4'6@%%H(,P*"6B"=WUV4SC6UVV" >;;r3"4'2P0%3"CN%@Q"^%E01A(@3"EE]"!!>S!>!L" "" " " " " " 17 !"#$%"$&'(&)*%+,&)'"#-'./)%"0'1*22%#/,"3/*#'/#'"'4&"561&+&-' 7%03/2*-"0'86!&"+#/#$'8#9/+*#2:';))%&)'3*'<&'=--+&))&-'' 80&#"'=#3/#*+*'>/??%3*@A'10"%-/"'BC'D/"#,E/#/'FA'@A''4"#/&0&'1"G%"#*'H'I"<+/&0&'I/"#5+&-"'JA'@'A' >"*0*'(*))/#/@'' !" #$%&%'%(")&"*+&,-.,","/,+-(0(1&,"),002"3(1-&.&(-,4"3(-$&10&("52.&(-20,"),00,"6&+,7+8,4"9&2"5(:,-%2-24";<"=">>!<!4"" 6(:2?"@"A-&B,7$&%C"D27&$"E4"6',"),"02"F&G,7%C4"@"H"IJ;@<"*%?K,-&$"3LKLM"N"K#O#F#FL4"A-&B,7$&%P")&"D,7'1&24"D&2..2" Q(702++8&4"!"="><!@J4"D,7'1&24"#%20R?"J"A-&B,7$&%P")&"6(:2"S*2T&,-.2U4"K&T27%&:,-%(")&"#-V(7:2%&+24"D&+%(7&20" 3(:T'%&-1"F2G(72%(7R4"9&2"*2027&2"!!J4">>!IE"6(:2?"W"K&T27%&:,-%(")&"*+&,-.,"),00XL)'+2.&(-,","),002"O(7:2.&(-,4" A-&B,7$&%P")&"Q2+,72%24"D?0,"Y,7%,00&4"!4"3?)2"9200,G(-24"<@!>>"Q2+,72%2"="#%20R4"""" " LH:2&0Z",0,-2?T&..'%([&$%+?+-7?&%4"+8&2)'!W[%&$+20&?&%4",0?,&-2)[1:2&0?+(:4"8R'12[,:2&0?&%4"T2(0(?7($$&-&[&$%+?+-7?&%" =<)3+",3'' /8&$"T2T,7" ,\2:&-,$" $(:,"(V"%8," :2](7"T7(G0,:$" 0&-^,)" %("%8," %2$^"(V"),$&1-&-1"2TT7(T7&2%," :'0%&0&-1'20",H0,27-&-1" ,-B&7(-:,-%$" V(7" ),2V" 0,27-,7$" _KF`?" K'," %(" %8,&7" 8,27&-1" )&$2G&0&%R" :($%" KF" ,\T,7&,-+," )72:2%&+" )&VV&+'0%&,$" &-" 2+a'&7&-1" 2TT7(T7&2%," 0&%,72+R" $^&00$?" LH0,27-&-1" %((0$" +('0)" &-" T7&-+&T0," G," B,7R" '$,V'0" V(7" V2+&0&%2%&-1" 2++,$$" %(" b,GHG2$,)" ^-(b0,)1," 2-)" T7(:(%&-1" 0&%,72+R" ),B,0(T:,-%" &-" KF?" c(b,B,74" ),$&1-&-1" 2TT7(T7&2%," ,H0,27-&-1" ,-B&7(-:,-%$"V(7"KF" &$"2" +(:T0,\" %2$^" ,$T,+&200R"G,+2'$," (V"%8,")&VV,7,-%" 0&-1'&$%&+"G2+^17('-)"2-)",\T,7&,-+,"KF" :2R"82B,4"2-)"(V"%8,":'0%&:()20"02-1'21,"7,$('7+,$"%82%"-,,)"%("G,"T7(B&),)"2-)"&-%,172%,)"_,?1?"02-1'21,"T7()'+,)"&-" %8,"B&$'20H1,$%'720"(7"$&1-,)":()20&%R4"&-"b7&%%,-"%,\%$4"+0($,)"+2T%&(-&-1"V(7"B(+20"02-1'21,"&-V(7:2%&(-`?"/8,""T'7T($," (V"%8&$"T2T,7"&$"%b(V(0)Z"_!`"),$+7&G," 2-)")&$+'$$" &$$',$"b,"G,0&,B,"-,,)" %("G," 2))7,$$,)4"V(+'$&-1"(-"%8," 0&:&%2%&(-$" %82%" 2TT,27" %(" +8272+%,7&.," $,B,720" ,H0,27-&-1" T02%V(7:$" %82%" 82B," G,,-" T7(T($,)" V(7" KFd" _@`" T7,$,-%" 2-)" )&$+'$$" (-1(&-1"7,$,27+8"2&:,)"2%"(B,7+(:&-1"%8,$,"0&:&%2%&(-$?" V2+,H%(HV2+," 02-1'21," (V" %8," #%20&2-" ),2V" +(::'-&%R" _F#*HF!`d"_@`"%8($,"b8("T7,V,7"%("'$,"$T(^,-"2-)"b7&%%,-" #%20&2-" _#%20&2-HF!`?" #%" &$" &:T(7%2-%" %(" $%7,$$" %82%4" (-" %8," b8(0,4" !"#$%&'"()*%"+%,-" ,\T,7&,-+," $,B,7," )&VV&+'0%&,$" &-" 2+8&,B&-1" 2TT7(T7&2%," 0&%,72+R" 0,B,0$" =" %8('18" (V" +('7$," h,\+,T%&(-20" 0,27-,7$X" b8(" (B,7+(:," %8,$," )&VV&+'0%&,$"+2-"G,"V('-)"b&%8&-",2+8"17('T?"" " i&%8" 7,$T,+%" %(" $&1-,7$4" %8," V(00(b&-1" :'$%" G," -(%,)?"*&-+,"%8,":(),7-"$%')R"(V"$&1-,)"02-1'21,$"_*F`" G,12-" b&%8" *%(^(,X$" _!I<>`" T&(-,,7&-1" b(7^" (-" g:,7&+2-"*&1-"02-1'21,"_g*F`4"b(70)Hb&),"7,$,27+8"82$" 0,)" %(" ),$+7&G,4" 2-)" %(" 7,+(1-&.," 2$" V'00HV0,)1,)" 8':2-" -2%'720" 02-1'21,$4" 2" B,7R" 0271," -':G,7" (V" -2%&(-20" *F4" &-+0')&-1" F#*" 2-)" 200" %8," (%8,7" :2](7" L'7(T,2-" $&1-,)" 02-1'21,$?"/8,"'$,"(V"*F" V(7" &-$%7'+%&(-20"T'7T($,$"82$" G,,-" ,\T0&+&%0R" 7,+(::,-),)" GR" %8," L'7(T,2-" D270&2:,-%" _$,," 6,$(0'%&(-" !jH<H!IEE4" 27%?" K`?"" " Y&0&-1'20",)'+2%&(-"T7(172:$"%82%"(VV,7"$&1-,)"2-)" (720Nb7&%%,-" 02-1'21," &-$%7'+%&(-" %(" ),2V" $%'),-%$" 82B," G,,-" ),B,0(T,)" &-" $,B,720" +('-%7&,$4" &-+0')&-1" #%20R" b8,7,"%8,R"82B,"G,,-"2TT0&,)"V(7"%8,":($%"%("L0,:,-%27R" $+8((0"+8&0)7,-?"g$"7,T(7%,)"GR"32$,00&"e"20"_@>><`4"&%"&$" '-a',$%&(-2G0,"%82%"%8,"'$,"(V"2"*F4",B,-"&V"0&:&%,)"%("&%$" '$'204"V2+,H%(HV2+,H"V(7:4"+2-"T02R"2"B,7R"&:T(7%2-%"7(0," &-" V($%,7&-1" KFX$" 1,-,720" 0&-1'&$%&+" +(:T,%,-+,?" " /8," &-+0'$&(-" (V" *F" b&%8&-" ,H0,27-&-1" T02%V(7:$" ),$&1-,)" V(7" KF" 82$" +(:," 2$" 2" -2%'720" ),B,0(T:,-%" (V" %8,"2)B2-+,:,-%"%82%"82B,"G,,-":2),"&-"('7"^-(b0,)1," (V"*F"2-)"(V"),2V"$&1-,7$?""c(b,B,74"2$"b,"T(&-%"('%"&-" $,+%&(-$"@"%("W"G,0(b4":2-R"7,+,-%"2-)"+'77,-%"2%%,:T%$" %(" ),B,0(T" 2TT7(T7&2%," ,H0,27-&-1" ,-B&7(-:,-%$" V(7" KF" @C! ;#3+*-%,3/*#' #%"&$"b&),0R"^-(b-"%82%" 200"(B,7"%8,"b(70)"),2V"+8&0)7,-" 2-)4" 02%,74" 2)'0%$4" ,\T,7&,-+," )72:2%&+" )&VV&+'0%&,$" &-" 2+8&,B&-1"2TT7(T7&2%,"7,+,T%&B,"2-)",\T7,$$&B,"$^&00$"-(%" (-0R" &-" (720" (7" B(+20" 02-1'21," _9F`" G'%" 20$(" &-" b7&%%,-" 02-1'21,?" /8," B2$%" :2](7&%R" (V" ),2V" 0,27-,7$" _KF`" 2+8&,B," 0&%,72+R" 0,B,0$" %82%" 27," :27^,)0R" G,0(b" %8($," T7(T,7"(V"%8,&7"8,27&-1"T,,7$"_$,,"2:(-1"(%8,7$" 32$,00&4" Q2721-2" e" 9(0%,7724" @>><d" f27+&2" e" K,7R+^,4" @>!>d" f27+&2"e"D,7&-&4"@>!>`?"g$"2"7,$'0%4"&-"%8,&7"$+8((0"R,27$" %87('18" 2)'0%8(()4" KF" ,\T,7&,-+," ,a'200R" )72:2%&+" )&VV&+'0%&,$"&-"2++,$$&-1"%8,"B2$%"G()R"(V"^-(b0,)1,4"2-)" %8," 7&+8" 0,27-&-1" ,-B&7(-:,-%$" :2)," 2B2&02G0," GR" 2)B2-+,)" :'0%&:,)&2" %,+8-(0(1&,$4" :($%" -(%2G0R" ,H0,27-&-1" ,-B&7(-:,-%$?" gTT7(T7&2%," b7&%%,-" 02-1'21," $^&00$" 27," &-" V2+%" '-a',$%&(-2G0R" 2" T7,H7,a'&$&%," V(7" ,\T0(&%&-1"%8,"T($$&G&0&%&,$"27&$&-1"V7(:"$'+8":'0%&:,)&2" 2-)":'0%&:()20"0,27-&-1",-B&7(-:,-%$?"" " #-"#%20R"2$"200"(B,7"%8,"b(70)!"%8,"$&%'2%&(-"(V"KF"&$" ,$T,+&200R" +(:T0,\" )'," %(" %8," B,7R" )&VV,7,-%" 02-1'21," G2+^17('-)" 2-)" ,\T,7&,-+," ),2V" T,7$(-$" :2R" 82B," ),T,-)&-1"'T(-"%8,"02-1'21,"%8,R"'$,"2$"%8,&7"T7&:27R"(7" T7,V,77,)" :,2-$" (V" +(::'-&+2%&(-4" (7" F!?" #%" &$" &-" V2+%" -,+,$$27R" %(" )&$%&-1'&$8" %b(" 17('T$Z" _!`" %8($," b8(" '$," #%20&2-" *&1-" 02-1'21," _F#*`4" %8," B&$'20H1,$%'7204" """"""""""""""""""""""""""""""""""""""""""""""""""""""""""""" ! "O(7"7,2$(-$"0&-^,)"%("%8,"),:(172T8R"(V"),2V-,$$" 2-)"%("%8," +(:T0,\" $(+&(0&-1'&$%&+" 2-)" +'0%'720" T7(T,7%&,$" (V" $&1-,)" 02-1'21,$" %8,"(G$,7B2%&(-$"b,":2^,"8,7,"b&%8" 7,$T,+%"%("#%20R" +2-" G," ,2$&0R" ,\%,-),)" 2+7($$" -2%&(-$" 2-)" +'0%'7,$4" b&%8" %8," -,+,$$27R" +82-1,$" +(-+,7-&-1" %8," -2%&(-20" $&1-,)" 2-)" B(+20Nb7&%%,-"02-1'21,$?"" 18 &!"&3( &-(3$6*7(L*!(!")/01!( $3( CG$6*30!)DE( (I3!!( )13-( &#!( 0.-4!2&((C@$2&)(3$6*EJ(9(( !"#$%$&'( )*+,-.( $/01$2)&!( 3-/!( /)4-.( 2-*2!0&5)1'( /!&#-+-1-6$2)1()*+(0.)2&$2)1(1$/$&)&$-*37(( ( 8*( 3!2&$-*( 9( :!( 0.!3!*&( )*+( +$32533( -*6-$*6( .!3!).2#()$/!+()&(-;!.2-/$*6(&#!3!(1$/$&)&$-*37( 8"! #7'0$%%9)10+31$)'+):'1)/3*9031$)+,' %+3&*1+,/;'<=+3'%$:&,/'$5'#7'3$'+:$-3>' !"! #$%&'(&)&*+,'-*$.,&%/'0$)0&*)1)(' &21/31)('&4,&+*)1)('-,+35$*%/'5$*'67'' 8..!30!2&$;!(-=(:#!&#!.(.!)1(3$6*!.3(-.(3$6*$*6();)&).3().!( 53!+'( -*!( )++$&$-*)1( 1$/$&)&$-*( -=( /)*?( 25..!*&( !==-.&3( &-:).+3( $*&!6.)&$*6( GA( /)&!.$)13( $*&-( !>1!).*$*6( 01)&=-./3( 2-*2!.*3( )( =)$15.!( &-( .!2-6*$V!( $/0-.&)*&( +$==!.!*2!3(%!&:!!*(GA()*+(;-2)1,:.$&&!*(1)*65)6!3'()*+( &#!( 0.-%1!/3( 0-3!+( %?( &#!( +.)/)&$2)11?( $*35==$2$!*&( .!=!.!*2!( &--13'( )*+( -;!.)11( 1$*65$3&$2( +!32.$0&$-*3'( &#)&( ).!(25..!*&1?();)$1)%1!(=-.(GA7(( 8&( /53&( =$.3&( %!( .!2)11!+( &#)&( 5))$?/$52"$ )5'(35("!$ 06%+&3%$ 5$ 026%%"'$ %25,6%6&'7( Y-.!( $/0-.&)*&1?( =.-/( )( .!3!).2#( 3&)*+0-$*&'( )*+( !;!*( &#-56#( )1/-3&( 9R( ?!).3( #);!( 0)33!+( 3$*2!( &#!( /-+!.*( 3&5+?( -=( GA( #)3( %!65*'( .!3!).2#!.3( 3&$11( #);!( *-&( =-5*+( )*( )6.!!/!*&( -*F( I)J( :#)&().!(&#!(2-*3&$&5!*&(!1!/!*&3(-=(GAH(I%J(:#)&(6.)0#$2( 3?3&!/3(2)*(%!(53!+(=-.(.!0.!3!*&$*6(GA($*(:.$&&!*(=-./( )*+'( -*( &#$3( %)3$3'( +!;!1-0( )00.-0.$)&!( .!=!.!*2!( &--13( I!767( +$2&$-*).$!3'( 6.)//).3'( 53)6!>%)3!+( 2-.0-.)( !&2J( &#)&( ).!( 5*[5!3&$-*)%1?( *!2!33).?( =-.( %-&#( &#!( 2-//5*$&$!3( -=( 3$6*!.3'( )*+( &#!( !"01-$&)&$-*( -=( GA( =-.( !+52)&$-*)1( )*+( $*3&.52&$-*)1( 05.0-3!3( I3!!( W5")2( P( B*&$*-.-( `$VV5&-'( KR\RH( ^).2$)'( KRRaH( KR\RH( ^).2$)( P( @!.?2D!'(KR\RJ7(( 8&( $3( *-&( &.$;$)1( &-( 3&.!33( &#)&'( )1&#-56#( -5.( D*-:1!+6!(-=(GA(#)3(2-*3$+!.)%1?()+;)*2!+'(:!(3&$11(+-( *-&( #);!( )*?( /-*-1$*65)1( +$2&$-*).?( -.( 6.)//).'( =-.( )*?( -=( &#!( GA( &#)&( #)3( %!!*( &-( +)&!( $*;!3&$6)&!+( ( >( *-&( !;!*(=-.(BGA7(( ( 8*(&#$3(2-*&!"&'(-*!(2-51+(!"0!2&(&#)&(:!11>6.-5*+!+( 0.-0-3)13( )$/!+( )&( !"01-$&$*6( GA( =-.( $*3&.52&$-*)1( 05.0-3!3( :-51+( +!+$2)&!( 0).&$251).( 2).!( $*( /)D$*6( !"01$2$&( &#!( /-+!13( -=( GA( !1!/!*&3( )*+( +$32-5.3!( &#!?( )+-0&7( M#$3( )00!).3( !30!2$)11?( *!2!33).?( %!2)53!'( )3( .!2)11!+(#!.!)=&!.'(&#!.!().!()&(0.!3!*&(&:-(/)4-.(21)33!3( -=(/-+!13(=-.(+!32.$%$*6(GA7(8*()6.!!/!*&(:$&#(W5")2(P( G)11)*+.!( IKRRTJ( :!( :$11( .!=!.( &-( &#!3!( /-+!13( )3( C)33$/$1$)&$-*$3&E(;37(C*-*()33$/$1$)&$-*$3&EF(&#!(=$.3&(&?0!( -=( /-+!13( #$6#1$6#&( 0.$/).$1?( &#!( 3&.52&5.)1( 3$/$1).$&$!3( %!&:!!*(GA( )*+(ZA'(:#$1!( &#!( 3!2-*+(-*!3((5*+!.32-.!( &#)&'( $*( )++$&$-*( &-( $/0-.&)*&( 3$/$1).$&$!3( &#!.!( ).!( ![5)11?( .!1!;)*&( +$==!.!*2!3( %!&:!!*( GA( )*+( ZA7 ( b$&#$*( &#!( 1$/$&3( -=( &#!( 0.!3!*&( 2-*&!"&'( :!( $1153&.)&!( 3-/!( -=( &#!( 2.52$)1( +$==!.!*2!3( %!&:!!*( &#!3!( &:-( &?0!3( -=( /-+!13( $*( .!1)&$-*( &-( &#!( 0.-%1!/( -=( +!=$*$*6(:#)&().!(&#!(2-*3&$&5!*&(!1!/!*&3(-=(GA7(( 8*(35%3&)*&$)1()6.!!/!*&(:$&#(!).1?'(;!.?($*=15!*&$)1( +!32.$0&$-*3( -=( BGA( 0.-;$+!+( %?( G&-D-!( I\caRJ( )*+( 35%3![5!*&1?( Q1$/)( P( d!1156$( I\cTcJ'( )33$/$1)&$-*$3&( /-+!13( )335/!(&#)&(GA(2-*3&$&5!*&3(5*$&3().!("!!"'%65))E$ *׵A)"$%&$4/$0&2,!'( )*+( ).!( 126#526)E$!"F3"'%65))E$ &2(5'6G",$ 6'$ %6#"7( M#!3!( /-+!13( ).!( 3&$11( 1).6!1?( 0.!;)$1$*6( $*( 25..!*&( .!3!).2#( -*( GA( )*+( #);!( %!!*( =-.( &#!( /-3&( )2.$&$2)11?( )+-0&!+( $*( !+52)&$-*)1( )001$2)&$-*3( <-.(&#!(05.0-3!3(-=(&#$3(0)0!.'(:!(1$/$&(-5.()&&!*&$-*3(&-( !>1!).*$*6(01)&=-./3(+!3$6*!+(=-.(?-5*6(-.()+51&(@A7(B*( -;!.;$!:(-=(3!;!.)1(352#(01)&=-./3(.!;!)13(&#!(=-11-:$*6( /)4-.(1$/$&)&$-*37(<$.3&'(&#!(65$+!1$*!3(=-.(+!;!1-0$*6(&#!( +!3$.!+(01)&=-./3().!(-=&!*( 453&(C3D!&2#!+E'()*+(0.-;$+!( =)$.1?(6!*!.)1(3566!3&$-*3(2-*2!.*$*6'(=-.(!")/01!F(>(&#!( $*2153$-*( -=( GA( ;$+!-3( :$&#( GA( &.)*31)&$-*3( -.( !"01)*)&$-*3( -=( &#!( :.$&&!*( &!"&3( =-5*+( $*( )( 30!2$=$2( !>1!).*$*6(01)&=-./H(>(&#!(+!;!1-0/!*&(-=()5&-/)&$2(&--13( I$7!7();)&).3J(=-.(&.)*31)&$*6(:.$&&!*(&!"&3($*&-(GAH(>&#!(53!( -=(2--0!.)&$-*(&--13(352#()3(;$+!-(2-*=!.!*2$*6K7(G!2-*+'( /)*?( !"$3&$*6( -.( 01)**!+( 01)&=-./3( )00!).( &-( %!( +!3$6*!+( 0.$/).$1?( =-.( @A( :#-( D*-:( GA'( %5&( !""#$ %&$ '"()"*%$%+"$'"",!$&-$./$0+&$12"-"2$%&$3!"$4/7(( L*( &#!( :#-1!'( &#!.!( )00!).3( 5( &-( %!( )( 6!*!.)1( &.!*+( &-:).+3( 2.!)&$*6( )*+( $*215+$*6( GA( /)&!.$)13( =-.( $/01!/!*&$*6( :.$&&!*( &!"&>%)3!+( !*;$.-*/!*&37( M#!( 2-*&!*&3( !*2-+!+( $*( :.$&&!*( 1)*65)6!( ).!( /)+!( /-.!( )22!33$%1!( ( &-( I3$6*$*6J( @A( ;$)( GA( &.)*31)&$-*3( )*+( !"01)*)&$-*37( L&#!.( !")/01!3( ).!( &#!( 01)&=-./( 2.!)&!+( :$&#$*( &#!( 0.-4!2&( @NBA O (=-.( &!)2#$*6( =-.!$6*( ;-2)1>:.$&&!*( 1)*65)6!3( &-( @A'( -.( &#!( -*!( +!3$6*!+( %?( @.$6)3(P(Q-5.!/!*-3(IKRR9J(=-.(;-2)&$-*)1()*+(6!*!.)1( !+52)&$-*)1(&.)$*$*67(( B( =)$.1?( 1).6!( %-+?( -=( :-.D( #)3( %!!*( +!+$2)&!+( &-( &#!( +!;!1-0/!*&( -=( 3$6*$*6( );)&).3( &-( %!( )++!+( &-( &#!( 53!.3S($*&!.=)2!'(.!01)2$*6(GA(/)&!.$)13(0.!3!*&!+(%?(.!)1( 3$6*!.3( I3!!( =-.( !")/01!( N=&#$/$-5( P( <-&$*!)'( KRRTH( KRRUH( Q).-5V$3'( W).$+)D$3'( <-&$*!)( P( N=&#$/$-5'( KRRT'( -.()13-(&#!(.!2!*&(8&)1$)*(0.-4!2&(CBMABGE(XJ7(( ( Y)*?( 0.-4!2&3( =-.( .!)1$V$*6( 3$6*$*6( );)&).3( !"#$%$&( #-:!;!.'($*(-5.(;$!:'()(.)&#!.(35.0.$3$*6(1$/$&)&$-*F(&#!?( )00!).(&-(#);!$5$3'6,62"*%6&'5)7$4/8*"'%"2",$1"2!1"*%69"7( M#!?(3&).&'(=-.(&#!(/-3&'(=.-/(ZA(:.$&&!*( &!"&3( )*+()$/( )&(0.-+52$*6();)&).3(&#)&(2)*(&.)*31)&!(352#(:.$&&!*(&!"&3( $*&-($*+$;$+5)1(3$6*3()*+(3$6*!+(3![5!*2!37(M#!3!(0.-4!2&( 5( $6*-.!( -.( 5*+!.!3&$/)&!( &#!( 0.-%1!/( -=(%25'!)5%6'($ -2&#$ !6('$ %&$ 9&*5):026%%"'$ %";%!7( M#!.!( ).!( -*1?( =!:( 0.-4!2&3( &#)&( !"01$2$&1?( )$/( )&( .!)1$V$*6( 3$6*$*6( );)&).3( =5*2&$-*$*6( $*( %-&#( +$.!2&$-*3'( $7!7( =.-/( 3$6*( &-( 30!!2#( )*+,-.( )13-( :.$&&!*( &!"&3'( )*+( =.-/( 30!!2#( )*+( :.$&&!*( ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( K (G!!( =-.( !"7F( <',696,35)!$ 0+&$ 52"$ ."5-$ &2$ =52,$ &-$ ="526'('( W!*&!.( =-.( B33$3&$;!( M!2#*-1-6?( )*+( N*;$.-*/!*&)1( B22!33( IWBMNBJ'( #&&0F,,:::7)22!33!1!).*$*67*!&,/-+\,\]RK70#0H( <>?$ (36,")6'"!$ -&2$ ."9")&16'($ @**"!!6A6)"$ /"52'6'($ @11)6*5%6&'!'( 8YG( ^1-%)1( A!).*$*6( W-*3-.&$5/'(( #&&0F,,:::7$/361-%)17-.6,)22!33$%$1$&?,)22!33$%1!;!.3,$*+!"7#& /1H( B"'"25)$ (36,")6'"!$ -&2$ <'*)3!69"$ C')6'"$ D3)%325)$ D&'%"'%7$ W)*)+$)*( _!&:-.D( =-.( 8*2153$;!( W51&5.)1( N"2#)*6!'( #&&0F,,2*$2!75&-.-*&-72),65$+!1$*!370#0( ( O (#&&0F,,:::7+!)1>1!-*).+-7!5( X ((((((((((((((((((((((((((((((((((((((((((((((((((((((((((((( 9 (#&&0F(:::73$6*0!)D7!5H(#&&0F,,:::7+$2&)3$6*7!5( (#&&0F,,:::7)&1)370-1$&-7$&,( 19 ,!'),# )!# 7&# T3))&($&$# )!U# 3($# +'!.&,,&$-# 3($# !)<&'# %("!'43)%!(# ,)&44%(1# "'!4# )<&# %()&'3.)%!(# =%)<# !)<&'# "&//!=# ,)0$&(),# 3($^!'# =%)<# 3# )0)!'# @&515# %(# &8.<3(1&,# )3?%(1#+/3.&#%(#3.)03/#./3,,'!!4,#!'#%(#9%$&!.!("&'&(.&,B5# :%(.&#$&3"#+&',!(,#40,)#0,&#)<&%'#,%1<)-#3($#3..!'$%(1/*# !'%&()# )<&%'# 9%,03/# 3))&()%!(-# )!# +'!.&,,# 7!)<# ?%($,# !"# %("!'43)%!(-# )<&# )=!# )3,?,# .3((!)# 7&# .3''%&$# !0)# 3)# )<&# ,34&#)%4&E#S;#.3((!)#,%40/)3(&!0,/*#/!!?#3)#)&3.<%(1#!'# &8+/3(3)!'*# 43)&'%3/,# $%,+/3*&$# !(# )<&# .!4+0)&'# ,.'&&(# !"##3)#/%(10%,)%.-#%()&'3.)%!(273,&$#%("!'43)%!(#1%9&(#!(# )<&# ,34&# 43)&'%3/,# =<%.<# )<&*# 40,)# 3/=3*,# $&.!$&# +'%43'%/*# 9%3# 9%,%!(# @&515# 7*# /%+2'&3$%(1# ,+!?&(# 0))&'3(.&,-# +'!.&,,%(1# 3# 4&,,31&# %(# :;-# '&3$%(1# ,07)%)/&,B5## D<%,# %,# 40.<# 0(/%?&# =<3)# <3++&(,-# %(# )<&# ,34&# .!!+&'3)%9&# /&3'(%(1# ,%)03)%!(-# "!'# <&3'%(1# /&3'(&',# =<!# .3(# ,%40/)3(&!0,/*# +'!.&,,# .!440(%.3)%9&# 4&,,31&,# .!(9&*&$# )<'!01<# ,!0($,# 3($# "'&&/*# !'%&()# )<&%'# 9%,03/# 3))&()%!(#)!#!)<&'# )*+&,#!"# %("!'43)%!(#.!4%(1#"'!4# )<&# .!4+0)&'# ,.'&&(5# S&9%,%(1# 3(# 3++'!+'%3)&# &2/&3'(%(1# &(9%'!(4&()# "!'# S;# )<0,# '&W0%'&,# 3..0'3)&# 3(3/*,&,# !"# )<&#=3*,#%(#=<%.<#)<&,&#/&3'(&',#0,($#$%,)'%70)&#)<&%'# 9%,03/#3))&()%!(#=<&(#+&'"!'4%(1#$%""&'&()#/&3'(%(1#)3,?,-# 3($#<!=#)<%,#.3(#%("/0&(.&#)<&#/&3'(%(1#+'!.&,,5# !"#$%""&'&()#)*+&,-#%(./0$%(1#&2/&3'(%(1#+/3)"!'4,5# # 6(# .!()'3,)-# (!(# 3,,%4%/%3)%!(%,)# 4!$&/,-# 73,&$# !(# &8)&(,%9&# 3(3/*,&,# !"# :;# $%,.!0',&-# ,<!=# )<3)# :;# .!(,)%)0&()# &/&4&(),# .3((!)# 7&# &3,%/*# 3,,%4%/3)&$# )!#>;# 0(%),5# 6(# 3$$%)%!(# )!# =!'$2/%?&# &/&4&(),-# :;# +!,,&,,# .!4+/&8-# <%1</*# %.!(%.# ,)'0.)0'&,# @A6:B# )<3)# 3'&# ,%40/)3(&!0,/*# !'13(%C&$# %(# 3# 40/)%/%(&3'# # "3,<%!(# )<3)# <3,#(!#+3'3//&/#%(#>;5#D<&#$%""&'&(.&,#7&)=&&(#=!'$2/%?&# 3($#(!(2=!'$2/%?�(%),# 3'&# 43'?&$#7*#(!(#43(03/# 3($# 43(03/# 3')%.0/3)!',-# 4!,)# (!)37/*# 7*# 4!$3/%)*2,+&.%"%.# &*&213C&# +3))&'(,E# =<&(# +'!$0.%(1# =!'$2/%?&# 0(%),-# )<&# ,%1(&'F,# 13C&# %,# $%'&.)&$# )!=3'$,# )<&# %()&'/!.0)!'-# =<&'&3,# =<&(# +'!$0.%(1# A6:# )<&# ,%1(&'F,# 13C&# %,# )*+%.3//*#$%'&.)&$#3=3*#"!'4#)<&#%()&'/!.0)!'5## G%10'&# H# 7&/!=# +'!9%$&,# I0,)# )=!# %//0,)'3)%9&# &834+/&,# !"# 3# =!'$2/%?&# 0(%)# @H3B# 3($# 3# (!(2=!'$2/%?&# A6:#@H7B#)<3)#3'&#.!44!(/*#"!0($#%(#:;#$%,.!0',&5#D<&# &834+/&,# 3'&# )3?&(# "'!4# ;6:# $%,.!0',&# 70)# 3# =&3/)<# !"# ,%4%/3'# &834+/&,# .3(# 7&# "!0($# %(# 3//# :;# @"!'# '&/&93()# $%,.0,,%!(,-# ,&&# &,+&.%3//*# J083.-# KLLLM# J083.# N# O()%(!'!#P%CC0)!-#KLHLM#P%CC0)!-#P%&)'3($'&3#N#:%4!(&-# KLLQM#R3'.%3#N#S&'*.?&M#KLHLB5# # 2"! 3-4'/5%)5+'678+,*+/+5)9&(*$($,:&'()',5) 9&(*$9-5'()+7(+'/,$,:).('*6-/9%) G%10'&# K# ,.<&43)%.3//*# %//0,)'3)&,# 3# 4!$&/# "!'# 3(# &2/&3'(%(1# +/3)"!'4# +'!)!)*+&# @];PPB# +'!)!)*+&# =&# 3'&# .0''&()/*# $&9&/!+%(1# =%)<%(# )<&# "'34&# !"# 3# (3)%!(3/# +'!I&.)#=<%.<#+0',0&,#)=!#43I!'-#%()&''&/3)&$#!7I&.)%9&,E# @HB# %4+'!9%(1# 40/)%/%(103/# ^# 40/)%4!$3/# &2/&3'(%(1# &(9%'!(4&(),# "!'# S;# @A%1<# :.<!!/# 3($# _(%9&',%)*# ,)0$&(),BM#@KB#+'!4!)%(1#)<&%'#/%)&'3.*#,?%//,#`5## # ! H3E#T.<%/$U############H7E#T.<%/$#/!!?%(1#!0),%$&-# /&3(%(1#!(#3#=%($!=,%//U# # G%10'&#HE#V!'$2/%?&#,%1(#@H3B#3($#A6:#@H7B# # D<&# +!%()# =&# =%,<# )!# ,)'&,,# <&'&# %,# )<&# "!//!=%(1E# A6:#3'	&'*#"'&W0&()#%(#:;#$%,.!0',&-#'3(1%(1#"'!4#XLY# )!#3,#40.<#3,#ZLY#@$&+&($%(1#!(#$%,.!0',&('&B#!"#)<&# .!(,)%)0&()#&/&4&(),#)<3)#.3(#7&#%$&()%"%&$#3($#+3',&$##%(# :;#$%,.!0',&#@[!0)&)-#:3//3($'&#N#G0,&//%&'2:!0C3-#KLHLM# J083.# N# O()%(!'!# P%CC0)!-# KLHLM# :3//3($'&-# KLLXM# S%# \&(C!# N# 3/-# KLLZB5# 6)# ,<!0/$# )<0,# 7&# &9%$&()# )<3)# :;# $&,.'%+)%!(,# !"# 3(*# ,!')-# %(./0$%(1# 4!$&/%,3)%!(,# 9%3# ,%1(%(1# 393)3',-# .3((!)# $%,'&13'$# 3,# T43'1%(3/U# )<&,&# ,)'0.)0'&,#)<3)#3++&3'#0(%W0&#!"#:;#@,&&#J083.#N#S3//&-# KLLQB5#]2/&3'(%(1#43)&'%3/,#73,&$#!(#)<,,04+)%!(#)<3)# :;#&/&4&(),# 3'&#"!'#)<!,)# TI0,)# /%?&#>;#=!'$,U# )<0,# &8<%7%)# ,&9&'&# /%4%)3)%!(,# )<3)# (&&$# )!# 7&# '&.!1(%C&$-# .'%)%.3//*#$%,.0,,&$#3($-#<!+&"0//*-#34&($&$5# # # # !"! #$%&'()'**+,*$-,).'**+/,%)$,)01)) O(# 3++'!+'%3)&# &2/&3'(%(1# &(9%'!(4&()# "!'# S;# 3)# /3'1&-# %5&5# "!'# 7!)<# ,%1(&',# 3($# (!(2,%1(&',-# 40,)# )3?&# %(# $0&# 3..!0()#3#.!(,)'3%()#)<3)#.3(#7&#&3,%/*#!7,&'9&$#3($#*&)-# )!#!0'#?(!=/&$1&-#<3,#(!)#7&&(#.3'&"0//*#%(9&,)%13)&$#%(# +'&9%!0,# '&,&3'.<5# V<&(# =!'?%(1# =%)<# 3# .!4+0)&'-# )<&# 9%,03/# 3))&()%!(# +3))&'(,# +'!+&'# !"# S;# 43'?&$/*# $%""&'# "'!4# )<!,&# !7,&'937/&# %(# <&3'%(1# /&3'(&',5# D<%,# %,# )'0&# &,+&.%3//*#%(#,%)03)%!(#!"#.!!+&'3)%9&# /&3'(%(1#=<&'&#)<&# ,)0$&(),# 40,)# ,%40/)3(&!0,/*# 3))&($# )!# 9%,03/# %("!'43)%!(# .!(.&'(%(1# ='%))&(# 43)&'%3/,# !"# $%""&'&()# G%10'&#KE#O#4!$&/#!"#$&3"2.&()&'&$#&2/&3'(%(1#+/3)"!'4# # D<&# ];;P# 4!$&/# %//0,)'3)&$# %(# G%10'&# K# 3%4,# 3)# ############################################################# ` #D<&#+'!I&.)#%(9!/9&,#"%9&#'&,&3'.<#)&34,#+'!9%$%(1#%()&'2#3($# )'3(,2$%,.%+/%(3'*# .!4+&)&(.&,# 3.'!,,# )<&# "%&/$,# !"E# 2:;# /%(10%,)%.,M# 2,+&.%3/# 3($# 7%/%(103/# &$0.3)%!(# "!'# S;M# 240/)%4&$%3# )!!/,# "!'# S;# 3($# <&3'%(1# /&3'(&',M# 2<043(2.!4+0)&'# %()&'3.)%!(# 3($#9%,03/# /&3'(%(1# %(# &2/&3'(%(1# &(9%'!(4&(),M# 2"!'&%1(# /3(1031&# )&3.<%(1# 4&)<!$!/!1%&,# %(# 7!)<#)'3$%)%!(3/#3($#&2/&3'(%(1#&(9%'!(4&(),5# 20 !"#$%!&'()* +,#* -'&'+.+'!(/* 0$!0#$* !1* &.(2* #3-#.$('()* 0-.+1!$&/*4/##*5'/%6//'!(*.7!"#8*'(*5'11#$#(+*9.2/:*;.%,* !1*+,#*&.<!$*=%!(%#0+6.-*%!&0!(#(+/>*!1*+,#*&!5#-*'/*.+* +,#*/.&#*+'&#*!"#$%&#'()*+?*.(5*,'-'..&/+)0"/*5#/')('()* .* ('&01-',#'/'() '12'&/,$,3) 42�"/!:* @,#* 0-.+1!$&* '/* )$!6(5#5* 60!(* +,#* '5#.* +,.+* $#/#.$%,* .'* .+* %$#.+'()* 6/#16-* 0$!56%+/* 1!$* 5#.1* 6/#$/* (##5/* +!* 7#* 5#"#-!0#5?* 1$!&*+,#*"#$2*/+.$+?*5$#6*5#.1*0#$/!(/?*(!+*<6/+*0"/?*!$*",* 5#.1* 0#!0-#:* A%%!$5'()-2?* .(5* $.+,#$* 5'11#$#(+-2* 1$!&* 9,.+* '/* $#0!$+#5* 1!$* &.(2* 0./+* .(5* !()!'()* 0$!<#%+/* 5'$#%+#5* +!* 5#.1* 0#$/!(/?* +,'/* '5#.* )6'5#/* !6$* .%+6.-* B0$!<#%+* &.(.)#&#(+C* 0$.%+'%#:* @,#* 0$!<#%+3-#.5#$* +#.&* '(%-65#/* /'D* 5#.1* %!--#.)6#/* 9,!* 4&/#$-$4&#') &.) 4/"#&3",$.#.)$,)#6')42&,,$,3)&,()&/#$-72&#$",)"0)#6')',#$/') /'.'&/-6) 4/"8'-#?* (!+* !(-2* ./* =#(5* 6/#$/>* !$* =#(5* #".-6.+!$/>*!1*+,#* -.()6.)#*$#/!6$%#/* .(5*5'5.%+'%* +!!-/* +!*7#*0$!56%#5*!$*'&0-#&#(+#5:*A--*!6$*5#.1*%!--#.)6#/* .$#* ,'),-2* 0$!1'%'#(+* '(* EFGH* +,$##* -#.$(#5* +!* /')(* '(* '(1.(%2?* 9'+,'(* 5#.1* /')('()* 1.&'-'#/?* +,$##* .5* 5'11#$#(+* .)#/?* ./* '+* ,.00#(/* +!* &!/+* 5#.1* /')(#$/* 4/##* I6D.%* J* A(+'(!$!* K'LL6+!?* MNON8P* +,#2* 0!//#//* 5'11#$#(+* 5#)$##/* !1*Q(!9-#5)#*!1*/0!Q#(*R9$'++#(*F+.-'.(*9,'%,*&'$$!$*'(* 0.$+*+,#'$*#56%.+'!(.-*7.%Q)$!6(5*S:** T#* )'"#* ,#$#* <6/+* 1#9* 0$.%+'%.-* #D.&0-#/* !1* +,#* %$6%'.-* '("!-"#&#(+* !1* !6$* 5#.1* %!--#.)6#/:* @,#* %,!'%#* !1* +,#* =%!(+#(+/>* 9#* 9'--* 1!%6/* !(* 1!$* 5#"#-!0'()* +,#* ;EKKU?*.(5*!1*+,#*5'11#$#(+*1!$&/*'(*9,'%,*/6%,*%!(+#(+/* 9'--* #"#(+6.--2* 7#* 0$#/#(+#5* +!* VE* !(* +,#* ;EKK* 4#:):* /0!Q#(* .(5* 9$'++#(* +#D+/?* /0##%,3+!3+#D+* %.0+'!(/?* GE* +$.(/-.+'!(/*.(5* #D0-.(.+'!(/?*)$.0,'%*'--6/+$.+'!(/8?*9./* &.5#* 1!--!9'()* #D+#(/'"#* 5'/%6//'!(/?* .&!()* +,#* 5#.1* .(5* +,#* ,#.$'()* &#&7#$/* !1* +,#* 0$!<#%+?* !1* 5'11#$#(+?* .-+#$(.+'"#* 0!//'7'-'+'#/:* W6$* 5#.1* %!--#.)6#/* .$#* %!(+$'76+'()* +!* +,#* 0$#0.$.+'!(* !1* .53,!%* X6#/+'!((.'$#/* .(5* +!* .* +,!$!6),* #D.&'(.+'!(* .(5* #".-6.+'!(* !1* -.()6.)#* +./Q/?* &.+#$'.-/?* &6-+''.* +#%,(!-!)'#/* 9#* .$#* 6/'()* .(5R!$* .$#* %6$$#(+-2* 5#"#-!0'()* 4'(%-65'()* 1!$* #D:*+,#*;EKK*'(+#$1.%#8:*F(*/,!$+?*+,#*.%+'"#*'("!-"#&#(+* !1* 5#.1* %!--#.)6#/* #(/6$#/* +,.+* +,#* #(5* 0$!56%+/* !1* !6$* 0$!<#%+*7#?*!(*!(#*,.(5?*%!(/'/+#(+*9'+,*+,#*=('&0)5"/2() %$'5>?* 4/##* Y')6$#* M8* Z* ':#:* .* %!&0-#D* %!(1')6$.+'!(* !1* #D0#$'#(+'.-* .(5* %!(%#0+6.-* Q(!9-#5)#* +,.+* '/* /+$!()-2* )$!6(5#5*'(*%$.$",*4/##*.&!()*!+,#$/*E.(#?*[!11&#'/+#$* J* \.,.(?* O]]^8?* .(5?* !(* +,#* !+,#$* ,.(5?* #11#%+'"#-2* $#/0!(5**+!*9:),''(.*4/##*Y')6$#*M8:** W(#* !+,#$* '&0!$+.(+* #-#&#(+* !1* +,#* 5#.13,#.$'()* %!--.7!$.+'!(*9#*.$#*0$!&!+'()*9'+,'(* +,#*0$!<#%+* '/* +,#* 1!--!9'()H*.--* +,#*,#.$'()*&#&7#$/*!1*+,#*0$!<#%+3-#.5#$* +#.&*0!//#//*.*)!!5*!$*.5".(%#5*Q(!9-#5)#*!1*EFGP*1!6$* !1* +,#* 1'"#* 4,#.$'()8* 2!6()* $#/#.$%,#$/* !1* +,#* !+,#$* $#/#.$%,* +#.&/* '("!-"#5* '(* +,#* 0$!<#%+* .$#* %6$$#(+-2* .++#(5'()* %-.//#/* +!* -#.$(* EFG:* T#* .$#* .-/!* /##Q'()* ************************************************************* 16$+,#$* %!--.7!$.+'!(/* 9'+,* 5#.1* #D0#$+/* 9,!* 6/#* F+.-'.(* 4$.+,#$*+,.(*EFG8*./*+,#'$*0$#1#$$#5*-.()6.)#:* * A/* (!+#5* .7!"#?* &!/+* #3-#.$('()* 0-.+1!$&/* 1!$* VE* .00#.$*+!*7#*('.$3,'()",2+)0"/).$3,$,3)9::*F(*%!(+$./+?*./* /,!9(* '(* Y')6$#* M?* !6$* ;EKK* .'&/* .+* &((/'..$,3) #6') ,''(.) "0* !"#$) .$3,$,3) ;:<=1:>?) &,() ,",) .$3,$,3) ;<#&2$&,1:>?)9::* F(* 1.%+?* ./* .-/!* (!+#5* .7!"#?* 7!+,* /6%,* )$!60/*!1*VE*#D0#$'#(%#*5$.&.+'%*5'11'%6-+'#/*'(*2$#'/&-+) ('%'2"4!',#:* W6$* $#/#.$%,* .'&/* .+* ./%#$+.'('()* +,#* /0#%'1'%*%!&&6('%.+'"#3-'()6'/+'%*(##5/*!1*#.%,*)$!60*!1* VE* .(5* +,#* #D+#(+* +!* 9,'%,* +,#/#* .$#?* !$* .$#* (!+* %!&0.$.7-#:* T#* #D0#%+* +,.+* +,#* * $#/6-+/* !1* !6$* '("#/+').+'!(/* 9'--* 0$!"'5#H* 4.8* (!"#-?* $#-#".(+* '(1!$&.+'!(*!(*+,#*-'()6'/+'%3%!)('+'"#*0$!1'-#*!1*+,#*+9!* )$!60/* !1* VE?* %-.$'12'()* .-/!* 9,#+,#$?* .(5R!$* ,!9* Q(!9-#5)#*!1*EFG*./*EO*&.2?*!$*&.2*(!+?*'(+#$1#$#*9'+,* +,#* .%X6'/'+'!(* .(5* 6/#* !1* /0!Q#(R9$'++#(* F+.-'.(P* 478* '&0!$+.(+* '(5'%.+'!(/* !(* ,!9* 9#* &.2* (##5* +!* 5'11#$#(+'.+#*+,#*&6-+'-'()6.-*.(5*&6-+'&!5.-*&.+#$'.-/*+!* 7#* %$#.+#5* 1!$* 0$!&!+'()* -'+#$.%2* 5#"#-!0&#(+* '(* VE* 9'+,* EFG3EO* ./* %!&0.$#5* +!* VE* 9'+,* F+.-'.(3EO:* Y!$* #D.&0-#?* $#%.--'()* 9,.+* (!+#5* '(* /#%+'!(* `?* '+* 9!6-5* 7#* 0-.6/'7-#* +!* ,20!+,#/'L#* +,.+?* 1!$* VE* 9'+,* EFG3EO?* +,#* /'&6-+.(#!6/-2* !$).('L#5?* &6-+'-'(#.$* -'()6'/+'%* /+$6%+6$#/* +,.+* .$#* ,'),-2* /0#%'1'%* !1* +,#'$* GE?* (.&#-2* [FG?*&.2*(#).+'"#-2* '(+#$1#$#*9'+,* +,#* -#.$('()*!1*&!$#* /#X6#(+'.--2* !$).('L#5* -'()6'/+'%* /+$6%+6$#/* +,.+* .$#* 0$!0#$*!1*9$'++#(*-.()6.)#:*F+*9!6-5*7#*#X6.--2*0-.6/'7-#* +!*,20!+,#/'L#*+,.+*+,#/#*0!+#(+'.-*(#).+'"#*'(+#$1#$#(%#/* /,!6-5*7#*.7/#(+*'(*VE*9'+,*F+.-'.(3EO:* [!9#"#$?*+,#/#* ,20!+,#/#/* %.(* 7#* #".-6.+#5* !(-2* 72* %!&0.$'()* +,#* -'()6'/+'%3%!)('+'"#*0$!1'-#/*!1*+,#*+9!*)$!60/*!1*VE?*./* 9#*0-.(*+!*5!*'(*!6$*0$!<#%+:** A* /67/+.(+'.-* (!"#-+2* !1* +,#* &6-+'-'()6.-* R* &6-+'&!5.-* ;EKK* #3-#.$('()* #("'$!(&#(+* 9#* .$#* 5#/')('()* %!(%#$(/* +,#* 6/#?* 0$#/#(+.+'!(* 4,#(%#?* 72* +,#* /.&#*+!Q#(?*#D0-'%'+*&!5#-'()*.(5*$#0$#/#(+.+'!(8*!1*+,#* +9!* &.<!$* +20#/* !1* 2&,37&3') /'."7/-'.* +,.+* 9'--* 7#* #&0-!2#5*1!$*0#5.)!)'%.-*06$0!/#/?*(.&#-2H*F+.-'.(* .(5* EFG:*T,.+*'/*(!"#-*'(*!6$*&!5#-*'/*+,.+?*./*'--6/+$.+#5*'(* Y')6$#* M?* 9$'++#(* +#D+/* 9'--* 7#* 0$!"'5#5* (!+* !(-2* '(* 5/$##',)<#&2$&,)4+,#* +.$)#+* -.()6.)#* '(* 9,'%,* 9#* .'&* +!* 0$!&!+#* VE* -'+#$.%2* 5#"#-!0&#(+8?* 76+* .-/!* '(* 5/$##',) :<=*Z*.*-.()6.)#*$#/!6$%#*9,'%,?*+!*!6$*Q(!9-#5)#?*,./* (#"#$*7##(*#D0#$'&#(+#5*'(*#3-#.$('()*0-.+1!$&/*1!$*VE:* =4"@',)<#&2$&,*.(5*0&-'1#"10&-'):<=*4+,#*-.++#$*'(*+,#*1!$&* !1*5')'+.-*"'5#!/8*9'--*.-/!*7#*6/#5*4/##*Y')6$#*M8:** Y!$* +,#* '(/+$6%+'!(.-* &.+#$'.-/* +!* 7#* 0$!"'5#5* '(* 5/$##',) <#&2$&,?* )6'5#5* #./'1'%.+'!(* 0$!%#56$#/* 9'--* 7#* 6/#5* +!* 1.%'-'+.+#* VEC/* .%%#//* +!* +#D+6.-* &.+#$'.-/P* /0##%,3+!3+#D+* %.0+'!('()* +!!-/* 9'--* )$.(+* "'/6.-* .%%#//'7'-'+2* +!* &.+#$'.-/* )'"#(* '(* .4"@',) <#&2$&,P* -'()6'/+'%* .%%#//'7'-'+2* +!* +,#* %!(+#(+/* .(5* 1!$&/* !1* F+.-'.(3#(%!5#5*'(/+$6%+'!(.-*&.+#$'.-/*9'--*7#*#(,.(%#5?* 1!$* VE* 9'+,* EFG3EO?* "'.* .00$!0$'.+#* "'5#!/* 0$!"'5'()* +$.(/-.+'!(/* .(5* #D0-.(.+'!(/* '(* 41.%#3+!31.%#8* EFG:* V6#* +!*/0.%#*-'&'+/?*(!*16$+,#$*5#+.'-/*.$#*)'"#(*,#$#*!(*+,#/#* +,$##* +20#/* !1* -.()6.)#* $#/!6$%#/?* 9,'%,* 9'--* 7#* '&0-#&#(+#5* 5$'"'()* !(* .* %!(/!-'5.+#5* #D0#$'#(%#* '(* S *G0!Q#(R9$'++#(*-.()6.)#*0$!1'%'#(%2*'(*5#.1*0#$/!(/*'/*,'),-2* ".$'.7-#* .(5* !(-2* 0.$+'.--2* -'(Q#5* +!* +,#* #56%.+'!(.-* -#"#-* .%,'#"#5:*W6$*5#.1*%!--#.)6#/*'(%-65#*!(#*5!%+!$.-*/+65#(+?*!(#* %!--#)#* )$.56.+#?* !(#* _('"#$/'+2* /+65#(+?* +,$##* ,'),* /%,!!-* )$.56.+#/:** U *Y!$* /0.%#* -'&'+/* 9#* %.(* !(-2* &#(+'!(* ,#$#* +,#* B)#(#$.-* %!(+#(+/C*!1*+,#*;EKKH*9#*9'--* 1!%6/*!(*+,#*,'/+!$2?*#"!-6+'!(* .(5*6/#*!1*9$'+'()?*.(5*%!&0.$#*!$.-R/')(#5*"/:*)$.0,'%R9$'++#(* 1!$&/*!1*%!&&6('%.+'!(:** 21 D)( >#'$( ,-( *)B)#->( D",="$( -&/( >/-A)+,( D"##( >/-B"*)( &4( ;&+=( $))*)*6( $-B)#( "$.-/;',"-$( .-/( '( !),,)/( &$*)/4,'$*"$%( -.( =-D( B"4&'#( "$.-/;',"-$( $))*4( ,-( !)( 4>',"'##<( '$*( ,);>-/'##<( 4,/&+,&/)*( "$( )J#)'/$"$%( )$B"/-$;)$,4( .-/( 016( '4( +-;>'/)*( ,-( =)'/"$%( &4)/4?( \=)4)( '$'#<4)4( D"##( '#4-( '##-D( ,-( &4( '4+)/,'"$( D=),=)/( ,=)/)('/)(2-/($-,:(/)#)B'$,(*"..)/)$+)4(!),D))$(4"%$"$%(B4?( $-$J4"%$"$%(*)'.(4,&*)$,46(D=)$(,=)4)(01(D",=(*"..)/)$,( #'$%&'%)( !'+L%/-&$*( '++)44( '$*( &4)( B"4&'##<( %/-&$*)*( "$.-/;',"-$6(-.(!-,=(#"$%&"4,"+('$*($-$J#"$%&"4,"+(,<>)?(( I"$'##<6(/)+'##"$%(,=)(+/&+"'#(";>-/,'$+)(-.(B"4"-$("$( ,=)( a*)'.( D-/#*( B")D`6( D)( ,="$L( ,=',( D)!J!'4)*( 4(*+'4,5')% +,67-.*.8',$% )-5% *,)0-'-8% +..*$( .-/( '( *)'.J+)$,)/)*( ]1ZZ( ;'<( !)( 4"%$"."+'$,#<( ";>/-B)*( ";>#);)$,"$%( '( B"4&'##<J!'4)*( %/'>="+( "$,)/.'+)?( 0/'D"$%(-$(-$%-"$%(/)4)'/+=(-$(,=)(,->"+(23'>&'$-(5('#6( 4&!;",,)*:6(D)('";(',(*)4"%$"$%('$("$,)/.'+)(,=',(01(+'$( '++)44( '$*( &4)( )'4"#<( '$*( a"$,&","B)#<`( !)+'&4)( ,)C,&'#( "$.-/;',"-$(2D="+=("4(*".."+&#,(.-/( ,=);:("4(4"%$"."+'$,#<( /)*&+)*6(-/()B)$()$,"/)#<(/)>#'+)*(!<(;-4,#<($-$J,)C,&'#( 2"+-$"+:( "$.-/;',"-$?( \="4( )$,'"#4( ,=)( $))*( -.( +/)',"$%( '( $)D6( %/'>="+( D'<( .-/( !/-D4"$%( D)!( >'%)46( '$*( "$,)/'+,"$%(D",=(,=)(]1ZZ?( I-/( ,=)( $',&/'#6( *)'.J>)+&#"'/( B"4&'#( D'<( -.( %/'4>"$%("$.-/;',"-$(,-(!)()C>#-",)*("$(-&/(>#',.-/;6(D)( '/)(%-"$%(,-(&4)('($)D("$,)/'+,"-$(>'/'*"%;(!'4)*(-$(,=)( ,=)-/")4( -.( ,49.5',5% 6.8-'+'.-%'$*( $+.0:+,**'-8( 21'L-..( 5(b-=$4-$6(KP[8R(b-=$4-$6(KP[cR(E;'X(5(d)$<-$6(788c:?( @",="$( ,="4( >'/'*"%;6( ,=)( #)'/$"$%( >/-+)44( +'$( !)( ;),'>=-/"+'##<( /)>/)4)$,)*( '4( '( 4,-/<( ,=',( "$+#&*)4( ,=)( &4)/('4( ,=)(;'"$( +='/'+,)/?(G++-/*"$%#<6(,=)(&4)/(a#"B)4`( ,=)( #)'/$"$%( >/-+)44( !<( >=<4"+'##<( )C>)/")$+"$%( ",( M( "$( ,=)(B"/,&'#(4>'+)(-.(,=)(]1ZZ(M('4('(>',=(D",=('(4,'/,"$%( >#'+)6( '( 4)_&)$+)( -.( 4)B)/'#( #)'/$"$%( 4,)>46( '$*( '( ."$'#( %-'#?( F&+=( '(;),'>=-/(4));4( ,-(!)('(B)/<("$,&","B)(D'<( -.( /)>/)4)$,"$%( ,=)( #)'/$"$%( )$B"/-$;)$,?( S-/)-B)/6( ",( 4));4(,-(!)('$('*)_&',)("$,)/'+,"-$(>'/'*"%;()4>)+"'##<( .-/(*)'.(&4)/46(4"$+)(",()C>#-",4(,=)(B"4&'#(+='$$)#('4(,=)( ;'"$(4-&/+)(-.("$.-/;',"-$?(( !"#"$%&'#( )*&+',"-$( .-/( 01( 23'4)##"( 5( '#6( 7889:6( '$*( ;-/)( %)$)/'##<( "$( #'$%&'%)( ,)'+="$%( ;),=-*-#-%")46( '4( *),'"#)*( "$( -&/( %/'$,( >/->-4'#?( @)( *)4+/"!)( !/").#<( ,=)( /',"-$'#)6( );>"/"+'#( %/-&$*46( '$*( ;'A-/( '";4( -.( -&/( $-B)#()C>)/";)$,',"-$(-.(D/",,)$(1EF?( G4($-,)*("$(4)+,"-$(H6('##(F1('/)(',(>/)4)$,(D",=-&,( '( D/",,)$( ,/'*","-$?( I-/( 01( D",=( 1EFJ1K6( ,=)( #'+L( -.( '( D/",,)$( .-/;( -.( ,=)"/( -D$( F1( ;'<( D)##( !)( -$)( -.( ,=)( -!4,'+#)4( -$( ,=)( /-'*( ,-D'/*4( '+=")B"$%( '>>/->/"',)( #",)/'+<(4L"##4("$('(#'$%&'%)(M(#"L)(E,'#"'$(J(,=',($-,(-$#<( *-)4( ='B)( '( D/",,)$( ,/'*","-$( !&,( "4( '#4-( ,<>-#-%"+'##<( B)/<(*"..)/)$,(./-;(,=)"/(-D$(24))()4>)+"'##<(-&/(/);'/L4( '!-B)( -$( F1( NEF:?( O)+)$,( /)4)'/+=( 4=-D4( ,=',( E,'#"'$( 4"%$)/4( +'$(>/-.",'!#<(&4)(F"%$( @/","$%(2F@:6('(%/'>="+( 4<4,);( >/->-4)*( !<( F&,,-$( 2KPPP:( .-/( D/","$%( F16( .-/Q( J,/'$4+/"!"$%(1EF(.'+)J,-J.'+)(>/-*&+,"-$4R(J(+/)',"$%6(.-/( ,=)( ."/4,( ,";)( "$( ,=)( ="4,-/<( -.( ,="4( F16( ,)C,4( +-$+)"B)*( *"/)+,#<( "$( D/",,)$( 1EF( 2F@( ='4( !))$( '*'>,)*( .-/( ,=)4)( >&/>-4)4( ,-( 1EF:?( S-/)( ";>-/,'$,#<( .-/( ,=)( >/)4)$,( *"4+&44"-$6( ,="4( /)4)'/+=( 4=-D4( ,=',6( /)#<"$%( -$( F@J)$+-*)*( 1EF( ,)C,46( 4"%$)/4( +'$( '&,-$-;-&4#<( >)/.-/;( ;)'$"$%.&#( +-;>'/"4-$4( !),D))$( 1EF( '$*( 4>-L)$TD/",,)$( E,'#"'$6( ',( '##( 4,/&+,&/'#( #)B)#4( J( #)C"+'#6( ;-/>=-#-%"+'#6(4<$,'+,"+6(,)C,&'#6(>/'%;',"+?(( U$( ,="4( !'4"46( 4"%$)/4( +'$( .-/;&#',)( ;),'+-%$","B)( '$*(;),'#"$%&"4,"+(/).#)+,"-$4(-$(,=)(4,/&+,&/)(-.(1EF('4( +-;>'/)*( ,-( 4>-L)$TD/",,)$( E,'#"'$6( '$*( ;-/)( %)$)/'##<( -$( ,=)( /)#',"-$4( !),D))$( V-/'#",<W( -/( .'+)J,-J.'+)( B4?( D/",,)$( +-;;&$"+',"-$6( "$( '( D'<( ,=',( ='4( $)B)/( !))$( >-44"!#)6( .-/( ,=);6( D",=-&,( /)#<"$%( -$( '( D/",,)$( /)>/)4)$,',"-$(-.(,=)"/(F1(24))(';-$%(-,=)/4(0"(O)$X-(5( '#6(7889R(788PR(Y"'$./)*'(5('#6(788PR(Z"XX&,-(5('#(7889R( G$,"$-/-(Z"XX&,-(5('#6(788[:?(\'L"$%("$(*&)('++-&$,(,=)( +/&+"'#( /-#)( ,=',( ;),'+-%$","B)( '$*( ;),'#"$%&"4,"+( 4L"##4( $-,-/"-&4#<( >#'<( "$( ,=)( *)B)#->;)$,( -.( #",)/'+<( 4L"##46( ,=)4)( /)4)'/+=( ."$*"$%4( ='B)( ;-,"B',)*( &4( ,-( .&/,=)/( )C>)/";)$,( D/",,)$( 1EF6( -$( -&/( ]1ZZ6( '4( '( >-,)$,"'##<( B)/<( >-D)/.&#( >)*'%-%"+'#( ,--#( .-/( >/-;-,"$%( #",)/'+<( '!"#",")4?( F@J)$+-*)*6( D/",,)$( /)>/)4)$,',"-$4( -.( 1EF( ='B)('#4-(>/-B)$(,-(!)()C,/);)#<(&4).&#(.-/('*B'$+"$%("$( ,=)(#"$%&"4,"+('$'#<4"4(-.(,=)(#'$%&'%)(2G$,"$-/-(Z"XX&,-( 5( '#6( 788[:6( >'B"$%( ,=)( D'<( .-/( ;-/)( '>>/->/"',)( ;-*)#"4',"-$4( D="+=( ;'<( !)( &4)*( .-/( !-,=( %)$)/'#( *)4+/">,"B)( >&/>-4)46( '$*( .-/( ";>#);)$,"$%( ,=)( &4)( -.( 1EF('4('(#"$%&"4,"+(/)4-&/+)(-$()J#)'/$"$%(>#',.-/;4?(( @)( $-,)*( "$( 4)+,"-$( ^( ,=',( !"#$% &'$()*% )++,-+'.-% /)++,0-$% '-% 123% ;'<( 4"%$"."+'$,#<( *"..)/( ./-;( ,=-4)( -.( =)'/"$%( #)'/$)/4?( U$)( -,=)/( '**","-$'#( $-B)#,<( -.( -&/( >/-A)+,( +-$+)/$4( ,=)( &4)( -.( )<)J,/'+L"$%( )_&">;)$,( .-/( '$'#<X"$%( 01`4( B"4&'#( ',,)$,"-$( >',,)/$46( '$*( +-;>'/)( ,=);( D",=( ,=-4)( -.( =)'/"$%( #)'/$)/4`6( *&/"$%( #)'/$"$%( ,'4L4( D="+=( *);'$*( ,=)( 4";&#,'$)-&4( >/-+)44"$%( -.( #'$%&'%)( /)4-&/+)4( '#-$%( D",=( B"4&'#( "$.-/;',"-$( -.( *"..)/)$,( 4-/,4?( Z/)#";"$'/<( /)4&#,4( -.( '( >"#-,( 4,&*<( D)( ='B)(+-$*&+,)*("$*"+',)(,=',6("$(>/-+)44"$%(;&#,";-*'#(T( ;&#,"#'$%&'%)( ;',)/"'#46( ,=)( %'X)( >',,)/$4( -.( 01( D",=( 1EFJ1K( ;'/L)*#<( *"..)/( ./-;( ,=-4)( -.( =)'/"$%( #)'/$)/4( 23'>&'$-6(1)B"'#*"(5(G$,"$-/-(Z"XX&,-6(4&!;",,)*:?(@)( ,/&4,( ,=',( ,=)( ;-/)()C,)$4"B)( "$B)4,"%',"-$4(-$(,="4( ,->"+( !"! #$%&'()*+,*-*&./0 \=)(>/)>'/',"-$(-.(,="4(D-/L(D'4(>'/,"'##<(.&$*)*(!<(J,=)( E,'#"'$( S"$"4,/<( -.( ]*&+',"-$( '$*( O)4)'/+=( 2SEeOJIEOd:6(Z/-A)+,(;<*,)0-'-8=%!,)>-,$$%)-5%?0'++,-% ")-8()8,( T( @3A;"( MOdf]8c^\g1( 2788PJ78K7:6( 4))Q( =,,>QTTBBBC&'$,*C6-0C'+R( J\=)( G44-+"',"-$( D0.8,++'% E,*'6'+F6( Z/-A)+,( V@/","$%( 1EF( '$*( F"%$@/","$%W( O-;)6( 2788g( J:?( @)( ,='$L( -&/( +-##)'%&)4( F,).'$-( 1)B"'#*"6( S'/"#)$'( 0)( S'/4"+-6( G$$'( 1'!)##'( '$*( G#)44"-( 0"( O)$X-(.-/(=)#>.&#(+-;;)$,4?(( ( 1"! 2*3*4*&$*/0 G$,"$-/-(Z"XX&,-6(]?6(3="'/"6(E?(5(O-44"$"6(Z?(2788[:?(\=)( /)>/)4)$,',"-$( "44&)( '$*( ",4( ;&#,".'+),)*( '4>)+,4( "$( +-$4,/&+,"$%( 4"%$( #'$%&'%)( +-/>-/'Q( _&)4,"-$46( '$4D)/46( .&/,=)/( >/-!#);4?( D0.6,,5'-8$% .>% +7,( G05% ?.0H$7./% .-% +7,% I,/0,$,-+)+'.-% )-5% D0.6,$$'-8% .>% A'8-% ")-8()8,$C% 1O]3( 788[6( S'//'L)+=( 2=,,>QTTDDD?#/)+J+-$.?-/%T>/-+))*"$%4T#/)+788[T:6( Kg8JKg[?( 22 !"#$%$&' ()&' *+,,+-./%&' 0)12)&' 3' 4#5%,,6%/1*"#7+&' 8)' 9:;<;=)' >%5$#+,6$?' @#A+6-%' %$' ,+-B#%5' .%5' 56B-%5C' %-$/%' D"-$6-##A' %$' E+/6+$6"-5)' 8-' !)' >+/D6+' %$' 0)' (%/FDG%'9%.5)=&'!"#$%&'()'*+,-#('%(&'&.-,(&/'0"$1('()' 2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&' -H' <I<&' A+/5' :;<;&' JJ1KL)' M+N#+-"&' ()&' O%E6+,.6&' *)' 3' 2-$6-"/"' P677#$"&' Q)' 95#RA6$$%.=)'QAR".6%.'E65#+,',%+/-6-B'+-.'.%+S-%55C'+' D"-D%N$'N+N%/)'' M+5%,,6&' 0)M)&' 0+/+B-+&' *)' 3' T",$%//+&' T)' 9:;;U=)' 3.,-#+--."'('&"$%.)6)'!","B-+C'8,'0#,6-"&' M#V+D&' M)' 9:;;;=)' 3+'3+,-#('%(&'!.-,(&' 7$+,8+.&(9' *(&' :".(&'%('*;<4",.4.)5&' 4+6$5' .%' O+-B#%5' -H<J1<U&' P+/65C' WN@/F5)' M#V+D&' M)' 3' 2-$6-"/"' P677#$"&' Q)' 9:;<;=&' QA%/B%-D%&' -"/A%' %$' E+/6+$6"-' .+-5' ,%5' ,+-B#%5' .%5' 56B-%5' C' E%/5' #-%' /%.?S6-6$6"-' -"$6"--%,,%)' 8-' !)' >+/D6+' 3' 0)' (%/FDG%'9%.5)=&'!"#$%&'()'*+,-#('%(&'&.-,(&/'0"$1('()' 2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&' -H' <I<&' A+/5' :;<;&' IK1JI)' M#V+D&' M)&' (+,,%&' P)' 9:;;K=)' P/"R,?A+$6X#%' .%5' D@%/D@%#/5'%-'$/+6$%A%-$'+#$"A+$6X#%'.%5',+-B#%5'.%5' 56B-%5)' 8-C' =$+.)(1(,)' >#)"1+).?#(' %(&' 3+,-#(&@' A>=>3>B' C' =$+.)(1(,)' +#)"1+).?#(' %(&' *+,-#(&' %(&' &.-,(&&' T",)' LY&' Z)' I&' <J1I;' D))EFGGHHH/+)+*+/"$-GCI"%(*.&+).",C()C)$+.)(1(,)C%(&C' M#V+D&' M)' %$' *+,,+-./%&' 012)' 9:;;K=)' 8D"-6D6$F' +-.' +/R6$/+/6-%55' 6-' 4/%-D@' *6B-' O+-B#+B%C' @6B@,F' 6D"-6D' 5$/#D$#/%5&' .%B%-%/+$%.' 6D"-6D6$F' +-.' .6+B/+AA+$6D' 6D"-6D6$F&' 6-' Q)' P677#$"&' P)' P6%$/+-./%+' 3' [)' *6A"-%' 9%.5=&' :($J+*' +,%' !.-,(%' 3+,-#+-(&@' K"1E+$.,-' !)$#4)#$(&@'K",&)$#4)&'+,%'I()D"%"*"-.(&&'0"#$"-'.%' >/#F$%/&'!%/,6-1Z%\']"/G&'<I1II)' (6' [%-7"&' 2)&' >6+-S/%.+&' >)&' O+A+-"&' O)&' O#D6",6&' ^)&' P%--+DD@6&' !)&' ["556-6&' P)&' !6+-D@6-6&' M)*)&' P%$6$$+&' >)&' 2-$6-"/"' P677#$"&' Q)&' 9:;;_=)' [%N/%5%-$+$6"-' `' 2-+,F565' 1' [%N/%5%-$+$6"-C' -"E%,' +NN/"+D@%5' $"' $@%' 5$#.F' "S' S+D%1$"1S+D%' +-.' \/6$$%-' -+//+$6E%5' 6-' 8$+,6+-' *6B-' O+-B#+B%' 9O8*=)' P+N%/' P/%5%-$%.' +$' $@%' K<3!' <,)($,+).",+*'K",L($(,4('",'!.-,'3+,-#+-(&&' Z+A#/&' !%,B6#A&'Z"E%AR%/'<U1:;&':;;_)' (6' [%-7"&' 2)&' O+A+-"&' O)&' O#D6",6&' ^)&' P%--+DD@6&' !)&' P"-7"&' O)&' 9:;;U=&' 8$+,6+-' *6B-' O+-B#+B%C' M+-' \%' \/6$%'6$'+-.'$/+-5D/6R%'6$'\6$@'*6B-'a/6$6-B'b'8-'QO[2' 9%.5)=&' 3MNK' OPPQ@' R"$S&D"E' T$"4((%.,-&' 9RCUVBF' !(4",%' R"$S&D"E' ",' )D(' M(E$(&(,)+).",' +,%' T$"4(&&.,-'"L'!.-,'3+,-#+-(&'&'<<1<U)' (/6B+5&' 2)*)' 3' c"#/%A%-"5&' ()' 9:;;J=)' 2-' %1O%+/-6-B' 0+-+B%A%-$'*F5$%A'S"/'$@%'(%+S'N%"N,%&'T$"4((%.,-&' "L' R!N>!' =$+,&+4).",&' ",' >%2+,4(&' .,' N,-.,(($.,-' N%#4+).",&'T",'8&':&':;1:L)' QS$@6A6"#&'Q)'3'4"$6-%+&'*)'9:;;K=&'2-'Q-E6/"-A%-$'S"/' (%+S' 2DD%556R6,6$F' $"' Q.#D+$6"-+,' M"-$%-$)' T$"4((%.,-&'"L' )D('7.$&)'<,)($,+).",+*' K",L($(,4('",' <,L"$1+).",' +,%' K"11#,.4+).",' =(4D,"*"-W' +,%' >44(&&.J.*.)W&'' QS$@6A6"#&' Q)' 3' 4"$6-%+&' *)' 9:;;Y=&' ^"",5' S"/' (%+S' 2DD%556R6,6$F' $"' +-' %>WT' Q-E6/"-A%-$&' 6-' 3(4)#$(' 0")(&' .,' K"1E#)($' !4.(,4(' A30K!=&' T",)' J<;J&' LLU1LJI)' >+/D6+&' !)' 9:;;U=)' ^@%' A%$@".","B6D+,&' ,6-B#65$6D' +-.' 5%A6","B6D+,' R+5%5' S"/' $@%' %,+R"/+$6"-' "S' +' \/6$$%-' S"/A'"S'O*4'94/%-D@'*6B-'O+-B#+B%=)'8-'QO[2'9%.5)=&' 3MNK'OPPQ'C' R"$S&D"E' T$"4((%.,-&' 9RCUVB@'!(4",%' R"$S&D"E' ",' )D(' M(E$(&(,)+).",' +,%' T$"4(&&.,-' "L' !.-,'O+-B#+B%5@'I<1IU)' >+/D6+&'!)'9:;<;=)'!"#$%&@'&#$%.)5@'*+,-#(A&B'%(&'&.-,(&'()' 5E.&)51"*"-.(' %(&' &4.(,4(&' %#' *+,-+-(/' T$"J*51+).?#(&' %(' *+' &4$.E)#$.&+).",' ()' 1"%5*.&+).",' %(&' J+&' ,.2(+#X' (,' 3+,-#(' %(&' !.-,(&' 7$+,8+.&(' A3!7B/' 0?A"6/%' .de+R6,6$+$6"-' f' (6/6B%/' ,%5' [%D@%/D@%5&'g-6E%/56$?'P+/65'Yh*+6-$1(%-65)' >+/D6+&' !)' 3' (%/FDG%&' 0)' 9:;<;=)' 8-$/".#D$6"-)' 8-' !)' >+/D6+' 3' 0)' (%/FDG%' 9%.5)=&' !"#$%&' ()' *+,-#(' %(&' &.-,(&/' 0"$1(' ()' 2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&' -H' <I<&'A+/5':;<;&'J1<K)' >+/D6+&'!)'%$'P%/6-6&'0)'9:;<;=)'Z"/A%5'%-'i%#'%$'i%#'.%5' -"/A%5' .+-5' ,%5' .%#V' ,+-B#%5' %-' N/?5%-D%' D@%7' ,%5' 5"#/.5' ,"D#$%#/5' .%' ,+' O+-B#%' .%5' *6B-%5' 4/+-j+65%' 9O*4=)' 8-' !)' >+/D6+' %$' 0)' (%/FDG%' 9%.5=&' !"#$%&' ()' *+,-#(' %(&' &.-,(&/' 0"$1(' ()' 2+$.+).",&&' 3+,-+-(' ()' !"4.5)5&'-H'<I<&'A+/5':;<;&'KJ1_L)' >6+-S/%.+&'>)&'P%$6$$+&'>)&'!6+-D@6-6&'M)*)&'(6'[%-7"'2)&' ["556-6&' P)&' ' O#D6",6&' ' ^)&' P%--+DD@6&' !)&' O+A+-"&' O)' 9:;;_=)' (+,,+' A".+,6$f' S+DD6+1+1S+DD6+' +' #-+' ,6-B#+' 5D/6$$+' %A%/B%-$%C' -#"E%' N/"5N%$$6E%' 5#' $/+5D/676"-%' %' 5D/6$$#/+' .%,,+' O6-B#+' .%6' *%B-6' 8$+,6+-+' 9O8*=)' 8-' M)' M"-5+-6&' M)' 4#/6+556&' 4)' >#+77%,,+' 3' M)' P%/$+' 9%.5)=&' >)).' %(*' YZ' K",-$(&&"' %(**[>&&"4.+\.",(' <)+*.+,+' %.' 3.,-#.&).4+'>EE*.4+)+']'^$+*.)6'G'!4$.))#$+/'<,'1(1"$.+' %.' _."$-."' M+.1",%"' K+$%",+)' P%/#B6+C' >#%//+' Q.676"-6&'L<I1LIK)' 8A+7&' 0)' 3' !%-F"-)' (' 9:;;K=)' `(&.-,.,-' H.)D'J*(,%&)' ^@%'08^'P/%55)' c+/N"#765&'c)&'M+/6.+G65&'>)&'4"$6-%+&'*)1Q)&'3'QS$@6A6"#&' Q)' 9:;;K=&' Q.#D+$6"-+,' /%5"#/D%5' +-.' 6AN,%A%-$+$6"-' "S' +' >/%%G' 56B-' ,+-B#+B%' 5F-$@%565' +/D@6$%D$#/%)' K"1E#)($&'a'N%#4+).",&'L_&'JL1KL)' c,6A+'Q)*)'%$'!%,,#B6&'g)'9<_K_=)'=D('!.-,&'"L'3+,-#+-()' M+AR/6.B%&'02C'e+/E+/.'g-6E%/56$F'P/%55)' k"@-5"-&'0)'9<_YK=)'=D('J"%W'.,')D('1.,%)'g-6E%/56$F'"S' M@6D+B"'P/%55)' O+G"SS&'>)'3'k"@-5"-&'0)'9<_Y;=)'I()+ED"$'R('3.2('bW/' g-6E%/56$F'"S'M@6D+B"'P/%55)' O+-%&' e)&' e"SSA%65$%/&' [)k)' 3' !+@+-&' !)' 9<__U=)' 2' i"#/-%F' 6-$"' $@%' (%+S1a"/,.)' *+-' (6%B"&' M2C' (+\-*6B-'P/%55)' P677#$"&' Q)&' P6%$/+-./%+&' P)&' 3' *6A"-%&' [)' 8-$/".#D$6"-)' 8-' Q)' P677#$"&' P)' P6%$/+-./%+' 3' [)' *6A"-%' 9%.5)=' 9:;;K=&' :($J+*' +,%' !.-,(%' 3+,-#+-(&' C' K"1E+$.,-' &)$#4)#$(&@' 4",&)$#4)&' +,%' 1()D"%"*"-.(&&' !%/,6-' l' Z%\']"/GC'0"#$"-'(%'>/#F$%/&'<1<;)' P677#$"&'Q)'["556-6&' P)'3'[#55"&'^)'9:;;U=)'[%N/%5%-$6-B' 56B-%.',+-B#+B%5' 6-'\/6$$%-'S"/AC'X#%5$6"-5'$@+$'-%%.' $"'R%'N"5%.)'8-'QO[2'9%.5)=&'3MNK'OPPQ']'R"$S&D"E' T$"4((%.,-&' 9RCUVBF' !(4",%' R"$S&D"E' ",' )D(' M(E$(&(,)+).",' +,%' T$"4(&&.,-' "L' !.-,' O+-B#+B%5@' <1U)' *+,,+-./%&' 0)12)' 9:;;I=)' 3(&'#,.)5&'%#'%.&4"#$&(' (,' 3+,-#('%(&'!.-,(&'7$+,8+.&('A3!7B']'=(,)+).2('%(' %(' 4+)(-"$.\+).",' %+,&' *(' 4+%$(' %[#,(' -$+11+.$(' %(' *[.4",.4.)5/' ^@m5%' .%' ("D$"/+$' %-' *D6%-D%5' .#' O+-B+B%&'P+/65&'g-6E%/56$?'P+/65'Y)' *$"G"%&'a)M)'9<_U;='*6B-'O+-B#+B%'*$/#D$#/%&'6-'!)#%.(&' .,'3.,-#.&).4&']'^44+&.",+*'T+E($'-)'Y&'<_U;'9/%E)'%.)' O6-5$"G'P/%55&'*6,E%/'*N/6-B&'0(&'<_KY=)' *#$$"-&' T)' 9<___=)' 3(&&",&' .,' !.-,R$.).,-/' =(X)J""S' a' R"$SJ""S)' O+'k",,+&'M2C'(%+S'2D$6"-' M"AA6$$%%'S"/' *6B-' a/6$6-B' 9:-.' %.6$6"-&' <5$' %.6$6"-' <__J=)' 23 Deaf People Education: crossing linguistic borders through e-learning Giuseppe Nuccetelli, Maria Tagarelli De Monte Istituto per Sordi di Roma Via Nomentana 56, 00161 Roma, Italy E-mail: [email protected], [email protected] Abstract The introduction of Web Technologies and the development and spread of portable devices has improved the quality of life of deaf people making distant communication easier. In particular, the development of online systems including video-messaging and the possibility to upload user generated contents, has given deaf people the possibility to rely on other, more direct, means of communication. Similarly, the development of e-learning platforms and their adoption in most Universities worldwide, is shaping the way education is conceived, leading to new and innovative systems merging in-class education with e-learning systems. Our contribution gives a first explanation of how Information and Communication Technology (ICT) can be a strategic resource to give deaf people equal educational opportunities focusing on the development of appropriate language skills, and the strategies through which these opportunities can become effective. Our experience is based on the results and outcomes of DEAL Project (Deaf people in Europe Acquiring Languages through E-Learning), carried out from Istituto Statale per Sordi Roma (ISSR - State Institute for the Deaf in Rome) with co-financing from the European Commission. The objective being that of creating an e-learning model for teaching foreign languages to deaf individuals in professional education, and giving new bases to researches in the field. 1. communication constitute a horizon of authentic interactions in the national written language (or rather, written/spoken) in which deaf people immerge themselves spontaneously and with strong motivation. This means that, inevitably, through these interactions they acquire language skills. In short, the use of new technologies in deaf people education configures for the first time a domain in which deaf people with medium/low skills in the written language can improve themselves through the involvement in real communication phenomena and not only through learning contexts. They can thus acquire languages, not only learn them. Linguistic competences in Deaf People: an integration problem Deaf people officially certified in our country (Italy) are about 60,000, but it is estimated that this number does not reflect the true dimension of the problem. About 11 of every 10,000 children born deaf. Deafness is a deficit, but not a cognitive one. However, School still offers no effective systematic response to the problem of deaf education. The social cost of this situation are enormous: not only deaf people are often excluded from written communication, as well as from the spoken one; in many cases, they cannot perform professional tasks involving minimum competences in written language and cannot access higher levels of education. Researches done in this field (Caselli et al., 2007; Fabbretti et al., 2006), reveal that deaf people, especially those whose deafness aroused in pre-linguistic age (before 18-30 months), have typical problems in the acquisition of written language and in the development of linguistic skills. These problems are specific for each culture and each language, and they are not always comparable. In Italian, for example, deaf people show lacks in the use of free morphology, clitic pronouns, prepositions, articles and so on. This means they need tools and educational methods aimed at resolving them. This is often a difficult task, due to the differences in deaf people logopedic rehabilitation and educational paths, and, thus, their different writing skills. Any possible solution has to adapt both to the type (genetic, sickness, etc.) and degree of deafness (deep, medium, light, partial), as well as the learners’ specific linguistic and communicational competences and abilities. In this perspective, the evolution of web technologies towards portability and adaptability to users’ needs, and the use of educational strategies based on e-learning tools can forecast an enhancement of the effectiveness of the actions directed to this specific target. On the user point of view, the new forms of digital 2. Sign Language as a possible tool for promoting deaf people linguistic competences The condition, however, is that strategies and tools are to be really oriented on the needs and resources of deaf learners. This is the crucial point of the researches and experimentations achieved so far, and can be divided into a number of critical issues that will be considered in the development of our contribution. Most of the findings here described are based on the experience gained working on the DEAL Project (Deaf people in Europe Acquiring Languages through E-Learning)1. In the case of deaf people using sign language2, the role of it in the didactic communication with and within the students is particularly important as part of promoting the development of skills in the target language. In fact, deaf students using sign language find it particularly comfortable as a language to refer to, putting them in the correct emotional condition to become a learner. Within the process of building these skills, we have 1 Please refer to the acknowledgement chapter for further information on the project. 2 All researches and developments of the project here depicted has considered the micro-culture of deaf people using sign language, to which we will refer, from now on, as “deaf people” or simply “the deaf”. 24 considered sign language as the perfect candidate to be one of the cornerstone resources in the design of all activities concerning the didactic communication: research, problem setting and problem solving, meta-linguistic reflection, metacognitive analysis. Building the e-learning platform, we have chosen to use sign language in both the interactions among peers and with teachers, integrating the online educational path with videos and explanations in sign language, and the possibility for the students to obtain further information through the video-chat system. The effective implementation of this strategy has brought up the importance of creating tools specially designed not only to allow sign language interactions regulated according to their purposes, but also to support building of feedback structured on a mosaic of codes. This means not only stimulating the use of sign language, but also creating a feedback system among teachers and learners, as well as between the learners themselves, allowing didactic activities to be really effective. Following what learners are doing, teacher will have the opportunity to intervene with different feedback degrees, tailored on the learners needs. • Videoconference possibility • Forum While following the teaching activities, at various set points along the course, deaf students uses special supports in their own sign languages. There are two kind of support: One way: • Presentation of the teaching unit • Lexical micro windows on the dialogue • Grammatical, syntactic and pragmatic support on the key concepts of the unit • Full translation of the dialogue Bidirectional: • Videoconference among peers • Videoconference with the teaching team The project has produced three courses: German, Italian and Spanish as second languages for the deaf students of the partner countries. For example: Italian deaf students had a Spanish and a German course available. This means that each course has two sign language to support it: for example, the Italian course has both supporting windows in Catalan Sign Language and in Austrian Sign Language. 3. Deaf People in Europe Acquiring Language through e-learning: the construction of a specific educational path The actions forecasted in the DEAL project were meant to significantly operate in this framework, through the introduction of educational tools based on an e-learning strategy, targeting the needs and the specific capacities of deaf adults. In DEAL e-learning based approach, we enhanced the methodological strategies and educational techniques that allowed the action upon those critical features in lexical and grammatical production indicated by the researches carried out in the field: we worked both on a lexicon level and on the linguistic structures for the development of the language skills of deaf learners through the integration of Sign Language in an educational perspective. The system is based on the use of an open source e-learning platform (Moodle) and a videoconferencing system based on Openmeetings/Red5. The choice of Moodle has followed that of many European Universities, adopting this platform for their online courses. Opportune adaptations were studied and applied to meet the needs of the target group (teenager students of technical schools for enterprise secretaries). The applications that have been added are: • Explanation and introductive videos in the local sign language • Animated segments with subtitles upon which educational activities has been developed. • Interactive teaching activities where the tutors can work with the students starting from their questions and their doubts in the educational system. Explanations are thus given from the active interaction with the students and not “from above”. Figure1: example of an Italian comprehension exercise with micro-window explanation in Austrian Sign Language. An interesting issue in working in such a multilingual environment has been, on several point of view, the lack of human resources having the skills and capacities required from the project: i.e. a tutor capable to sign in Catalan Sign Language to give information about German or Italian language course. This could be an issue to discuss in an international environment, also for the construction of possible professional figures. 4. Evaluating the DEAL platform, issues and future develooments The DEAL project has begun in September 2006 and the main prototype test has been carried out in May 2008 in Italy for the Spanish course. The experimentation took place in the Istituto Statale Superiore Magarotto (ISISS State High School “Magarotto”). Eight deaf teenagers has 25 participated, all students of a high school for commercial secretaries, of which six have accepted to reply to the final interview. They were all familiar with computers and have never studied Spanish. The platform has been tested in a blended modality, having a technical support in the classroom as well as a teacher they could ask questions to. The experimentation has also tested both the asynchronous and synchronous interaction modality. During the test, while following the course indications, the students could share their questions both in a Forum (asynchronous modality) or a Videoconference environment (synchronous modality) where the teacher would reply to questions through the help of an interpreter. The materials used to collect the information coming from the experiments has been: anamnesic questionnaires for teachers, observation checklist filled by the researchers, and a final interview to participant students. Anamnesic questionnaires for teachers has collected personal data of the participants, information concerning the type of deafness, her familiar situation, and her linguistic competences in Italian and foreign languages, if any, both in vocal or sign language modality. Observation checklist were filled by 2 researchers per participant, in three sessions of 20 minutes each situated in the beginning, in the middle and in the end of the experimentation. The information collected in this phase being the interaction of the students in the classroom and with the teacher, the chosen linguistic form, and other free observations. At the end of the test, participants were asked to express their opinion upon the degree and type of knowledge achieved during the course, a comparison with traditional in-class courses, feelings about the interaction with the system as a whole and possible suggestions on how to improve it. The results have confirmed the validity of the chosen educational methodology, as the participants have confirmed learning something new about Spanish in a more stimulating and fascinating way. Participants liked using the videoconference system as well as the sign language explanatory windows, which has been considered a funny and clear way to achieve knowledge. However, the overall data collected in this phase has revealed the need to improve the overall navigation in the system, making the whole online experience more “friendly”. We believe that a solid evaluation of the platform will come with its use within the deaf community to which the system has been made available on the project website. However, the experimentation has given important information not only for that concerning the methodology to use on an e-learning platform, but also for that concerning the management of language codes and system interfacing. Not only the educational path needs to be adapted to the e-learning model, but also the quantity and quality of information to give in each step must be managed according to the user’s special needs and visual skills, as sight is the only sense in which all the information are conveyed during the interaction with the platform. 5. The management of time and space on an e-learning platform for the deaf: the importance of data transmission efficiency Developing an e-learning platform for the deaf also requires a special attention to the management of time and screen space (Keatin & Miru, 2003). This has emerged clearly during the experimentation phase of the DEAL project when, for example, giving signed explanations of words or grammatical segments. In cases like the one described here, giving students enough time to pass from the sentence under analysis (written text) to the video/chat is fundamental for both educational and motivational reasons. Teacher, computer screen, (eventual) interpreter, and other students play the role of “educational objects” taking their turn in the construction of sense for the student on both a spatial and linear line. On the spatial line, all “educational objects” must be positioned in order to allow students to return to the selected resource when needed, well localized in space and not undergoing changes. The linear line will be that of “taking turn” in the dialogical relationship among the “educational objects”, and the amount of information given. In a multilingual educational environment, in blended learning, where in-class sessions are completed by sessions with online tutors, this becomes particularly important. The role of the tutor is that of providing further adaptability to the course contents, cut upon the single learners’ specific needs. To have the tutor online while developing educational tasks means that every single learner will have the possibility to ask questions about the course content, in a dialogical relationship with the tutor and the other students. Similarly, this feature allows the tutor to monitor the class development in relation to the course contents and to manage the students’ community discussion in order to enhance learning in particular fields. A possible scenario for this case is that of the student being home while the tutor follows her and other students in a separate ambient. Students are given the possibility to follow tutor explanation both on video or written chat. Deaf students are continuously engaged in following and decoding messages through the only sense of sight. In a context like the one described above, their cognitive resources are thus engaged in processing at least three different codes: text, sign language video and teacher’s explanation. This means that, in the hypothesis of a teacher who is also a sign language speaker, s/he will have to give students enough time to allow sight to complete the video message decoding, eventually integrated with hints given through the written or video chat, think and then reply either in sign language or on written chat, in a distant construction of sense. The depicted situation is furthermore complicated in case of teachers who are non-signers, and the interpreter figure needs to be added. 26 An incorrect management of these types of interaction could lead to frustration, demotivation and possible abandon of the learning session. This is also the case when working on deaf people writing skills enhancement in the learners’ local language (i.e. Italian deaf learner – Italian written language): it is proven that deaf people approach to written language is often affected by the difficulties faced during their linguistic rehabilitation and scholastic path, and the frustration they experience in constructing their writing skills (Fabbretti et al. 2006). A proper management of screen space and time will impact the emerging relationship between students and teachers and the construction of the learning environment. In fact, while in the case of hearing students speech and sight works contemporarily in the construction of sense and on two different levels (student can watch the screen contents while listening to the teacher’s explanation), in the case of deaf students there is only one level to work on, sight, which is engaged in receiving multiple inputs contemporarily. Visual elements in the screen should be managed in order to be highly visible, easy to decode, and giving good navigational cues also for the enhancement of the ongoing interactions in the system. This great use of video and visual communication tools, makes data transmission quality one of the main issues of e-learning platforms for the deaf. Real-time online video communication such as video-chat for sign language or lip movement are strongly affected by the efficiency of data transmission, as this should be as close as possible to real people movements. Many are, in fact, the cases in which multiple video chats makes communication between deaf people (either bimodal or oralists) nearly impossible, due to the scarce quality of video transmission. This constitutes a strong limit in the development of online educational solutions for deaf people. As it’s possible to understand, a lack of efficiency in video transmission, a poor website visual objects management and a incorrect management of time could end up to a loss in deaf students comprehension of the main topics and their motivation in following the course. 6. In this framework, thus, we need to search the best structure for educational communication with deaf learners and the role given to sign language in the variety of possible codes. This point is strictly related to the interaction regulation (learner/learner, learner/teacher, etc.) and time balancing (synchronous, asynchronous) to grant the maximum efficiency in the learning environment. One of the results of our researches has been that the educational interaction in video conferences requires a definite number of participant. Basing on the DEAL experience, our hyphotesis is that an optimal number for a smooth interaction could be that of 4 people: i.e. one tutor and 3 students. However, the problem of a system like this is the regulation of speech turn and the different communicational channels balancing: i.e. video-chat vs. textual chat vs. working area where the student is involved in her educational activity. There is a problem in optimizing sign language as a mean of educational communication in an environment in which the target language remains written and, in multilingual environment, is a foreign language. The problems we have developed so far are surely strategic with regards to the target group, but they also have a relevance that seems to go beyond this specific scenario. In a “regular” educational environment, there are issues that are normally underrated due to the redundancy of communicational possibilities between hearing people who are able to pick up the information they need from the ongoing communicational process. Working on a multilingual platform for deaf people education has thus opened reflection not only on the specific problems that this type of user could meet but have also given a base for reflection on the nature of educational communication in foreign language learning. In fact, these problems shows that the educational communication in e-learning environments shows inefficiency margins, amplified but not generated by deafness. Working towards the solution of these issues can thus have important theoretical implications also in the frame of second language education in digital learning environments. Conclusions Being one of the first experiences in Europe trying to teach a foreign language to deaf students through the support of e-learning, DEAL project has focused mainly on the structure of the didactic content, and the use of sign language and short “explanation” windows in a complementary and innovative way, in order to support several type of deaf learners needs. This has challenged other aspects of the educational path, such as the selection of the best technology to use, the design of a correct interface for deaf learners, the combination of multiple communicational channels and the “rhythm” of the ongoing interactions in the system. One of the points that the DEAL project has aroused is the importance of creating a collaborative network among students and tutors, through the use of an effective and reliable technological support. 7. Acknowledgements The DEAL project has been financed by the European Union within Leonardo’s programme, and immediately received the European Label 2008 for Innovative Projects in Language Teaching and Learning [www.deal-leonardo.eu]. Partners of the project has been: Istituto Statale per Sordi di Roma - ISSR, Istituto di Scienze e Tecnologie del CNR - ISTC, Istituto Superiore di Istruzione Specializzata per Sordi Magarotto – ISISS [Rome, Italy], Universitat de Barcelona, Fundaciò del Centre d’Estudios de Llengua de Signes Catalana [Barcelona, Spain], Klagenfurt Universitat [Austria]. The project has been re-funded in the frame of Leonardo’s Transfer of Innovation programme 2009-2012 that sees the University College London as a new partner of the 27 project, in place of Klagenfurt Universitat. The ISSR has recently begun working on a project for the improvement of deaf people Italian writing skills through e-learning (VISEL). Both authors are in complete agreement for that concerning the paper’s contents. Main contributor for chapters 2,3 and 4 has been Dott. Giuseppe Nuccetelli while Dott. Maria Tagarelli De Monte is to consider the main contributor for chapter 1, 5,6 and 7. 8. References Keatin, E.G., Miru, G.S. (2003). American sign language in virtual space: Interactions between deaf users of computer-mediated video communication and the impact of technology on language practices. Language in Society, 32(05):693-714. Elsendoorn, B.A.G., Coninx, F. (1993), Interactive Learning Technology for the Deaf, Proceedings of the NATO Advanced Research Workshop on Interactive Learning Technology for the Deaf. The Netherlands, NATO ASI Series, Computer and Systems Sciences, 13(113): p. 285. Fabbretti, D., Volterra, V., Pontecorvo, C. (1998). Written language abilities in deaf italians. Journal of Deaf Studies and Deaf Education, 3(3):231--244. Pizzuto, E., Caselli, M. C., Volterra, V. (2000). Language, cognition, and deafness. Seminars in Hearing, 21(04):343--358. Rinaldi, P., Caselli, C. (2009). Lexical and grammatical abilities in deaf italian preschoolers: The role of duration of formal language experience. Journal of Deaf Studies and Deaf Education, 14(1):63--75. Maragna, S., Nuccetelli, G (2008). An e-learning model for deaf people’s linguistic training. Proceedings of the DEAL project final meeting. Publicacions I Edicions de la Universitat de Barcelona. Fabbretti, D., Tomasuolo E. (2006). Scrittura e sordità. Roma: Carocci Editore S.p.A. 28 BONy: a knowledge centric collaborative learning platform Alfio Massimiliano Gliozzo, Concetto Elvio Bonafede, Aldo Gangemi *STLab-ISTC-CNR Via Nomentana 56, 00161, Rome, Italy [email protected], [email protected], [email protected] Abstract In this paper we describe BONy, a technology enhanced platform for collaborative learning. Semantic technology, and in particular an RDF/OWL ontology, is used to integrate different modules of the system, allowing strong interoperability between linguistic data and structured knowledge. This allows us to develop intelligent advanced functionalities, including expert finding, mentoring and semantic search. Those functionalities largely exceed the capabilities of existing state of the art e-Learning platforms, for example allowing multilingual search. BONy is an unique showcase for the next generation semantic systems for e-Learning. The BONy platform is currently working as a free on-line service. 1. Introduction 1. enhance interoperability and system integration Electronic learning (e-learning) is a type of education where the medium of instruction is computer technology. It is a planned teaching/learning experience using a wide spectrum of technologies, mainly internet based, to reach learners at a distance. The base units of e-learning systems are called learning objects. They are resources, usually digital and web-based such as HTML pages or animations, that can be used and re-used to support learning. They represent an atomic piece of knowledge and are composed into courses. At their core there will be instructional content, practice, and assessment. The way in which the units can be stored, retrieved and managed has been the focal point of most Learning Content Management Systems (LCMS). The actual mechanisms to manage the learning objects, mainly based on web standards such as XML, is not able to face the new requirements of collaborative learning, where teachers and users are no longer two different players in the network. In fact, in a web2.0 perspective, students are asked to supervise other students and are supposed to actively contribute to the development of learning objects, playing the role of professors with respect to the areas of expertise where theirs skills are higher. In addition, in a collaborative learning scenario, the student is typically exposed to a very highly unstructured information (e.g. wikis developed by other students, forums, chats), requiring the intervention of a professor or an expert in the field to recommend a personalized learning path and to ensure the selection of high quality content. On the other hand, non-semantic technology, such as web 2.0 platforms, do not allow us to implement a fully automatic system satisfying the new needs of collaborative learning, and in particular to represent the user profile and assess his skill. To this aim, semantic technology such as ontologies can play a big role, for example to represent the user profile with respect to different subjects and to represent the content of learning objects. To this purpose, within the BONy project, we looked forward to semantic technologies, anticipating the next generation WEB 3.0 solutions for eLearning while providing a showcase of the new generation capabilities. BONy is a knowledge centric LCMS where a core ontology is used for two main purposes: 2. integrating linguistic information from learning objects with structured information from databases 3. allow intelligent services such as expert finding, mentoring and semantic search The core component of the system is a ”RDF/OWL” ontology, developed according to the best practices and by applying Ontology Design Patterns (Gangemi, 2005; Presutti et al., 2008; Reich, 1999; Svatek, 2004). As far as interoperability and system integration are concerned, the ontology is used to enhance the integration of three existing open source platforms: a LCMS (DOKEOS) (Grandmontagne., 2008; ?), a framework for social networking (SPREE) (Bauckhage et al., 2007; Metze et al., 2007) and a collaborative authoring tool (Semantic Media Wiki) . The ontology is automatically populated by re-engineering data from the different databases exploited by the three platforms integrated so far. A mayor role of the ontology is linking the textual data to the knowledge structures. This is done by extracting keywords from the text embedded in the Learning Objects, and associating different keywords to a set of topics of interest for the domain of the course. This allows us to map different courses in different languages to the same topic structure, and to improve search and multilingual retrieval. This allows us to implement a set of intelligent functionalities, including an automatic mentoring algorithm designed for the generation of personalized learning paths, multilingual search and expert finding. To this aim, we connect Learning Objects and user profiles with a shared taxonomy of topics describing the content of the e-Learning course, and we used SPARQL queries and a reasoner. This has been done by extracting keywords for each course The platform is currently working as an free on-line service, available on the web at the address social.bonynetwork.eu . We invite the reader to join the BONy network and feel the different user experience provided by semantic technology in use. This paper is structured as follows: in section 2. we illustrate the architecture of the platform, section 3. is devoted to describe the ontology used in system, in section 4. we 29 To allow modularity, the ontology has been subdivided into three main components (see Figure 1): describe the intelligent functionalities of BONy. Section 5. concludes the paper. • Topic Ontology: it describes the subjects covered by the eLearning course and their conceptual dependencies. Topics are instances of the class TOPIC, and they are related between each other by the object properties isSubTopicOf and nearTopicTo connecting different instances of the same class. 2. Architecture The BONy platform is an integration of three existing open source platforms: DOKEOS, SPREE and Semantic Media Wiki. The architecture of the system is described in Figure 1: an RDF/OWL ontology is used to represent data coming from the different databases adopted by the integrated open source solutions. The ontology describes semantically the three main components of the platform, and in particular the Learning Objects, the European Project Management domain (Topic ontology) and the user profile in the social network (User Ontology) as in Figure 1. Differently from other e-Learning system, data is represented in the ontology in RDF/OWL format. In addition, when data is represented into the ontology, it is also linked semantically to a topic ontology, describing the content of the course. In particular, user profiles and learning objects are linked together across topics. The richer expressivity of this formalism allows us to develop semantic functionalities such as user profiling, learning path generation and expert finding. Thanks to the ontology it is possible to enhance the consistency of the inserted data. This is done by using a reasoner to check the consistency of the entire database every time new data is inserted. The technology adopted to represent and manage the data in the ontology is based on state-of-the-art Java open source solutions: Jena1 , Pellet 2(Sirin et al., 2006), and Protégé 2 . We used protege to build the ontology, Jena to access the ontology and Pellet to reason on the data. The access to the ontology from the various sub-system is implement by adopting a client/server architecture developed in Java. • eLearning Ontology: it is about the learning objects and describes different features, e.g. the type of electronic support adopted, dependencies between learning objects and the time required for learning. This part of ontology is composed by different classes such as: LearningActivity, SCO and CourseRole. The instances of those classes and their relations have been mostly derived from the corresponding SCORM descriptions by a reengineering process. • User Ontology: it is about the social network players, representing students and teachers’ profiles, their relationships and their skills. All the users in the network are represented by instances of the class AGENT. Specific subclasses are STUDENT, TEACHER and EXAMINER The topic ontology operates as a link between the eLearning ontology and the user ontology. For example, users and SCOs can share a relation with a common topic, allowing the development of recommending services and the automatic assessment of the user profile. Users are linked to topics by the knowsTopic relation, reflecting their skills into 5 specific degrees: knowsMediocre, knowsBasic, knowsFair, knowsGood and knowsPerfect. In a similar way, Learning Objects are linked to Topics by the relation hasTopic. This is derived by the keywords annotation performed on the learning objects and represented in the ontology as well. In fact, keywords are linked to topics, allowing to infer the has topic relation between topics and learning objects. Our ontology is developed according to the Ontology Design Pattern (Gangemi, 2005) (ODP) paradigm, i.e. utilizing and specializing some already existing reusable ontology to describe particular piece of domain knowledge. An ODP is usually a small ontology that solves complex modelling issues to enhance semantic interoperability of different knowledge components. The notion of ODP was introduced in 1999 for a particular problem domain in biology (Reich, 1999). Afterwards, ODP appeared under different names such as semantic patterns, knowledge patterns and the designing patterns for Semantic Web ontologies that are now called ODPs. A large repository of ODP is available on line3 . 3. The BONy Ontology The development of the BONy ontology has been inspired by the following principles: • re-usability: when adapted to a new course, the OWL schema of the BONy ontology is preserved and only RDF data change, it allows us to minimize the adaptation costs to new domains; • modularity: the ontology is composed by three modules representing the eLearning content, the social networks and the topics of the course, it allows us to change the course while preserving the community; • best practices: the ontology has been designed by specializing Ontology Design Patters (ODP) Regarding re-usability, we carefully distinguished the OWL part of the ontology (i.e. the metamodel) from the actual data. In this way, the platform can be adapted to new communities, learning Objects and topics without any change in the ontology. To this aim, the topic taxonomy has been reified, so that topics are instances and not classes. 1 2 3.1. Populating the ontology The ontology is populated by re-engineering data coming from different databases belonging to different applications, and in particular: a) from the e-Learning course (described by the Manifest file in the SCORM syntax) b) jena.sourceforge.net protege.stanford.edu 3 30 http://www.ontologydesignpatterns.org Figure 1: Services and data are linked by the BONy ontology Figure 2: The Bony Ontology: concept hierarchy from the user profiles in the LCMS and in the Social Network (see Figure 1). In addition, the Topic ontology has been manually populated by topics of interest for the domain of the course and their relationships. meaningful taxonomy describing the project management domain. Mentoring Populating the eLearning Ontology To populate the eLearnign ontology we re-engineered data from SCORM to RDF following the metamodel developed for the eLearning ontology, which basically reflected the SCORM distinctions. To this aim, we represent some of the relevant distinctions in the SCORM definition into properties of the ontology. This process is totally automatically and is performed once the course is loaded into the platform. This is done partially by re-engineering the XML based metadata in the SCORM manifest file. In order to connect the Leaning objects to the topic ontology we exploited the keyword annotation developed to build the topic taxonomy and we inferred the relations between topics and learning objects if one or more keyword is associated to both. This is done by a CONSTRUCT query in the SPARQL language. Populating the Topic Ontology The ontology class Topic is one of the most important classes. It allows us to link users and learning objects. In our Ontology we use a semiotic notion of Topic as a (usually potential) collection of SocialObject(s). For example, Project management is a topic constituted by the set of social objects that are associated with project-management related entities, such as tasks and deliverables. Topics are related each other by Narrower and Broader relations. The procedure adopted to build the topic ontology was entirely manual, but at the same time inspired by quantitative principles aimed at preserving a pretty uniform distribution of learning object for each topic. To achieve this, we first selected a set of keywords describing each learning object, then we look for their corresponding pages in wikipedia, in order to find their corresponding category. We select those categories as topic after manual revision, and we browse the narrower/broader relationships among them to figure out a meaningful taxonomy a Populating the User Ontology The user data are derived from a variety of different systems integrated in the BONy platform. The ontology allows us to integrate different frameworks such as DOKEOS (where personal data are 31 Figure 3: Interface for user profiling in the BONy platform collected in a database) and SPREE (a open source knowledge exchange network) where the data about the knowhow of the single user are registered. Relations between Users and Topics are first established at registration time by the user profiling module, and then refined by the user at any time. Synchronization between the user profile in the Social Network and the ontology is guaranteed by updating the ontology at every change. 4. Then, the BONy platform uses Information Retrieval and Natural Language Processing techniques to match the content described so far with the topic ontology, in order to establish new relations between her/his profile and topic ontology. To this aim, we exploit one of the core capabilities provided by the SPREE framework, which is able to crawl specified sites, extracting bag-of-words, and therefore representing each page in a vector space model, and then measuring the similarity among vectors associated to users and those associated to topics in the ontology by cosine similarity. To this aim, the SPREE platform generate bag of word vectors for each topic in the ontology when it is installed by adopting a very similar approach to what described for the user (Bauckhage et al., 2007; Wetzker et al., 2007). The result of this process is a preliminary assessment of the user skills that can be further refined by the user itself by adding new topics or modifying the degree of relevance of each category, ranging from basic to perfect. This is illustrated by right part of Figure 3. The user model obtained so far is then stored in the ontology while checking the logical consistence. The aim of this service is to recommend a minimal sequence of learning objects to a new user on the basis of his profile. This set will be generated automatically by an algorithm whose goal is to select a sequence of learning objects so that the user is not studying subjects he is already aware of, while concentrating on filling the gap between her/his initial user skills (i.e. those inferred by the user profiling module described in the previous section) and the full range of topics covered by the course. The goal of this process is to minimize the time required to study the full topics of the course while avoiding subjects already well known, while taking into account dependencies between learning objects. The output of this process is illustrated in Figure 5. Clicking on the ”yes, I would like to try”, the automatic mentoring process starts and after a few seconds returns the the sequence of learning objects where a subset has been marked by a green sign (see righter part of Figure 5), meaning that the student does not need to go trough them since he is already skilled in the subject. The effect of this process is that the system generates a minimal set of learning objects, avoiding the student to go thought the full course, which Intelligent Functionalities of the BONy platform The BONy platform provides three main semantic services which are far behind the capabilities of current eLearning technology: mentoring, (i.e. the generation of personalized learning path within the course on the basis of the user profile), semantic expert finding (i.e. looking for experts within the network which are able to answer to specific questions) and multilingual search (i.e. the capability of retrieving Learning Objects in any of the 11 different languages of the BONy course). Mentoring and expert finding are based on the user profile, automatically inferred by the platform and represented in the ontology. Even though some of them have been already proposed in the literature, BONy is the first working platform implementing all them at the same time in a integrated environment, thanks to the massive use of semantic technology. User profiling The user profile is represented in the ontology and consists of biographic data, such as email, name, address, as well the assessment of the user skills. The user skills are represented by relations between them and topics in the ontology, as described in section 3.. The BONy platform is able to assess the competence of each users in a semi-automatic way, by looking at web pages and other content indicated by the user as a reference material for his competence. This process is easy, quick and effective, and works as follows. Every time a new user is enrolled in the system, she/he is asked to enter a set of web pages describing her/his skills (e.g. her/his home page, the home page of her/his university or organization). In addition, she/he is asked to enter a set of keywords describing her/his skills. This is illustrated in the the left part of Figure 3. 32 Figure 4: Expert finding process. When a question is submitted, the system categorizes the question (categories box on the left figure) and searches the experts (Experts box on the right). will take around 5 hours in the European Project Management case study. Rather he is supposed to study less, saving time (about 1 hour in the example in Figure 5) To implement this service, a typical approach in Artificial Intelligence is to use a planner. Given the reduced number of constraints and the relatively small scale domain, it was possible to implement the same set of capabilities in a much simpler way by defining ad-hoc SPARQL queries and using a reasoner. This generates a planner that is different from those using a rigorous logical formalism and a clear definition of goals. Instead using SPARQL we can make an approximation because the objective is not formalized. In fact, each learning object is linked to one or more Topics. This allows us to link the user profile (degree of knowledge in the different Topics) with the learning objects regarding topics he knows better. A simple SPARQL query allows us to select all those Learning Objects about topics that are not in the user profile, generating the mentoring service we are interested in. This service is implemented by adopting the Jena API to perform the SPARQL queries and Pellet 2 to reason on the data. rum and can be ranked by using a feedback mechanism, in order to assess the reputation of users in the network and to promote new experts for forthcoming questions. Figure 4 presents an example of the expert finding process, showing the categories of the question and the retrieved users. Multilingual Search All learning objects and their textual content have been indexed by a search engine (i.e. Lucene). The index is done by using the text within the slides and the keywords associated to each of them. As far as the learning objects are aligned among languages by a common representation in the ontology, it is possible to write queries in any language, and to retrieve pages in different languages. Expanding text in learning objects by the keywords in the ontology is also a way to implement semantic search. Figure 6 describes a screen-shot of the search engine and his multilingual capabilities. 5. Conclusion and future work In this paper we presented BONy, a technology enhanced platform for collaborative learning using semantic technology to enhance interoperability between systems and to allow advanced functionalities such as including expert finding, mentoring and multilingual search. Those functionalities largely exceed the capabilities of existing state of the art e-Learning platforms. BONy is an unique showcase for the next generation semantic systems for e-Learning and can be used on line at the address social.bonynetwork.eu . The main focus of our work has been showing the new capabilities allowed by connecting linguistic data with knowledge bases, how to represent this information into a proper knowledge base and how to make it interoperable with linked data in the semantic web. Therefore we did not concentrated in boosting the performances of the single components, for example by using richer ontologies or more advanced Natural Language Processing techniques. In the future, we are going to develop the 3.0 version of the BONy platform, where semantic web data will play a big role to shift from an information to a knowledge centric system. In particular, we are going to implement a knowledge centric authoring tool for learning objects, where semantic web Expert finding The role of the expert finding service is to look for other students in the network which are able to answer a specific question. Every user is regarded as a possible expert on the Topics where is user profile has stronger association. BONy is able to look for suitable experts by simply classifying questions with respect to the topics in the ontology, which is the same adopted to represent the user skills. To this aim, a bag of words for each topic in the ontology is retrieved from on-line or off-line resources. The same process is done to describe the user profile. Then each expert is mapped to one or more topics by using similarity metrics (Bauckhage et al., 2007; Wetzker et al., 2007). Every time a new question is submitted, the system classifies it with respect to the topic ontology. Classification is used together with a similarity measure between the query and expert profiles in order to select the first five top scored experts. The question is then automatically sent to them by email. The answers collected so far are then stored in a public fo- 33 Figure 5: Output of the mentoring process 6. References C. Bauckhage, T. Alpcan, S. Agarwal, F. Metze, R. Wetzker, M. Ilic, and S. Albayrak. 2007. An intelligent knowledge sharing system for web communities. In In IEEE Int. Conf. on Systems, Man, and Cybernetics, Montreal, Canada. A. Gangemi. 2005. Ontology design patterns for semantic web content. In Proceedings of the ISWC 2005, volume 1729 of Lecture Notes in Computer Science (LCNS). Y. Grandmontagne. 2008. Technical report, DOKEOS. Available via http://www.dokeos.com/en/press. F. Metze, C. Bauckhage, T. Alpcan, K. Dobbrott, and C. Clemens. 2007. A community based expert finding system. In Proceedings of IEEE Int. Conf. on Semantic Computing., Irvine, CA. V. Presutti, A. Gangemi, S. David, G. A. de Cea, M. C. S., E. Montiel-Ponsoda, and M. Poveda. 2008. Deliverable 2.5.1: A library of ontology design patterns: reusable solutions for collaborative design of networked ontologies. Deliverable Project Number IST-2005-027595, NeOn: Lifecycle Support for Networked Ontologies. J. R. Reich. 1999. Ontological design patterns for the integration of molecular biological information. In Proceedings of the GCB’99. E. Sirin, B. Parsia, B.C. Grau, A. Kalyanpur, and Y. Katz. 2006. Pellet: A practical owl-dl reasoner. Technical report, Maryland information and network dynamics lab semantic web agent project. V. Svatek. 2004. Design patterns for semantic web ontologies: Motivation and discussion. In Proceedings of the 7th Conf. on Business Information Systems., Poland. R. Wetzker, T. Alpcan, C. Bauckhage, W. Umbrath, and S. Albayrak. 2007. An unsupervised hierarchical approach to document categorization.. In IEEE Intl. Conf. on Web Intelligence (WI’07), Silicon Valley, USA. Figure 6: Fulll text multilingual search inside the eLearning content data are composed by Ontology Design Patterns specialized on the subject of interest for the course, we are going to exploit agent based technologies for advanced tutoring and mentoring, we are going to replace the retrieval engine with a more powerful recommending engine, looking for semantic web data as well as for internal repositories of learning objects. Last but not least, we are going to explore the potentiality of applying advanced NLP tools for information extraction from text and to link the extracted information to dictionaries like wordnet and other linguistic resources available from the collaborative work, such as wikitionaries and DBpedia. Acknowledgments This work has been supported by the BONy project, financed by the Education and culture DG of the EU, grant agreement N 135263-2007-IT-KA3-KA3MP, under the Lifelong Learning Programme 2007 managed by EACEA. 34 Social E-SPA C ES: socio-collaborative spaces within Virtual Worlds Vanessa C amilleri M atthew Montebello University of Malta Malta E-mail: [email protected], [email protected] A bstract This paper presents research based on a current study validating the effectiveness of the teaching and learning process in the context of virtual spaces. A report about teens and social Media (Lenhart, Madden, Rankin Macgill, & Smith, 2007) reveals that 93% of the teens who were interviewed use the Internet !"#!#"$%&!'#())*&+,#-'!%).##/0&"1#%$2-')3#4&*0#5)%)+*#&+*)5+)*#2"!,)#"*!*&"*&%"1#)"*!6'&"0)"#73&,&*!'# +!*&8)"9#!"#!%*&8)#-!5*&%&-!+*"#&+#*0)#3)"&,+#$:#+)4#()3&!#!"#"$%&!'#%$''!6$5!*&8)#*$$'".#;$2'3#*0)")#"$%&!'#*$$'"#6)#)::)%*&ve in the e-learning context or will they form part of a wider knowledge management framework? The purpose of this study is to outline the design of the measurement of interaction processes in the virtual spaces used for e-learning. interaction processes, rather than knowledge acquisition. The socio constructivist approach in this scenario describes learning as a collaborative meaning-making experience where learners participate in a number of interaction processes which facilitate the learning process. The interactions between learning communities, as well as individuals within the learning communities, as has been argued by Alier (2006), in essence would enforce the reason for existence of Virtual Worlds (VWs) transforming gaming into serious ga ming, breeding social communities of practice (CoP) which eventually develop into learning communities. The scope of this study is to create the framework for the measurement criteria assessing the validity of VWs for the teaching-learning process using human-behaviour parameters. The rest of the paper is structured as follows: Section 2 gives a brief overview of the insights into e-learning perspectives, whereas Section 3 highlights the pedagogical value of collaborative spaces. Section 4 has a look at established pedagogies which can be implemented in VWs whereas Section 5 and Section 6 propose a design and framework parameters for E-SPACES. Section 7 looks at future developments of the framework for the measurement and validity of the effectiveness of social collaborative process in the VWs. 1. Introduction Tiropanis, et al., (2009) discuss how the level of adoption and use of tools and services within the higher education sector in the UK associated with teaching and learning, are various. In addition to these tools and services in the form of Web2.0 applications, or Learning Management Systems (LMS), a number of educational institutions, make use of Virtual Worlds (VWs) for various learning activities (NMC, 2010). These learning activities take the form of seminars or tutorials or simulations as well as other problem solving exercises, engaging learners in their knowledge building process (Wrzesien & Alcaniz Raya, 2010). More learning activities are described in Petrakou (2009) who illustrates in detail the scope of having a specific virtual environment for the facilitation of transmission of the online content whilst Kumar et al. (2008) portray the VW as being a social environment which holds computer-based simulations which users can make use of without any pre-defined objectives but which yet assimilates groups of people together through an expression of interest. Carey (2007) argues that VWs are intended to be immersive social experiences which not only offer alternatives to face to face interactions but which can also provide new forms of human experiences, built upon a vast array of communication tools which can offer the same emotional satisfaction as gathered from the social exchanges happening on the daily basis. This of course is discussed within the context of the online environment which is discussed extensively in (Dillenbourg, Lehtinen et al., and Slavin in Petrakou, (2009)). These describe how the transition towards the migration of learning content to the online environment is further assisted by a number of interaction processes within collaborative learning. The latter, being one of the pillars of the design for e-learning systems contributes to the construction of new concepts, collectively brought together through communities, most often, established by dialogical interaction (Etelapelta & Lahti, 2008). The premise of this study is built around learning theories which adopt the socio-constructivist approach (Vygotsky, 1978) describing knowledge construction through 2. E-learning Perspectives Over the years the use of ICT in education has shifted from mere Computer Based Learning (CBL), making use of software as a means of knowledge transmission, to Computer Enhanced Learning (CEL), which aims to improve the environment for creative knowledge practice. Studies have in fact shown that merely pushing content online is not returning the results expected (Spalter & Simpson, 2000). Solimeno et al (2008) show, 0$4#2-#2+*&'#*0)#'!*)#<=>?9"# the first models of computer supported education, put the learner as a solo-user creating an isolated niche where the -5$($*&$+#$:#')!5+&+,#!*#$+)9"#$4+#&+3&8&32!'#-!%)# 4!"# highlighted. However more recently due to the social networking boom, researchers have been looking at a more advanced form of computer supported collaborative learning, as an additional enhancement to the online 35 teaching model. Such a derivation of the CEL is based upon constructivist learning theories which focus primarily on the social interdependence as affecting the learning process. This in fact has given rise to a new evolution to the use of learning management systems, which in addition to providing content, are also providing some means of online interactivity, paving the way for social interactions as a means of constructing knowledge concepts. /$3!@9"# )-learning paradigm has shown an evolution from Learning Management Systems (LMS) where the scope is that of utilising the web as a pipe, there only to deliver content, to a meeting point, a place where to hang out with others in specific CoP. Brophy (in Paechter, Maier, & Macher, 2010) propose five fields of instruction as being core components of e-learning design. These include the course design and the electronic environment, the interaction between students and instructors, the interactions among peers, the individual learning process and the course outcomes. The interactions and processes will be discussed in more detail in Section 4. In addition to interactions, Granic, Mifsud, & Cukusic (2009) further propose that clear pedagogical objectives based on sound pedagogic principles need to be incorporated within the e-learning design for more effectiveness within the teaching learning process to be achieved. The pedagogic approach chosen by the authors :$5# *0)&5# "*23@# &"# 62&'*# !5$2+3# *0)# %$+%)-*# $:# A!%*&8)# ')!5+&+,B#4&*0#%$5e components which include aspects of constructivism, blended learning and collaborative learning. Engaging and further motivating the learner for !# ($5)# !%*&8)# &+8$'8)()+*# 2"&+,# C$'69"# )D-)5&)+*&!'# learning theories (Kolb, 1984) is one of the basic pedagogic principles which will be adopted in this study. This then leads to the development of a socio-constructivist model for e-learning which will be used to enhance deeper conceptual thinking. social capital is defined as a pool of resources which an individual can accumulate as a result of developed interrelationships. The parameters within which learning communities are assessed include: ! E# ()!"25)# $:# ')!5+)5"9# "!*&":!%*&$+ during learning; ! Characterisation of interpersonal relationships during collaborative practice; ! Peer support, indicating connectivity throughout the experience; ! Change in behaviour owing to the social capital constructed. These parameters will be taken into account when designing the framework for measuring the effectiveness of virtual worlds for knowledge building activities. 4. Virtual Pedagogies? Camilleri & Montebello, (2008) propose a virtual assistant within the social context of the VWs which not only aim to assist and aid in cooperative knowledge building, but which can learn and sustain a mentally stimulating interactive conversation F a two-way communication which finds its roots in every social +)*4$5G&+,# !--'&%!*&$+.# H+'&+)# 2")5"9# 5)I2&5)()+*"# !5)# considered quite distinct however those resident in VWs have unique needs. These include: ! Maintaining student engagement; ! Developing a community; ! Providing immediate feedback; ! Similar learning opportunities; ! Hands-on interactive activities; ! Student ! content interaction; ! Faculty ! student interaction; ! Student ! student interaction. Furthermore the authors identify a number of strategies which when implemented within the virtual world space contr&62*)#*$#*0)#%5)!*&$+#$:#!#7J)!5+"%!-)9# F a learning environment within the VWs which is built upon: Flow in balancing inactivity and challenge. Repetition allowing learners to repeat experiments until they are satisfied with the outcomes. Experimentation in encouraging learners to try and learn in the process. Experience which is more engaging than other digitally mediated technologies. Doing through practice. Observing through an essential communication platform. M$*&8!*&$+#"*&(2'!*)3#6@#*0)#-)$-')9"#$4+#active part. 3. Collaborative Spaces Having established that research trends in pedagogies applied to the online learning environment point towards the setting up of communities for collaborative constructivist models, this research proposes to determine the parameters around which such communities are built. Miller & Brunner, (2008) make use of the Social Impact Theory (SIT) (Latane in Miller & Brunner, 2008) to 2+3)5"*!+3# 0$4# ')!5+)5"9# &+*)5-)5"$+!'# %0!5!%*)5&"*&%"# affect peers during collaborative learning experiences. /0&"# *0)$5@# &"# 3)"%5&6)3# !"# %0!+,)"# &+# !+# &+3&8&32!'9"# behaviour, resultant from communication exchanges with 7-)5%)&8)39# !+3# 5)!'# &+38&32!'".# /0)# %$+%)-*# $:# *0)# perceived peers can be made use of within the virtual world ecosystem, an environment designed and built as a collaborative space. The online environment in itself has been indicated as being more of a support and a supplement to face to face interaction. Research (Tomai, Rosa, Mebane, A, Benedetti, & Francescato, 2010) has shown that the development of online communities and social networks contributes to a possible increase in the social capital for each individual within the group. The Virtual pedagogies are also designed around different approaches and perspectives. Bonanno (2008) in his discussion of learning through collaborative gaming: a process-oriented pedagogy, comes up with a new model which derives its inspiration :5$(# A%$++)%*&$+&"*B# !+3# A%$+"*52%*&8&"*B# -)5"-)%*&8)"# !+3# 40&%0# ")58)"# *0)# -25-$")#$:#!+!'@"&+,#3&::)5)+*#A%!*),$5&)"#$:#&+*)5!%*&$+"# and the major factors that influence them during %$''!6$5!*&8)#,!(&+,B. Monahan & Bertolotto, (2008) describe the transition to 36 ! ! the Virtual Reality (VR) environment as one in which the shift is from the 7%$+8)+*&$+!'# *)D*-6!")39# *$# *0)# immersive and intuitive one, where the computer simulates the natural environment thus making it easier for the learner to identify with. The project Virtual European Schools (VES) F (Bouras in Monahan, G, & Bertolotto, 2008) simulates a collaborative learning environment within virtual classrooms themed around specific school subjects. This project has achieved a high level of user satisfaction highlighting the social presence !"#!#7(!K$5#!38!+*!,)9.##/0)#!2*0$5"#3)"%5&6)#LJMN-R, a 3D learning environment which proposes the social interaction between learners as one of the most important elements which is exploited from its marked absence in other conventional, text based learning systems. E-SPACES will attempt to build upon this research, by making use of pedagogies and principles which have already be ascertained to bring about a change in learner behaviour, within VWs. These will be used to create a measurable standard for the effectiveness of VWs on learning, using distinct parameters for integrating human complex behaviour in community-based learning. Perceived usefulness of VWs for learning; connecting relationships established through learning communities. The setting will be piloted within a specific case scenario, in the higher education context. In this scenario, students following the teacher training course (B.Ed (Hons.) will experience collaborative learning practices, through a hands-on pilot study held inside an immersive environment, such as Second Life (SL,2010) or Olive (Fortrerra Systems, 2010). This pilot study will embark on $::)5&+,# &+3&8&32!'# ')!5+&+,# 7$6K)%*"9# 4&*0in the virtual 4$5'3# :$''$4&+,# *0)# 7OPMMQHR9# ($3)'# $2*'&+)3# &+# section 4 of this document. 6. T he E-SPA C ES F ramewor k Based upon the perspectives of e-learning design, the E-SPACEs framework will take into account all the interaction processes for connectivity and build a virtual "-!%)#2"&+,#*0)#7!%*&8)9#($3)'. The E-SPACES framework will be designed around a simulating environment exploiting the VW through collaborative, constructivist and experiential activities. Content presented for the pilot study will focus around specific tasks and activities which future teachers can design and create for their students. This means that the through the virtual world, these future teachers, will partake into their own active learning processes to design different activities for school children at different levels. Collaboration will take place within this virtual meeting place which also offers sandboxes, to be able to )D-)5&)+%)# &+# -5!%*&%)1# *0)&5# -))5"9# *!"G# 3)"&,+".# /0)# scope of the framework is to clearly define the measurement parameters and establish whether VWs increase the effectiveness of the teaching/learning process comparing the results to a real world control group participating in the same exercise on a face-to-face classroom setting. E-SPACES proposes that the content bridges the gap between the pedagogic approaches and the interactions between the actors involved. In the VW, E-SPACES proposes three distinct actors all having a number of interactions; the educator as the instructional designer, the virtual agent as an intelligent assistant facilitating the virtual experience, and the learners actively involved in their own learning process. The interactions proposed involve the three actors, interrelating with the content presented within the socio-collaborative environment. /0)# !--5$!%0)"# %$++)%*)3# 4&*0# *0)# !%*$5"9# &+*)5!%*&$+"# will build this virtual ecosystem which will be the niche of the learning experience in the social space constructed. The research questions will be shared amongst the actors in this framework. The methodology proposes both qualitative and quantitative data collection, taking views :5$(#)32%!*$5"#!+3#')!5+)5"1#!+3#!'"$#()!"25&+,#"*23)+*"9# attainment targets at the end of a pilot course in the E-SPACE framework. The questions proposed in this study are designed for the measurement of effectiveness within this learning framework and will facilitate a clearer understanding of 5. Proposed Design One very important component of this study is the implementation of a virtual space, included in which are the key elements which would enable users to experiment with their own learning and interact with each other in a collaborative environment, in a persistent space, facilitating meetings, collaboration, and socialisation for the construction of new concepts. It is also important for the study to establish the validity of the theories proposed and the implementation of the technologies applied in terms of the teaching/learning experience. Through the use of virtual reality learners can thus become more visually aware of their companions, through *0)&5#!8!*!5"1#"*&(2'!*&+,#*0)#7-)5%)-*&$+9#$:#*0)#(&+3#*0!*# they are no longer isolated in their online learning sphere. The design of E-SPACES proposes that avatar presence is persistent within the VWs. In Camilleri & Montebello (2008) the concept of persistence and scope is emphasised in that VWs without a collective scope or interest remain void and fulfil nothing more than a static representation of content transmission. The pedagogical approaches -5$-$")3# 2")# %$+%)-*"# $:# 7!%*&8)# ')!5+&+,9# &+8$'8&+,# learners in their own knowledge building, using the constructivist and collaborative models as well as process-oriented models (Bonanno, 2008). Categories of interactions, influenced by a number of parameters and interpersonal factors in the connected VW will also be applied within the design. One fundamental approach to the design is the specification of the learning communities of practice in the VW context, and the content which will serve to connect learners. The complex human behaviour relationships which will be targeted through E-SPACES will measure the: ! attitude of users towards 3D VWs; ! Perceived behavioural control of users in relation to the VWs; 37 9. References the findings. Question #1: How does learning occur in the VW? S2)"*&$+# TUV# ;0!*# !5)# *0)# "*23)+*"9# -)5%)-*&$+" of learning in the online context? S2)"*&$+# TWV# ;0!*# !5)# *0)# "*23)+*"9# -)5%)-*&$+"# $:# learning in the VW context? Question #4: Does learning transfer from the VW to real life? Question #5: What is the perceived usefulness of the VW context for the learners? Question #6: How are the interactions in the VW established? Question #7: How useful for their learning do learners find the interactions within the space? Ajjan, H., & Hartshorne, R. (2008). Investigating Faculty descisions to adopt Web2.0 technologies: Theory and empirical tests. Internet and Higher Education , 71-80. Alier, M. (2006). A Social Constructionist Approach to Learning Communities: Moodle. In M. D. L, & N. (. Ambjorn, Open Source for Knowledge and Learning Management: Strategies beyond Tools. Idea Group, INC. Barbour, M. K., & Reeves, T. (2009). The reality of virtual schools: A review of literature. Computers & Education , 402-416. Bonanno, P. (2008). Learning through Collaborative Ga ming: A Process-oriented Pedagogy. Finland: Joensuu. Camilleri, V., & Montebello, M. (2008). SLAVE F Second Life Assistant in a Virtual Learning Environment. RELIVE08 ! Researching Learning in Virtual Environments. Milton-Keyes: The Open University. Carey, J. (2007). Expressive Communication and Social Conventions in Virtual Worlds. The Data Base for Advances in Information Systems , 81-85. Casamayor, A., Amandi, A., & Campo, M. (2009). Intelligent assistance for teachers in collaborative learning environments. Computers & Education , 1147-1154. Chou, S.-W., & Min, H.-T. (2009). The impact of media on collaborative learning in virtual settings: The perspective of social construction. Computers & Education , 417-431. Etelapelta, A., & Lahti, J. (2008). The resources and obstacles of creative collaboration in a long-term learning community. Thinking Skills and Creativity , 226-240. Granic, A., Mifsud, C., & Cukusic, M. (2009). Design, implementation and validation of a Europe-wide pedagogical framework for e-Learning. Computers & Education , 1052-1081. Jarmon, L., Traphagan, T., M, M., & Trivedi, A. (2009). Virtual world teaching, experiential learning, and assessment: An interdisciplinary communication course in Second Life. Computers & Education , 169-182. Kolb, D. A. (1984). Experiential learning: Experience as a source of learning and development. Englewood Cliffs, NJ: Prentice-Hall. Kumar, S., Chhugani, J., Kim, C., Kim, D., Nguyen, A., Dubey, P., et al. (2008). Second Life and the New Generation of Virtual Worlds. Computer , 46-53. Miller, M., & Brunner, C. C. (2008). Social impact in technologically-mediated communication: An examination of online influence. Computers in Human Behavior , 2972-2991. Monahan, T., G, M., & Bertolotto, M. (2008). Virtual Reality for Collaborative e-learning . Computers & Education , 1339-1353. NMC. (2010). What is Happening in Virtual Worlds? US: NMC. Paechter, M., Maier, B., & Macher, D. (2010). Students' expectations of, and experiences in e-learning: Their relation to learning achievements and course satisfaction. Computers & Education , 222-229. Petrakou, A. (2009). Interacting through avatars: Virtual worlds as a context for online education. Computers & 7. F uture Developments The E-SPACES framework is interdependent on a number of parameters including the VW platform chosen, the target sector of learners involved in the pilot study, and the content which is chosen to bridge the gap between the pedagogic approaches and the interactions proposed. It is being proposed that the current study undergoes specific analysis to gather data for this framework. It is then proposed that data and content are integrated within the framework and implemented during a short pilot course. The limitations and challenges of this study, will surface if a limited number of students are chosen for this study. This might occur depending on the content chosen and the participants available for the duration of the course. The quantitative measure of the effectiveness of the social spaces, will need to be performed against a control. Such control might be difficult to establish in the context of the learning environment. It is expected that the future development of E-SPACES is to identify limitations and challenges, for the design of the study measuring the effectiveness of social niches established in the context of VWs. 8. Conclusion Whilst the use of 3D-VWs seem to point towards their increased use for the learning contexts of the future, there is limited research validating their effectiveness based upon pedagogic approaches taking into consideration collaborative learning in the socio-constructivist perspective. Virtual spaces have a number of characteristics which can be found commonly throughout all platforms including the presence of avatars, an immersive experience and a series of interactions between player characters, non player characters and other world components. VWs are a combination allowing for "&(2'!*&$+#!+3#*0)#A5)!'B#8&5*2!'&*@.#L!+#40!*#0!--)+"#&+#!# real classroom, including all the interactions and exchanges, indeed be transferred to the virtual world? How can this challenge be identified and overcome? Can technology be used to increase the effectiveness of this learning medium? This research is needed to understand how experiential collaborative activities may apply to a number of instructional contexts within the VWs. 38 Education . Solimeno, A., M.E., M., Tomai, M., & Francescato, D. (2008). The influence of students and teachers characteristics on the efficacy of face-to-face and computer supported collaborative learning. Computers & Education , 109-128. Spalter, A., & Simpson, R. (2000). Integrating interactive computer-based learning experiences into established curricula: a case study. Proceedings of the 5th annual S IG C S E/S IG C U E ITiC S Econference on Innovation and technology in computer science education (pp. 116 119 ). Helsinki, Finland: ACM, New York. Tiropanis, T., Davis, H., Millard, D., Weal, M., White, S., & Wills, G. (2009). JIS C - SemTech Project Report. Southampton, UK: JISC CETIS. Tomai, M., Rosa, V., Mebane, M. E., A, D., Benedetti, M., & Francescato, D. (2010). Virtual communities in schools as tools to promote social capital with high school students. Computers & Education , 265-274. Vygotsky, L. (1978). Mind and society: The development of higher mental processes. Cambridge, MA: Harvard University Press. Wrzesien, M., & Alcaniz Raya, M. (2010). Learning in serious virtual worlds: Evaluation of learning effectiveness and appeal to students in the E-Junior project. Computers & Education . 39 A Semantic Knowledge Base for Personal Learning and Cloud Learning Environments Alexander Mikroyannidis, Paul Lefrere, Peter Scott Knowledge Media Institute, The Open University Milton Keynes MK7 6AA, United Kingdom E-mail: {A.Mikroyannidis, P.Lefrere, Peter.Scott}@open.ac.uk Abstract Personal Learning Environments (PLEs) and Cloud Learning Environments (CLEs) have recently encountered a rapid growth, as a response to the rising demand of learners for multi-sourced content and environments targeting their needs and preferences. This paper introduces a semantic knowledge base that utilises a multi-layered architecture consisting of learning ontologies customized for certain aspects of PLEs and CLEs. A number of stakeholder clusters, including learners, educators, and domain experts, are identified and are assigned distinct roles for the collaborative management of this knowledge base. 1. presents integration mechanisms for the different layers of the knowledge base. Section 5 describes the involved stakeholder clusters and their roles within the management of the knowledge base. Section 6 discusses certain challenges arising from the collaborative nature of the management of the knowledge base. Finally, the paper is concluded and the next steps for progressing this work are provided. Introduction Personal Learning Environments (PLEs) and Cloud Learning Environments (CLEs) are gradually gaining ground over traditional Learning Management Systems (LMS) by facilitating the lone or collaborative study of user-chosen blends of content and courses from heterogeneous sources, including Open Educational Resources (OER). 2. PLEs follow a learner-centric approach, allowing the use of lightweight services and tools that belong to and are controlled by individual learners. Rather than integrating different services into a centralised system, PLEs provide the learner with a variety of services and hands over control to her to select and use these services the way she deems fit (Chatti et al., 2007). The OpenLearn case study The Open University (www.open.ac.uk) provides a wide range of OER through the OpenLearn educational environment (http://openlearn.open.ac.uk). OER can be described as “teaching, learning and research resources that reside in the public domain or have been released under an intellectual property license that permits their free use or repurposing by others depends on which Creative Commons license is used” (Atkins et al., 2007). OER are freely available on the web and can be accessed through common web sites or Virtual Learning Environments (VLEs), and more recently through PLEs and CLEs. They can be used, edited and shared by any interested party, such as learners, teachers, institutions, and learning communities. CLEs extend PLEs by considering the cloud as a large autonomous system not owned by any educational organisation. In this system, the users of cloud-based services are academics or learners, who share the same privileges, including control, choice, and sharing of content on these services. This approach has the potential to enable and facilitate both formal and informal learning for the learner. It also promotes the openness, sharing and reusability of OER on the web (Malik, 2009). OpenLearn users have the ability to learn at their own pace, keep a learning journal in order to monitor their progress, complete self assessment exercises, and discuss with other learners in forums. OpenLearn has gathered the interest of a wide audience ranging from governmental and non-governmental entities interested in promoting continuing professional development, public and private higher education institutes, academic teachers, training course designers, graduate and postgraduate students, educational researchers, and generally anyone interested in informal learning (Okada, 2007). In the context of the European project ROLE (Responsive Open Learning Environments www.role-project.eu) we are targeting the adaptivity and personalization of learning environments, in terms of content and navigation, as well as the entire learning environment and its functionalities. We propose the use of ontologies to model various aspects of the learning process within such an environment. In particular, we consider a semantic knowledge base as the core of the learning environment, enabling the collaboration between diverse stakeholder clusters. OpenLearn is essentially a traditional LMS, based on the Moodle platform (http://moodle.org), following a course-based paradigm, rather than a learner-based one. It has been built around units of study and not the personal profiles of learners. Currently, OU students are missing a place where they can aggregate the content offered by different OU services, such as OpenLearn and iTunesU, and mix it together with other educational The remainder of this paper is organised as follows. Section 2 describes the OpenLearn case study, consisting of a traditional LMS into transition towards the PLE and CLE paradigms. Section 3 introduces the architecture of the proposed semantic knowledge base and discusses the various learning ontologies that formulate it. Section 4 40 Figure 1. Climate change OER in OpenLearn (http://tinyurl.com/yene49o) content. Therefore, what we aim to offer OU students in the context of ROLE, is a combined aggregator and e-portfolio, where they can set their learning goals, gather and organise various learning resources, monitor their progress, get recommendations from the system and their peers, and connect with other learners. widgets, configuring them, tagging them, and organising them into thematic categories in different tabs. In the context of the ROLE project, we are working on the transition from the LMS-based approach of OpenLearn towards the PLE and CLE paradigms, by putting emphasis to the needs and preferences of learners. In particular, we aim at providing them with a wider range of OER to choose from, both from OpenLearn as well as from external Web 2.0 sources. However, discovering OER from such a wide range is not an easy task; therefore providing the learners with OER recommendations based on information from their profiles and portfolios is very important. In order to explore the present limitations of OpenLearn, we have been comparing its capabilities with those of a PLE, by delivering the same learning resources with both approaches. For this purpose, we have created a collection of OER related to the UK 10:10 climate change campaign (http://www.1010uk.org/). Figure 1 shows this collection delivered by the existing OpenLearn environment, featuring OpenLearn courses and OU albums from iTunesU. In addition, content from external sources, such as YouTube and SlideShare, is included. However, syndication from dynamic Web 2.0 sources, such as the blogosphere, Twitter, and FriendFeed, is not supported. We propose the use of ontologies to model various aspects of the learning process within the transformed OpenLearn environment. In particular, we consider a semantic knowledge base as the core of this learning environment, enabling the use of metadata and ontologies to annotate learning resources, and model various aspects of the learning process, such as learner profiles. The curation of the proposed semantic knowledge base is supported by the active involvement and collaboration between different stakeholder clusters. On the other hand, the PLE of Figure 2 is a showcase of a widget-based environment hosting the same climate change resources as in OpenLearn, in addition to dynamic Web 2.0 sources. Compared to OpenLearn, this approach offers more flexibility in terms of creating new 41 Figure 2. A widget-based PLE for climate change OER (http://tinyurl.com/m6zrhl) 3. Starting from the top of the pyramid, the Learner layer contains ontologies that model the profiles of the learners involved in the learning process. In particular, the ontologies of this layer model the learners’ profiles according to their interests, goals, preferences, and skills. Some ontology standards corresponding to this layer are the IEEE Learning Objects Metadata Standard (LOM) (http://ltsc.ieee.org/wg12/files/LOM_1484_12_1_v1_Fin al_Draft.pdf), the IEEE Personal and Private Information for Learner (IEEE PAPI) both developed by the IEEE Learning Technology Standards Committee (LTSC), the IMS Learner Information Package (LIP) (http://www.imsglobal.org/profiles), and the IMS Reusable Definition of Competency and Educational Objective (RDCEO) (http://www.imsglobal.org/competencies). Semantic knowledge base architecture In order to efficiently manage the metadata associated with different aspects of the learning process, we propose their organisation into a number of ontology layers. Figure 3 shows the multi-layered semantic knowledge base adapted from the Heraclitus II framework (Mikroyannidis and Theodoulidis, 2006, Mikroyannidis, 2007, Mikroyannidis and Theodoulidis, 2010). In this pyramid, the lower layers represent more generic and all-purpose ontologies, while the ontologies of the upper layers are customized for certain uses within a PLE or CLE. When traversing the pyramid from bottom to top, each layer reuses and extends the previous ones. In addition, whenever a layer extends the ones below it (e.g. with the insertion of new concepts), these extensions are propagated to the lower layers. Different stakeholder clusters curate each layer, depending on the expertise that each layer requires. The integration of the ontology pyramid layers is achieved with the use of ontology mappings between ontologies belonging to the same or different layers. The Learning Resource layer models the learning resources that are employed within a PLE or CLE by learners. These resources are mainly widgets of educational tools and content. For example, the climate change PLE of Figure 2 includes widgets of: • OpenLearn OER • iTunesU albums • External resources, e.g. blog feeds, YouTube videos, 42 • SlideShare presentations, Google gadgets, etc. Knowledge maps ontologies. 4. The ontologies of the Learning Resource layer are constructed out of annotations of these widgets. These annotations can be user-generated tags, or automatically generated semantic annotations, e.g. with the use of IE (Information Extraction) and NLP (Natural Language Processing) techniques. Apart from the Learner layer, the IEEE Learning Objects Metadata Standard (LOM) also corresponds to this layer, as it defines models for learning objects, including multimedia content, instructional content, as well as instructional software and software tools. Knowledge base integration The integration of the ontology pyramid layers into a single manageable scheme is achieved with the use of ontology mappings. In terms the layers of the ontology pyramid being mapped, ontology mappings are either intra-layer, mapping ontologies of the same ontology layer, or inter-layer, mapping ontologies belonging to different layers. From an architectural point of view, ontology mappings can be either structural, namely referring to the structure of the mapped ontologies, e.g. via is-a relations, or semantic when mapping two ontology objects via a semantic relation, such as an employer-employee relation. OWL Full (Bechhofer et al., 2004) offers a variety of constructs for representing structural ontology mappings, including owl:subclassOf, owl:sameAs, owl:inverseOf, owl:equivalentClass, and owl:equivalentProperty. The Learning Domain layer models the learning domain of interest. These are more generic ontologies describing a certain domain of interest to the learner, e.g. bioinformatics. The ontologies of the Gene Ontology (GO) project (The Gene Ontology Consortium, 2000) and the Foundational Model of Anatomy (FMA) (Cornelius Rosse, 2003) are some widely used domain ontologies in bioinformatics. Ontology mappings are particularly useful for the extraction of recommendations to the learner, as they link her profile to learning resources, as well as to profiles of other learners. They can therefore be used to recommend learning resources of potential interest to the learner. They can also be used to recommend a ‘study-buddy’, with whom the learner shares common abilities and interests. Finally, the Lexical layer contains domain-independent ontologies of a purely lexicographical nature. An example of such an ontology is the widely adopted WordNet (Fellbaum, 1998). A lexical ontology is the most generic form of ontology that can be constructed. The ontologies of this layer can be used to model practically any domain. The ontologies of all the other layers are independent of the language used, or other linguistic issues, which concern only this layer. 5. Stakeholder clusters Since each ontology layer represents a different degree of specialization, different stakeholder clusters are required to contribute to the curation of each layer. Starting from the bottom of the pyramid, lexicographers have the knowledge on language structures that is required in this level. Domain experts need to be employed for the next layer. These are professionals on a certain domain, e.g. biologists are responsible for a biology-related ontology. Although lexical ontologies constitute a strong basis for the construction of any domain-specific ontology, their relations tend quite often to be imprecise and thus not suitable for logical reasoning. This can be addressed with the use of more strictly constructed, general purpose ontologies, such as SUMO (Sevcenko, 2003). Such models can act as structuring mechanisms for lexical ontologies or intermediates between lexical and domain Figure 3. Multi-layered semantic knowledge base 43 For the Learning Resource layer, a more diverse group is suitable: producers and consumers of learning resources. The producers are those that develop learning resources, either content or tools. They can be lecturers, learning designers, or team leaders who develop new courses, workshops or training sessions and author new learning material. The consumers are learners who use and annotate the offered learning resources. conflicts. Various technologies can be used to address this issue, such as CVS (The Gene Ontology Consortium, 2000), Wiki (Auer et al., 2006, Schaffert, 2006), or peer-to-peer based solutions (Becker et al., 2005, Xexeo et al., 2004). • Consistency maintenance: Parts of the knowledge base curated by different authors may be inconsistent with each other, since an ontology usually reflects the point of view of each author. Mechanisms for structural and semantic consistency preservation as well as change propagation need to be provided to ensure that the knowledge base is free of inconsistencies at all times. • Privilege management: In order to ensure the accuracy of the knowledge base, a collaborative environment needs to assign different levels of privileges to its users, based on their expertise, authority, and responsibility. Our architecture is based on a flat scheme regarding privilege management, by giving each stakeholder cluster equal privileges in their layer of responsibility. • History maintenance: Collaborative environments should provide the means to recover from wrong or unintended changes to the knowledge base. All changes to the knowledge base should be thus recorded in order to be able to track the authorship of a change and to prevent loss of important information. The bitemporal ontology model of Heraclitus II (Mikroyannidis, 2007) retains the necessary information to achieve this goal. • Scalability: Long-term collaboration of diverse parties usually increases the size of knowledge bases; therefore, a collaborative environment has to be scalable to large ontologies. This is particularly important in the abundant environment of CLEs, where a wide variety of cloud-based services is employed. Finally, the Learner layer is curated by learners, who provide information about themselves in order to receive recommendations about learning resources and create personal networks with users from different learning environments, with whom they may share common learning interests. Depending on the scope of intra and inter-layer ontology mappings, these are performed by one or more stakeholder clusters. For example, an inter-layer ontology mapping between the lexical and the domain layer will be created jointly by the stakeholder clusters of these two layers, namely lexicographers and domain experts. Intra-layer ontology mappings are performed by the stakeholder cluster of the corresponding layer. The assignment of stakeholder clusters as curators of the ontology pyramid layers is summarized in Table 1. Ontology layer Stakeholder cluster Lexical layer Lexicographers Learning domain Domain experts layer Learning resource Learning resource developers / layer Learners Learner layer Learners Inter-layer ontology Stakeholder clusters of mappings corresponding layers Intra-layer ontology Stakeholder cluster of mappings corresponding layer Table 1. Assignment of stakeholder clusters as curators of the semantic knowledge base 6. 7. Challenges in collaborative ontology management PLEs and CLEs address the crucial demands of today’s learner for a personalized and adaptive learning environment. In order to achieve these goals, we propose the use of ontologies for modeling the learning process and assigning distinct curator roles to the involved stakeholder clusters. We perceive a semantically enhanced PLE or CLE as the evolution of the present OpenLearn environment, as well as the evolution of LMS-based approaches in general. Collaboration between stakeholder clusters in curating the semantic knowledge base is essential; however, it involves several challenges, including concurrency, consistency, and scalability issues. We will be targeting the following set of parameters for collaborative ontology management, as outlined in (Bao et al., 2006): • • Conclusion and next steps Knowledge integration: A fundamental task in a collaborative environment is the integration of contributions from multiple participants. The proposed semantic knowledge base consists of a multi-layer architecture that is curated by diverse clusters of stakeholders. Reusability and integration is supported through ontology mappings. We are currently in the process of refining the specifications of the proposed semantic knowledge base for addressing particular requirements of the OpenLearn case study. This refinement includes reviewing existing ontology standards in terms of their suitability to be reused, repurposed and adapted within an OpenLearn-specific ontology pyramid. Concurrency management: Different ontology authors need to be able to work on different parts of the knowledge base simultaneously. In case the same part of the knowledge base is concurrently edited by more than one author, this can cause 44 8. Acknowledgements Management and Evolution for Business Intelligence. International Journal of Information Management, (forthcoming). Okada, A. (2007) Knowledge Media Technologies for Open Learning in Online Communities. International Journal of Technology, Knowledge and Society, 3(5), 61-74. Schaffert, S. (2006) IkeWiki: A Semantic Wiki for Collaborative Knowledge Management. 15th IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE'06). Manchester, UK, 388-396. Sevcenko, M. (2003) Online Presentation of an Upper Ontology. Znalosti 2003. Ostrava, Czech Republic. The Gene Ontology Consortium (2000) Gene Ontology: tool for the unification of biology. Nat Genetics, 25, 25-29. Xexeo, G., De Souza, J. M., Vivacqua, A., Miranda, B., Braga, B., Almentero, B. K., D' Almeida, J. N., Jr. & Castilho, R. (2004) Peer-to-peer collaborative editing of ontologies. 8th International Conference on Computer Supported Cooperative Work in Design (CSCWD 2004). Xiamen, China, 186-190 The research work described in this paper is partially funded through the ROLE Integrated Project, part of the Seventh Framework Programme for Research and Technological Development (FP7) of the European Union in Information and Communication Technologies. 9. References Atkins, D. E., Brown, J. S. & Hammond, A. L. (2007) A Review of the Open Educational Resources (OER) Movement: Achievements, Challenges, and New Opportunities. The William and Flora Hewlett Foundation, http://www.oerderves.org/wp-content/uploads/2007/0 3/a-review-of-the-open-educational-resources-oer-mo vement_final.pdf. Auer, S., Dietzold, S. & Riechert, T. (2006) OntoWiki A Tool for Social, Semantic Collaboration. 5th International Semantic Web Conference (ISWC 2006). Athens, GA, USA, Springer LNCS, 736-749. Bao, J., Hu, Z., Caragea, D., Reecy, J. & Honavar, V. G. (2006) A Tool for Collaborative Construction of Large Biological Ontologies. 17th International Conference on Database and Expert Systems Applications (DEXA'06). Krakow, Poland, 191-195. Bechhofer, S., Harmelen, F. V., Hendler, J., Horrocks, I., Mcguinness, D. L., Patel-Schneider, P. F. & Stein, L. A. (2004) OWL Web Ontology Language Reference. IN DEAN, M. & SCHREIBER, G. (Eds.) W3C Recommendation.World Wide Web Consortium, http://www.w3.org/TR/owl-ref/. Becker, P., Eklund, P. & Roberts, N. (2005) Peer-to-peer based ontology editing. International Conference on Next Generation Web Services Practices (NWeSP 2005). Seoul, Korea, 259-264. Chatti, M. A., Jarke, M. & Frosch-Wilke, D. (2007) The future of e-learning: a shift to knowledge networking and social software. International Journal of Knowledge and Learning, 3(4/5), 404-420. Cornelius Rosse, J. L. V. M. J. (2003) A reference ontology for biomedical informatics: the Foundational Model of Anatomy. Biomedical Informatics 36 (2003), 478 - 500. Fellbaum, C. (1998) WordNet: An Electronic Lexical Database: The MIT Press. Malik, M. (2009) Cloud Learning Environment - What it is? EduBlend. http://edublend.blogspot.com/2009/12/cloud-learning -environment-what-it-is.html. Mikroyannidis, A. (2007) Heraclitus II: A Framework for Ontology Management and Evolution. PhD Thesis, Manchester Business School, University of Manchester, Manchester Mikroyannidis, A. & Theodoulidis, B. (2006) Heraclitus II: A Framework for Ontology Management and Evolution. 2006 IEEE /WIC/ACM International Conference on Web Intelligence (WI 2006). Hong Kong, China, IEEE Computer Society, 514-521. Mikroyannidis, A. & Theodoulidis, B. (2010) Ontology 45 Semantic Annotation for Semi-Automatic Positioning of the Learner Petya Osenova, Kiril Simov Linguistic Modelling Laboratory, IPP-BAS Acad. G.Bonchev 25A, 1113 Sofia, Bulgaria [email protected], [email protected] Abstract The learner’s positioning with respect to a curriculum is of a great importance to the life-long learning (an informal learner needs to achieve a certain level of competency) as well as to the mobility learning (a student spending a semester in another university). In both cases it is necessary to determine learner’s prior knowledge. Thus, he might profit in an optimal way from the consequent learning process. The learner’s positioning requires grading of pre-course questionnaires by a tutor. This grading is tedious and time-consuming work. In this paper we present the first implementation of a knowledge-rich method for supporting the tutor in the positioning task. Our method exploits the potential of the semantic annotation with regard to the curriculum and the learner’s questionnaire answers. The annotation of the curriculum provides the level of the competence to be covered in the course, while the annotation of the questionnaire answers provides evidence for the learner competence per se. The final judgment is assigned to the tutor. The presented method might be well used also for the learner’s self-positioning with slight modifications only. 1. Section 4 describes the semantic annotation of the curricula and answers. Section 5 outlines some preliminary evaluation of the method. Section 6 presents the further extensions over the semantic annotation. Section 7 concludes the paper. Introduction Learner’s positioning, on the one hand, has proved to be a very important step in the learning process, and on the other hand, to be a very difficult task. It has been often considered in the context of the self-positioning (Ross 2006) or the context of various groups of practices (Braun and Schmidt 2008). The central role in the positioning task plays the tutor, since there has not been invented yet a completely automatic way, which to be 100 % successful and reliable. Hence, our aim is to support the tutor in his judgments when positioning the learners. We assume that the tutor is inspecting a set of learner’s answers to a questionnaire. The questionnaire would reflect the required knowledge that has to be covered by the learner. The actual positioning is with respect to a curriculum, which presents the following aspects: the knowledge-oriented requirements for a learner, a set of learning materials to support him during the learning process, and links to people who might help him with the learning topics. Thus, the questionnaire is designed on the base of the curriculum. The positioning in these settings is viewed as a set of recommendations from the tutor to the learner which directs the learner within the curriculum, i.e. which materials to study, which people to contact, etc. Our method relies on the comparison among the curriculum and the related learner’s answers, both semantically annotated. This comparison highlights the learner abilities to express the necessary concepts in the answers of the questions from the questionnaire. The tutor can use the results from the comparison to balance his judgments individually for each learner and collectively, as a group. This method has also the advantage that some conceptual or terminological gaps/inconsistencies might also be discovered in the curriculum itself. The structure of the paper is as follows: Section 2 concentrates on the various aspects of the knowledge rich method. Section 3 overviews the design of the curriculum and the questionnaire answers as well as their interaction. 2. The Knowledge-Rich Approach In our work on the positioning of the learner with the help of the knowledge rich approach 1 we rely on the ideas reported in (Kalz et al. 2007). They discuss the notion of the learning network. According to it, learner’s competence can be automatically compared to a set of concept evidences of the target competence. Our goal is to achieve an ontology-based positioning where the learner competence is represented by a learner’s competence ontology and curriculum competence ontology. However, reliable competence ontologies are still missing. Thus, in our work we rely on domain ontologies, which are supposed to reflect the knowledge part of the learner’s competence. The ontological analyses of the learner’s portfolio (mainly tests and CVs) and the textual description of the relevant curriculum might be considered an approximation of the learner’s (per se) competence against the curriculum (required) competence. Thus we consider the learning network a set of different resources including tutors, experts, learning materials and learners, whose connections are mediated by ontologies. The positioning of a learner within the learning network is identical to the task of creating a learning path for each learner within the established network. Our method facilitates the tutor in positioning task by analyzing some of the textual elements of the network. Thus, the knowledge rich methods rely on the analysis of the text by using knowledge sources, external to this text, such as ontologies, lexicons, grammars. These sources are used to achieve a semantically rich text analysis which to explicate the conceptual content of the learner’s answers 1 We call the method knowledge rich because it requires an appropriate ontology to represent the conceptual knowledge to be explicated in the curriculum and learner’s answers. 46 connection from ontology via lexicon to grammars is also relied on for the semantic (concept) annotation of the text. In this way, we established a connection between the ontology and the texts. The relation between the lexicon and the ontology is used for defining user queries with respect to the appropriate segments within the documents. In a more general setting, these relations can be extended to cover the overall learning network. For example, the annotation of a document with concepts connects it to the ontology, which with the help of the lexicon and the grammar connects it to other documents. Similarly, it is possible to annotate other resources, such as images, web sites, people profiles, etc. to the questionnaire. The main steps in the text analysis that we envisage as necessary in order to support the problem in reliable way, are: (1) grammar-based semantic annotation with concepts from an ontology, (2) discourse segmentation; (3) lexical chains creation to support the disambiguation of concept annotation from (1) and concept distribution within the text; and (4) sentiment analysis for evaluation of the concept usage in the text. The combination of all these analyses should explicate in the best way the conceptual content of the curriculum and the learner texts, which to be used for the positioning. Our first implementation of the positioning service realizes completely only point (1) and sketches initially a version of the other processing tasks. Therefore, in the rest of this section we will concentrate on analysis (1), namely grammar-based semantic annotation with concepts from an ontology. The reason is that this analysis has already been completely performed and preliminary evaluated in a designed learning-based setting. The other steps are discussed as further extensions to the task design in Section 6. As mentioned above, the knowledge rich approaches are usually connected with the availability and the usage of knowledge rich databases, such as ontologies and lexicons. The ontologies reflect the conceptualizations in some domain of interest – for example, DAML ontology library, SWOOGLE, LT4eL ontology, etc. These ontologies have to be connected to an upper ontology in order to cover in a better way the general knowledge – for example, DOLCE, SUMO, SIMPLE Core Ontology. The most famous knowledge rich lexicons are the so-called wordnets (WordNet, EuroWordNet, BalkaNet, SIMPLE). Such resources are exploited for the semantic annotation of documents and/or for semantic retrieval. Within LT4eL project an ontology-to-text relation was defined, which approaches the interaction among conceptualizations and terms (Simov and Osenova 2007; Simov and Osenova 2008). For clarity, this relation is briefly presented here. We assume that the ontology is the repository of the lexical meaning of the language. Thus, we start with a concept in the ontology and we search for lexical items and non-lexical phrases that convey the content of the concept. There are two possible problems: (1) there is no lexical item for some of the concepts in the ontology, and (2) there are lexical items in the language without a concept representing the meaning of the lexical item in the ontology. The first problem is overcome by allowing in the lexicon also non-lexical phrases to be represented. The second problem is solved by extension of the ontology. The lexicon items are then mapped to grammars. We call them concept annotation grammars. These grammars relate the lexicon to the text. Such a mapping is necessary as much as lexical items and phrases from the lexicons allow for multiple realizations in the text. Thus, they require some additional linguistic knowledge in order to disambiguate between different meanings of some lexical item or phrase. We have been using the relations between the different elements for the task of ontology-based search. The 3. Design of the curriculum and the related questionnaires As it was mentioned above, we assume that a curriculum consists of set of topics providing the content of a course or a set of courses. Each topic is then associated with a set of learning materials – lectures, tests, descriptions of expected answers, etc. The learner needs to acquire at least two kinds of knowledge studying the curriculum: the content knowledge and the skills necessary to apply the content knowledge in a community of practise. Here we focus on the content knowledge. The questionnaire, on the other hand, consists of questions of various types, which check the learner’s status with respect to the curriculum topics. They might reflect more surface as well as more profound aspects of the topics. However, as a first practical approximation we decided to exploit a set, which more or less amalgamates both perspectives – curriculum plus questionnaire, but at the same time is being used in real job seeking situations. As our design setting we used a sample of 10 topic questions, provided by BitMedia within the LTfLL project. The topics are in the IT area. Each question suggested a more surface background when asking about types of things. It also further asked about functions and properties. Some examples are in order: Explain the meaning of the concept RAM and describe its properties; Name as many PC ports as you can and give some examples. These topic questions have been equipped with a set of required example answers. Since the set was provided in German, translation was performed into English and Bulgarian. Thus, only the real answers had to be gathered. This part of the setting is described in Section 5. On the base of this concrete curriculum, we identified the following types of questions: (1) content questions which require answers; (2) skill questions which highlight the learner’s abilities to apply the knowledge in practice; (3) personal questions which demonstrate learner’s abilities to communicate within a group, etc. Our primary goal is to cover evaluation of questions of the first kind. 4. Semantic annotation In this section the annotation of the curriculum and questionnaire answers will be presented. 47 is not covered by the learner competence and they can be used to suggest further learning activities; (3) list of additional concepts – these concepts could demonstrate some wrong understanding of the topic by the learner or gaps in the curriculum (topics or semantic annotation). In the context of the above example a learner responded with the following answer: Output device, monitor, display devices of a PC; there are two types: Monitors with cathode ray tube (CRT) - heavy, need more power, occupy more space; Flat panel displays - light, need less power, and occupy less space. The terms in the text recognized as related to concepts in the ontology are highlighted. The three lists are as follows: Common concepts: CRT monitor, display, monitor Missing concepts: contrast, frame rate, graphical elements, image, LCD monitor, picture, pixel, ratio, refresh rate, rendering, resolution, screen, size, VGA Additional concepts: types, devices, Output device, PC, power, space The concepts in the first two lists are lexicalized on the base of the lexicon, mapped to the ontology. The concepts in the last list are represented with the terms used by the learner. This helps the learner and the tutor to identify the usage context of these concepts. As it was mentioned above, the usage of additional concepts is not always an evidence of wrong knowledge, but could be a good feedback to both - the learners and the tutors. The expression output device in the above example might be considered as a good concept to be included in the annotation of the query. The semantic annotation of a curriculum includes two steps. First, all the learning materials related to the curriculum, are annotated automatically with concepts from the domain ontology. Then, the tutor (or the teaching administrator responsible for the curriculum) creates a set of queries to reflect the content knowledge of the curriculum. Each question is also annotated with appropriate concepts to reflect this content knowledge. A comparison is made whether the coverage of the questions meets the requirements within the curriculum. It is worth mentioning that other questions concerning the skills of the learner might be additionally provided within the question set, but they are not necessarily annotated with concepts from the ontology, and their answers have to be evaluated in a different way. During the creation of the questions, the tutor has at his disposal the ontology and the semantic annotation of the learning materials. Then the questions are also automatically annotated and the mappings are again presented to the tutor. To sum up, our approach demonstrates the usage of automatic procedures, which alternate with the tutor’s intervention, when required. In our practical setting, the questions, related to the curriculum, were given in advance. Thus, we only provided the automatic annotation of the questions themselves and the example answers. The question annotation was additionally edited by experts in the area of IT. The following example demonstrates the questions in BitMedia questionnaire with the list of the assigned concepts. The learner’s answers to the questions were annotated with concepts automatically by the semantic annotation module, described above. In our setting this step was performed exactly in this way. Here is one example of a query: Name some of the technical specifications of different kinds of monitors. The following is a list of concepts, selected as annotations for this question by an expert: CRT monitor, display, contrast, frame rate, graphical elements, image, LCD monitor, monitor, picture, pixel, ratio, refresh rate, rendering, resolution, screen, size, VGA This list of concepts demonstrates that the tutor could include not only concepts that are directly answers to the question, but also related concepts which are necessary in order to ensure that the learner uses the concepts related to the answers within the proper context. The above list also demonstrates the case in which concepts and sub-concepts are also included in the list because they define slightly different contexts of usage. The next example shows the annotation of the learner’s answers. The annotation is done within the text of the answer. Then the concepts from this annotation are compared to the concepts from the question annotation and three lists of concepts are created: (1) list of common concepts – the concepts that demonstrate how well the learner competence matches the required competence; (2) list of missing concepts – these concepts determine what 5. Evaluation Having the semantic annotation of the curriculum and the learner’s answers, the service calculates several lists of concepts, as it was reported in the previous section. The real evaluation within the learning network of BitMedia is under implementation. Here we report on a small scale evaluation, run by us in order to have some first evidences for the usefulness of the service and to acquire some ideas about the future development of the service. The concept evidence of the learner’s competence can be automatically compared to a set of concept evidences of the target competence (learning network in the terms of (Kalz et al. 2007)). Those are selected, which are not covered by the current learner’s competence. For the comparison of the concept evidences we use the standard vector metrics from Information Retrieval community. The automatic evaluation was constructed as a ration of the list of the common concepts with the list of concepts from the annotation of the query. In order to do evaluation of the automatic method, the 10 questions were given to Bulgarian students in IT area. We 48 co-references has been approached from various perspectives. For example, (Lech and de Smedt 2006) and (Nikolov et. al 2009), among others, exploit the semantic features from ontology in order to improve the co-reference chaining; (Kawazoe et al. 2003) designed a software that helps experts in biomedical domain to create ontologies and annotate texts with co-references. In our task, we exploited these papers (together with the work on anaphora and co-reference annotation in general) in the annotation of the corpus. In our future work, we will apply their approaches for the implementation of a new version of our ontology-to-text relation. One of the reasons for the underestimation of the learner answers by the automatic method is due to the fact that the concept annotation requires very exact answers which sometimes are not present among learners’ answers. The learners use freer style of expressing their knowledge. Thus, they rely on similar concepts to the ones in the curriculum annotation – such as, more general or sibling concepts, etc. In order to handle this problem, we envisage extending the annotation from domain concepts via domain terms to general concepts via general lexica. As it was mentioned in the goal of classifying concepts used by the learner, we would like to evaluate the level of knowledge of the used concept. To do this, we will exploit a version of the sentiment analysis. In our case, the sentiment analysis determines the attitude of the learner to the concepts explicated within the answers. As a starting point for developing of the sentiment analysis, we consider the work reported within (Moilanen and Pulman 2007) and (Liu 2008). It is often underlined that adding knowledge rich features improves the results in sentiment analysis. For example – (Moilanen and Pulman 2007), (Kennedy and Inkpen 2006), (Kim and Hovy 2006). The input for this module will be the results from the previous modules. In order to construct a concept evidence of the learner’s competence, we first need to extract the concepts which are mentioned within the answers text. Then, on the base of the ontological reasoning, the implied concepts will be added. For example, if the answer’s holder says that he/she is used to giving injections 2 , this automatically means: on more general level, that he/she can intervene in order to improve the situation, and, on more specific level, that he/she can put liquid under the skin by using a syringe. We also need to know in what context each of the concepts in the text was mentioned by the learner. For example, if the learner stated two opposite fact: it is easy to give an intradermal injection, but it is difficult to give an intramuscular one. From this short context a conclusion can be drawn that the learner is not experienced in giving injections as a whole. Thus, comparing conceptual information and discourse relations about the context, each mentioning of a concept will be evaluated by one of the values: ‘well known’, ‘known’, and ‘unknown’. We will use the methods developed in the areas of sentiment and opinion analysis. As it was already gathered more than 10 answers per topic at average. Then, the same answers were given to two tutors in IT area to grade them. First, we compared the concepts, presented in the answers, to those, required in the descriptions. Then, we compared the automatic grading to the tutors’ one. The results are as follows: there is a big mismatch between the descriptions and the answers due to short students’ productions or avoidance of certain concepts. On the other hand, tutors’ grading was different. It accounted certain aspects (such as detailed description of characteristics of the main concepts to be covered by learner’s answers), but not others (such as the presence/absence or distribution of concepts). The last point reflects the fact that in a verbose answer it is relatively easy to overestimate the learner’s knowledge – especially under time pressure. Thus, the preliminary evaluation showed that: pure automatic comparison might underestimate learner’s knowledge; pure tutor grading also skips some aspects of learner’s knowledge while putting more weight to others. The conclusion is that the best way is for the tutor to have at disposal the intersection list of concepts from curriculum and learners’ answers in order to present the final judgement with respect to learners’ status and future learning materials. The tutor has the concepts from the curriculum, which were mentioned in the answers as well as the list of ones not mentioned. However, in the long run we aim at a more profiled concept evidence, which would be possible when the extensions to the semantic annotation are added (see the discussion in the next section). In such a case the learner’s competence would be set of concept descriptions extracted from the answer. For the moment we envisage to extend the classification of the concepts from three lists to five. We will divide the set of concepts in the following subsets: (1) known concepts; (2) partially known concepts; (3) unknown concepts; (4) concepts with contradictory usages; and (5) additional concepts. The first subset will contain all the concepts which are evaluated as known in the answer. The second subset will contain concepts that are mentioned in the answer, but there is no enough evidence about the level of knowledge of the learner with respect to them. The third subset will contain concepts that are definitely mentioned as unknown by the learner. In the fourth subset we will include the concepts for which there are positive and negative evidences about the knowledge of the learner. The last set is the same as the described above. It can influence the other groups with its relevance or irrelevance. In addition to the extracted concepts, we will extract links to the occurrences of the concepts in the text. 6. Extensions to the semantic annotation For better semantic annotation and its usage in positioning, we consider also additional context-oriented information: co-referential relation annotation, annotation of general lexica and sentiment analysis of the concept usage in the text. The relation between concept annotation and 2 The examples in this section are from a preliminary work in the medical domain. 49 European Project LTfLL (http://www.ltfll-project.org/). We would like to thank the colleagues from the project for the discussions on the topics related to the task of positioning of the learner. Also we would like to thank the two reviewers for their valuable comments. mentioned, a pre-defined requirement list of necessary concepts with definitions will be used in order to estimate the degree of competence, delivered by the learner in the portfolio. There will be three types of evaluation: coverage, degree of detailness and relevance. The coverage will be estimated over the number of the mentioned relevant concepts that match the pre-defined list. The degree of detailness will be evaluated over the depth of the conceptual space. And the relevance will be estimated via the ontological relations from a given concept to the other co-occuring concepts within the discourse segment. 7. 9. Braun, Simone, Andreas Schmidt. (2008). People Tagging & Ontology Maturing: Towards Collaborative Competence Management. In: 8th International Conference on the Design of Cooperative Systems (COOP '08), Carry-le-Rouet, France, May 20-23, 2008. Kalz, Marco; Van Bruggen, Jan; Rusman, Ellen; Giesbers, Bas; Koper, Rob. (2007). Positioning of Learners in Learning Networks with Content, Metadata and Ontologies. In Interactive Learning Environments, Volume 15, Issue 2 August 2007 , pages 191 – 200 Kennedy, Alstair and Inkpen, Diana. (2006). Sentiment classification of movie reviews using contextual valence shifters. Computational Intelligence. vol. 22, 2, pp. 110-125. Kim, Soo-Min and Hovy, Eduard. (2006). Extracting Opinions, Opinion Holders, and Topics Expressed in Online News Media Text. In Proceedings of ACL/COLING Workshop on Sentiment and Subjectivity in Text, Sydney, Australia. Lech, Till Christopher and Koenraad de Smedt. (2006). Enhancing Semantic Annotation through Coreference Chaining: An Ontology-based Approach. In: Siegfried Handschuh, Thierry Declerck, Marja-Riitta Koivunen (eds.), CEUR Workshop Proceedings, Vol. 185, 2006. Liu, Bing. (2008). Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data. Springer. Moilanen, Karo and Pulman, Stephen. (2007). Sentiment Composition. In Proceedings of Recent Advances in Natural Language Processing (RANLP 2007). September 27-29, Borovets, Bulgaria, pp. 378-382. Nikolov , Andriy, Victoria Uren, Enrico Motta and Anne de Roeck. (2009). Towards instance coreference resolution in a multi-ontology environment. Presented at: Workshop on matching and meaning, Edinburgh, UK, April 2009. Simov, Kiril and Petya Osenova. (2007). Applying Ontology-Based Lexicons to the Semantic Annotation of Learning Objects. In Proc. Of RANLP 2007 workshop: Natural Language Processing and Knowledge Representation for eLearning Environments. Borovets, 26. September 2007. Simov, Kiril and Petya Osenova. (2008). Language Resources and Tools for Ontology-Based Semantic Annotation. In Proc. at the Workshop ‘OntoLex 2008’ at LREC 2008. John A. Ross. (2006). The Reliability, Validity, and Utility of Self-Assessment. In: Practical Assessment, Research and Evaluation. Volume 11 Number 10, November 2006. Conclusions In this paper we presented a knowledge-rich method for supporting the tutor in his positioning task. We presented a preliminary evaluation setting, which showed: the potential of the domain ontologies in the semantic annotation within the life-long learning context; the role of the tutor in the same context; and the natural ways for further extension of the annotation, which aims at a more precise and wider eliciting of learner’s knowledge evidences. The result of the service will be used further to compare the concept evidence of the learner’s competence with the learner network. The comparisons will use a vector representation of concept evidence of the learner’s competence and concept evidence of the target competence. The vector for the target competence will be fixed within the learner network. The vector for learner’s competence will be created by the assessor on the basis of the above sets of concepts. Our goal is not just to calculate these sets of concepts, but also to use them for giving feedback to the learner and thus achieving better results in the learning activities. This kind of feedback will be even more useful when the approach is used for self-positioning of the learner. Knowledge rich approach requires some initial effort to prepare the necessary resources in order to achieve the goals of positioning of the learner. In our view (also discussed and shared by other colleagues from the LTfLL project – especially Christoph Mauerhofer from Bitmedia), the effort invested at the beginning will pay off during a long and wide exploitation. This could be true in cases of introducing new products of big software companies, where the company itself has the interest to construct appropriate resources (ontologies, lexicons, curriculum, tests, etc). The advantages of the knowledge rich approach are: the exactness of the evidences of the learner competency, the links to the learning materials and the definition of learning paths. Another advantage of the approach is multilinguality – the curriculum and its annotation could be prepared in one language, but it might be reused with little additional effort in many other languages for the learners who do not know the original language of the curriculum. 8. References Acknowledgements The work reported within this paper is supported by the 50 Facilitating cross-language retrieval and machine translation by multilingual domain ontologies Petr Knoth∗ , Trevor Collins∗ , Elsa Sklavounou† , Zdenek Zdrahal∗ ∗ KMI, The Open University Milton Keynes, United Kingdom {p.knoth, t.d.collins, z.zdrahal}@open.ac.uk † SYSTRAN Paris, France [email protected] Abstract This paper presents a method for facilitating cross-language retrieval and machine translation in domain specific collections. The method is based on a semi-automatic adaption of a multilingual domain ontology and it is particularly suitable for the eLearning domain. The presented approach has been integrated into a real-world system supporting cross-language retrieval and machine translation of large amounts of learning resources in nine European languages. The system was built in the context of a European Commission Supported project Eurogene and it is now being used as a European reference portal for teaching human genetics. 1. Introduction Because of the low frequency of polysemy in domain specific collections, domain-specific MT systems are capable of achieving high performance. However, one of the main obstacles remain in the acquisition of terminology. At the same time, the domain terminology is usually an essential artefact used for query composition. Our method is motivated by this problem and tries to approach it by using a single terminological access point embodied by the multilingual domain ontology for both CLIR and MT. This allows to combine the strengths of ontology-based retrieval and domain-specific machine translation. In Section 2, approaches to domain CLIR with relation to MT are introduced. The theoretical foundation of the method for facilitating domain CLIR and MT is explained in Section 3. The application of the approach in the Eurogene system is then presented in Section 4 and the performance is discussed in Section 5. Finally, the contribution of the paper for the eLearning domain is summarized in Section 6. A significant amount of research has been carried out in the NLP and Semantic Web technology fields in the last years. A few activities and projects, such as LT4eL (Lemnitzer et al., 2007) or LTfLL (LTfLL, 2008), have been launched with the objective to integrate these technologies with eLearning systems. One of the vital sub-objectives of these projects is to allow seamless access and retrieval of multilingual learning materials. In this paper we report on the activities undertaken in the context of Eurogene (The First Pan-European Learning Service in the Field of Genetics) project related to the problem of accessing and sharing multilingual learning resources. More specifically, the article builds on the idea that eLearning systems should not only allow the cross-language retrieval of learning resources, but should be extended with machine translation capabilities to provide a better user experience. The proposed approach synchronizes the adaption of cross-language retrieval and machine translation in such a way that the performance of both systems improves. Although the presented method has been integrated into an eLearning system in the human genetics field, it is applicable in a broader context. Many of the important players in the information retrieval field (including Google and Yahoo!) offer cross-language information retrieval (CLIR), some of them also provide machine translation (MT). While the performance of these systems is usually sufficient for general queries, CLIR and MT are often inaccurate for domain-specific queries. Large repositories storing domain specific content, such as PubMed which stores vast amounts of scholarly articles, have successfully adopted large thesauri/ontologies of domain terminology to improve the performance of their retrieval system (Lu et al., 2009). While there are efforts targeting cross-language retrieval in eLearning (Lemnitzer et al., 2007; Eichmann et al., 1998; Lu et al., 2008), the combination of the domain-specific retrieval and machine translation is rarely available. 2. Approaches to domain CLIR There are two typical approaches to CLIR: 1. MT approach - The user’s query is translated from the source language to the target language and submitted to the search system. This approach can be further divided into two cases: (a) MT of the query is performed and the query is submitted in all languages of interest. (b) A multilingual ontology is developed and used to map the submitted query to different languages. 2. Statistical approaches - The system is trained on a collection of texts (usually parallel). The user’s query is then mapped to a language independent document vector using approaches, such as Latent Semantic Indexing (LSI) (Dumais, 1997). Approach 1(a) requires the search system to be welladapted for the translation of the terminology of the tar- 51 get domain. Depending on the MT system in hand, domain adaption is rule or statistically based. Rule-based approaches allow specifying rules expressing that a given term tL1 in language L1 corresponds to term tL2 in L2 . Statistical approaches to machine translation support automatic learning of such pairs from parallel corpora. Approach 1(b) is motivated by the fact that monolingual domain ontologies can be employed to improve the performance of the retrieval system by query expansion leveraging the ability of ontologies to represent synonyms linked to a concept and the hierarchical structure of concepts. Monolingual ontologies can be extended to multilingual ontologies. Approach 2 is influenced by the size of the available parallel corpora which is critical for the performance of the retrieval system. The approach is, in general, more suitable for bilingual cross-language retrieval as it is usually difficult to find experts to build a domain-specific training set that would contain parallel texts from each language of interest to a common interlingua. (cognitive units of meaning - abstract ideas or mental symbols), T is a set of terms (textual representations of concepts), E is a set of oriented relations (is-a relations), such that !C, E" is a directed acyclic graph, and f : T → C is a surjective function from terms to concepts. Note that this implies that polysemy cannot be represented in our ontology. This is for our purposes intentional as we comprehend a domain as an area or part of an area in which the terminology is unambiguous.1 . Today, lightweight ontologies can be built by reusing existing ontologies or by applying NLP methods for term extraction and ontology learning (Cimiano and Völker, 2005). In the second step, the initial domain ontology is translated using MT and is validated by domain experts. The accuracy of MT is at this moment usually low as the system has not yet been sufficiently trained for the target domain. The resulting multilingual ontology is a 6-tuple O = !C, T, E, f, L, lang", where L is the set of languages and lang : T → L is a mapping from terms to languages. After the validation, the multilingual ontology is integrated with the retrieval system and the available collection of language resources is indexed in terms of the ontology. A set of terms {t|lang(t) = language of the resource} is used for indexing. The bootstrapping phase can be iterated as many times as necessary. The mutual updating procedure is shown in Figure 1. This phase can be further divided into: 3. Synergy of CLIR and MT Our method is based on the assumption that when we start to build a domain-specific system for sharing language resources, the amount of parallel corpora available is often limited. Our methodology uses a multilingual domain ontology as we argue that ontologies are well-suited for domain CLIR and can also be used for the adaption of the machine translation system. We presume an IR system and a MT system to be available. More specifically, our approach requires a hybrid MT system combining rule-based and statistical-based MT. The method consists of two phases, which will be discussed in this section in detail: the initialization phase and the bootstrapping phase. The initialization phase takes as the input a collection of domain texts or an existing monolingual domain ontology and produces as an output a lightweight multilingual ontology of the target domain. While this step is performed just once, the bootstrapping phase is repeated as many times as necessary. The bootstrapping phase takes as the input the multilingual ontology produced in the initialization phase and adapts the MT system by extracting domain specific translation rules from the ontology. As the amount of learning resources stored in the system systematically grows, a statistical module of the MT system can be applied at any time to extract bilingual pairs of domain terms from the available collection of learning resources. These pairs are then used to semi-automatically enrich the multilingual ontology, thus improve the performance of the CLIR and later also the MT system. The initialization phase can be further divided into: 1. Adaption of the MT dictionaries 2. Adaption of the multilingual ontology In the first step of the bootstrapping phase, the MT system is adapted to the domain using bilingual substitution rules of form tL1 → tL2 extracted from the multilingual ontology and satisfying the condition f (tL1 ) = f (tL2 ), where tL1 ∈ TL1 , tL2 ∈ TL2 and TLn is defined as TLn = {t|lang(t) = Ln }. For MT systems that translate using an interlingua, the term on the left hand side of a rule is a term in the language of the interlingua and the term on the right hand side is a term in any other supported language. For bilingual MT systems all combinations of terms are exploited and used for the generation of the translation rules. Supplying MT with rules extracted from the ontology can be also useful when a domain is accessed from a general-purpose search engine. IR systems can be equipped with a classification component that can: calculate the most probable domain of a document, select the most suitable domain ontology available, and extract the rules for adaption of the MT system. For the second step of the bootstrapping phase, let us assume that the content stored in our system grows over time. Each time a new learning resource is submitted, it is indexed and put into the document collection. The submitted learning resource may be a translation of an already existing resource stored in the collection. Such parallel texts can be automatically recognized (Resnik and Smith, 2003) and used by the machine translation system for training.2 1. Development of a seed monolingual ontology. 2. Extension of the ontology to multiple languages. The first step of our approach requires building a small monolingual domain ontology of concepts. For our purposes, we will define the monolingual ontology as a quadruple O = !C, T, E, f ", where C is a set of concepts 1 2 52 Note that this assumption is not always true. Most of the statistical MT systems require parallel corpora Figure 1: Collaboration of CLIR and MT. Translation rules are extracted from the multilingual ontology and are used to adapt the MT system. New terminology discovered in the statistical training phase is sent to the CLIR system which adapts the multilingual ontology. The updates are validated by a domain expert. The output of the statistical training is a set of quadruples of the form (tL1 , tL2 , conf, langq ), where conf is the confidence measure of translating term tL1 to tL2 estimated from text and langq : T → L is a mapping from terms to languages. The statistical model of the MT system is updated and the quadruples are sent to the CLIR system which uses the following algorithm to update the ontology: The algorithm requires one pass through the set of quadruples Q (line 2). During initialization a sufficiently high value of parameter τ is set (line 1). Each quadruple is first tested for the compatibility with the ontological language set and for its confidence (line 3). Later, it is checked whether the terms suggested by MT can be mapped to the ontology (lines 4 and 9). The ontology is then updated using the components of the quadruple (lines 5-7 and 10-12). Finally, the algorithm assembles the new ontology (line 16). When the ontology is updated, domain terminology administrators are made aware of the updates by the system and, if necessary, modifications can be performed (for example, new concepts should be added or better translation than the one proposed exists). Performed validation causes new pairs of rules tL1 → tL2 to be extracted from the validated part of the ontology and to be submitted back to the rule base of the MT system. As the amount of content grows, the system bootstraps and the performance of both MT and CLIR is improved. 4. Application in human genetics In this section, we describe an application of the method of Section 2 in the context of the Eurogene project, which provides an eLearning system for sharing learning resources in human genetics.3 The learning resources are submitted to the system typically in the form of slides, books and research articles represented in a variety of formats including Portable Document Format, Word, Power Point and many others. The Eurogene system also supports multimedia resources, such as images and videos in a number for training, however there have been research studies that investigated learning of multilingual terminology from non-parallel texts, such as in (Fung and Mckeown, 1997). 3 The system can http://eurogene.open.ac.uk/ 53 be freely accessed at of formats. Resources can be handled in nine European languages4 , which are English, German, French, Spanish, Italian, Greek, Dutch, Czech and Lithuanian. More than 30 universities and other institutions located mainly across Europe, but also in non-European countries are actively contributing to this collection. In Eurogene, the initial genetic ontology was developed by merging six monolingual ontologies5 that contained a descriptive, but not too extensive, terminology of the domain. This ontology was translated into the above nine European languages (English is used as an interlingua, i.e. it is used to label the names of concepts) by domain experts and an upper-level ontology has been inferred using Unified Medical Language System (UMLS). A more comprehensive description of the ontology building process can be found in (Zdrahal et al., 2009). The upper-level ontology helps to organize concepts from a relatively flat structure into a concept hierarchy, which is represented in the Simple Knowledge Organization System (SKOS) format which satisfies our definition of the ontology from the previous section. Figure 2 shows how a genetic concept linkage analysis is represented in our ontology. The multilingual ontology was then integrated with the CLIR system. Since then, available content is being annotated. Textual resources are annotated automatically, multimedia resources are annotated manually, but the annotation procedure is guided by the ontology. In the first part of the bootstrapping phase, rules were extracted from the multilingual ontology to adapt the MT system as described in the previous section. This typically helps to improve the performance of MT. For example, before the adaption, our system wrongly translated the English collocation linkage analysis to French as analyse de triglerie, whereas since the rule Linkage analysis → Analyse de liasion was extracted from the part of the ontology in Figure 2 and it was put into the MT rule base, the system has correctly translated the term as Analyse de liasion. The CLIR system is powered by Lucene extended with a dedicated query parser that allows the user to combine terminological and full-text queries. Queries can be expressed in any of the available languages, and the results can be filtered by a subset of the available languages. Queries are mapped to a language independent representation using the ontology. The CLIR system can also be used during query composition to visualize the concept hierarchy and to interactively control query expansion for broader and/or narrower terms (Figure 3), thus utilizing the benefits of ontology-based retrieval. A hybrid system developed by SYSTRAN is used for MT tasks, i.e. for the MT of resources and also for the learning of relations from parallel texts (SYSTRAN, 2009). The Figure 2: Representation of a concept linkage analysis in the multilingual ontology. The preferred label of this concept is the English version Linkage analysis. The concept has a two alternative representations in German (LinkageAnalyse and Kopplungsanalyse).7 The representation in French is Analyse de liasion and in Spanish Analisis de ligamiento. The concept Linkage analysis is a broader concept for Parametric linkage analysis and Non-parametric linkage analysis, and it is related to a concept Marker analysis. Figure 3: User interface of the Eurogene CLIR system. The CLIR system allows to control the expansion for broader/narrower terms. CLIR and MT systems communicate using SOAP messages that allow the sending of extracted translation rules from CLIR to MT, and the sending of newly proposed translations from MT to CLIR. When newly proposed translations are received by CLIR, the ontology is updated using the algorithm in Section 2. Domain experts then perform terminology validation which is supported by the system and results in sending new translation rules to the MT rule base. This synchronization provides a mechanism for continuous semi-automatic adaption of both CLIR and MT systems. 4 While CLIR allows to pose queries and receive results in any of the mentioned languages, MT is limited to language pairs supported by the Systran system. Please also note that MT is not applied to images and videos. 5 Published by the University of Washington in Seattle, National Institute of General Medical Sciences in Bethesda, Elsevier, Oracle ThinkQuest, University of Michigan and Centre for Genetics Education in Sydney 5. Performance analysis The performance of the proposed method and its impact on the resulting CLIR and MT systems can be influenced by a number of factors. These include mainly the suitability of the multilingual ontology for the target domain, 54 7. the amount of domain corpora available in the statistical phase, the performance of the multilingual keyword extraction system and the validity of the judgements performed by domain experts in the ontology refinement process. Given the number of possible error sources, it seems much more sensible to make sure that the method satisfies certain properties rather than performing a quantitative evaluation that would be biased by too many components. One of the important properties that the proposed method in Section 3 should have is that the performance of both CLIR and MT should never decrease as a result of any bootstrapping iteration. Let us assume that the initial ontology has been validated by domain experts, so that it does not include any spurious translations. There are now two tasks which could have a negative impact on the performance of the CLIR or MT systems. These tasks correspond to 1) the update of the MT rule base and 2) the update of the multilingual ontology as described in Section 3. If we assume that our domain is sufficiently small, so that no domain specific term appearing in the multilingual ontology is polysemous in our collection, then updating the dictionary of the MT system may either improve or not change the precision of the MT system. Since it is not possible to extract a spurious translation rule from the multilingual ontology, the resulting MT system cannot perform worse than before the update. It is essential to expect that the statistical training phase described in Section 3 may produce quadruples describing translations that are in fact invalid and may thus introduce errors to the ontology. However, since all the updates must be validated by domain experts before they can be used by the CLIR system, it is possible to assume that no errors are introduced. This is in reality difficult as humans are in fact vulnerable to introducing errors. Thus, the quality of the ontology used by CLIR can deteriorate only under the condition that an error has been introduced by a domain expert. To summarize, if all the above mentioned conditions are met, the method is guaranteed to improve or in the worst case not to worsen the performance of the CLIR and MT systems after each iteration. 6. Conclusion Multilingual ontologies are particularly suitable for domains where terminology is used for query composition, such as in eLearning. They can be used as a synchronization component for domain adaption of CLIR and MT systems. In addition, the solution is easily readable and adjustable by humans and does not preclude the use of statistical approaches for terminology extraction when a large corpora is available. In the future, publishing of multilingual ontologies on the Web in a standard format may allow an application to decide which domain ontology to use for query expansion and for adaption of the MT system based on the context of the query. This may be helpful when a user accesses a specific domain from a general-purpose search engine. 8. References Philipp Cimiano and Johanna Völker. 2005. Text2onto - a framework for ontology learning and data-driven change discovery. Susan T. Dumais. 1997. Automatic cross-language retrieval using latent semantic indexing. David Eichmann, Miguel E. Ruiz, and Padmini Srinivasan. 1998. Cross-language information retrieval with the umls metathesaurus. In In: Proc. of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 72–80. Pascale Fung and Kathleen Mckeown. 1997. Finding terminology translations from non-parallel corpora. Lothar Lemnitzer, Cristina Vertan, Alex Killing, Kiril Ivanov Simov, Diane Evans, Dan Cristea, and Paola Monachesi. 2007. Improving the search for learning objects with keywords and ontologies. In Erik Duval, Ralf Klamma, and Martin Wolpers, editors, EC-TEL, volume 4753 of Lecture Notes in Computer Science, pages 202–216. Springer. LTfLL. 2008. Language technology for lifelong learning (ltfll). Wen-Hsiang Lu, Ray S. Lin, Yi-Che Chan, and Kuan-Hsi Chen. 2008. Using web resources to construct multilingual medical thesaurus for cross-language medical information retrieval. Decis. Support Syst., 45(3):585–595. Zhiyong Lu, Won Kim, and W. John Wilbur. 2009. Evaluation of query expansion using mesh in pubmed. Inf. Retr., 12(1):69–80. Philip Resnik and Noah A. Smith. 2003. The web as a parallel corpus. Computational Linguistics, 29:349–380. SYSTRAN. 2009. Systran’s machine translation technology url: http://www.systran.co.uk/systran/corporateprofile/translation-technology. Zdenek Zdrahal, Petr Knoth, Trevor Collins, and Paul Mulholland. 2009. Reasoning across multilingual learning resources in human genetics. In Proceedings of ICL 2009. Implications for eLearning This paper showed that current eLearning applications supporting CLIR can also easily adopt MT and tailor it for their domain. In addition, the synergy of CLIR and MT may help to improve the performance of both. The main reason why the method is particularly useful in eLearning is that we should expect that the users of eLearning applications will very often use domain terminology as a part of their submitted queries, thus the added value will become more noticeable than in other contexts. The paper brought the following contribution: • Development of a new method for facilitating crosslanguage retrieval and machine translation by multilingual domain ontologies. • Development of a real-world eLearning application enhanced by the use of the presented method. 55