home > scenarios > digital libraries
Digital libraries
Advanced and multilingual research in digital libraries and OPAC
Users of digital libraries or OPACs are accustomed to using search engines such as Google or Yahoo as their primary source for information retrieval.
The user interfaces of traditional OPACs are not designed with these types of searches in mind, focusing primarily on the categorization of data rather than data retrieval.
Thanks to a close collaboration with international libraries and research centers (particularly the department of Computer Science of the University of Bolzano), CELI has created a solution that can be integrated into a pre-existing OPAC, while at the same time guaranteeing the simplest possible retrieval of the texts that interest the user.
The approach is based on two guiding principles:
- Simplicity: a text area must be large enough to guarantee the user access to the desired information. Recent studies have shown that more complex forms of querying and navigation are often poorly, if at all, utilized.
-
Intelligent processing: in spite of the simplicity of the interface, an indexing and retrieval system, like the one used in the areas of Digital Libraries and E-Catalogs must rely on state-of-the-art technologies in order to resolve the kinds of problems that simply don't exist in a normal text retrieval context.
Ease of use and intelligent analysis
Thanks to the linguistic capabilities of the Sophia Semantic Engine, it is possible to improve searching in digital catalogues and/or text collections in electronic format:
- lemmatization and morphological analysis (the queries "tropical plant" and "tropical plants" produce the same results)
- using thesauri to refine search results (the query "tropical plants" also yields results that contain the words "tropical trees")
- automatic recognition of proper nouns (the query "Buffalo Bill" will provide documents discussing the American soldier and historical figure as its first results, rather than documents about the animal 'buffalo' or the legal term for a proposed law, 'bill')
- weighted indexing of various fields in the catalogue and texts in electronic format
- integration with the various forms of subject glossaries used by a library
Multilingual research
Most libraries have foreign-language texts in their catalogues.
Furthermore, the mobility of researchers and other "knowledge professionals" has increased. In a day-to-day context, many libraries must confront the problem of multilingual research. The necessity of guaranteeing multilingual access to catalogues becomes a necessity in the case of meta-OPACs on a European level.
Thanks to decades of activity in the "Cross Language Information Retrieval" sector,
CELI is able to provide systems that offer multilingual approaches to electronic catalogues and digital libraries, which are based on the same simplicity of access provided in the respective monolingual versions. The end user simply chooses the languages of the books he wishes to retrieve and enters a query in his native language; the system then finds the best results for each of the selected languages.
- a vast array of bilingual dictionaries for query translation: Italian, English, French, German, Polish, Russian...
- automatic updating of bilingual dictionaries based on user queries.
- semantic disambiguation to identify the correct translation in a certain domain.
- multiple language expansion of queries targeted at identifying relevant digital objects, even when an exact translation of the words in the user query is not available in the lexica.
- indepedence from the underlying cataloguing system: the system uses various national classification systems (for example glossaries, subject headings, etc.) in order to improve results, but the response strategy to multilingual queries does not depend on the glossary utilized.