Enterprise search 2.0
More effective document retrieval thanks to semantic analysis
CELI integrates
semantic technologies into the DocDigger search engine.
DocDigger analyzes the contents of portals and knowledge bases and provides the end user with new and more effective ways of retrieving the documents contained therein.
The greatest strength of DocDigger lies in its ability to provide optimal results in specialized domains. Unlike general research engines, such as those represented online, DocDigger is capable of understanding documents, to a certain degree, and disambiguating the terms on the basis of the search domain.
This allows, for example, to distinguish between the musical meaning of bow (part of a violin), the archery meaning of bow, and the communication meaning of bow (inclining the head/body to salute).
DocDigger owes its precision to the integration of semantic analysis technologies, which are the culmination of decades of research and development.
In particular, in accordance with the most recent indications from the academic community, DocDigger integrates symbolic analysis techniques (using dictionaries, grammars, thesauri, etc.) with statistic analysis algorithms, targeted at supporting processes such as automatic document classification, clustering, etc.
From the point of view of the end user, such integration results in more precise retrieval of relevant documents and a notable reduction in search latency.
Characteristics of DocDigger
Thanks to the linguistic capablities of the Sophia Semantic Engine, it is possible to improve the searching of portals and knowledge bases.
- Free text search (allows the identification of keywords specified in user input, regardless of their morphological inflection).
- Identifying the concepts that occur most frequently in documents and further using them as a keywords to refine searching.
- Automatic summary of concepts present in a document (Snapshot View).
- Expansion based on conceptual similarity.
- Possible category view.
- Automatic classification.
- Clustering (the grouping of documents into classes by similarity, not explicitly specified by the user)
- Automatic extraction of entities and their use in the search phase (ex.: email addresses, dates, names of companies, names of people, numbers, etc. )
- Multilingual (languages currently available, with varying depths of analysis: Italian, English, French, Spanish, Catalan, Portuguese, German, Dutch, Swedish, Norwegian, Finnish, Danish, Polish, Russian, Belarusian, Estonian, Latvian, Lithuanian, Ukrainian, Greek, Turkish, Arabic, Hebrew, Armenian, Albanian, Croatian, Serbian, Czech, Slovak, Slovenian, Romanian, Bulgarian, Hungarian, Chinese, Japanese)
Facet browsing
DocDigger is based on the methodology of faceted classification, which allows it to surpass the limits
of traditional taxonomies.
This methodology introduces a multidimensional approach, on the basis of which, contents are described as functions of multiple "facets", and can thus be searched according to multiple criteria.
Multidimensional classification facilitates access to the contents, and thanks to the navigable taxonomy, it offers implicit suggestions for additional search routes, bringing itself closer to the users' needs and expectations.