About

Sõnaveeb [WordWeb] is the language portal of the Institute of the Estonian Language (EKI) containing lexical information from a growing number of dictionaries and databases. The portal was released in February 2019. The information displayed in Sõnaveeb comes from Ekilex, a Dictionary Writing System maintained and developed by the Institute in collaboration with the software company TripleDev. As of February 2024, Ekilex contains 130+ lexical databases: general as well as specialised dictionaries. Databases are constantly updated and edited, including changes that are made upon receiving feedback from users. As of February 2024, the portal contained about 300,000 words and phrases in Estonian, about 220,000 words and phrases in Russian and 116,000 in English. The versions of Sõnaveeb are updated and archived once a year.

Since 2020 all information from separate lexical databases is displayed in a unified mode as a single source titled EKI ühendsõnastik [The EKI Combined Dictionary, CombiDic]. The EKI Combined Dictionary displays information from different lexical databases: "The Dictionary of Estonian 2019", "Estonian Collocations Dictionary 2019", "Basic Estonian Dictionary" (2014), "The Estonian Morphological Database of the Institute of the Estonian Language 2019–". It displays also information from bilingual lexical databases: "Estonian-Russian orthographic dictionary for students 2018" (1st edition 2011), "Estonian-Russian Dictionary 2018" (1st edition 1997–2009), "The Russian Morphological Database of the Institute of the Estonian Language 2019–".

In addition to carefully selected usage examples we display web examples from 'The Corpus of Web Examples for Estonian' via the corpus query system KORP API.

The creation and development of the portal was funded by the Digital Focus Program of the Ministry of Education and Research (2018–2021), by EKI-ASTRA program (2016–2022) and by the The State Shared Service Centre (24.01.2022 - 31.10.2023).

Technical support: OÜ TripleDev.

Copyright: Institute of the Estonian Language 2024

Estonian data

Information about Estonian is displayed when Estonian is selected as the target language. The user can choose between the two modes of information display: Sõnaveeb or Learner's Sõnaveeb.

Sõnaveeb displays all available information on the word that you are looking for.

Learner's Sõnaveeb shows less information, there are fewer words here, and the information is presented in a simpler way: the explanations are shorter, there is less additional material, fewer additional explanations and comments.

Learner's Sõnaveeb

The user can currently choose between two modes of information display: Sõnaveeb (for advanced users) or Learner's Sõnaveeb (for language learners). Sõnaveeb is intended primarily for native speakers. It displays all the information on a word that comes from different sources. The advanced mode is a sophisticated view that might require more options for further filtering. The Learner's Sõnaveeb is intended primarily for learners at the A2–B1 proficiency levels. It shows 6,000 basic Estonian words and information is presented in a simpler way: the definitions are shorter, knowledge is organized using controlled vocabulary, there is explicit information about the most frequent morphological forms, etc.

Pronunciation

Users can listen to the pronunciation of about 5,000 of the most frequent headwords, as well as their most important inflected forms, and of about 7,000 unadapted loan words. The information on pronunciation has been aggregated from different datasets. In the case of unadapted loan words, we used Estonians who speak foreign languages at high proficiency levels. For the pronunciation of the most frequent words and their inflected forms, we used professional actresses.

Users can also listen the usage examples chosen by lexicographers. Text-to-Speech synthesis has been developed by the Institute of the Estonian Language.

Dictate!

Speech recognition, developed by the Department of Cybernetics of the Tallinn Technological University, is used when dictating words. Speech recognition operates in real time. For optimum quality, users have to pronounce the search word clearly and steadily.

Morphological forms

The information on the morphological forms of Estonian and Russian come from "The EKI Estonian Morphological Database 2022".

Collocations

Collocations are words that are often used together in a language. Each language has special combinations of words, the knowledge of which is essential in order to be able to speak and write in that language naturally and fluently. The purpose of displaying collocations is to help primarily language learners and assist them in using the language similar to that of native speakers. The information on collocations comes from the dictionary database "Estonian Collocations Dictionary 2018". The collocations have been semi-automatically selected from the Estonian National Corpus (2013 and 2017).

Usage examples

Usage examples have been carefully selected from the Estonian National Corpus (2013, 2017, 2019, 2021) and minimally edited by the lexicographers.

Web sentences

In Sõnaveeb, authentic examples from the corpus are displayed. They have been automatically selected and they have not been edited. The examples are queried from the 'The Corpus of Web Examples for Estonian' via the Corpus Query System KORP API.

The examples for Russian are queried from the ruSkELL 1.6 corpus via Sketch Engine JSON API.

Picture Dictionary

The EKI Picture Dictionary is a multilingual tool suitable for language learners at both basic and advanced levels. Through the Picture Dictionary, one can learn Estonian, Russian, and Ukrainian languages. The dictionary contains approximately 1000 pictures, organized into 52 themes (e.g., vehicles, fruits and berries, sports, school, and office). The Estonian version of the Picture Dictionary is audio-enabled, allowing the pronunciation of all words to be heard. The pictures are accompanied by usage examples. Additionally, users can navigate directly from the Picture Dictionary to the Learner's Sõnaveeb, where they can explore the meaning of a word, its synonyms, collocations, and learn how to inflect or conjugate the word.

The EKI Picture Dictionary was compiled by Jelena Kallas, Kristina Koppel, Olga Kiisla, Ellen Dovgan, with artists Kristel Karp and Anni Piits-Jamnik, technical support by Katrin Tsepelina, and TripleDev OÜ.

Copyrights

The copyright of 'EKI ühendsõnastik' [Combined Dictionary of Estonian] belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2024

The copyright of 'EKI terminibaas Esterm' belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2024

The copyrights of terminological databases belong to the authors
Copyright: authors

The copyright of the language portal Sõnaveeb belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2024

The copyright of the dictionary system Ekilex belongs to Eesti Keele Instituut [Institute of the Estonian Language]
Copyright: Eesti Keele Instituut 2024

Referencing

A word or a phrase in 'EKI ühendsõnastik 2024' should be referenced as follows:

Word or phrase. EKI ühendsõnastik 2024. Eesti Keele Instituut, Sõnaveeb 2024. https://sonaveeb.ee/sõna või väljend (DD.MM.YYYY)

A word or a phrase in the terminological database should be referenced as follows:

Word or phrase. The name of the database. Eesti Keele Instituut, Sõnaveeb 2024. https://sonaveeb.ee/sõna või väljend (DD.MM.YYYY)

Versioning

The EKI ühendsõnastik [Combined Dictionary of Estonian] and the terminological databases are constantly updated and edited. The new versions of are created and archived once a year and marked by the date.

All versions are registered in METASHARE, archived in the Centre of Estonian Language Resources and in Ekilex database.