The Eesti Murrete Korpus (EMK) consists of data on Estonian dialects. Although the area where Estonian is spoken is quite small, the number of dialects that are used is considerable. At least eight main dialects have been classified and more than hundred sub-dialects. Until now, only a few comparative studies on Estonian dialect grammar and phonology have been undertaken. This corpus aims to compile a database with information on Estonian dialects to facilitate (more) research on this subject.
The data that is included in the corpus is gathered and handled using the same principles. In this manner it is possible to compare phonological and grammatical structures in Estonian dialects.
The Corpus of Estonian Dialects is a joint project of the University of Tartu and the Institute of Estonian Language, which started in 1998. The corpus includes dialect data sources of the Institute of Estonina Language and of the Tartu University, starting with the oldest recordings.