COSER (Audible Corpus of Spoken Rural Spanish)

The Audible Corpus of Spoken Rural Spanish (after its Spanish abbreviation: COSER [1]) is a dialectal corpus based on interviews with informants who have been object of interest in the traditional dialectology: rural native speakers, elderly and with a low education degree. So far, 1,497 informants have been recorded, with the following distribution by sex:

Males 662 (44.2%) Females 835 (55.7%) Total: 1,497

The informants' global average age is 72.9 years old. COSER deals with a survey oriented towards informants, who have been born in the first third of the 20th century, and who have not received much instruction. On the whole, they have attended some years of elementary school learning, according to their declarations, "to read and write, and four more rules [on elementary mathematics]". The recordings within the COSER have been regularly obtained since 1990 up to recently in a series of surveys campaigns. This fieldwork has been organized by the support of several research projects and as a part of the fieldwork attached to the optional subjects "Hispanic Dialectology" (1988-1996) and "The Spoken Spanish: Peninsular Variants" (1996-2011), belonging to the Degree on Hispanic Philology in the Autonomous University of Madrid (UAM). Until now, 801 rural places of the Center and North of the Iberian Peninsula have been interviewed. The final objective is to obtain recordings of the Spanish language spoken in rural areas of the whole Iberian Peninsula. The localities surveyed so far appears in the map:

COSER localities.jpg

The audio materials include, for the time being, the central band of the Iberian Peninsula. Besides, the network density is comparable to that of the regional atlases or, even, larger. In general, COSER has nowadays circa 1,000 recording hours. All of them has been digitized. Some have been also transcribed as text files, thanks to the support obtained by several research projects and the participation of numerous students generations in the UAM, who have transcribed recordings that they collected themselves, as a part of their work in the academic course. Forty hours of these recordings and their transcriptions are available now at In 2012, 150 hours will be available.


The methodology used in COSER has consisted in sociolinguistic interviews, aimed by part of the interviewers at some subjects of traditional country life. The fact that the interview is focussed on such specific subjects does not prevent that, after some time and having gained the informant’s confidence, interest is aimed at other subjects, such as education, personal hopes and experiences, life or family, depending on the level of easiness and spontaneity shown by the informant. The decision of focusing the interview on specific subjects related to rural life “of former times” has much to do with the fact that, in order to accept to be interviewed, potential informants must prove to have some knowledge about a way of life in decline. This knowledge is a product of their own personal experience and age and gives them informative "authority" in front of the urban interviewer. Informants accept the interview as they realize that we are interested in a testimony on a way of life in decline about which very few have hardly any memory at all and which they know they are expert on. We think that the informants’ spontaneous cooperation would be much more difficult if they would be required at first to be interviewed on personal views or experiences, linguistic matters or other aspects beyond rural life. The fact that the interviewing team has insisted on their specific interest in the strictly local tradition, in contrast to that of other rural enclaves, as well as in the exclusive informant’s condition as recipient of such tradition, has been on many occasions a decisive factor for accepting the interview. Informants are always randomly contacted, with no previous actions, among the local inhabitants fulfilling the above mentioned requirements. Due to the experience, not much gratifying, of some interviews on account of the informants’ low communication ability (people not much willing to speak, who answered with very short sentences or just in monosyllables) led us to add subsequently the condition of loquacity (“that the informants like talking”) to the informants’ selection protocol. As it will be obviously well-known to anyone who has ever carried out fieldwork, success is never assured, and an interview starting under the same conditions may be optimum or dreadful. Thus, not all interviews are equally suitable or informative, depending on the informants’ willingness, the interviewers’ skills as well as the interaction between them; however, no testimony should be disregarded for that reason. This methodology can not avoid the problem of accommodation between the informant and the interviewer, or the challenging representativeness of the informant ramdomly chosen. Nevertheless, we think that the quantity of the data allows to circumvent these potential problems, since the data always show geographical coherence and make it possible to discard those informants who could be considered anomalous with their area. Regarding the number of informants of each enclave, in general one single person has preferably been thoroughly interviewed in COSER, either a man or a woman. Nevertheless, recording conditions have sometimes not allowed to avoid interruptions from other individuals (generally members of the family or acquaintances who, drawn by such an extraordinary event as the interview, cannot resist the temptation to take part in the interview by giving their own testimony.) Thus, although up to 1,497 informants have been recorded in COSER, most of the times only one informant per enclave has actually been thoroughly surveyed as desired (almost the half). The average duration of the recordings is one hour and fifteen minutes (75 minutes) per enclave, although it may range from just half an hour up to more than two hours and a half. The quality of the data recorded is not directly proportional to the duration, since there are excellent and very informative recordings of just half an hour, whose results are comparable to those obtained in a longer session.

Utilitiy and limitations

COSER is a corpus aimed to measure the differences which may be found in the speech of sociocultural groups with a lower education in rural areas. It is therefore a complement to both linguistic atlases and to the different corpora of cultivated and urban speech which have been compiled or are planned to be so in the Spanish-speaking world. The uniformity in the methodology used makes it useful to measure both the linguistic distance which separates different areas (physical distance) and the linguistic distance which separates this social group from others, like for instance, that of speakers with a higher sociocultural level or that of younger speakers (social distance). Although the proportion of men and women interviewed is not identical (55,7 % women vs. 44,2% men), the number of speakers of each gender is statistically representative and also allows to investigate linguistic differences associated with gender. The fact that the media are the sources of most Spanish oral corpora lends some singularity to the COSER, since the interviewed speakers for COSER are rarely recorded in this field. The comparison between the data obtained in COSER and in other corpora of spoken Spanish enables thus to point out clear sociocultural differences. In this regard, COSER has proved especially useful since it provides the study of non-standard grammatical solutions, which are usually systematically avoided in written language and in the speech of sociocultural groups of higher education. For that reason, Chambers (1995) has proposed, as a sociolinguistic universal, the qualitative character (presence/absence) of grammatical variables in the social scale, in contrast to the quantitative character of phonetic variables.

Research lines and publications

COSER materials have made it possible to research some aspects of the grammatical variation in Spanish, whose results have been periodically published [2]. The research topics have been the following: accusative / dative clitics alternation, mass neuter, clitic order, subjunctive / indicative variation, double determination, personal infinitives for 3rd person plural subjects, reflexive passives and reflexive impersonals, analogical verb forms and lexical variation in adverbs.

Research team

Inés Fernández-Ordóñez Hernández, Project Director and Main Researcher

Enrique Pato Maldonado, Researcher

Javier Rodríguez Molina, post-doctoral Researcher

Bautista Horcajada Diezma, ICT Developer

Carlota de Benito Moreno, PhD student Dialectology

Víctor Lara Bermejo, PhD student Dialectology

Beatriz Martín Izquierdo, Research Assistant

Sara García Motilla, Research Assistant

