Chapter 3: Transcription protocol

From Dialectsyntax
Jump to: navigation, search



This protocol is a manual for transcribing dialect material from oral interviews that take place between an assistant interviewer and an informant.

This protocol is also available in PDF format.

The audio material collected consists of:

  • Training of the assistant interviewer;
  • Questions uttered by the assistant interviewer;
  • Informal conversation in the dialect (5-10 mins.);
  • The administration of the questionnaire (where the recorded questions are being played);
  • Questions by the field worker.

Of all this material, at least the administration of the questionnaire is transcribed. The instruction of the assistant interviewer, the questions of the field worker and the informal conversation will not be transcribed during the SAND project. These parts will nevertheless remain available as sound material. Facts and comments in these parts that are relevant for the SAND-research are written down in a short report that the field worker has made of each interview.

General remarks

If you have doubts about the correct application of the instructions in this protocol, present the problem to the others. To guarantee uniformity of the transcriptions it is of great importance that everyone transcribes in the same way. Hence, consultation is necessary.

As a general rule, the SAND-data are transcribed in Standard Dutch. ‘Het Groene Boekje’ represents the standard. When you hear biek or buuk for boek (‘book’), you transcribe boek. Functional elements receive a special transcription; you can read more about this in section 6, which also discusses other deviations from the Standard Dutch spelling. For words not contained in ‘Het Groene Boekje’ that are not functional words (that is, dialect words such as stoet for brood (‘bread’), you can think up your own spelling. To achieve internal consistency in the spelling of dialect words, every transcriber keeps a log in which the decisions (s)he has taken are written down.


The programme Praat is used for the transcriptions. It allows the creation of several tiers, so that for instance different speakers can be distinguished. On the tiers you can insert boundaries, so that the transcription is cut up into little pieces with a maximal time length of 10 seconds. Always insert a boundary on the tier at the start of a new sentence. If the sentence lasts longer than 10 seconds, you must insert one or more boundaries within the sentence. Make sure that a boundary does not fall in the middle of a word.

For the interviews we use five tiers:

1. Assistant interviewer
2. Informant
3. Field worker
4. Cluster
5. Commentary

If more informants are present, an extra informant tier is added.

1. Assistent interviewer tier
The assistant interviewer tier contains all speech of the assistant interviewer, including the questions and instructions recorded on tape. In the next stage, this tier will be provided with grammatical tags and Dutch lemma’s, and will subsequently be syntactically annotated.

2. Informant tier
The informant tier contains all speech of the informant. In the next stage, this tier will be provided with grammatical tags and Dutch lemma's, and will subsequently be syntactically annotated.

3. Field worker tier
The field worker tier is used for comments of the field worker during the interview. The expectation is that this tier will for the most part remain empty, as the field worker is not supposed to take part in the interview.

4. Cluster tier
Clusters of verbs and pronouns and clusters of complementizers and pronouns will be transcribed as clusters in the assistant interviewer tier and informant tier. See section 6.2 for the guidelines. In the cluster tier, transcribers provide an initial analysis with respect to the splitting up of these clusters.

5. Commentary tier
The commentary tier is used to make comments about badly audible parts of the interview or for other cases in which you have doubts about the transcription.

In addition, you can indicate which parts are not transcribed because they do not contain linguistic information. You do this by inserting a start and end boundary and add ‘n. a.’, which means that this part is not applicable to the SAND-project. An example of such a part is an extensive discussion between assistant interviewer and informant about the use of the word stoet (for brood, ‘bread’) which can be used in some but not all contexts. If this part only consists of one sentence, it is transcribed.

The commentary tier is also used to mention special sounds for which one suspects syntactic relevance, such as nasalization, glottal stops, lengthening, shortening. See section 6.3.

Start, end, question and answer codes

Questions and answers are provided with a code as indicated in the table below. Intended with ‘question’ is: instruction + sentence from the questionnaire. Intended with ‘answer’ is: the actual answer to the question from the questionnaire. In case of multiple answers, each answer is assigned a start and end code. All remaining sentences that appear in the question or answer start with a capital and end with a full stop or a question mark. Syntactically irrelevant digressions are not transcribed.

Table 1: Start, end, question and answer codes

Start of sentence
Capital All
End of sentence
. All
End of question
 ? All
Start of question
Assistent interviewer Insert the number of the question followed by a space at the start of a question from the questionnaire
End of question
Asstistent interviewer Insert slash (poss. number of question) and space directly after the question mark.
Start of answer
Informant Insert the number of question followed by a space at the start of the actual answer.
End of answer
Informant Insert a slash (poss. the number of the question) and a space directly after the full stop of the actual answer.
'yes' to 'Does this occur'-question
Informant Add to the usual start code for an answer a y, directly after the number of the question. (The end of the answer has the standard end code for answers.)
'no' to 'Does this occur'-question
Informant Add to the usual start code for an answer a n, directly after the number of the question. (The end of the answer has the standard end code for answers.)
Most common sentence of a series
Informant Add to the usual start code for an answer a g, directly after the number of the question. (The end of the answer has the standard end code for answers.)

Punctuation marks and other marks not representing sounds

In general: only the marks below are used, so no comma’s, quotation marks, exclamation marks, etc. Beware: full stops, question marks and numbers are unique codes (see table above) and should not be used for any other purpose. All figures are written in letters.

List of other allowed marks:

- Hyphen at the end of unfinished sentences
xxx Indicates a part (of a sentence) that is difficult to decipher
xx Indicates a part of a word that is difficult to decipher
ggg Indicates clearly audible sounds from the speakers such as laughter, crying, shouting, coughing, etc.

Deviations from the Standard Dutch spelling

Functional elements

We make a distinction between lexical and functional elements. Functional elements are transcribed as precisely as possible.

Guidelines for the transcription of functional elements

- If one or more sounds do not appear that are present in the Standard Dutch spelling, these sounds are not written down;
- If one or more sounds are present that do not appear in Standard Dutch spelling, these sounds are written down;
- If other sounds are audible than those suggested by the Standard Dutch spelling, these other sounds are written down;
- If you do not know which letter to use to represent an extra or different sound, consult the group

Some examples (from Dutch):

(1)    wa      denk   je     wien     ik   zien   heb    in  de   stad
         what   think   you  who-n  I     seen  have  in  the  city

(2)    variants like zwemm, zwemme, zwemmen and zitn, zitte, zitten for Standard Dutch zwemmen 'to swim' and zitten 'to sit', respectively.

List of functional elements

Below a list of functional elements is given. The terminology of the Algemene Nederlandse Spraakkunst (ANS, a grammar of Dutch) is used as much as possible. The ANS can therefore be used as an aid during the transcription process to check if a word or morpheme is a functional element. When in doubt, consult the SAND-group. If a word or morpheme does not belong to the elements below, it is lexical and transcribed according to Dutch orthography.

  • Adpositions (prepositions and postpositions):
    • Preposition and complementizer are written as two words. So: Voor dat je naar huis gaat (and not: Voordat je naar huis gaat) 'Before you go home'. 
  • Inflection (morphological realization of properties like number, gender, person, definiteness, case, tense, mood (note that this list is not exhaustive):
    • Verbal inflection (including prefixes to participles and vowel change in the verbal paradigm)
    • Complementizer agreement
    • Morphology on determiners, pronouns, adjectives, nouns which suggest that they encode case, person and/or number distinctions
    • Auxiliaries (of modality, aspect, tense and passive)
    • Determiners
    • Negative elements
    • Count words
    • Complementizers
    • Pronouns
      • Demonstrative pronouns
      • Relative pronouns
      • Possessive pronouns
      • Indefinite pronouns
      • Personal pronouns
      • Reciprocal pronouns
      • Reflexive pronouns
      • R-pronouns, such as er 'there', hier 'here', daar 'there', ergens 'somewhere', overal 'everywhere', nergens 'nowhere'
      • Declamative pronouns
      • Interrogative pronouns


In the cases below, a cluster is transcribed unanalyzed (i.e. without indicating the morpheme boundaries) on the assistant interviewer and informant tier; a cluster ends before the following first lexical word (see above for the distinction between lexical and functional elements). It is possible that the list is not exhaustive. When in doubt, consult the SAND-group.

  • Clusters consisting of a complementizer and one or more pronouns. 
    Examples: danzezunder (that-they-they), dase (that-they), ovvie (if-he).
  • Clusters consisting of a verb and one or more pronouns.
    Examples: loopse (walk-they), lopie (walks-he), geefeketem (give-I-it-him).
  • Clusters consisting of a verb, negation particle and possibly pronouns.
    Example: keneen oast nietsnie gezeid.(nobody has.2sg nothing-not said).
  • Complex negation clusters, such as nietsnie (nothing-not) above, are also represented as clusters.

On the specific cluster tier the transcriber does provide a subdivision of the cluster. This splitting up has the status of a first guess, not of an accomplished fact. This means that in splitting up the cluster the transcriber follows his/her first intuition and does not start a mini-investigation into the correct analysis of the cluster. Examples of subdivision: dan ze zunder, geef ek et em, ek en een.

Special sounds

Incidentally, three sounds or sound properties will appear that possibly have syntactic relevance. These involve nasalization, glottal stops and vowel lengthening.

  • Nasalization is indicated with an n.
    Example: dã is written as dan.
    Syntactic relevance of nasalization is written down on the commentary tier.
  • Glottal stops are not represented. If you suspect syntactic relevance, indicate this on the commentary tier.
  • Lengthening and shortening of vowels can be syntactically relevant in verbal and nominal paradigms but require an extensive analysis of the paradigm as a whole and are therefore not represented. If you suspect syntactic relevance, indicate this on the commentary tier.
  • The schwa is represented with an e, also with weak pronouns, as in Standard Dutch.
    Examples: werkte (work-te) ‘worked’, de ‘the’, em ‘him’, et ‘it’, ze ‘she/they’, er ‘there’.
    Exception: the determiner een ‘a(n)’ is written as een when pronounced with a schwa to distinguish it from the complementizer en ‘and’. When it is pronounced as één, it is written with two accents, just like the count word.
  • Linking phonemes will be transcribed.

Separable compound verbs

In contrast to the guidelines for Standard Dutch, the two parts of a separable compound verb are written down disjointly.

Dit probleem is niet eerder voor gekomen.
'This problem has not occurred before.'

Jan wilde Marie op bellen.
wanted Mary
'John wanted to call up Mary.'

Het vensterluik is open gewaaid.
the window shutter   
'The window shutter is blown open.'

Separable verbs can be identified by using them as a finite form in a main clause: the two parts are then separated.

Dit probleem komt nooit voor
'This problem never appears.'

Inseparable compound verbs are written as one word.

Dit probleem kan niet worden voorkomen.
'This problem cannot be prevented.'
(Test: Jan voorkomt dit probleem.)

Pronominal adverbs

Pronominal adverbs are words that consist of an R-pronoun (er ‘there’, daar ‘there’, hier ‘here’, waar ‘where’, overal ‘everywhere’, ergens ‘somewhere’, nergens ‘nowhere’) and a preposition. These two elements are written disjointly.

Marie schrijft daar mee.
Mary   writes
'Mary writes with that.'

Combinations of a preposition and a complementizer

This involves forms of the type voordat (for.that) 'before', omdat (around.that) 'because', nadat (after.that) 'after', etc. These are written  disjointly: voor dat, om dat, na dat.


Interjections are not directly important for the SAND project and different spellings of these words will not lead to great problems. The table below can serve as a guide for the spelling of interjections.

ah bwa hu oesje tut ('tut tut')
aha ei, eikes hum oh, o uh, uhm
ai goh jee ('o jee') oho uhu
au ha, haha mm-hu poeh wauw
bah hè, hé, hei mmm pst who
boe ho oeh, oei sjt, sst zuh, zulle

1. ‘Het Groene Boekje’ (The Green Booklet), officially called Woordenlijst Nederlandse Taal (Wordlist of the Dutch Language), is a publication of the Institute of Dutch Lexicology and is generally accepted as the standard for Dutch spelling.
Personal tools