Chapter 4: Tagging protocol

From Dialectsyntax
Revision as of 10:42, 12 January 2016 by Admin (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

The tagset that is used in the Edisyn search engine can be viewed here (note that this is work in progress). This tagset is used to label the parts of speech of (dialect) databases. The document shows how the tags of the various databases are connected to those of the Edisyn search engine. In the column 'Edisyn search engine' the tags are taken up which are used in this search engine. The other columns show the tags that apply to each individual database. Per row the correspondence between a tag of a database and that of the search engine is made visible.
The tags of the Edisyn search engine consist of two parts, a linguistic category (e.g. V verb) which may be modified with one ore more feature(s) (e.g. 1,s first person singular). In the search engine one can search via categories or features or both. In order to make many databases interoperable the categories and features are somewhat general. An argumentation of the tagset can be opened here.

The protocol below is a manual for performing Parts of Speech tagging. It was developed by Sjef Barbiers and Guido Vanden Wyngaerd, for the SAND-project (Syntactic Atlas of Dutch Dialects), but can be useful for other dialect research groups/projects.

This protocol is also available in PDF format.

Contents

Introduction

This tagging protocol provides an overview of the tags that were used during the parts of speech tagging of the SAND-project (Syntactic Atlas of Dutch Dialects). Every tag is represented both with a numeral code and with one or more capitals. In the tagging application, which assigns the tags semi-automatically, the number and capital codes are two alternative but equivalent ways to assign a tag to the transcription.

Every tag has the following format: Category, Attribute, Value, Specification, Specification. For example: V FEAT FIN PT 1.PL; Category = V (verb); Attribute = FEAT (feature/characteristic); Value = FIN (finite); Specification = PT (present tense); Specification = 1.PL (first person plural). Category, attribute, value and specification are marked in capitals. If these capitals are between brackets, the marking/filling-in is optional.

Every tag corresponds to a five digit code. The structure is as follows. The first digit indicates the category (for example 1 = N, i.e. noun). The second digit indicates the attribute (for example 3 = CASE, i.e. case). The third number marks the value of the attribute (for example 1 = OBL, i.e. oblique). The fourth and fifth numbers specify the value (for example 2 = DAT, i.e. dative). A zero marks depending on its position no category/no attribute/no value/no specification. The number code 13120 thus corresponds with the tag N CASE OBL DAT.

Every tag is followed by a short description of category/attribute/value/specification and, if necessary, an illustration (examples). In most cases it will be necessary to assign more than one number code to a word. For example: blackberries in a bucket (of) blackberries gets code 111000: N INFL -es (noun with inflection -es), plus the code 12300: N POS POST-N (noun in postnominal position).

The tagset is inspired by the tagset used in Corpus Gesproken Nederlands (Corpus Spoken Dutch - F. Van Eynde, Part of Speech Taggingen Lemmatisering, Centre for Computational linguistics, K.U. Leuven, 2000.), which is based on the EAGLES standard for tagsets. The SAND tagset differs in a number of ways from both CGN and EAGLES tagsets. These differences will be mentioned and illustrated in this document, whenever necessary.

0. Uncertainty tag 00000 O No tag. Use this code if the category of the word is not clear. This code is not to be used if the category is clear, but the attribute, value or specification is not. In this latter case, the 0 is to be inserted in the position of attribute/value/specification.

Noun

10000 N Noun

2.1

Inflection

11000 N (INFL) inflection.
The existence of this attribute indicates the occurrence of an audible morphological marking for categories like: person, number, gender, case, definiteness, etc. inflection doesn’t refer to zero morphemes or diminutive suffixes. The following rule applies: inflection is only tagged as such, if the word can also occur without the morpheme. The following tags give information about the inflection found:
11100 N (INFL) -(e)n
11200 N (INFL) -(e)t
11300 N (INFL) -e
11400 N (INFL) -(e)s
11500 N (INFL)-st
11600 N (INFL) OT Other inflectional morphology

2.2

Position

12000 N POS Position
12100 N POS PRE-N Prenominal
For example a few books. We call few prenominal , because books is the head of the phrase, as follows from the agreement on the finite verb (there are a few books at the table).
12200 N POS N Nominal
This is the 'ordinary' use of the noun, as the head of an (argumentative) noun phrase.
12300 N POS POST-N Postnominal
For example: a bucket (of) ''blackberries. We call blackberries postnominal, because bucket is the head of the phrase, as follows from the agreement on the finite verb (There is a bucket at the table.)
12400 N POS FREE Tag for nouns that are not part of an argumentative noun phrase.
12410 N POS FREE PRED Predicative
For example: John is (a) doctor/mayor.
12420 N POS FREE ADV Adverbial
For example: Zondags gaat zij naar de kerk , '[lit:] Sundays goes she to the church'.

2.3

Case

13000 N (CASE) Case
Case is assigned to nouns only if the flection is audible. So, nominal nouns don't get this attribute.
13100 N (CASE) (OBL) Oblique
All nouns that have audible case flection, that isn’t genitive, will be assigned the value ‘oblique’. This value can be specified as ‘accusative’ or ‘dative’. If it’s not clear whether a consituent is accusative or dative, oblique will not be specified.
13110 N (CASE) (OBL)(ACC) Accusative
13120 N (CASE) (OBL)(DAT) Dative
13200 N (CASE) (GEN) Genitive
Value for nouns with genitive flection.

2.4

Person

14000 N Person Person
14300 N Person 3 All nouns are third person. This tag will be the default.

2.5

Number

15000 N Number Number
15100 N Number S Singular
15200 N Number PL Plural

2.6

Gender

16000 N (GENUS) Grammatical gender
As the gender of a word is not always clear, this attribute is optional (can but need not be assigned).
16100 N (GENUS) Z Non-neuter
16110 N (GENUS) Z (M) Masculine
16120 N (GENUS) Z (F) Feminine
16200 N (GENUS) Neu Neuter

2.7

Function

The function of a DP (determiner phrase, noun phrase) is normally not incorporated in the tagging process. However, the assignment of the values subject (SUBJ), object (D-OBJ), indirect object (I-OBJ) and prepositional object (P-OBJ) is essential in order to be able to search the database, at a stage where (full) syntactic annotation is lacking. This value is assigned to the noun.

17000 N (FUNCTION) Grammatical function
17100 N (FUNCTION) SUBJ Subject
Is assigned if the DP agrees with the finite verb in person/number.
17200 N (FUNCTION) D-OBJ Direct object
Applies when the DP is the direct object.
17300 N (FUNCTION) I-OBJ Indirect object
Is assigned when the DP is the indirect object (without a preposition).
17400 N (FUNCTION) P-OBJ Object of preposition
Applies when the DP is the complement of a preposition.

Adjective

20000 A

3.1

Inflection

21000 A (INFL) Inflection
This attribute indicates the existence of an audible morphological marking for categories like: person, number, gender, case, etc. The s (f.e. iets moois, 'something beautiful') is counted as flection. This is not the case for degrees of comparison (comparative, superlative), zero morphemes, diminutive suffixes. The following rule applies: inflection is only tagged as such, if the word can also occur without the morpheme. This attribute has the following values:
21100 A (INFL) -(e)n
21200 A (INFL) -(e)t
21300 A (INFL) -e
21400 A (INFL) -(e)s
21500 A (INFL) -st
21600 A (INFL) OT Other inflectional morpheme

3.2

Position

22000 A POS Position
22100 A POS PRE-N Prenominal
For example: a beautiful book. Words like many and few are categorized as adjectives (not as quantifiers), because they can be used in degrees of comparison, and with adverbs of degree (very, extremely, almost).
22110 A POS PRE-N ELL

Prenominal with ellipsis
As, for example in the second adjunct of He has a white plate and I a green (one). The sentence doesn't need to contain the antecedent per se: Ik heb gisteren een witte telefoon gekocht. Hij past beter bij het interieur dan de groene. I bought a white telephone yesterday. It fits into the anterior better than the green (one)'.

22200 A POS N Nominal
Nominal (or substantive used) adjectives are not treated as substantives, but as adjectives. Arguments in favor of this method are a.o.: the existence of comparative and superlative forms (f.e. de ''ouderen, 'the elderly', de ''rijksten, 'the richest (people)'), the compatibility with adverbs of degree (de zeer rijken, 'the very rich'), and the fact that plural marking is different with nominal adjectives than with substantives.
22300 A POS POST-N Postnominal
F.e. kindeke ''teer, ‘child fragile’, alle rivieren bevaarbaar in de winter, ‘all rivers navigatable in winter’, niets ''bijzonders, ‘nothing special’, iets groters, ‘something bigger'.
22400 A POS FREE Adjectives that are not part of a DP
22410 A POS FREE PRED Predicative
Predicative adjectives are adjectives that function as a subject complement, f.e. het schip is ''schoon, ‘the ship is clean’, or as secondary predicate, in other words as predicative adjunct. There are three types of predicative adjuncts: Hij veegt het schip ''schoon, ‘he wipes the ship clean’ (resultative), Hij vindt Marie ''aardig, ‘[lit.:]He finds Mary nice’/'He likes Mary’ (predicative), Hij gaf de tas leeg terug,’he returned the bag empty’ (depictive). Separable prefixes (from verbs) are also assigned this tag, if they have the same form as an adjective, whether they are separated from the verb or not (f.e. Hij drinkt de beker ''leeg, lit: ‘He drinks the mug empty’, dat hij de beker leeg drinkt, '[lit:] ‘that he the mug empty drinks’, dat het venster open waait, ‘that the window open blows’).
22420 A POS FREE ADV Adverbial
Adverbially used adjectives are not treated as adverbs, but as adjectives (like in the ANS-97, CELEX, RN, WOTAN-2, CGN and the German STTS-95). To distinguish adverbs from adverbially used adjectives, we use the following criterion: if the word is also used in prenominal position with the same meaning, then it is not an adverb but an adjective. Vrij ‘free’ is an adjective in Je kan hier vrij rondlopen ‘You can walk around here freely’, whereas the same word is an adverb in een vrij warme dag, '[lit.] a free (somewhat) hot day’.

3.3

Degrees of comparison

23000 A (DEGREE) The comparative and superlative , A + diminuative. The positive doesn’t get assigned ‘DEGREE”. Indirect comparatives (of the type more A, most A, less A, least A; f.e. most'' beautiful) will get DEGREE ‘comparative’ or ‘superlative’. In that case, the tag is assigned to the modifier of the adjective (most); the adjective that is the head (beautiful) is not assigned the tag DEGREE.
23100 A (DEGREE) COMP Comparative. F.e. bigger
23200 A (DEGREE) SUP Superlative. F.e. biggest
23300 A (DEGREE) DIM Diminuative. F.e. dunnetjes '[lit.:]thinly.DIM'

Verb

The problem of proclisis and enclisis arises when dealing with the finite verb: weak personal pronouns and/or the negative particle, that form one word or a unit with the finite verb. In such cases, it is not always easy to distinguish between verb and pronoun (f.e. ganeme, normal: gaan we, 'go we', issem, normal: is hij, 'is-he', zakzekik, normal: zal ik ze, 'will-I-them-I-I', etc.). Therefore, we have two different tiers in the transcription: the first one (informant tier or assistent interviewer tier) contains the unanalyzed cluster, while in the second tier (the cluster tier) the cluster is divided into separate parts (f.e. is em, za k ze kik). Part of speech tagging is done with both tiers/at both levels. In the informant or assistant interviewer tier, the word gets tagged as V-clitic-cluster (33000). In the cluster tier the verb gets the relevant tag, and of course the same holds for the other element of the cluster. THe following tags can either be assigned to verbs in the informant tier (if the finite verb is not part of a cluster) or in the cluster tier (when it is part of a cluster).

30000

V

Verb

4.1

Inflection

31000 V (INFL) Inflection
The occurrence of this attribute indicates the presence of an audible affix. The following rule applies: inflection is only marked if the word can also occur without the inflectional morpheme. The attribute has the following values:
31100 V (INFL) -(e)n
31200 V (INFL) -(e)t
31300 V (INFL) -e
31400 V (INFL) -(e)s
31500 V (INFL) -st
31600 V (INFL) OT Other inflectional morpheme

4.2

Position

32000 V POS Position
4.2.1 Regular order
32100 V POS REG Regular order
Position of the finite verb (finite form), directly following the subject (and followed by the rest of the sentence). Also in case of subject doubling, where the subject precedes and follows the finite verb, the finite verb gets this value.
4.2.2 Inverted order
32200 V POS INV Inverted order
Position of the finite verb (finite form), directly followed by the subject. All verb initial sentences (e.g. yes/no questions, imperatives) and non-subject-initial matrix sentences will get this value.
4.2.3 Position right periphery
32300 V POS END End position
The verb is not in first and second position, but is situated at the end of the sentence. Participles and infinitives are always V-final (unless they are free or used adnominally, see below). If the more than one verb is in the final position of the sentence, we call it a verb cluster, e.g. dat Jan ze wel op zou willen eten, '[lit.:] that John them AFF PRT would want eat'. The hierarchical (not the linear) order/position of each verb is given, by means of the specifications mentioned below. A verb cluster can be ‘interrupted’ by DP’s, prepositions, adverbs, elements like te (=infinitive indicator: hij zit te studeren, ‘[lit:] he sits to study’, meaning: he is studying). etc. but not by complementizers. The underlined parts form a verb cluster in the following sentence: dat Jan probeert de krant ''te lezen, ‘[lit:] that John tries the paper to read’, but the underlined parts do not form a verb cluster in this sentence: dat Jan probeert om de krant ''te lezen, ‘[lit:] that John tries in order the paper to read’. This last sentence contains a complementizer, marking a new subordinate clause domain, and thus (potentially) a new verb cluster domain.
32310 V POS END (1)

Hierarchically highest verb
The finite verb always gets this specifications.
F.e. Ik denk dat Jan de wagen gemaakt zou kunnen hebben, '[lit.:] I think that John the car fixed could can have'.

32320 V POS END (2) Hierarchically second highest verb
This is always a non-finite verb, even if there is a finite verb in first or second position (in the main clause), counting in the cluster starts (at the end) with this specification.
F.e.:Ik denk dat Jan de wagen gemaakt zou kunnen hebben, ‘[lit:] I think that John the car fixed could can have’.
32330 V POS END (3) Hierarchically third highest verb
F.e. Ik denk dat Jan de wagen gemaakt zou kunnen hebben, '[lit.:] I think that John the car fixed could can have'.
32340 V POS END (4) Hierarchically lowest verb
F.e. Ik denk dat Jan de wagen gemaakt zou kunnen hebben, '[lit.:] I think that John the car fixed could can have'.
4.2.4 Other positions
32400 V POS OT Other verbal positions
32410 V POS OT PRE-N Prenominal
F.e. een gedurfd voorstel, 'a daring preposition'.
32420 V POS OT N Nominal
F.e. de ''vrijgestelden, 'the freed/released/liberated (people)', het zingen van de nachtegaal, 'the singing of the nightingale'.
32430 V POS OT POST-N Postnominal
F.e. een tas gemaakt van leer, 'a bag made of leather'.

4.3

Verb-clitic-cluster

33000 V CL-CL This attribute is assigned at the informant or assistant interviewer tier to verbs that form a cluster with weak pronouns (see introduction of this paragraph).

4.4

Features

34000 V FEAT
4.4.1 Finite
34100 V FEAT FIN Finite verb (form)
4.4.1.1 Present tense, indicative
34110 V FEAT FIN PT Present tense
34111 V FEAT FIN PT 1.S Present tense, first person singular
34112 V FEAT FIN PT 2.S Present tense, second person singular
34113 V FEAT FIN PT 2.S-P Present tense, second person singular, polite form
34114 V FEAT FIN PT 3.S Present tense, third person singular
34115 V FEAT FIN PT 1.PL Present tense, first person plural
34116 V FEAT FIN PT 2.PL Present tense, second person plural
34117 V FEAT FIN PT 2.PL-P Present tense, second person plural, polite form
34118 V FEAT FIN PT 3.PL Present tense, third person plural
4.4.1.2 Present tense, conjunctive
34120 V FEAT FIN PT.CONJ Present tense, conjunctive
34121 V FEAT FIN PT.CONJ 1.S Present tense, conjunctive, first person singular
34122 V FEAT FIN PT.CONJ 2.S etc.
34123 V FEAT FIN PT.CONJ 2.S-P
34124 V FEAT FIN PT.CONJ 3.S
34125 V FEAT FIN PT.CONJ 1.PL
34126 V FEAT FIN PT.CONJ 2.PL
34127 V FEAT FIN PT.CONJ 2.PL-P
34128 V FEAT FIN PT.CONJ 3.PL
4.4.1.3 Imperative
34130 V FEAT FIN TT.IMP Finite imperative
34131 V FEAT FIN TT.IMP S Finite imperative, singular
34132 V FEAT FIN TT.IMP PL Finite imperative, plural
3.4.1.4 Past tense indicative
34140 V FEAT FIN PastT Past tense
34141 V FEAT FIN PastT 1.S Past tense, first person singular
34142 V FEAT FIN PastT 2.S etc.
34143 V FEAT FIN PastT 2.S-P
34144 V FEAT FIN PastT 3.S
34145 V FEAT FIN PastT 1.PL
34146 V FEAT FIN PastT 2.PL
34147 V FEAT FIN PastT 2.PL-P
34148 V FEAT FIN PastT 3.PL
4.4.1.5 Past tense conjunctive
34150 V FEAT FIN PAST.CONJ Conjunctive, past tense
34151 V FEAT FIN PAST.CONJ 1.S Conjunctive, past tense, first person singular
34152 V FEAT FIN PAST.CONJ 2.S etc
34153 V FEAT FIN PAST.CONJ 2.S-p
34154 V FEAT FIN PAST.CONJ 3.S
34155 V FEAT FIN PAST.CONJ 1.Pl
34156 V FEAT FIN PAST.CONJ 2.Pl
34157 V FEAT FIN PAST.CONJ 2.Pl-p
34158 V FEAT FIN PAST.CONJ 3.Pl
4.4.1.6 Past tense imperative
34160 V FEAT FIN PAST.IMP Imperative, past tense
34161 V FEAT FIN PAST.IMP S Imperative, past tense singular
34162 V FEAT FIN PAST.IMP PL Imperative, past tense plural
3.4.2 Non-finite
34200 V FEAT INF Infinitive
34210 V FEAT INF N Infinitive, used nominally
34220 V FEAT INF FREE Free infinitive
4.4.3 Participles
34300 V FEAT PART Participle
34310 V FEAT PART PAST Past participle
34311 V FEAT PART PAST +Prefix Past participle with prefix ge- or e-
34312 V FEAT PART PAST -Prefix Past participle without a prefix
34320 V FEAT PART PRES Present participle

4.5

Type

35000 V TYPE
4.5.1 Auxiliary
35100 V TYPE AUX Auxiliary verb
35110 V TYPE AUX PERF Perfective auxiliary
35120 V TYPE AUX MOD

Modal auxiliary
E.g. can, must, want, need, would, could

35130 V TYPE AUX ASP Aspectual auxiliary
E.g. go, come, stand, lay, sit, stay, begin
35140 V TYPE AUX PASS Passive auxiliary
E.g. worden in Er wordt gedanst, '[lit:] there is/get dance.part'. If the sentence has a perfective meaning at the same time, then assign tag 35110 (f.e. Er is gedanst, 'there has been dancing going on'/'[lit:]there is danced').
4.5.2 Matrix
35200 V TYPE HEAD Matrix verb
4.5.2.1 Inherent reflexive
35210 V TYPE HEAD REFL Inherent reflexive matrix verb
4.5.2.2 Transitive verb
35220 V TYPE HEAD TRANS Transitive matrix verb
4.5.2.3 Intransitive verb
35230 V TYPE HEAD INTR Intransitive matrix verb
35231 V TYPE HEAD INTR UNACC Unaccusative verb
These verbs use zijn 'to be' in perfective in Dutch, can not be used in an impersonal passive construction, and can modify a noun that corresponds to the subject, if they are used as prenominal participles (f.e. de gestorven man, '[lit:] the died.part man').
F.e. sterven, 'to die', vallen, 'to fall'.
35232 V TYPE HEAD INTR UNERG Unergative verb
These verbs use hebben 'to have' in perfective in Dutch, they can be used in an impersonal passive construction, but can not modify a noun that corresponds to the subject, as a prenominal participle.
F.e. werken, 'to work', slapen, 'to sleep'.

Pronouns

Dutch has a number of different elements that are classified as pronouns (the following list is taken from ANS - Algemene Nederlandse Spraakkunst 'Normal/Ordinary Dutch Speech'):

This is a large and rather heterogeneous group that is, in our opinion insufficient, in one respect: determiners and numerals would fit into this group. We decided to include these elements in this category.

40000

PRON

5.1

Inflection

41000 PRON (INFL) Inflection
This attribute indicates the presence of an audible morpheme for categories like: person, number, gender, case, mode, definiteness etc. For example the –e suffix attached to possessive pronouns. Differences between hen vs. hun (‘them.ACC’ vs. ‘them.DAT’) or hem vs. hen (‘he.ACC’ vs. ‘them.ACC’) are not marked (tagged) through this attribute. The following rule applies: inflection is only marked if the word can also occur without the inflectional morpheme. The following values give information regarding the nature of the inflection:
41100 PRON (INFL) -(e)n
41200 PRON (INFL) -(e)t
41300 PRON (INFL) -e
41400 PRON (INFL) -(e)s
41500 PRON (INFL) -st
41600 PRON (INFL) OT Other inflectional morpheme

5.2

Position

42000 PRON POS
42100 PRON POS PRE-N Prenominal
Determiners are always prenominal. Interrogative pronouns, relative pronouns, demonstrative pronouns, possessive pronouns and quantifiers can be prenominal.
E.g which N, that N, my N, all/three/some N.
42110 PRON POS PRE-N (ELL) Prenominal with ellipsis
We speak of ellipsis if the noun can be added (but is not), in constructions like ik heb deze boeken gekocht, en hij die, 'I bought these books, [lit.:] and he those'. There is no ellipsis in constructions like the following: de mijne (*boeken) liggen nog op mijn kamer, '[lit.:] The mine lay still at my room'. Another example of ellipsis: Ik heb die rode gekocht, '[lit.:]I have those red bought'.
42200 PRON POS N Nominal
A pronoun is used nominally, if it is the head of an NP, and if there are (or can be) other elements inside the same NP. If this last part of the definition is not the case, we call the position free.
F.e. dit alles, 'this all', dit allemaal, 'that all', de mijne, 'the mine'.
42300 PRON POS POST-N Postnominal
F.e. Zij allen/beiden' hebben het geweigerd, [lit.:] they all/both have it refused'.
42400 PRON POS FREE Free
A pronoun is free if it forms an NP on its own. Personal pronouns, reflexive pronouns, reciprocal pronouns and R-pronouns are always free. Interrogative pronouns, relative pronouns, possessive pronouns, demonstrative pronouns, and quantifiers can be free (who, what, this, that, all, nothing, nobody). Pronouns that are marked for genitive case (f.e. wiens 'whose') are free.
42410 PRON POS FREE (PRED) Predicative pronouns
Predicative pronouns are found in constructions of the type Die fiets is mijns, 'that bicycle is mine.GEN', Jan is daar, 'John is there'.

5.3

Case

Just like with nouns, pronouns are labelled with an attribute for case if the case is morphologically visible.
43000 PRON (CASE)
43100 PRON (CASE)(NOM) Nominative
This is normally the pronoun that is used as the subject and that agrees with the finite verb in person and number.
43200 PRON (CASE)(OBL) Oblique
This means: not nominative or genitive. This valuecan be further specified as accusative or dative. If it is not entirely clear whether the pronoun is in accusative or dative case, the value oblique is sufficient, and no further specification is given.
43210 PRON (CASE)(OBL)(ACC) Accusative
43220 PRON (CASE)(OBL)(DAT) Dative
43300 PRON (CASE)(OBL)(GEN) Genitive

5.4

Person and number

44000 PRON (FEAT)
44100 PRON (FEAT) 1.S First person singular
44200 PRON (FEAT) 2.S Second person singular
44300 PRON (FEAT) 2.S-p Second person singular, polite form
44400 PRON (FEAT) 3.S Third person singular
44500 PRON (FEAT) 1.PL First person plural
44600 PRON (FEAT) 2.PL Second person plural
44700 PRON (FEAT) 2.PL-p Second person plural, polite form
44800 PRON (FEAT) 3.PL Third person plural

5.5

Gender

45000 PRON (GEND)
45100 PRON (GEND) Z Non-neuter
45110 PRON (GEND) Z (M) Masculine. This specification is only added in clear cases.
45120 PRON (GEND) Z (F) Feminine. This specification is only added in clear cases.
45200 PRON (GEND) N Neuter

5.6

Function

Grammatical function has, strictly speaking, no place in parts-of-speech tagging. However, it is necessary to add this attribute in order to be able to search the database, when syntactic annotation is not been done (yet).
46000 PRON (FUNCT)
46100 PRON (FUNCT) SUBJ Subject
Constituent that agrees with the finite verb.
46200 PRON (FUNCT) D-OBJ Direct object
46300 PRON (FUNCT) I-OBJ Indirect object
Indirect object, without a preposition.
46400 PRON (FUNCT) P-OBJ Prepositional object

5.8

Type

All types of pronouns are tagged under this attribute. For most of them we follow ANS (except for determiner, R-pronoun and quantifier, which are not classified as pronouns in ANS).
48000 PRON TYPE
5.8.1 Personal
48100 PRON TYPE PERS Personal pronouns
5.8.1.1 Subject doubling
48110 PRON TYPE PERS (DOUBL) Personal pronoun in a subject doubling construction.
48111 PRON TYPE PERS (DOUBL) 1-STRONG Linearly first person pronoun in a doubling construction, if it is strong.
F.e. Zij'' heeft ze me niet gebeld, 'She.STRONG has she.WEAK me not called'.
48112 PRON TYPE PERS (DOUBL) 1-WEAK Linearly first person pronoun in a doubling construction, if it is weak.
F.e. Ze'' heeft zij me niet gebeld, 'She.WEAK has she.STRONG me not called'.
48113 PRON TYPE PERS (DOUBL) 2-STRONG Linearly second person pronoun in a doubling construction, if it is strong.
F.e. Ze heeft zij me niet gebeld, 'She.WEAK has she.STRONG me not called'.
48114 PRON TYPE PERS (DOUBL) 2-WEAK Linearly second person pronoun in a subject doubling construction, if it is weak.
F.e. Zij heeft ze me niet gebeld, 'She.STRONG has she.WEAK me not called'.
48115 PRON TYPE PERS (DOUBL) 3-STRONG Linearly third person pronoun in a subject doubling (=tripling) construction, if it is strong.
F.e. Marie heeft ze zij me niet gebeld, 'Marie has she.WEAK she.STRONG me not called'.
48116 PRON TYPE PERS (DOUBL) 3-WEAK Linearly third person pronoun in a subject doubling (=tripling) construction, if it is weak.
F.e. Marie heeft zij ''ze<.u> ''me niet gebeld, 'Marie has she.STRONG she.WEAK me not called'.''
5.8.2.1 Strong and weak
(non doubling)
48120 PRON TYPE PERSSTATUS This tag distinguishes between strong and weak pronouns in a non-doubling construction. It is assigned only if the distinction is clear. When in doubt, don't assign this tag.
48121 PRON TYPE PERSSTATUS STRONG Strong pronoun, no doubling
48122 PRON TYPE PERSSTATUS WEAK Weak pronoun, no doubling
5.8.1.3 Expletive
48130 PRON TYPE PERSEXPL

This tag is assigned to pronouns that function as expletive subject
E.g. Het'' is jammer dat Piet gaat, 'It is a shame that Pete leaves'.
Er'' loopt een poes in de tuin, '[lit.:]There walks a cat in the garden'.

48131 PRON TYPE PERSEXPL STRONG This specification is only assigned to strong pronouns that function as expletives.
E.g. Dat'' schijnt dat Piet komt, '[lit.:] That (it.STRONG) seems that Piet comes'. Daar'' was eens een prinses die trouwen wilde , '[lit.:] There was once a princess that marry wanted' ('Once upon a time there was a princess that wanted to get married').'
5.8.2 Reflexive and reciprocal
48200 PRON TYPE REFL
48210 PRON TYPE REFL Reflexive pronoun
E.g. Jan kent zichzelf, '[lit.:] John knows REFL/himself'.
48211 PRON TYPE REFL SIMPL Reflexive pronoun consisting of one morpheme.
E.g. Jan wast zich, 'John washes REFL'.
48212 PRON TYPE REFL COMPL Reflexive pronoun consisting of two morphemes.
E.g.Jan wast ''zichzelf, 'John washes REFL', Piet schaamt z'n eigen, '[lit.:]Pete shames HIS OWN'.
5.8.3 Possessive
48300 PRON TYPE POSS Possessive pronoun
48310 PRON TYPE POSS STRONG Strong possessive pronoun
E.g. Marie heeft haar huis verkocht, 'Marie sold her house'.
48320 PRON TYPE POSS WEAK Weak possessive pronoun
E.g. Marie heeft d'r huis verkocht , 'Marie sold her house'.
5.8.4 Demonstrative
48400 PRON TYPE DEM Demonstrative pronoun
48410 PRON TYPE DEM DEF Definite demonstrative pronoun.
E.g. die'' fiets, 'that bicycle'.
48420 PRON TYPE DEM INDEF Indefinite demonstrative pronoun
E.g. zulke'' boeken, 'such books', zo'n ''brug, 'such-a bridge'.
5.8.5 Interrogative
48500 PRON TYPE WH Interrogative pronoun
E.g. who, what, which.
5.8.6 Relative
48600 PRON TYPE REL Relative pronoun
48610 PRON TYPE REL W Relative pronoun starting with a /w/
E.g. Alles wat Jan weet, '[lit.:] Everything what John knows', Het meisje met wie hij uitging, '[lit.:] The girl with who he dated', De bal waarmee zij speelden , '[lit.:] The ball where-with they played'.
48620 PRON TYPE REL D Relative pronoun starting with a /d/
E.g. Alles dat Jan weet, '[lit.:] Everything that John knows', Hij is iemand die graag praat, '[lit.:] He is someone that gladly talks'.
5.8.7 R-pronouns
48700 PRON TYPE R-PRON This tag is assigned to er 'there', daar 'there', overal 'everywhere', ergens 'somewhere', nergens 'nowhere', hier 'here', waar 'where', whenever they are used non-adverbially.
E.g. Hij denkt nergens meer aan, 'He thinks of nothing anymore'.
48710 PRON TYPE R-PRON STRONG Strong R-pronoun
E.g. Kijk, daar loopt een adelaar, 'Look, there walks an eagle'.
48720 PRON TYPE R-PRON WEAK Weak R-pronoun
E.g. D'r'' staat een paard in de gang, '[lit.:]There.WEAK stands a horse in the hallway'.
5.8.8 Quantifiers
48800 PRON TYPE QUANT This category resembles the indefinite pronouns used in ANS, but is enlarged with numberals.
48810 PRON TYPE QUANT NUM Numerals, both ordinals and cardinals. Veel 'many' and weinig 'less' do not belong to this class, but are adjectives.
48820 PRON TYPE QUANT UNIV Universal quantifiers: each, every, all, altogether, everything, everybody, both.
48830 PRON TYPE QUANT EXIS Existential quantifiers: any, some.
48840 PRON TYPE QUANT NEG Negative quantifiers: nobody, nothing, no.
5.8.9 Determiners
48900 PRON TYPE ART
48910 PRON TYPE ART DEF Definite articles: de 'the.MASC/FEM', het, 'the.NEUT'.
48920 PRON TYPE ART IND Indefinite articles: een 'a'.

Adpositions: prepositions and postpositions

Adpositions normally take a complement. This can be a DP, but also a PP, an adverb, an adjective, a numeral or a verbal projection (V, VP, IP, CP). In the latter case, adpositions are treated as complementizers, which results in a systematic ambiguity for words like tot 'until', sedert, sinds 'since', voor 'before', na 'after', naar 'to', zonder 'without', met 'with', door 'through', om 'because of/in order to'. Just like CGN, the SAND does not follow this strategy, and classifies these words as adpositions. THis also holds for te, that introduces an infinitive, and aan '[lit.:] on' in constructions like aan het vissen 'fishing, busy fishing', op 'on' in constructions like op springen staan '[lit.:] on jump standing, be about to explode', and uit in uit vissen gaan 'go out fishing'.

50000

P

Adpositions

6.1

Inflection

We talk about inflection if there is an audible morpheme attached to P, for categories like: person, number, gender, case, mode, definiteness etc. The following rule applies: inflection is only marked if the word can also occur without the inflectional morpheme. There are 6 specifications:
51000 P (INFL)
51100 P (INFL) -(e)n
51200 P (INFL) -(e)t
51300 P (INFL) -e
51400 P (INFL) -(e)s
51500 P (INFL) -st
51600 P (INFL) OT Other inflectional morpheme

6.2

Position

52000 P POS
52100 P POS PREP Prepositional, e.g. op'' de brug, 'on the bridge'.
52110 P POS PREP FUSION Prepositional and fused with (a part of) its complement.
E.g. ter'' plaatse, 'on.the spot', ten'' geleide '[lit.:] at.the guard, preface'.
52200 P POS POST Postnominal, e.g. onder de brug door, '[lit.:] under the bridge through'. Adpositions accompanied by a R-pronoun are postpositional, even if the R-pronoun does not directly precede or follow the preposition. E.g. dat ik er gisteren met Jan over, '[lit.:] that I there yesterday with John about spoken have (that I've talked to John about it yesterday)'.
52300 P POS FREE Adpositions can also occur without a complement. This is the case for adverbially or predicatively used prepositions. E.g. het bier is op, '[lit.:] the beer is on (we ran out of beer)', het licht is aan, '[lit.:] the light is on'. Separable prefixes of a verb also count as predicatively used prepositions, whether they are separated from the verb or not. (E.g. Hij belt haar op, '[lit.:] he calls her up (he calls her)', ... dat hij haar opbelt, '[lit.:] that he her up-calls').
52320 P POS FREE ADV Adverbially used adpositions.
E.g. Dat hij liever binnen werkt, 'that he rather works inside/indoors'.

Complementizers

On the one hand, there are complementizers in the strict sense: that, or, if, then, to, and on the other hand there are combinations of a preposition and a complementizer: doordat 'through.that/because of', nadat 'afther.that/after', omdat 'to.that/because', opdat 'on.that/so that', totdat 'until.that/until', voordat 'before.that/before/until'. Furthermore there are complementizers of the type terwijl 'meanwhile', alhoewel 'although', tenzij 'unless', alvorens 'all/already.forward/before'. In the CGN tagset, these types are all classified as complementizers. This does not seem to be an optimal situation: voor 'for' is from this point of view a preposition, but voordat 'for.that' a complementizer. We therefore decided to assign two different tags to the combination preposition-complementizer; and for this reason we split these combinations in the transcriptions (doordat will be transcribed as door dat).

60000

C

7.1

Inflection

Complementizer agreement constructions involve, just like inflected verbs, person and number features. We therefore expect that the tagging of complementizers corresponds to a high degree with finite verbs in this respect. The question then arises whether these features should also be indicated on complementizers in dialects that lack complementizer agreement. From experience from the written questionnaire, we know it is sometimes difficult to see whether there is agreement or not. This is an argument in favour of treating all complementizers alike: not assigning person and number features to any complementizer. The complementizer (to be) is part of a cluster with the subject pronoun in many cases. The complementizer is redundant. Therefore we decided to select a minial feature set for complementizers. The subject close to the complementizer provides the necessary person and number features. The subject is recognizable because it is assigned the SUBJ tag.

The attribute INFL indicates the presence of an audible morphological marking for categories like person, number, gender, case, mode etc. The following rule applies: INFL is only assigned if the word can also occur without the inflectional morpheme.

61000

C (INFL)

The following tags provide information regarding the encountered inflection:

61100 C (INFL) -(e)n
61200 C (INFL) -(e)t
61300 C (INFL) -e
61400 C (INFL) -(e)s
61500 C (INFL) -st
61600 C (INFL) OT Other inflectional morphology

7.2

Complementizer-clitic-
cluster

This is about enclisis: weak personal pronouns that are phonologically connected to the complementizer, and, moreover, that form one word with the complementizer. In such cases, it is not always clear where the (word) boundaries are, especially if you are dealing with dialects that have visible agreement (e.g. attem '[lit.:] as-t-he.ACC(used NOM)', dakzekik (Standard Dutch: dat ik ze) '[lit.:] that.I-them.ACC-I-I', ovveme 'if.-e-we.ACC(used NOM)'. We use two tiers in the transcriptions: in one (informant or assistant interviewer tier) we transcribe the non-analyzed cluster (non-divided cluster), and the cluster tier contains the divided/split cluster. The assignment of tags is done at both levels/in both tiers: the undivided cluster gets the tag for C-clitic-cluster, and the complementizer and other elements (of the divided cluster) will get separate tags in the cluster tier.

62000

C CL-CL

7.3

Complementizer type

This tag can be assigned at the informant tier or assistant tier (if the complementizer is not part of a cluster) or at the cluster tier (if that is the case).

63000

C TYPE

63100 C TYPE COORD Coordinating complementizer, e.g. and, or.
63200 C TYPE SUBORD Subordinating complementizer, e.g. whether.
63210 C TYPE SUBORD FIN Finite subordinating complementizer
C TYPE SUBORD FIN Q Interrogative finite subordinating complementizer
E.g. I asked whether you knew. Words like why, when, how are treated as interrogative pronouns and not as complementizers.
63212 C TYPE SUBORD FIN DECL Declarative finite subordinating complementizer.
E.g. if, that and their dialectal counterparts. Also then, while, now, as soon as, since, before, if, unless. Then is a complementizer following a comparative, but not when it is initiating a 'result'-sentence in an 'if'-sentence (if...- then...); then is an adverb, because inversion follows (in Dutch: als je niet ophoudt, dan word ik boos, '[lit.:] if you not stop, then will.be I angry'). Words that can be followed by a DP (since, before, except, without) are considered as adpositions. Potentially occurring complementizers in relative clauses (E.g. de man die dat ik gezien heb, '[lit.:] the man that/who that I seen have') are declarative as well.
63220 C TYPE SUBORD INF Subordinating complementizer, introducing an infinitival subordinate clause. In Standard Dutch om, whereas many dialects employ van in these constructions, and also voor.
63221 C TYPE SUBORD INF Q Subordinating complementizer that introduces an interrogative infinitival subordinate clause. Possibly if occurs in these constructions.
63230 C TYPE SUBORD NON-S Complementizers can introduce subordinate clauses, but also smaller phrases, for example in comparatives: richer than Fons, not as big as John. In the latter case the distinction 'finite' vs. 'infinitival' does not apply and the tag non-sentential is assigned.

Adverbs

Adjectives and nouns that are used adverbially are not treated as adverbs, but as adjectives or nouns. In order to distinguish between adverbs and adverbially used adjectives/nouns, we use the following criterion: if the word in question can also be used in pronominal position - having the same meaning - then it is not an adverb, but an adjective.

70000

ADV

8.1

Inflection

This attribute indicates the presence of an audible morphological marking for categories like person, number, gender, case, mode etc. The following rule applies: INFL is only assigned if the word can also occur without the inflectional morpheme.

71000

ADV (INFL)

The following tags give information regarding the form/type of the encountered inflection:
71100 ADV (INFL) -(e)n
71200 ADV (INFL) -(e)t
71300 ADV (INFL) -e
71400 ADV (INFL) -(e)s
71500 ADV (INFL) -st
71600 ADV (INFL) OT Other inflectional morphology

8.2

Position

72000 ADV POS
72100 ADV POS AD-A Ad-adjectictival
E.g. a really rich baker
72200 ADV POS AD-P Ad-adpositional
E.g. recht'' op zijn hoofd, 'right on his head'.
72300 ADV POS AD-ADV Ad-adverbial
E.g. een heel erg rijke man, '[lit.:]a whole really rich man (a terribly very rich man)'.
72400 ADV POS FREE
72410 ADV POS FREE ADV Adverbs that modify a verb, a part of a sentence or a complete clause: Hij wil nu naar huis, 'he wants to go home now', hij heeft het wel gedaan, 'he has it AFF done/he did do it', hij wil het niet doen, 'he does not want to do it', Als het regent, dan zijn er wolken, 'if it's raining, then there are clouds'.
72420 ADV POS FREE PRED Adverbs that are nominal predicates: Dat is jammer, 'it's a shame', het feest is nu, 'the party is now'. The same holds for adverbial prefixes of separable verbs, whether they are separated from the verb or not.
E.g. Ik drukte de eierschalen tussen mijn vingers samen, '[lit.:] I pressed the eggshells between my fingers together', dat ik ... samen drukte, '[lit.:] that I ... together pressed'.

8.3

Type

Not all types of adverbs will fit in with the types mentioned below, therefore this attribute is optional.

73000

ADV (TYPE)

8.3.1 Interrogative
73100 ADV (TYPE) Q Interrogative adverbs
E.g. when, how, why, and where. Note that these words can also belong to other classes: in de stad waar ik woon, 'the city where I live', where is a relative pronoun. In Waar kijk je naar '[lit: where] what are you looking at', where is an interrogative pronoun, and in Waar woont hij, 'where does he live', where is a interrogative adverb.
8.3.2 Quantifier
73100 ADV (TYPE) QUANT Quantificational adverb/adverbial quantifier
732100 ADV (TYPE) QUANT UNIV Universal adverbial quantifier
E.g. always, constantly, repeatedly.
73220 ADV (TYPE) QUANT NON-UNIV Non-universal adverbial quantifiers
E.g. nonce, ever, sometimes, often.
73230 ADV (TYPE) QUANT NEG Negative adverbial quantifiers
E.g. never, rarely, nowhere.

8.4

R-pronouns (adverbial)

74000 ADV FORM R-PRON To this class belong words like here, there, somewhere, nowhere, everywhere when used adverbially.
E.g. Er zitten hier nergens muizen, '[lit.:]EXPL sit here nowhere mice'. This tag is not assigned to R-pronouns that belong to an adposition, as in Hier'' zit hij op, '[lit.:]here sit he on (this is what he does)', in Standard Dutch: Hij zit hierop). Those R-pronouns receive the following tag: PRON TYPE R-PRON.


Personal tools