Chapter 4: Tagging protocol

From Dialectsyntax
Revision as of 13:48, 2 November 2011 by Franca (Talk | contribs)

Jump to: navigation, search

The tagset that is used in the Edisyn search engine can be viewed here (note that this is work in progress). This tagset is used to label the parts of speech of (dialect) databases. The document shows how the tags of the various databases are connected to those of the Edisyn search engine. In the column 'Edisyn search engine' the tags are taken up which are used in this search engine. The other columns show the tags that apply to each individual database. Per row the correspondence between a tag of a database and that of the search engine is made visible.
The tags of the Edisyn search engine consist of two parts, a linguistic category (e.g. V verb) which may be modified with one ore more feature(s) (e.g. 1,s first person singular). In the search engine one can search via categories or features or both. In order to make many databases interoperable the categories and features are somewhat general. An argumentation of the tagset can be opened here.

The protocol below is a manual for performing Parts of Speech tagging. It was developed by Sjef Barbiers and Guido Vanden Wyngaerd, for the SAND-project (Syntactic Atlas of Dutch Dialects), but can be useful for other dialect research groups/projects.

This protocol is also available in PDF format.

Introduction Noun Adjective Verb Pronouns Adpositions: prepositons and postpositions Complementizers Adverbs

Personal tools