emTag - POS Tagger

About the tool

What is it good for? What does it do?

Based on the training data the programme determines the lemma and part-of-speech of every token that had been identified earlier, and tags these, as well.

What is the input?

The programme deals with every sentence separately. Accordingly, the input is a series of senteces divided into tokens.

What is the output?

The programme returns the input tokens and with their word stem and POS tags, respectively..

An example:

A kastély nem vár.]

A# a# [/Det|art.Def]
kastély# kastély# [/N][Nom]
nem# nem# [/Adv]
vár# vár# [/N][Nom]
.# .# [/PUNCT]

A kastély nem vár senkire.

A# a# [/Det|art.Def]
kastély# kastély# [/N][Nom]
nem# nem# [/Adv]
vár# vár# [/V][Prs.NDef.3Sg]
senkire# senki# [/N|Pro][Subl]
.# .# [/PUNCT]

For developers:

Source https://github.com/ppke-nlpg/purepos
Source code Java
Input One sentence per row, the tokens divided by a space.
Output The same as the input, but the token is followed by its lemma and the tag assigned to a it, following a # tag.
Execution java -jar purepos-<version>.jar tag -m betanított.model [-i input.txt] [-o output.txt]
Licence LGPL-3.0
Further information Compile dependency: maven 2.