e-magyar.hu

emTag - POS Tagger

About the tool

What is it good for? What does it do?

Based on the training data the programme determines the lemma and part-of-speech of every token that had been identified earlier, and tags these, as well.

What is the input?

The programme deals with every sentence separately. Accordingly, the input is a series of senteces divided into tokens.

What is the output?

The programme returns the input tokens and with their word stem and POS tags, respectively..

An example:

A kastély nem vár.]

A#	a#	[/Det\|art.Def]
kastély#	kastély#	[/N][Nom]
nem#	nem#	[/Adv]
vár#	vár#	[/N][Nom]
.#	.#	[/PUNCT]

A kastély nem vár senkire.

A#	a#	[/Det\|art.Def]
kastély#	kastély#	[/N][Nom]
nem#	nem#	[/Adv]
vár#	vár#	[/V][Prs.NDef.3Sg]
senkire#	senki#	[/N\|Pro][Subl]
.#	.#	[/PUNCT]

For developers:

Source	https://github.com/ppke-nlpg/purepos
Source code	Java
Input	One sentence per row, the tokens divided by a space.
Output	The same as the input, but the token is followed by its lemma and the tag assigned to a it, following a # tag.
Execution	java -jar purepos-<version>.jar tag -m betanított.model [-i input.txt] [-o output.txt]
Licence	LGPL-3.0
Further information	Compile dependency: maven 2.