emTag - POS Tagger
About the tool
What is it good for? What does it do?
Based on the training data the programme determines the lemma and part-of-speech of every token that had been identified earlier, and tags these, as well.
What is the input?
The programme deals with every sentence separately. Accordingly, the input is a series of senteces divided into tokens.
What is the output?
The programme returns the input tokens and with their word stem and POS tags, respectively..
An example:
A kastély nem vár.]
A# | a# | [/Det|art.Def] |
kastély# | kastély# | [/N][Nom] |
nem# | nem# | [/Adv] |
vár# | vár# | [/N][Nom] |
.# | .# | [/PUNCT] |
A kastély nem vár senkire.
A# | a# | [/Det|art.Def] |
kastély# | kastély# | [/N][Nom] |
nem# | nem# | [/Adv] |
vár# | vár# | [/V][Prs.NDef.3Sg] |
senkire# | senki# | [/N|Pro][Subl] |
.# | .# | [/PUNCT] |
For developers:
Source | https://github.com/ppke-nlpg/purepos |
Source code | Java |
Input | One sentence per row, the tokens divided by a space. |
Output | The same as the input, but the token is followed by its lemma and the tag assigned to a it, following a # tag. |
Execution | java -jar purepos-<version>.jar tag -m betanított.model [-i input.txt] [-o output.txt] |
Licence | LGPL-3.0 |
Further information | Compile dependency: maven 2. |