emTag - POS Tagger
About the tool
What is it good for? What does it do?
Based on the training data the programme determines the lemma and part-of-speech of every token that had been identified earlier, and tags these, as well.
What is the input?
The programme deals with every sentence separately. Accordingly, the input is a series of senteces divided into tokens.
What is the output?
The programme returns the input tokens and with their word stem and POS tags, respectively..
An example:
A kastély nem vár.]
| A# | a# | [/Det|art.Def] |
| kastély# | kastély# | [/N][Nom] |
| nem# | nem# | [/Adv] |
| vár# | vár# | [/N][Nom] |
| .# | .# | [/PUNCT] |
A kastély nem vár senkire.
| A# | a# | [/Det|art.Def] |
| kastély# | kastély# | [/N][Nom] |
| nem# | nem# | [/Adv] |
| vár# | vár# | [/V][Prs.NDef.3Sg] |
| senkire# | senki# | [/N|Pro][Subl] |
| .# | .# | [/PUNCT] |
For developers:
| Source | https://github.com/ppke-nlpg/purepos |
| Source code | Java |
| Input | One sentence per row, the tokens divided by a space. |
| Output | The same as the input, but the token is followed by its lemma and the tag assigned to a it, following a # tag. |
| Execution | java -jar purepos-<version>.jar tag -m betanított.model [-i input.txt] [-o output.txt] |
| Licence | LGPL-3.0 |
| Further information | Compile dependency: maven 2. |
