emDep - Dependency parser

About the tool

What is it good for? What does it do?

The tool reveals the dependency relations between the structural units (words, multiword expressions) of a sentence.

What is the input?

Text that had been tokenised and morphologically disambiguated.

What is the output?

Sentences, the words of which are arranged in so-called parse trees, which reveal the dependency relations between the units of the sentence. Every token is assigned the appropriate analysis tag and its parent node, the head.

An example:

Az exkatonát kórházba szállították, ahol két műtétet is végrehajtottak rajta.

1 Az az DET Definite=Def|PronType=Art 2 DET
2 exkatonát exkatona NOUN Case=Acc|Number=Sing 4 OBJ
3 kórházba kórház PROPN Case=Ill|Number=Sing 4 OBL
4 szállították szállít VERB Definite=Def|Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin|Voice=Act 0 ROOT
5 , , PUNCT _ 4 PUNCT
6 ahol ahol ADV PronType=Rel 10 LOCY
7 két két NUM Case=Nom|NumType=Card|Number=Sing 8 ATT
8 műtétet műtét NOUN Case=Acc|Number=Sing 10 OBJ
9 is is CONJ _ 8 CONJ
10 végrehajtottak végrehajt VERB Definite=Ind|Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin|Voice=Act 4 ATT
11 rajta rajta PRON Case=Sup|Number=Sing|Person=3|PronType=Prs 10 OBL
12 . . PUNCT _ 0 PUNCT

For developers

Source http://rgai.inf.u-szeged.hu/magyarlanc
Source code Java
Input Input is the output of the POS tagger (one token per row, separate column for the word form with lemma and morphological analysis), the respective sentences divided by an empty line.
Output One token per row, a separate column for word form, lemma, morphological analysis, parent node and syntactic tag.
Execution java -Xmx2G -jar magyarlanc-3.0.jar -mode depparse -input in.txt -output out.txt
Licence The database is licensed under the Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA) licence. GNU General Public License (GPL v3) converts the primary source of the database)