emCons - Constituency parser
About the tool
What is it good for? What does it do?
Constituency parsing of a sentence reveals what phrases the words of a sentence can create when combined with each other, and how they create a whole sentence.
What is the input?
The input is a text that had been tokenised and morphologically disambiguated. The words of the sentence (input tokens) arranged in a parse tree: every token is assigned an appropriate tag.
What is the output?
The output is a parse tree of the words of a sentence and of all the potential syntactic relations of every possible phrase that may be created of these.
An example:
Az exkatonát kórházba szállították, ahol két műtétet is végrehajtottak rajta.
Az | az | DET | Definite=Def|PronType=Art | (ROOT(CP(NP* |
exkatonát | exkatona | NOUN | Case=Acc|Number=Sing | *) |
kórházba | kórház | PROPN | Case=Ill|Number=Sing | (NP*) |
szállították | szállít | VERB | Definite=Def|Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin|Voice=Act | (V_(V0*)) |
, | , | PUNCT | _ | * |
ahol | ahol | ADV | PronType=Rel | (ADVP*) |
két | két | NUM | Case=Nom|NumType=Card|Number=Sing | (NP* |
műtétet | műtét | NOUN | Case=Acc|Number=Sing | *) |
is | is | CONJ | _ | (C0*) |
végrehajtottak | végrehajt | VERB | Definite=Ind|Mood=Ind|Number=Plur|Person=3|Tense=Past|VerbForm=Fin|Voice=Act | (V_(V0*)) |
rajta | rajta | PRON | Case=Sup|Number=Sing|Person=3|PronType=Prs | (NP*) |
. | . | PUNCT | _ | *)) |
For developers:
Source | http://rgai.inf.u-szeged.hu/magyarlanc |
Source code | Java |
Input | Input is the output of the POS tagger (one token per row, separate column for word form with its lemma and morphological analysis), the respective sentences divided by an empty line. |
Output | One token per row, a separate column for word form, lemma, morphological analysis and syntactic parsing. |
Execution | java -Xmx2G -jar magyarlanc-3.0.jar -mode constparse -input in.txt -output out.txt |
Licence | The database is licensed under the Creative Commons Attribution-ShareAlike 4.0 (CC-BY-SA) licence. GNU General Public License (GPL v3) converts the primary source of the database). |