Dialogue Evaluation 2023

SEMarkup

Semantic markup


Key dates:

  • 20 January — train dataset publication;
  • 6 February — test dataset and CodaLab publication;
  • 28 March 23:59 (GMT +3) — shared task deadline, results publication;
  • 8 April — paper submission deadline.

Task

We offer participants two tracks:

  • to create a solution that will produce semantic markup using morphosyntactic markup;
  • to create a solution that will simultaneously produce morphological, syntactic and semantic markup.

Description and data

The corresponding dataset was collected from news texts from the NewsRU portal, marked up automatically by the Compreno system, manually checked and automatically converted into the UD (Universal Dependencies) format with subsequent partial proofreading. There are three dataset markup levels:

  • morphology (UD);
  • syntax (UD);
  • semantics (deep slots and generic Comreno semantic classes adapted to the UD format).

There is still no semantic parsers set up to the data of the Russian language. The presence of morphosyntactic markup in the training dataset will make it possible to take these data into account and, in the future, to find out the relationship between different levels of markup.

Simultaneous markup of three language levels at once is a new, more difficult challenge for participants compared to previous competitions (GramEval-2020 with two levels — morphology and syntax).