First Shared Task on Multilingual Easy-to-Read Translation

Advance methods for producing easy-to-read versions of texts, with a focus on Catalan, Italian, and Spanish (plus a surprise language).

Registration Download Trial Data Download Test Data

Languages: CA · IT · ES (+ surprise) Up to 3 submissions / language Data: iDEM corpus (CSV)

Supported by: Horizon Europe iDEM and IDEAL projects

Overview

Why Easy-to-Read?

Accessible language supports participation for people with language comprehension difficulties (e.g., intellectual disabilities, low literacy), aligning with accessibility and inclusion goals.

Relevant standards and guidelines:

Language versions of easy-to-read standards
ISO 24495-1:2023 (Easy language principles and guidelines)
ISO 24495-2:2025 (Easy language — Part 2)
Guía de Lectura Fácil España

What's new here

MER-TRANS is a multilingual shared task targeting Romance languages (Catalan, Italian, and Spanish), introducing multilingual easy-to-read translation at shared-task scale.

Task

Objective

Automatically produce easy-to-read versions of texts or sentences. Inputs are complex excerpts; outputs should be simplified, readable, and meaning-preserving.

Primary languages: Catalan, Italian, Spanish
Surprise task language: disclosed closer to test release
Max submissions: up to 3 runs per language per team

Scope

Texts come from a domain-focused corpus (democratic participation) simplified by experts following easy-to-read recommendations and validation procedures.

Tip: design systems that generalize—avoid overfitting to a single dataset style.

Data & Resources

Corpus: iDEM (E2R)

Original + simplified versions aligned at sentence level
Not parallel across languages (but each language has original↔simplified pairs)
Authentic variation; multiple text types (informative, news, policy, etc.)
Format: CSV, one file per language with metadata

Training data policy

No task-specific training set is released. Teams may use existing simplification/adaptation resources (including cross-lingual augmentation).

Examples of relevant external datasets:

Simplext (ES), FEINA (ES), CLEARS @ IberLEF 2025 (ES)
SIMPITIKI (IT)
Newsela (EN), Wikipedia↔Simple Wikipedia, ASSET (EN)
TSAR (lexical simplification), MLSP (lexical complexity/pipeline)

Trial data

↓

Languages: Catalan, Italian, Spanish, plus our surprise language: Arabic
Content: One full document per language together with its easy-language adaptation
Format: ZIP archive with 4 CSV files
Encoding: UTF-8
Columns: document_id, original_sentence_id,language, original_text, simplified_text

The iDEM corpus and annotation categories are described in: A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes. Bott, S., Riegler, V., Saggion, H., Rascón Alcaina, A., and Khallaf, N. To appear in the Language Resources and Evaluation Conference (LREC), 2026.

Test data

↓

Languages: Catalan, Italian, Spanish, Arabic
Content: Complex excerpts only, without easy-language adaptations
Format: ZIP archive with 4 CSV files
Encoding: UTF-8
Columns: document_id, original_sentence_id, language, original_text

Gold data

↓

Languages: Catalan, Italian, Spanish, Arabic
Content: Official gold easy-language references for the test set
Format: ZIP archive with 4 CSV files, one per language
Encoding: UTF-8
Columns: document_id, original_sentence_id, language, original_text, simplified_text
Purpose: This gold set is released after the official evaluation deadline to support score reproduction, error analysis, comparison with reference adaptations, and post-evaluation experiments for system papers.
Usage: To obtain the actual evaluation results for your system, replace the sample/trial files used by the evaluator with these official gold files and run the evaluator following the repository instructions.
Note: Results obtained after accessing the gold data must be reported as post-evaluation results and do not affect the official ranking.