First Shared Task on Multilingual Easy-to-Read Translation
Advance methods for producing easy-to-read versions of texts, with a focus on Catalan, Italian, and Spanish (plus a surprise language).
Quick Links
Overview
Why Easy-to-Read?
Accessible language supports participation for people with language comprehension difficulties (e.g., intellectual disabilities, low literacy), aligning with accessibility and inclusion goals.
Relevant standards and guidelines:
- Language versions of easy-to-read standards
- ISO 24495-1:2023 (Easy language principles and guidelines)
- ISO 24495-2:2025 (Easy language — Part 2)
- GuĂa de Lectura Fácil España
What's new here
MER-TRANS is a multilingual shared task targeting Romance languages (Catalan, Italian, and Spanish), introducing multilingual easy-to-read translation at shared-task scale.
Task
Objective
Automatically produce easy-to-read versions of texts or sentences. Inputs are complex excerpts; outputs should be simplified, readable, and meaning-preserving.
- Primary languages: Catalan, Italian, Spanish
- Surprise task language: disclosed closer to test release
- Max submissions: up to 3 runs per language per team
Scope
Texts come from a domain-focused corpus (democratic participation) simplified by experts following easy-to-read recommendations and validation procedures.
Tip: design systems that generalize—avoid overfitting to a single dataset style.
Data & Resources
Corpus: iDEM (E2R)
- Original + simplified versions aligned at sentence level
- Not parallel across languages (but each language has original↔simplified pairs)
- Authentic variation; multiple text types (informative, news, policy, etc.)
- Format: CSV, one file per language with metadata
Training data policy
No task-specific training set is released. Teams may use existing simplification/adaptation resources (including cross-lingual augmentation).
Examples of relevant external datasets:
- Simplext (ES), FEINA (ES), CLEARS @ IberLEF 2025 (ES)
- SIMPITIKI (IT)
- Newsela (EN), Wikipedia↔Simple Wikipedia, ASSET (EN)
- TSAR (lexical simplification), MLSP (lexical complexity/pipeline)
Trial data
↓- Languages: Catalan, Italian, Spanish, plus our surprise language: Arabic
- Content: One full document per language together with its easy-language adaptation
- Format: ZIP archive with 3 CSV files
- Encoding: UTF-8
- Columns:
document_id,original_sentence_id,language,original_text,simplified_text
The iDEM corpus and annotation categories are described in: A Multilingual Human Annotated Corpus of Original and Easy-to-Read Texts to Support Access to Democratic Participatory Processes. Bott, S., Riegler, V., Saggion, H., RascĂłn Alcaina, A., and Khallaf, N. To appear in the Language Resources and Evaluation Conference (LREC), 2026.
Test data
↓- Languages: Catalan, Italian, Spanish, Arabic
- Content: Complex excerpts only, without easy-language adaptations
- Format: ZIP archive with 3 CSV files
- Encoding: UTF-8
- Columns:
document_id,original_sentence_id,language,original_text
Evaluation
Surface similarity
BLEU — compares system output to reference simplifications.
Simplification-focused
SARI — measures add/keep/delete operations vs input and references.
Semantic similarity
BERTScore and MeaningBERT — meaning preservation signals.
Readability / Complexity
Readability metrics and complexity classifiers may complement the core metrics to assess accessibility.
Evaluator
The official evaluator for this shared task is available in our Git repository
Official Results
Evaluation results
↓Results are shown per team, language, and submitted method. Click any score column to sort the table.
| Team | Language | Method |
|---|
Higher scores are better for BLEU, SARI, and BERT-Score.
Schedule
| Milestone | Date |
|---|
Note: All dates are tentative and may be updated. Please check this page regularly for the latest schedule.
Participation
- Registration window: Feb 16–Mar 5, 2026
- Submissions: up to 3 per language per team
Papers
- Paper due: Jun 1, 2026
- Acceptance: Jun 19, 2026
- Camera-Ready: Jun 27, 2026
Organization team
-
Horacio Saggion — Universitat Pompeu Fabra, Spain -
Nelson Perez Rojas — Universidad de Costa Rica, Central America -
Stefan Bott — Universitat Pompeu Fabra, Spain -
Nouran Khallaf — University of Leeds, England -
Mehrzad Tareh — Universitat Pompeu Fabra, Spain -
Daniel Adanza — Universitat Pompeu Fabra, Spain -
Almudena Rascon — Plena Inclusion Madrid, Spain -
Sandra Szasz — Universitat Pompeu Fabra, Spain
Contact
Primary contact
Horacio Saggion
Universitat Pompeu Fabra (UPF)
Email:
Ethics
The dataset was created within the iDEM project under strict ethics protocols and in compliance with European data protection requirements.