ICDAR 2023 Competition on Language Guided Document Editing (DocEdit)

About the competition

Billions of digital documents and PDF's are created, edited or shared each year ubiquitously, with a majority of them being designed by amateur users. Professional document editing requires a certain level of expertise to perform complex edit operations. To make editing tools accessible to increasingly novice users, there is a need for intelligent document assistant systems that can make or suggest edits based on a user’s natural language request. Such a system should be able to understand the user’s ambiguous requests and contextualize them to the visual cues and textual content found in a document image to edit the text and its structured layout. Unparalleled advances in deep learning, language/foundation models, and generative AI has made it possible to utilize multimodal cues for automated document manipulation. However, there are several unresolved challenges towards realizing this goal: (1) extracting intent, sequence of actions, and visual attributes of the document, (2) document hierarchical parsing for layout understanding, (3) understanding direct and indirect references to document objects, (4) inferring local and global relations between embedded text and visual objects through multimodal (text+visual+layout) signals, and (5) grounding the requested edits to localized objects and scene text. To address all these major challenges, it is critical that we develop more reliable and accurate AI techniques for automated document editing that can handle the inherent complexity of the task. To this end, we propose the first "ICDAR 2023 Competition on Language-Guided Document Editing (DocEdit)". The main goal of this competition is to bring together researchers from various domains such as natural language processing, computer vision, machine learning, human computer interaction, data mining, graphics, and multimedia to explore artificial intelligence solutions for language-guided document editing.

This competition will be of interest to researchers working in natural language processing, computer vision, multimodal deep learning, document intelligence, signal processing, artificial intelligence, information extraction and retrieval, and data mining and in particular to those who are interested in the applications of AI in document understanding. ICDAR 2023 will be sought after destination for researchers from document analysis and representation, computer vision and language communities. With the rise of Transformer and Stable Diffusion based techniques in AI, the competition will be of very high interest to researchers in generative AI and vision+language modeling. The competition aims to attract young researchers from universities, early-career AI practitioners as well as industry veterans in the document intelligence space.

Competition Outline

Dataset

DocEdit dataset provides pairs of document images and user edit requests along with the ground truth edit command. Each edit request is mapped to an executable command that can be simulated in a real-world document editing software. We present around 15K scanned PDF documents comprising edits performed on publicly available PDF documents. The dataset has a diverse mix of edit operations (add, delete, modify, split, merge, replace, move, copy) and reference types (direct, object referring, text referring) from the users.

Evaluation Rules

Participants will need to evaluate their approach on the test set, and will be expected to submit the following:
1. ID-wise predictions on the test set
2. A short and complete system description of the proposed approach
The organizers will tabulate the results for each task and present it at ICDAR 2023 in San Jose, California, USA .
Note that you do not need to attend ICDAR 2023 to participate in this competition.

Tasks

Edit Command Generation: Input - Document image and the user text request; Output - ACTION (< Component >, < Attribute >, < Initial_State >, < Final_State >)

Evaluation Metrics

Edit Command Generation: Exact match accuracy (EM %) and ROUGE-L score.

Competition Rules

The competition is open for participants from both industry and academia as long as they comply with the dataset license which can be found along with the dataset. The competition is open for both students and professionals who want to make a contribution to the field of Document AI.
Each participant team can include up to a maximum of 10 people from one or more affiliations. For the sake of fairness to smaller research groups, we will not allow bigger research groups to participate as a single team.
Winners teams will have certificates listing the names of their members in the exact format and order as they were registered.
Participants can participate in one or more tasks.
The total scores for the task will be used to obtain a single final score used to pick the overall winning team.
Participants will be allowed to use their own datasets for training. However, they must notify the organizers about this.
Submissions will be done via this >form. Participants will be required to submit the results of their systems in the expected CSV format. We might not consider submissions which do not conform to the correct file format. You may email docedit.icdar2023@gmail.com for queries.
Winners will be required to submit their code to verify the output.
Participants must provide a description of the methods used to produce the results submitted. In the final competition paper, we will summarize these descriptions when we describe the submitted systems. If external datasets were used, participants will also have to provide a full description of these. We reserve the right to disqualify submissions which do not provide a sufficiently detailed description of their system.
To be fair to all participants, any deadline extensions given will apply to all participants, not just to individual research groups who might request them.

Data Download

Command Generation

Training

Test

DocEdit@ICDAR 2023