News
- August 21-26, 2023: DocEdit Competition at ICDAR 2023
- Mar 15, 2023: DocEdit is open for submission
- Mar 10, 2023: DocEdit website is launched
Billions of digital documents and PDF's are created, edited or shared each year ubiquitously, with a majority of them being designed by amateur users. Professional document editing requires a certain level of expertise to perform complex edit operations. To make editing tools accessible to increasingly novice users, there is a need for intelligent document assistant systems that can make or suggest edits based on a user’s natural language request. Such a system should be able to understand the user’s ambiguous requests and contextualize them to the visual cues and textual content found in a document image to edit the text and its structured layout. Unparalleled advances in deep learning, language/foundation models, and generative AI has made it possible to utilize multimodal cues for automated document manipulation. However, there are several unresolved challenges towards realizing this goal: (1) extracting intent, sequence of actions, and visual attributes of the document, (2) document hierarchical parsing for layout understanding, (3) understanding direct and indirect references to document objects, (4) inferring local and global relations between embedded text and visual objects through multimodal (text+visual+layout) signals, and (5) grounding the requested edits to localized objects and scene text. To address all these major challenges, it is critical that we develop more reliable and accurate AI techniques for automated document editing that can handle the inherent complexity of the task. To this end, we propose the first "ICDAR 2023 Competition on Language-Guided Document Editing (DocEdit)". The main goal of this competition is to bring together researchers from various domains such as natural language processing, computer vision, machine learning, human computer interaction, data mining, graphics, and multimedia to explore artificial intelligence solutions for language-guided document editing.
This competition will be of interest to researchers working in natural language processing, computer vision, multimodal deep learning, document intelligence, signal processing, artificial intelligence, information extraction and retrieval, and data mining and in particular to those who are interested in the applications of AI in document understanding. ICDAR 2023 will be sought after destination for researchers from document analysis and representation, computer vision and language communities. With the rise of Transformer and Stable Diffusion based techniques in AI, the competition will be of very high interest to researchers in generative AI and vision+language modeling. The competition aims to attract young researchers from universities, early-career AI practitioners as well as industry veterans in the document intelligence space.
Participants will need to evaluate their approach on the test set, and will be expected to submit the following:
1. ID-wise predictions on the test set
2. A short and complete system description of the proposed approach
The organizers will tabulate the results for each task and present it at ICDAR 2023 in San Jose, California, USA .
Note that you do not need to attend ICDAR 2023 to participate in this competition.
Command Generation | Training | Test |
All deadlines are end of day, anywhere on earth (UTC-12).
Contact us: docedit.icdar2023@gmail.com or puneetm@umd.edu