The repo comprises of 400 government forms, which is a sub-part of the test set used for evaluating form structure extraction models developed by the MDSR Research Labs at Adobe:
-
'Document Structure Extraction using Prior based High Resolution Hierarchical Semantic Segmentation'.
Accepted and presented at ECCV 2020 (Paper link)
-
'Multi-Modal Association based Grouping for Form Structure Extraction'
Published in WACV 2020 proceedings (Paper link)
In order to facilitate further research in this area, we aim to make 400 forms available (300 for ECCV and 100 for WACV publications respectively). Further details can be found in their respective directories.
In case you use one of these data directories, please cite the corresponding paper or otherwise cite both of them if you intend to use both data directories as follows:
{ @inproceedings{sarkar2020document, title={Document Structure Extraction Using Prior Based High Resolution Hierarchical Semantic Segmentation}, author={Sarkar, Mausoom and Aggarwal, Milan and Jain, Arneh and Gupta, Hiresh and Krishnamurthy, Balaji}, booktitle={European Conference on Computer Vision}, pages={649--666}, year={2020}, organization={Springer} }{ @inproceedings{aggarwal2020multi, title={Multi-Modal Association based Grouping for Form Structure Extraction}, author={Aggarwal, Milan and Sarkar, Mausoom and Gupta, Hiresh and Krishnamurthy, Balaji}, booktitle={The IEEE Winter Conference on Applications of Computer Vision}, pages={2075--2084}, year={2020} }
This dataset is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.