Genalog: Improving NLP Accuracy on OCR Documents

NLP
Microsoft
conference
paper
Our DI-2021 paper + open-source Genalog: action-based model on synthetic document images to improve NER accuracy on OCR output.
Author

synesis

Published

July 21, 2021

Lights, Camera, Action! Genalog framework. Image: LinkedIn.

See how we built an action-based model on synthetic document images to improve NER accuracy on OCR output! We are also open-sourcing Genalog to help further the research on model robustness on OCR text!

Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents. Document Intelligence Workshop @ KDD 2021: https://document-intelligence.github.io/DI-2021/files/di-2021_final_22.pdf

Genalog: https://microsoft.github.io/genalog/index.html

Amit Gupte, Alexey Romanov, Sahitya M., dalitso banda, Jianjie Liu, Muhammad Raza Khan, Lakshmanan Ramu, Benjamin Han, Soundararajan Srinivasan.

Originally posted on LinkedIn.