Genalog: Improving NLP Accuracy on OCR Documents
NLP
Microsoft
conference
paper
Our DI-2021 paper + open-source Genalog: action-based model on synthetic document images to improve NER accuracy on OCR output.

See how we built an action-based model on synthetic document images to improve NER accuracy on OCR output! We are also open-sourcing Genalog to help further the research on model robustness on OCR text!
Lights, Camera, Action! A Framework to Improve NLP Accuracy over OCR documents. Document Intelligence Workshop @ KDD 2021: https://document-intelligence.github.io/DI-2021/files/di-2021_final_22.pdf
Genalog: https://microsoft.github.io/genalog/index.html
Amit Gupte, Alexey Romanov, Sahitya M., dalitso banda, Jianjie Liu, Muhammad Raza Khan, Lakshmanan Ramu, Benjamin Han, Soundararajan Srinivasan.
Originally posted on LinkedIn.