DI-2021 @ KDD 2021: Cha Zhang Talk Recording

NLP

conference

Recording of Cha Zhang’s talk: Visual Document Intelligence in the Wild.

Author

synesis

Published

August 17, 2021

Recording of invited talk 1/6 in Document Intelligence Workshop @ KDD2021 given by Cha Zhang, IEEE Fellow and Partner Engineering Manager at Microsoft Azure AI.

Title: Visual Document Intelligence in the Wild https://youtu.be/dw1RXnl7rbI

Abstract: Recent progress in AI has brought Optical Character Recognition (OCR) and document understanding to a whole new level. In this talk, we will first provide an overview of Microsoft’s latest OCR engine (aka OneOCR), which applies the latest deep learning techniques to recognize mixed printed and handwritten text in over 100 languages, with text lines along arbitrary orientations (even flipped), and with varying degrees of quality and distortion. OneOCR achieves industry leading accuracy on a wide range of application scenarios such as document, invoice, receipt, business card, slide, menu, book cover, poster, GIF/MEME, street view, product label, handwritten note and whiteboard. We then introduce another breakthrough technology developed at Microsoft for document understanding: LayoutLM. LayoutLM bridges computer vision and language, producing state-of-the art results on a number of tasks, including document segmentation, classification, TextVQA, and others. Combining OneOCR and LayoutLM, we created the Form Recognizer API in Azure AI, which extracts text, key-value pairs, tables, and structures from documents in the wild. I will demonstrate some of the capabilities of Form Recognizer, highlight its core component technologies, and explain the roadmap ahead.

Program committee (alphabetical): Doug Burdick, Dave Lewis, Yijuan Lu, Hamid Motahari, Sandeep Tata Chair: Benjamin Han

Originally posted on LinkedIn.