In collaboration with Computer Vision Centre (CVC), Barcelona, the researchers at Centre for Visual Information Technology (CVIT), IIIT-H are trying to make document image analysis more fluent and interactive.
They said that spammers typically embed a bit of text inside images in a bid to circumvent spam-blocking software as there is still no easy way for computers to read and recognise information that is in the form of an image. “As a community, we are now looking for a superior understanding beyond just recognition,” said CV Jawahar, head of CVIT. He said that they are creating a system that can initiate a dialogue with different forms of written text such as that in a document, a book, an annual report, a comic strip, and so on. “Known as Document Visual Question Answering (DocVQA), research here is centered around guiding machines to understand human requests and respond appropriately to them, eventually in real-time,” said Jawahar.
The researchers said that the questions could range from very simple ones such as asking the system to identify what is in the image (for instance, is it a person, an animal, a food item, etc.), to the more complex — identifying persons (if it is a celebrity, system is asked to identify who the celebrity is) and what is happening in the image.