It’s not unusual to see companies tout their AI research innovations, but it’s far rarer for those discoveries to be quickly deployed to shipping products. Xuedong Huang, CTO of Azure AI cognitive services, pushed to integrate it into Azure quickly because of the potential benefits for users. His team trained the model with images tagged with specific keywords, which helped give it a visual language most AI frameworks don’t have. Typically, these sorts of models are trained with images and full captions, which makes it more difficult for the models to learn how specific objects interact.
“This visual vocabulary pre-training essentially is the education needed to train the system; we are trying to educate this motor memory,” Huang said in a blog post. That’s what gives this new model a leg up in the nocaps benchmark, which is focused on determining how well AI can caption images they have never seen before.
But while beating a benchmark is significant, the real test for Microsoft’s new model will be how it functions in the real world. According to Boyd, Seeing AI developer Saqib Shaik, who also pushes for greater accessibility at Microsoft as a blind person himself, describes it as a dramatic improvement over their previous offering. And now that Microsoft has set a new milestone, it’ll be interesting to see how competing models from Google and other researchers also compete.