"In the English language there's something like 180,000 words, and we only use 3,000 to 5,000 of them. If you're trying to do voice recognition, there's a really small set of things you actually need to be able to recognize. Think about how many objects there are in the world, distinct objects, billions, and they all come in different shapes and sizes. So the problem of search in vision is just vastly larger than what we've seen with text or even with voice."
Clay Bavor, Google