Back in February, when Meta CEO Mark Zuckerberg announced that the company was working on a range of new AI initiatives, among the projects that he noted was how Meta is developing new experiences with text, images, video, and ‘multi-modal’ elements.

Recently, Meta finally added context to what multi-modal meant. Meta has outlined how this kind of AI could work, with the launch of ImageBind, a process that enables AI systems to better understand multiple inputs for more accurate and responsive recommendations.

Multi-Modal

As per Meta:

“When humans absorb information from the world, we innately use multiple senses, such as seeing a busy street and hearing the sounds of car engines. Today, we’re introducing an approach that brings machines one step closer to humans’ ability to learn simultaneously, holistically, and directly from many different forms of information – without the need for explicit supervision. ImageBind is the first AI model capable of binding information from six modalities.”

Meta’s new ImageBind process essentially allows the system to learn association, not just between the above-mentioned elements, but also including audio, depth (via 3D sensors), and even thermal inputs! Combined, these elements can provide more accurate spatial cues that can enable the system to produce more accurate representations and associations, which take AI experiences a step closer to emulating human responses.

The potential use cases are significant, and if Meta’s systems can establish more accurate alignment between these variable inputs, that could advance the current state of AI tools, which are primarily text and image-based, to an entirely new realm of interactivity. Such could facilitate the creation of more accurate VR worlds, a key element in Meta’s advance towards the Metaverse. For example, through Horizon Worlds, people can create their own VR spaces, but the technical limitations of such, at least right now, meaning that most Horizon experiences are still rather basic – like 80s video game basic.

The Wrap

If Meta can provide more tools that allow users to create whatever they want in VR, simply by speaking it into existence. That could facilitate a whole new realm of possibility, which could quickly make its VR experience more attractive and engaging. But things aren’t quite there yet, and Meta hopes that advances such as these would help bring it a step closer to achieving its Metaverse vision. Meta also notes that ImageBind could be used in more immediate ways to advance certain in-app processes. These are the early usages of the process, and it could end up being one of Meta’s more significant advances in the field of AI development.

Sources

https://bit.ly/3LKUHBu