In the near future, an AI assistant (similar to XR4ED project) will make itself at home inside your ears, whispering guidance as you go about your daily routine. It will be an active participant in all aspects of your life, providing useful information as you browse the aisles in crowded stores, take your kids to see the pediatrician, even when you grab a quick snack from a cupboard in the privacy of your own home. It will mediate all your experiences, including your social interactions with friends, relatives, coworkers and strangers. On the positive side, these assistants will provide valuable information everywhere you go, precisely coordinated with whatever you’re doing, saying or looking at. The guidance will be delivered so smoothly and naturally, it will feel like a superpower, a voice in your head that knows everything, from the specifications of products in a store window, to the names of plants you pass on a hike, to the best dish you can make with the scattered ingredients in your refrigerator. On the negative side, this ever-present voice could be highly persuasive, even manipulative, as it assists you through your daily activities, especially if corporations use these trusted assistants to deploy targeted conversational advertising.
The risk of AI manipulation can be mitigated, but it requires policymakers to focus on this critical issue, which thus far has been largely ignored. Of course, regulators have not had much time, the technology that makes context-aware assistants viable for mainstream use has only been available for less than a year. The technology is multi-modal large language models, and it is a new class of LLMs that can accept input not just text prompts, but also images, audio and video. This is a major advancement, for multi-modal models have suddenly given AI systems their own eyes and ears and they will use these sensory organs to assess the world around us as they give guidance in real-time. The first mainstream multi-modal model was ChatGPT-4, which was released by OpenAI in March 2023. The most recent major entry into this space was Google’s Gemini LLM announced just a few weeks ago. The most interesting entry is the multi-modal LLM from Meta called AnyMAL that also takes in motion cues. This model goes beyond eyes and ears, adding a vestibular sense of movement. This could be used to create an AI assistant that doesn’t just see and hear everything you experience; it even considers your physical state of motion.
More information:
https://venturebeat.com/ai/2024-will-be-the-year-of-augmented-mentality/