Our Research Direction: Designing for Accessibility
In our initial investigations, we identified a major obstacle to achieving digital equity: the “accessibility gap.” This term refers to the lag between the launch of a new feature and the development of corresponding assistive technologies. To bridge this divide, we are transitioning from traditional reactive tools to proactive systems that are embedded within the interface itself.
Research Pillar: Utilizing Multi-System Agents to Enhance Accessibility
Multimodal AI tools represent a promising avenue for creating more accessible interfaces. In specific prototypes, such as our initiatives focused on web readability, we’ve implemented a model featuring a central Orchestrator that manages reading tasks strategically.
Instead of navigating a convoluted maze of menus, users can rely on the Orchestrator to maintain a shared context. This system understands the document and enhances accessibility by assigning tasks to specialized sub-agents.
- The Summarization Agent: Masters complex documents by distilling crucial information and delegating specific tasks to expert sub-agents, ensuring that profound insights are presented clearly and understandably.
- The Settings Agent: Dynamically adjusts the user interface, including text scaling, to improve usability.
Our research indicates that this modular approach allows users to interact with systems more intuitively. Specialized tasks are efficiently managed by the appropriate expert, eliminating the need for users to search for the “correct” button.
Progressing Towards Multimodal Fluency
Additionally, our research aims to advance beyond basic text-to-speech technology to foster multimodal fluency. Leveraging Gemini’s capability to process voice, vision, and text concurrently, we’ve developed prototypes that can convert live video into interactive audio descriptions in real-time.
This effort extends beyond simply describing scenes; it enhances situational awareness. During our co-design sessions, we have noted that enabling users to ask for specific visual details as they occur can significantly reduce cognitive load, transforming a passive viewing experience into an engaging, conversational exploration.