MAMI: Multimodal Automatic Mobile Annotations


We present MAMI (i.e. Multimodal Automatic Mobile Indexing), a mobile-phone prototype that allows users to annotate and search for digital photos on their camera phone via speech input. MAMI is implemented as a mobile application that runs in real-time on the phone. Users can add speech annotations at the time of capturing photos or at a later time. Additional metadata is also stored with the photos, such as location, user identification, date and time of capture and image-based features. Users can search for photos in their personal repository by means of speech. MAMI does not need connectivity to a server. Hence, instead of full-fledged speech recognition, we propose using a Dynamic Time Warping-based metric to determine the distance between the speech input and all other existing speech annotations. We present our preliminary results with the MAMI prototype and outline our future directions of research, including the integration of additional metadata in the search.

Figure 1. MAMI's in capture and annotation mode (left) and in search mode (right)

Related Papers

"Multimodal Photo Annotation and Retrieval on a Mobile Phone" Xavier Anguera, JieJun Xu and Nuria Oliver
Proceedings of Int. Conf. Multimedia Information Retrieval (MIR'08) 2008 Conference. Oct 2008

MAMI: Multimodal Annotations on a Camera Phone, Xavier Anguera and Nuria Oliver, Proceed. Intl. Conf. on Mobile HCI. Amsterdam. Sept 2008

Mobile and Multimodal Personal Image Retrieval: A User Study , Xavier Anguera, Nuria Oliver and Mauro Cherubini, Proceed. Intl. Workshop on Mobile Information Retrieval (MobIR'08). SIGIR'08. Singapore. July 2008.