Abstract:As technologies are rapidly evolving in the area of storage, recognition and displays, the vision of multi-modal interaction is positioning the human needs central to pervasive access to services. Users expect better looking interfaces that are also easy to use to improve high efficiency and productivity and to focus on task itself instead of tools as providing the invisibility in pervasive computing. However, it is hard to obtain the right service at the right time naturally in the multimedia environment based on current interaction approaches. In this paper, a framework is proposed for multi-modal interaction which offers the sketch and speech interactive models without limitation of the diversity and richness of the provided services in the pervasive computing environment. Furthermore, it is applied to describe sample application and shows how it is used to create a new interface with high performance.