EXPERIMENTAL EVALUATION OF AUDITORY DISPLAY AND SONIFICATION OF TEXTURED IMAGES

Antonio Cesar Germano Martins(1) , Rangaraj Mandayam Rangayyan (2)

(1) Laboratório de Sistemas Integráveis (LSI)
Escola Politécnica da Universidade de São Paulo
Av. Prof. Luciano Gualberto, 158, Trav. 3, 05508-900 - SÃO PAULO - SP - BRASIL.
Email : amartins@lsi.usp.br

(2) Department of Electrical and Computer Engineering
The University of Calgary

ABSTRACT

In order to verify the potential of proposed auditory display and sonification methods for aural analysis of textured images, a set of experiments was designed and was presented to 10 subjects. The results obtained and limitations of the methods are discussed.



1- INTRODUCTION

It has been shown that the auditory system is very useful for task monitoring and analysis of multidimensional data [1]. However, the use of sound in scientific data analysis is rather rare, and analysis and presentation of data are done almost exclusively by visual means. Even when data are the result of sounds, such as an ultrasound exam or sonar, they are first mapped to an image and visual analysis is performed. Sonification of data can be generally represented as in Figure 1. Data characteristics need to be mapped to sound attributes in order to be presented and analyzed. This mapping is an important task, and appropriate design of the mapping function is the difference between a successful sonification procedure or failure.

Figure 1 - Sonification process.

Many processes related to our everyday experiences can be used to map data to sound, such as metaphorical and affective association [2]. Common metaphorical associations include high frequency for an increase in a data characteristic or a rapidly changing parameter of the data, and conversely, low frequency for a decrease or slow rate of change. Affective association tries to associate the feelings evoked by a sound with the data controlling the sound. As an example, in a project on the identification of features in magnetic resonance images of the brain, an initial suggestion from our collaborators in the field of radiology was to map pleasant sounds to healthy tissue and unpleasant sounds to diseased areas. A few mappings from data to sound have been established in the literature [1]. Walker and Kramer [3] argued that one of the considerations that should be taken into account is that a particular mapping choice must rely on the performance of a task, rather than on merely the feeling of the designers that the choice made is intuitive; their preliminary result shows that what is intuitive for some may not be intuitive for others. We are particularly interested in sonification of textured images [4]. Among the many problems involved in this task, the first that comes to mind is that sound is essentially a change of pressure with time, i.e., sound is something that evolves in time, whereas an image is a static object. How can something static be mapped into something that is time-varying? One way of addressing this problem is to choose a sound signal, for example a sinusoidal oscillator, and associate pixel characteristics of the image to some of the parameters of the oscillator. Meijer [5] proposed a sonification method using this approach, with the amplitude of a sinusoidal oscillator being proportional to the pixel gray level and the frequency being dependent on the position of the pixel. The image is scanned one column at a time, and the associated sinusoidal oscillator outputs are presented as a sum, followed by a click before the presentation of the next column. The sound signal contains all the information of the image, but analysis can be very difficult. The mapping seems to be "natural", but is arbitrary and needs to be tested with a large group of subjects. We have developed methods for auditory display and sonification of textured images drawing support from the model for speech generation [6]. Random texture may be modeled as the result of filtering (convolving) a random noise field with a "spot" [7]. Ordered or quasi-periodic texture could be seen as the result of the convolution of a spot (texture element or texton) with an ordered field of impulses. The models compare very well with the models for speech, where we have voiced speech as the result of filtering a quasi-periodic glottal impulse train by the characteristics of the vocal tract, or unvoiced speech due to filtering of a random noise input by the vocal tract. We have drawn an analogy between speech and texture synthesis [4,9] since both can be generally modeled as the result of a convolution of an impulse field with a basic wavelet. In the case of random texture, we convert the two-dimensional (2D) image into one-dimensional signals by taking projections (Radon transform) of the image at several angles [8,9]. By the Fourier slice theorem [10], we know that the Fourier transform of a projection of an image is equal to the radial slice of the 2D Fourier transform of the image at the angle of the projection. By presenting several projections as sound, we deliver the spectral characteristics of the random texture. The mapping of the projection data to sound has been fully discussed in a previous publication [4]. Linear prediction was used to model the projection data and generate sound signals extended to 0.5 s per projection at a sampling rate of 8 kHz. For quasi-periodic texture, we map salient attributes of the texture element and periodicity to sound parameters [4]. In particular, we use projections of the texture element as basic wavelets to synthesize voiced-speech-like sounds, with the pitch being a function of the vertical periodicity. The horizontal periodicity is used to provide rhythm in the presentation of the series of projections of the texture element. The two methods were designed to have close connections to the models for texture synthesis and derived in the context of speech models. We believe that our methods have the desired aspect of natural mappings, but this by itself is not an adequate proof of the concept. In order to verify the potential of the methods for auditory analysis of textured images, we conducted a set of experiments with several subjects. The following sections present details of the experiments and the results obtained.

2- AUDITORY EXPERIMENTS

The experiments were designed to verify the validity of the model used for the auditory mappings. The main purposes of the experiments were, for the case of random texture, to verify if subjects could: and in the case of periodic texture to verify if: We designed a total of 15 experiments, with 10 for random texture and 5 for periodic texture. For random texture, we conducted the following experiments: