The Indexical Inscription of the Acoustic

John Puterbaugh, 1994

 

 

Introduction

Invariant sound reproduction requires a particular form of representation based on order and persistence over time, in other words memory. Metaphors provide one approach towards understanding memory, allowing someone to conceptualize one thing in terms of another. Metaphors establish heuristics for problem solving by incorporating poetic, evocative, or structural forms which operate through analogy, transference, and organization, respectively. Models are one example of metaphors. Technological change has stimulated models for memory ranging from the aviary, storehouse, the wax tablet, hydraulics and clockwork, to more contemporary models centered around the computer [see Rose 1992, Krell 1990, Courruthers 1993, Yates 1992]. Advancements in molecular biology and neuroscience, having established a physiological basis for memory, offer new models for memory. These models can be used to design technologies for sound reproduction.

The following pages are a preliminary investigation into memory and its role in technologies used to reproduce sound. Three instruments are taken as examples of sound reproducing devices: the carillon, phonograph, and compact disc player. Terminology, description, and classification based on these instruments will be defined. The ear and human memory will be given as alternatives to these instruments, providing another paradigm for the recording and reproduction of sound.

Instruments

The carillon contains a musical cylinder with metal notepins (elevations), claviers (levers), hammers, and bells. As the cylinder rotates, the lever raises and releases the hammer causing it to strike the bell. Mechanical instruments such as music boxes and player pianos are based on this simple procedure used by the carillon to reproduce music: an inscription, carried by a medium, triggers an action on a sound source. Specifically, elevations and impressions are retained by a cylinder, circular disk, or roll of paper, which strike, pluck, bow, or blow a sound-producing device (idiophone, chordophone, aerophone). In most mechanical instruments the recording process involves a human agent who transcribes a piece of piece music and then inscribes it onto a medium that is readable by the sound-reproducing instrument. The sound generation itself is never actually a reproduction because the instrument plays the piece of music through its bells, strings, pipes, etc. The ability of these instruments to reproduce music involves the use of explicit memory stored by an inscription within a medium, external to the sound generation.

Using a phonograph to reproduce sound involves recording through microphones (electroacoustic transducers). A record retains quasi-permanent impressions of acoustic vibrations through vertical or lateral grooves embossed on its surface. When played back, the stylus acts as a vehicle that responds to the recorded inscriptions. These inscriptions are engraved sequentially beginning at the perimeter and wind their way towards the center of the disk. Early wax phonographs were engraved directly by the stylus without the process of electrical transduction. In either case, the process of recording was no longer done directly by a human agent. A record's grooves are continuously varying inscriptions that are analogs of the sound's changing amplitude. This is in contrast to the discrete nature of most mechanical inscriptions. Phonographs use loudspeakers to reproduce recorded sound.

A compact disk, or CD, uses a laser beam instead of a stylus for transmitting the stored representation of sound. In the case of the CD the representation is digital, in which a dynamic property, its amplitude, is measured at discrete points in time. The amplitude is quantized and stored as a sequential pattern of bits, encoded as small pitts printed upon the disc. The surface of the CD is transparent which allows the laser to pick up changes in reflectivity caused by the pitts. These patterns of reflectivity are translated into an electrical signal which can be converted into an analog signal. The inscription itself is still sequential, like the mechanical instruments and phonographs, but is generated by sampling the sound, a time domain process.

Definitions

Reproducing an event such as music involves recording. Viewed as a process, recording is transcribing, a transformation - the transduction and inscription of sound. Each of the instruments uses some form of transduction (mechanoacoustic or electoacoustic) and inscription (discrete, analog, digital). Viewed as a thing, recording is a materialization; Theodore Levin calls it, a "reification which transforms an acoustic-temporal event into a trace" [Levin 1990]. Recording, as a material trace of some sound event, is based on an actual causal relation between the sound and its representation.

As mentioned above, recording is a transcriptive process. Transcription means to make a copy and requires some form of transduction. Transduction is the process of transmitting one form (an input) of energy to another (an output), which often involves some type of change between the two forms. Since copying involves changing one form of energy into another, transcription necessarily involves transduction. In terms of sound reproduction, transducers act as sound generators and receivers [Pohlmann 1989]. In the case of the carillon, mechanical energy from the hammers and levers is transduced into acoustic energy through the bell. Microphones and loudspeakers convert between electrical and acoustical energy. There are many types of transducers that can be used for sound reproduction: electromagnetic, electrostatic, piezoelectric, dynamic, magnetic, and carbon [Parker 1988].

Retaining a transcription involves inscription. An inscription is the registering of quantities on or within some medium. It contains and retains the information required for invariant reproduction. An inscription that denotes its object, sound or music, through some causal correlation or physical connection will be termed an indexical representation. The carillon, phonograph and CD use inscriptions that are indexical as a means for representing sound. An indexical inscription is the simplest form of memory that can be used for invariant reproduction.

An indexical inscription such as the grooves on a phonograph contain many individual indices. Indices operate through contiguity and not resemblance. In other words, a particular sound intensity is encoded because it is present at the time of recording. It is not due to some intrinsic character or quality it possesses that can be discriminated. This means that an index has no significant resemblance to the object it represents [Pierce]. If the basis of representation was due to resemblance, the inscription would be characterized as iconic.

The Ear

The ear, consisting of outer, middle, and inner parts, acts as a very specialized transducer converting mechanical vibrations into neural firing patterns. The outer ear filters incoming sound in two ways - through the pinna (folds and flaps on the exterior ear) and through the ear canal (a hollow chamber). The folds of the pinna reflect the sound, adding many short time delays, functioningy similarly to a comb filter [Rodgers 1981]. This filtering boosts and attenuates various frequencies, an important role in the front-to-back and vertical localization of sound in the environment. The ear canal is a resonator for frequencies between 2000 and 5000 Hz with a peak at 3000 Hz, an area comprising most acoustic energy produced by music and speech [Handel 1993]. The middle ear connects the ear canal with the fluid filled cochlea. Fluids have a higher acoustic impedance than air (meaning it is difficult for vibrations in the air to cause water to vibrate) which the middle ear corrects by increasing the energy between the ear drum (at the end of the ear canal) and the oval window (at the beginning of the cochlea) through a mechanical mechanism made up of the hammer, anvil, and stirrup [Kelly 1992]. These vibrations, resulting from the mechanisms in the middle ear, produce a complex spatio-temporal pattern of displacements along the basilar and tectoral membrane (located in the cochlea) [Yang 1992]. The displacements along the basilar membrane cause the cilia (hairlike filaments on the basilar membrane) to bend. The cilia are connected to inner hair cells. The cilia together with the inner hairs provide the mechanical-to-neural transduction by connecting the basilar membrane to the auditory nerve. The inner ear has been described as a frequency analyzer in terms of Fourier analysis [see Kelly 1992] but it is probably more suitable to consider it as a cascade of low-pass filter sections [see Kates 1993]. Joe Schoome confirms this stance by stating that frequency analysis is achieved in the cochlea by a bank of low pass frequency filters based on the mechanical gradients of the basilar membrane [Bats]. In summary, the outer ear gathers and filters the sound; the middle ear transduces air vibrations into fluid vibrations; the inner ear divides the sound into frequency components which are transduced in parallel into neural firing patterns.

As shown, the ear's transduction process alters the content of the original sound significantly. Only information based on features of the modified sound is transmitted. Further modifications, other than the physical apparatus of the ear described above, are based on a listener's past experiences. Incoming sensory signals are routed in parallel to many processing areas which are reciprocally connected. These areas send information back to the ear through the reciprocal connections. There are actually more connections feeding from the brain down to the ear than there are from the ear feeding into the brain, top-down versus bottom-up connections, respectively [Reed and Singer 1993]. The brain can alter the incoming signals by sending messages to the cochlea through links to the outer hair cells (located underneath the tectoral membrane), allowing the adjustment of frequencies during analysis. For example, frequencies that are present in a familiar voice can be boosted during conversation while noisier elements are simultaneously attenuated. As Joe states, "There is growing evidence that the nature of our auditory experience is not a precise reflection of the physical properties of stimuli but rather a highly dynamic reconstruction process which is not only modulated at a rapid time scale by state changes and top-down projections but is also subject to slowly developing and long lasting modifications of processing circuits."

Memory

Memory is a capacity. Recall and recollection, storage and categorization, association and reconstruction are facets of this capacity. Generally speaking memory involves information that is received through the senses, filtered in terms of past experience and current context, and then stored. Storage creates connections based on the content of the input and its associations. Remembering is the reactivation of the information. Learning is closely related to memory and involves many activities including conditioning and training, habituation and imprinting, trial-and-error and insight [Rahmann 1992]. Von Foerster suggests that the faculties to perceive, remember, and infer, used in learning and cognition, cannot be isolated: "(i) Omit perception: the system is incapable of representing internally environmental regularities. (ii) Omit memory, the system has only throughput. (iii) Omit prediction, i.e., the faculty of drawing inferences: perception degenerates to sensation, and memory to recording" [Von Foerster 1967].

In terms of the neural biology of memory, Eric Kandel states that, "much of what is know can be summarized in just four principles: (1) memory has stages and is continually changing; (2) long-term memory may be represented by physical changes in the brain; (3) the traces for memories are localized in multiple regions throughout the nervous system; and (4) reflexive and declarative memories may involve different neuronal circuits" [Kandel 1992]. The term "stages" refers to various levels of processing in memory such as short, medium, and long-term storage. For example, about 16 bits of information (out of the roughly 100,000,000,000 bits received through the sensory organs) can enter short term storage, which has a capacity of 100 - 400 bits lasting around 6 - 25 seconds [Rahmann 1992]. Reflexive means memories that are gradually accumulated such as perceptual and motor skills, while declarative memory depends on some form of conscious reflection for retrieval [Kandel 1992].

Concepts such as recall and recollection or reflexive vs. declarative are not very useful for specifying a particular model of memory. A model becomes more feasible when learning stated in terms of the notion of pattern detection and association, in other words, locating concomitant properties found in our perception of the environment. In Von Foerster's terms, detecting concomitant properties is the basis of inductive inference (the principle of generalization), which is one form of learning. A recording system based on the auditory system involves a parallel, frequency-based input. Rather than being solely indexical such as the carillon, phonograph, and CD, the input would be based on reseblance: an iconic feature detector. More importantly, the recording system would have an auditory memory (history) which can beused to modify incoming information through feedback. Analog and digital cochlea models have been described by Lyons 1992, Yang 1992, and Kates 1991, 1993. Just as the ear transduces sound patterns into neural patterns, a feature detector, based on the cochlea, could be fed into an artificial network of neurons, a neural network. Neural networks are systems with the ability to learn: the aquisition or arrangement of processes needed to solve some problem. Problems can be thought of as tasks a particular "user" of a network wants accomplished. Neural networks are useful for categorizing groups of information and recognizing patterns within an envirnment. A network can be trained according to learning schemes such as supervised or unsupervised learning. These learning shemes can be described in terms of a teacher. In the case of supervised learning, a teach external to the network provides feedback by specifying the desired output every time the network, itself, generates an output [Quinlan 1991]. Unsupervised learning does not utilize a teacher. This learning is based on the content of the data it receives, tending to emphasize information such as "membership in clusters of similar patterns or highly correlated features" [Hrycej 1992]. In other words, rather than explicitly telling the network what to learn and output, they learn by experience: their teacher is the environment they are contained within. Retrieving information from networks is based on the content of the information presented previously. Rather than giving an address, telling the network where to find information to recall, the current input pattern's similarity to past associations is used to reproduce the desired output. This type of memory is generally called "content-addressable" memory. Content and associative memory models have been proposed by Anderson 1968, Nakano 1972, Kosko 1988 and Kanerva 1988 [see Anderson, Pellioisz and Rosenfeld 1990].

Bibliography


Carruthers, M. The Book of memory. New York: Cambridge University Press, 1993.

Handel, S. Listening: An Introduction to the Perception of Auditory Events. Cambridge: MIT Press, 1989.

Kandel, E.R. and Schwartz, J.H. Principles of Neural Science, 2nd edn. Amsterdam: Elsevier Science, 1985.

Kates, J.M. A Time-Domain Digital Cochlear Model. IEEE Transactions on Signal Pro cessing, vol. 39, no. 12, December 1991.

Kates, J.M. Accurate Tuning Curves in a Cochlear Model. IEEE Transactions on Speech and Audio Processing, vol. 1, no. 4, October, 1993.

Krell, D.F. Of Memory, Reminiscence, and Writing. Bloomington: Indiana University Press, 1990.

Levin, T. For the Record: Adorno on music in the age of its technological reproducibility. October 55, 1990.

Parker, S.P., ed. Acoustic Souce Book. New York: McGraw-Hill, 1988.

Peirce, C.S. Collected Papers of Charles Sanders Peirce, Vol. I and vol. II. London: Oxford University Press, 1960.

Pohlman, K.C., ed. Advanced Digital Audio. Indiana: SAMS, 1991.

Rahmann, H. and Rahmann, M. The Neurobiological Basis of Memory and Behavior. New York: Springer-Verlag, 1992.

Reed, R. and Singer, W. Sensory Systems. Current Biology, vol. 3, no. 4, August, 1993.

Rodgers, C.A. Pinna Transformations and Sound Reproduction. Journal of the Audio Engineering Society, vol. 29, no. 4, April, 1981.

Rose, S. The Making of Memory. New York: Doubleday, 1993.

von Foerster, H. What is memory that it may have hindsight and forsight as well. Brain Circuitry and its Structural Basis. BCL Publications, 1967.

Yang, X. Auditory Representations of Acoustic Signals. IEEE Transactions on Information Theory, vol. 38, no. 2, March 1992.

Yates, F.H. The Art of Memory. London: Pimlico, 1992.

  Copyright © 1994 John Puterbaugh


John Puterbaugh Home