Time Domain Analysis
In this section I will discuss the time domain signal processing modules implemented in the software system.

 
 
 
 
 
 
        Noise Content Analysis: Linear Prediction
I have used linear prediction as the basis for extracting the degree of "noisiness" of a signal. The motivation behind using the LPC method for musical signals lies in its robust performance in modeling the voice. In the "LPC vocal tract model", the resultant acoustical signal is represented via a noise signal and a sequence of pulses passed through a resonant all-pole filter, shaping the spectral envelope of the voice as shown in figure 2.12. In essence, the linear prediction filter coefficients are used to predict the current sample with a finite number of weighted past samples. Figure 2.13 shows examples of noise content analysis for the flute and electric bass.

 
 
 
 
 
 
 
 
 
 
 
 
Figure 2.12 Vocal tract model
Figure 2.13 Noise content analysis flute and electric bass

The linear prediction model is simply defined as in equation 2.9. It assumes that the current sample may be represented with past samples weighted "appropriately" (Atal and Hanauer 1971).

(2.9)

The coefficients in the difference equation are selected so that the error  between the current sample s[k] and the predicted sample form equation 2.9 is minimized as shown in equation 2.10 using the least square method.

(2.10)

The noise content analysis algorithm is shown in figure 2.14. Before submitting the signal to the short term prediction filter block, a pre-emphasis filter (equation 2.11) is used to flatten the spectrum for enhanced performance. The pre-emphasis filter (high pass filter) coefficients range from 0.95 to 0.98. The residual signal ds[n]ispassedthrough a "spike damping filter" which damps spikes that are present in ds[n] and ultimatelyrenders the noise content of the signal (figure 2.15). 

(2.11)



 
 
Figure 2.14 Noise content analysis
Figure 2.15 Noise content analysis of electric bass

 

        Pitch Detection
The pitch detection algorithm uses autocorrelation, natural cubic spline interpolation and period averaging to accurately compute the pitch of the signal. The range of operation is from approximately 26 Hz to 5000 Hz (A0 = 27.50 Hz, C8 = 4186 Hz). Figure 2.16 shows the basic procedure for computing pitch. As seen in figure 2.17 the error for the period- averaging method is smallest compared to the autocorrelation method without interpolation and the FFT. The period averaging method is discussed in section 2.3.2.4.

 
 
 
 
 
Figure 2.16 Pitch computation
Figure 2.17 Error plot: interp., interp. with period averaging and DFT
           
          Autocorrelation
Autocorrelation is a standard way of determining signal periodicity and is defined as:
(2.12)

The discrete time equivalent is:

(2.13)

A typical autocorrelation vector with increasing integer lag values is shown in figure 2.18. In the current implementation zero crossings of the autocorrelation signal are determined. More precisely, the location of a peak bounded by two zero crossing pairs is considered a peak if it agrees with specific magnitude threshold values.
 
 
Figure 2.18 Autocorrelation signal, sine wave at 100 Hz

Comparing figure 2.18 and figure 2.19 it is clear that the time resolution for higher frequencies in the autocorrelation vector decreases substantially, causing greater error. In other words, the samples that are present between the autocorrelation peaks (period of the signal) decrease with the increase in frequency (approximately 440 samples vs. 50 samples, figure 2.18 and figure 2.19). One way to improve performance is to use interpolation.
 
 

Figure 2.19 Autocorrelation signal, sine wave at 1010 Hz
          Detection of Periods
Periods in the autocorrelation signal are detected through peaks that correspond to the frequency of the audio signal (figure 2.20). Peaks are extracted using two zero crossing pairs for each peak. These pairs define the range where a peak that corresponds to the period, could actually be found. The fact that the autocorrelation vector's magnitude decreases with the increase of its lag is exploited in determining if a peak is really a period or just a local peak corresponding to strong harmonics. The first period value is used as the basis to look for and compute consecutive peaks in the autocorreation vector. Hence, an error margin dictated by the first period found is used to guide in searching the remaining peaks. All peaks that are considered periods are subjected to interpolation.
Figure 2.20 Peak detection through zero crossing and interpolation
          Natural Cubic Spline Interpolation
The natural cubic spline Interpolation method is used in pinpointing the "actual" peak and hence its period, in the autocorrelation function. The basic idea behind the natural cubic spline method is shown in figure 2.21. Each "curvature" connecting the knots is represented by a cubic polynomial equation denoted by .
Figure 2.21 Natural cubic spline

This essentially is a problem of solving each polynomial bounded by knots, for its roots (Cheney and Kincaid 1994).

(2.14)

          Period Averaging
As mentioned before, an increase in pitch decreases the period length. Since the number of autocorrelation vectors decrease accordingly, interpolation can only help so much. To get better performance both for low frequency and high frequency pitch detection, I have developed a period averaging method which simply uses M number of periods found in the autocorrelation vector to compute the mean after interpolation. In a given frame, if we find M peaks/periods, the respective period can be represented as:
(2.15)

The number of autocorrelation peaks may vary from frame to frame. The maximum number of periods for averaging is controlled by variable M. The average period then becomes:

(2.16)

where is the maximum number of peaks found in frame f. As seen in figure 2.17, this decreases the error margin considerably.
 
Figure 2.22 Peak averaging (with number of peaks)

        Amplitude Envelope
The amplitude envelope describes the energy change of the signal in the time domain and is generally equivalent to the so called ADSR (attack, decay, sustain, release) of a musical sound.

 
Figure 2.23 Electric bass envelope

The envelope of the signal is computed with a frame by frame RMS (root mean square) and a low 3rd order Butterworth low pass filter.

RMS (equation 2.17) is related to the average power of a signal different from the average or peak level. There is a fundamental difference between the average, peak level and the signal's RMS value. The average changes very little even if the signal consists of numerous transients peaks. The peak level on the other hand can vary greatly in a small amount of time, but without much affecting the average value. RMS is a more perceptually relevant measurement and has been shown to correspond more closely to the way we hear loudness.

(2.17)

The frame by frame RMS is used quite similar to the short time Fourier transform method. The length of the RMS frame determines the time resolution of the envelope. A large frame length yields lower transient information and small frame length greater transient energy. The window length M (equation 2.18) is an integer multiple of N, N being the total length of the signal and p a positive integer.

(2.18)

Figure 2.24 Amplitude Envelope

The window size is selectable in the software, a longer window resulting in less transitional information and shorter window more transitional information. The cutoff frequencies have been determined empirically at 350 Hz (fs = 8000), 1200 Hz (fs = 22050) and 1700 Hz (fs = 44100).

        Amplitude Modulation
Detecting amplitude modulation is similar to the amplitude envelope detection algorithm with a few steps added. Figure 2.25 shows a summary of amplitude modulation analysis. The steady state portion of the signal is extracted and analyzed for peaks which corresponds to the frequency of amplitude modulation. For accurate location of peaks, the cubic spline interpolation method is again used.
Figure 2.25 Amplitude modulation analysis

Amplitude modulation is frequently observed in musical instruments such as the violin, flute and saxophone (figure 2.26). The frequency in Hertz is computed using the following formula:

(2.19)

where fs is the sampling rate, w the RMS frame length and T the period in samples of the RMS signal.
 
Figure 2.26 Amplitude modulation alto-saxophone

        Attack Time
Attack time (Saldanha and Corso 1964; Elliot 1975) is an important feature of timbre. It is defined as the time it takes to reach the maximum amplitude of a signal from a minimum threshold magnitude (McAdams, 1999). The minimum threshold value is necessary as it acts as a gating function, only starting measurement of the attack time when this threshold level is exceeded. Although the attack portion embodies a great deal of transitional information of the signal leading to a steady state, it is difficult to say where the attack portion ends and where the steady state begins. As a matter of fact, it is even difficult to say how much information the attack portion actually represents and no concrete measurement technique has been published to date.
(2.20)

However, this attribute of timbre has been indirectly applied successfully in wavetable synthesis. The basic idea is to take an auditory snapshot of the signal - the attack and first few milliseconds of the steady state portion, then loop the steady state portion. Hence, this gives the listener the illusion that the whole signal is being played back, although only a fractional length of the signal has actually been used to render such an illusion. Today's popular music genres and "electronic" jazz music are very much dominated by this technology. However, many contemporary composers (Appleton 1991) also have used this technology mainly via alternative MIDI controllers such as the Radio Baton.

Table of Contents
next
previous