Time
Domain Analysis
In this section
I will discuss the time domain signal processing modules implemented in
the software system.
Noise
Content Analysis: Linear Prediction
I have used
linear prediction as the basis for extracting the degree of "noisiness"
of a signal. The motivation behind using the LPC method for musical signals
lies in its robust performance in modeling the voice. In the "LPC vocal
tract model", the resultant acoustical signal is represented via a noise
signal and a sequence of pulses passed through a resonant all-pole filter,
shaping the spectral envelope of the voice as shown in figure 2.12. In
essence, the linear prediction filter coefficients are used to predict
the current sample with a finite number of weighted past samples. Figure
2.13 shows examples of noise content analysis for the flute and electric
bass.
|
|
Figure 2.12 Vocal tract model
|
|
|
|
Figure 2.13 Noise content analysis
flute and electric bass
|
The linear
prediction model is simply defined as in equation 2.9. It assumes that
the current sample may be represented with past samples weighted "appropriately"
(Atal and Hanauer 1971).
(2.9)
The coefficients
in the difference equation are selected so that the error
between the current sample s[k] and the predicted sample form equation
2.9 is minimized as shown in equation 2.10 using the least square method.
(2.10)
The noise content
analysis algorithm is shown in figure 2.14. Before submitting the signal
to the short term prediction filter block, a pre-emphasis filter (equation
2.11) is used to flatten the spectrum for enhanced performance. The pre-emphasis
filter (high pass filter) coefficients range from 0.95 to 0.98. The residual
signal ds[n]ispassedthrough
a "spike damping filter" which damps spikes that are present in ds[n] and
ultimatelyrenders the noise content
of the signal (figure 2.15).
(2.11)
|
|
Figure 2.14 Noise content analysis
|
|
|
Figure 2.15 Noise content analysis
of electric bass
|
Pitch
Detection
The pitch
detection algorithm uses autocorrelation, natural cubic spline interpolation
and period averaging to accurately compute the pitch of the signal.
The range of operation is from approximately 26 Hz to 5000 Hz (A0 = 27.50
Hz, C8 = 4186 Hz). Figure 2.16 shows the basic procedure for computing
pitch. As seen in figure 2.17 the error for the period- averaging
method is smallest compared to the autocorrelation method without interpolation
and the FFT. The
period averaging method is discussed in section
2.3.2.4.
|
|
Figure 2.16 Pitch computation
|
|
|
Figure 2.17 Error plot: interp.,
interp. with period averaging and DFT
|
Autocorrelation
Autocorrelation
is a standard way of determining signal periodicity and is defined as:
(2.12)
The discrete
time equivalent is:
(2.13)
A typical autocorrelation
vector with increasing integer lag values is shown in figure 2.18. In the
current implementation zero crossings of the autocorrelation signal are
determined. More precisely, the location of a peak bounded by two zero
crossing pairs is considered a peak if it agrees with specific magnitude
threshold values.
|
|
Figure 2.18 Autocorrelation signal,
sine wave at 100 Hz
|
Comparing figure
2.18 and figure 2.19 it is clear that the time resolution for higher frequencies
in the autocorrelation vector decreases substantially, causing greater
error. In other words, the samples that are present between the autocorrelation
peaks (period of the signal) decrease with the increase in frequency (approximately
440 samples vs. 50 samples, figure 2.18 and figure 2.19). One way to improve
performance is to use interpolation.
|
|
Figure 2.19 Autocorrelation signal,
sine wave at 1010 Hz
|
Detection
of Periods
Periods in
the autocorrelation signal are detected through peaks that correspond
to the frequency of the audio signal (figure 2.20). Peaks are extracted
using two zero crossing pairs for each peak. These pairs define the range
where a peak that corresponds to the period,
could actually be found.
The fact that the autocorrelation vector's magnitude decreases with the
increase of its lag is exploited in determining if a peak is really a period
or just a local peak corresponding to strong harmonics. The first period
value is used as the basis to look for and compute consecutive peaks in
the autocorreation vector. Hence, an error margin dictated by the first
period found is used to guide in searching the remaining peaks. All
peaks that are considered periods are subjected to interpolation.
|
|
Figure 2.20 Peak detection through
zero crossing and interpolation
|
Natural
Cubic Spline Interpolation
The natural
cubic spline Interpolation method is used in pinpointing the "actual" peak
and hence its period, in the autocorrelation function. The basic idea behind
the natural cubic spline method is shown in figure 2.21. Each "curvature"
connecting the knots is represented by a cubic polynomial equation denoted
by
.
|
|
Figure 2.21 Natural cubic spline
|
This essentially
is a problem of solving each polynomial bounded by knots, for its
roots (Cheney and Kincaid 1994).
(2.14)
Period
Averaging
As mentioned
before, an increase in pitch decreases the period length. Since the number
of autocorrelation vectors decrease accordingly, interpolation can only
help so much. To get better performance both for low frequency and high
frequency pitch detection, I have developed a period averaging method
which simply uses M number of periods found in the autocorrelation
vector to compute the mean after interpolation. In a given frame, if we
find M peaks/periods, the respective period can be represented as:
(2.15)
The number
of autocorrelation peaks
may vary from frame to frame. The maximum number of periods for averaging
is controlled by variable M. The average period then becomes:
(2.16)
where
is
the maximum number of peaks found in frame f. As seen in figure
2.17, this decreases the error margin considerably.
|
|
Figure 2.22 Peak averaging (with
number of peaks)
|
Amplitude
Envelope
The amplitude
envelope describes the energy change of the signal in the time domain and
is generally equivalent to the so called ADSR (attack, decay, sustain,
release) of a musical sound.
|
|
Figure 2.23 Electric bass envelope
|
The envelope
of the signal is computed with a frame by frame RMS (root mean square)
and a low 3rd order Butterworth low pass filter.
RMS (equation
2.17) is related to the average power of a signal different from
the average or peak level. There is a fundamental difference
between the average, peak level and the signal's RMS value.
The average changes very little even if the signal consists of numerous
transients peaks. The
peak level on the other hand can vary greatly
in a small amount of time, but without much affecting the average
value. RMS is a more perceptually relevant measurement and has been shown
to correspond more closely to the way we hear loudness.
(2.17)
The frame by
frame RMS is used quite similar to the short time Fourier transform method.
The length of the RMS frame determines the time resolution of the envelope.
A large frame length yields lower transient information and small frame
length greater transient energy. The window length M (equation 2.18)
is an integer multiple of N, N being the total length of
the signal and p a positive integer.
(2.18)
|
|
Figure 2.24 Amplitude Envelope
|
The
window size is selectable in the software, a longer window resulting in
less transitional information and shorter window more transitional information.
The cutoff frequencies have been determined empirically at 350 Hz (fs =
8000), 1200 Hz (fs = 22050) and 1700 Hz (fs = 44100).
Amplitude
Modulation
Detecting
amplitude modulation is similar to the amplitude envelope detection algorithm
with a few steps added. Figure 2.25 shows a summary of amplitude modulation
analysis. The steady state portion of the signal is extracted and analyzed
for peaks which corresponds to the frequency of amplitude modulation. For
accurate location of peaks, the cubic spline interpolation method is again
used.
|
|
Figure 2.25 Amplitude modulation
analysis
|
Amplitude modulation
is frequently observed in musical instruments such as the violin, flute
and saxophone (figure 2.26). The frequency in Hertz is computed using the
following formula:
(2.19)
where fs
is the sampling rate, w the RMS frame length and T the period
in samples of the RMS signal.
|
|
Figure 2.26 Amplitude modulation
alto-saxophone
|
Attack
Time
Attack time
(Saldanha and Corso 1964; Elliot 1975) is an important feature of timbre.
It is defined as the time it takes to reach the maximum amplitude of a
signal from a minimum threshold magnitude (McAdams, 1999). The minimum
threshold value is necessary as it acts as a gating function, only starting
measurement of the attack time when this threshold level is exceeded. Although
the attack portion embodies a great deal of transitional information of
the signal leading to a steady state, it is difficult to say where the
attack portion ends and where the steady state begins. As a matter of fact,
it is even difficult to say how much information the attack portion
actually represents and no concrete measurement technique has been published
to date.
(2.20)
However, this
attribute of timbre has been indirectly applied successfully in wavetable
synthesis. The basic idea is to take an auditory snapshot of the signal
- the attack and first few milliseconds of the steady state portion, then
loop the steady state portion. Hence, this gives the listener the illusion
that the whole signal is being played back, although only a fractional
length of the signal has actually been used to render such an illusion.
Today's popular music genres and "electronic" jazz music are very much
dominated by this technology. However, many contemporary composers (Appleton
1991) also have used this technology mainly via alternative MIDI controllers
such as the Radio Baton.
Table
of Contents
next
previous