Saturday, November 27, 2021

Phd thesis dissertation mit

Phd thesis dissertation mit

phd thesis dissertation mit

PhD Dissertation Requirements. The Department’s long-standing emphasis on original research is a key element in the Candidate’s educational development. The thesis defense has two stages: i) a final Thesis Committee Meeting report, and ii) a defense. The final Thesis Committee Meeting report involves only the student and the Thesis Committee Figure Anatomy of the ear. The middle ear is essentially a transducer that converts air oscillations in the outer ear (on the left) into fluid oscillations in the inner ear (on the right). It is depicted with greater details in the bottom drawing. The vestibular cochlear nerves connect the cochlea with the auditory processing system of the brain.. Image from [4 This dissertation argues that an application should be able to choose how its storage system handles the tradeoffs inherent in wide-area data management. The main contribution of this dis-sertationisthedesign,implementation,andevaluationofadistributedfilesystemcalledWheelFS, whichgivesapplications directcontroloverthesetradeoffs



Tristan Jehan PhD Thesis - Chapter 3



Music listening [ 68 ] is concerned with the understanding of how humans perceive music. As modelers, our goal is to implement algorithms known as machine listeningcapable of mimicking this process.


There are three major machine-listening approaches: the physiological approach, which attempts to model the neurophysical mechanisms of the hearing system; the psychoacoustic approach, rather interested in modeling the effect of the physiology on perception; and the statistical approach, which models mathematically the reaction of a sound input to specific outputs. For practical reasons, this chapter presents a psychoacoustic approach to music listening.


The torso, head, and outer ear filter the sound field mostly below Hz through shadowing and reflection. The outer ear canal is about 2-cm long, which corresponds to a quarter of the wavelength of frequencies near Hz, and emphasizes the ear sensitivity to those frequencies.


The middle ear is a transducer that converts oscillations in the air into oscillations in the inner earwhich contains fluids. To avoid large losses of energy through reflection, impedance matching is achieved by a mechanical lever system—eardrum, phd thesis dissertation mit, incus, stapes, and oval window, as in Figure —that reaches an phd thesis dissertation mit perfect match around Hz.


Along the basilar membrane, there are roughly inner hair cells arranged in a regular geometric pattern. The entire flow of information runs from the inner ear through approximately 30, afferent nerve fibers to reach the midbrain, thalamus, and finally the temporal lobe of the cerebral cortex where is is phd thesis dissertation mit perceived as sound.


The nature of the central auditory processing is, however, still very much unclear, which mainly motivates the following psychophysical approach [ ]. Figure Anatomy of the ear. The middle ear is essentially a transducer that converts air oscillations in the outer ear on the left into fluid oscillations in the inner ear on the right.


It is depicted with greater details in the bottom drawing. The vestibular cochlear nerves connect the cochlea with the auditory processing system of the brain, phd thesis dissertation mit. Image from [ 44 ], phd thesis dissertation mit. It connects the physical world of sound vibrations in the air to the perceptual world of things we actually hear when we listen to sounds. It is not directly concerned with the physiology of the hearing system as discussed earlier, phd thesis dissertation mit, but rather with its effect on listening perception.


This is found to be the most practical and robust approach to an application-driven work. This chapter is about modeling our perception of music through psychoacoustics. Our model is causalmeaning that it does not require knowledge about the future, and can be implemented both in real time, and faster than real time.


A good review of reasons that motivate and inspire this approach can also be found in [ ]. Let us begin with a monophonic audio signal of arbitrary length and sound quality. Since we are only concerned with the human appreciation of music, the signal may have been formerly compressed, filtered, or resampled. The music can be of any kind: we have tested our system with excerpts taken from jazz, classical, funk, electronic, rock, pop, folk and traditional music, as well as speech, environmental sounds, phd thesis dissertation mit, and drum loops.


We seek to remove the information that is the least critical to our hearing sensation while retaining the most important parts, therefore reducing signal complexity without perceptual loss, phd thesis dissertation mit. The MPEG1 audio layer 3 MP3 codec [ 18 ] is a good example of an application that exploits this principle for compression purposes.


Our primary interest here is understanding our perception of the signal rather than resynthesizing it, therefore the reduction process is sometimes simplified, but also extended and fully parametric in comparison with usual perceptual audio coders. We experimented with many window types and sizes, which did not have a significant impact on the final phd thesis dissertation mit. However, since we are mostly concerned with timing accuracy, we favor short phd thesis dissertation mit e.


The Fast Fourier Transform FFT is zero-padded up to 46 ms to gain additional interpolated frequency bins. We calculate the power spectrum and scale its amplitude axis to decibels dB SPL, a measure of sound pressure level as in the following equation: 3.


The threshold of hearing is in fact frequency-dependent and is a consequence of the outer and middle ear response. A transfer function was proposed by Terhardt in [ ]and is defined in decibels as follows: 3. Figure Transfer function of the outer and middle ear in decibels, phd thesis dissertation mit a function of logarithmic frequency. Note the ear sensitivity between 2 and 5 KHz. The oscillation of the oval window takes the form of phd thesis dissertation mit traveling wave which moves along the basilar membrane.


The mechanical properties of the cochlea wide and stiff at the base, narrower and much less stiff at the tip act as a cochlear filterbank : a roughly logarithmic decrease in bandwidth i. Figure Different scales shown in relation to the unwound cochlea. Mel in particular is a logarithmic scale of frequency based on human pitch perception. Note that all of them are on a linear scale except for frequency. Tip is shown on the left and base on the right.


A Bark unit was defined and led to the so-called critical-band rate scale. The spectrum frequency f is warped to the Bark scale z f as in equation 3. An Equivalent Rectangular Bandwidth ERB scale was later introduced by Moore and is shown in comparison with the Bark scale in figure [ ].


The rule-of-thumb Bark-scale approximation is also plotted Figure adapted from [ ]. The effect of warping the power spectrum to the Bark scale is shown in Figure for white noise, and for a pure tone sweeping linearly from 20 to 20K Hz. Note the non-linear auditory distortion of the frequency vertical axis. Figure Frequency warping onto a Bark scale for [top] white noise; [bottom] a pure tone sweeping linearly from 20 to 20K Hz. Masking in the frequency domain not only occurs within critical bands, phd thesis dissertation mit, but also spreads to neighboring bands.


A more refined model is highly non-linear and depends on both frequency and amplitude. Masking is the most powerful characteristic of modern lossy coders: more details can be found in [ 17 ]. A non-linear spreading function as found in [ ] and modified by Lincoln in [ ] is: 3. Integrating spreading functions in the case of complex tones is not very well understood. To simplify, we compute the full spectral mask through series of individual partials.


Figure [right] Spectral masking curves in the Bark scale as in reference [ ]and its approximation dashed-green. The two Phd thesis dissertation mit spectrograms are zoomed around the frequency range of interest. The top one is raw. The bottom one includes frequency masking curves. In zone A, the two sinusoids are equally loud.


In zone B and C, the amplitude of the tone at Hz is decreased exponentially. Note that in zone C1 the tone at Hz is clearly visible, while in zone C2, it entirely disappears under the masker, which makes it inaudible. As illustrated in Figurethere are two types of temporal masking besides simultaneous masking: pre-masking and post-masking, phd thesis dissertation mit.


Pre-masking is quite unexpected and not yet conclusively researched, phd thesis dissertation mit, but studies with noise bursts revealed that it lasts for about 20 ms [ ].


Within that period, sounds softer than the masker are typically not audible, phd thesis dissertation mit. We do not implement it since signal-windowing artifacts have a similar smoothing effect. We convolve phd thesis dissertation mit envelope of each frequency band with a ms half-Hanning raised cosine window.


This stage induces smoothing of the spectrogram, phd thesis dissertation mit, while preserving attacks. The effect of temporal masking is depicted in Figure for various sounds, phd thesis dissertation mit, phd thesis dissertation mit with their loudness curve more on loudness in section 3. Figure Schematic drawing of temporal masking, including pre-masking, simultaneous masking, and post-masking.


Note that post-masking uses a different time origin. Figure Bark spectrogram of four sounds with temporal masking: a digital click, a clave, a snare drum, and a staccato violin sound. Note the ms smoothing effect in the loudness curve. The temporal masking effects have important implications on the perception of rhythm. Figure depicts the relationship between subjective and physical duration of sound events.


The physical duration of the notes gives an incorrect estimation of the rhythm in greenwhile if processed through a psychoacoustic model, the rhythm estimation is correct in blueand corresponds to what the performer and audience actually hear. Figure Importance of subjective duration for the estimation of rhythm. A rhythmic pattern performed by a musician see staff results in a subjective sensation blue much different from the physical reality green —the physical duration of the audio signal.


A temporal model is implemented for accurate duration analysis and correct estimation of rhythm. Its outcome is what we call the audio surface. Note that we do not understand music yet, but only sound. Figure displays the audio surface of white noise, a sweeping pure tone, four distinct sounds, and a real-world musical excerpt. It is derived easily from our auditory spectrogram by adding the amplitudes across all frequency bands: 3. Advanced models of loudness by Moore and Glasberg can be found in [ ] [ 57 ].


An example is phd thesis dissertation mit in Figure In music, timbre is the quality of a musical note that distinguishes musical instruments. It was shown by Grey [ 66 ] and Wessel [ ] that important timbre characteristics of the orchestral sounds are attack quality temporal envelopespectral flux evolution of the spectral distribution over timeand brightness spectral centroid.


Those can be organized in various categories including temporal descriptors computed from the waveform and its envelope, energy descriptors referring to various energy measurements of the signal, spectral descriptors computed from the STFT, harmonic descriptors computed from the sinusoidal harmonic modeling of the signal, and perceptual descriptors computed using a model of the human hearing process [ ] [ ] [ ].


The measurement of loudness through critical band reduction is fairly reasonable, phd thesis dissertation mit, and computationally much more efficient. The next step typically consists of finding the combination of those LLDs, phd thesis dissertation mit, which hopefully best matches the perceptive target [ ]. An original approach by Pachet and Zils substitutes the basic LLDs by primitive operators. Psychoacousticians tell us that the critical band can be thought of as a frequency-selective channel of psychoacoustic processing.


For humans, only 25 critical bands cover the full spectrum via the Bark scale. These can be regarded as a reasonable and perceptually grounded description of the instantaneous timbral envelope.


An example of that spectral reduction is given in Figure for a rich polyphonic musical excerpt. This section only refers to the most atomic level of segmentation, that is the smallest rhythmic events possibly found in music: individual notes, chords, drum sounds, etc.


Organized in time, a sequence of sound segments infers our perception of music. Since we are not concerned with sound source separation, a segment may represent a rich and complex polyphonic sound, usually short. Other kinds of segmentations e.




Q\u0026A Session following PhD Defense - Steven Keating - MIT

, time: 14:43





Dissertations – MIT Philosophy


phd thesis dissertation mit

PhD and ScD theses are also listed (title, author, and abstract) in ProQuest Dissertations & Theses Global. For each thesis received by the Libraries, a digital version is created and made publicly available in DSpace@MIT. Students may choose to submit a PDF of the thesis via the Libraries voluntary submission portal. Submitting a PDF, in addition to the physical copies, preserves color content, text Dissertations + Theses. These lists are undergraduate and graduate alumni who produced either a dissertation or a thesis within or related to the Building Technology discipline. The thesis supervisor or committee chair is listed in parenthesis after each document title. (Current Building Technology students are in the people section.) Dissertations + Theses. These lists are graduate alumni who produced either a dissertation or a thesis within or related to the HTC discipline or are AKPIA SMArchS students. The thesis supervisor or committee chair is listed in parenthesis after each document title. (Current HTC and AKPIA students are in the people section.)

No comments:

Post a Comment