Accelerated 3D Imaging Using Compressed Sensing

Three-dimensional (3D) imaging of the upper airway during sustained sound production has recently emerged as a promising tool in speech production research as a means to capture the full geometry of the vocal tract. The diversity of tongue shapes and dynamics are made possible, at least in part, through direrent lingua-palatal bracing  leading to complex airway geometries, the understanding of which is critical for investigations into the production of both normal and disordered speech. In addition to helping shed light on the intricate airway shaping mechanisms underlying the production of various linguistically-meaningful speech sounds, 3D imaging also lends itself to providing quantitative volumetric information of the airway regions.The shaping of the tongue and other articulators, and the temporal characteristics of their shaping, give rise to characteristic patterns of acoustic resonance behavior of thevocal tract that dene the properties of human speech that can be modeled with such quantitative information Recent work has shown that three-dimensional tongue shape and the dynamics underlying shape formation are critical to understanding natural linguistic classes and issues of phonological representation as evidenced in speech motor control. Previous models of speech production often assumed that the position of maximum constriction, dened inthe midsagittal plane, was the main \place of articulation” parameter. Imaging studies such as those by Narayanan et al have suggested that articulation cannot be characterized solely by identifying a constriction position and that speech production targets go beyond the midsagittal plane. Initial speech studies using MRI focused on vowel sounds. The models of the vocal tract constructed from the MR images of direrent vowels yielded good estimations of vowel formant frequencies and formant patterns, which agreed with the general acoustic implication of the notion of the tongue height and backness on vowel articulation. For example, the study by Narayanan et al that focused on tongue shaping and 3D vocal tract data and models for the American English vowels /a/, /i/, /u/ showed distinct direrences in tongue shaping: the anterior tongue was raised and convex for /i/ compared to the lowered concave shape for /a/ while the tongue back showed an opposite trend in the degree of concavity. These data were used in a nite element based simulation of the vocal tract models to study the acoustic properties of the vowel sounds. Other studies have investigated a variety of continuant consonant sounds such as fricatives and liquids. Narayanan et al examined vocal tract shaping of consonants using MRI and other articulatory measurements, and have presented data and results on three dimensional vocal tract and tongue shapes for fricative sounds produced by talkers of American English. These data showed key direrences in tongue shaping between the sibilants /s/ (concave, grooved) and /S/ (convex, cupped) and were helpful in deriving meaningful acoustic source models for these sounds .Using in- sights gained in imaging work, in conjunction with the quantitative data of vocal tract area functions and sublingual cavity of Alwan et al., EspyWilson et al. create acoustic models for the American-English /r/ delineating clearly the role of the oral and pharyngeal constrictions and the sublingual volume. Similar advances have been made toward understanding the acoustics of lateral sounds While these studies represent signicant progress in speech research, they can be further improved by addressing certain technological limitations.These previous MRI studies were based on 2D multi-slice acquisitions, requiring multiple repetitions of the same sound and scan-time on the order of several minutes . These procedures are prone to data inconsistency, resulting from slightly direrent positions of the jaw, head, and tongue during each repetition. Compared to 2D multi-slice, it is well known that 3D encoding provides contiguous coverage with the potential for thinner slices and improved signal-to-noise ratio (SNR) eficiency. However, 3D encoding with high spatial resolution currently requires prohibitively long scan time and easily exceeds the normal duration of sustained sound production with minimal subject motion

Image Reconstruction

Since all data sets were fully sampled along the readout (kx) direction, data were -rst inverse-Fourier transformed along the readout direction, and image reconstruction was performed separately for each y ¡ z planar section. For each x position, fully sampled data sets were reconstructed using 2D inverse Fourier transform (IFT). For the simulated and real undersampled acquisitions, un-acquired k-space locations were called with zeros prior to inverse Fourier transformation. For PC-CS, the phase map was calculated in two ways: (PC-I) Taking a 2D inverse Fourier transform of fully sampled low spatial frequency data. In order to remove Gibbs


Figure1: k-space sampling patterns used in the experimental studies. Relative reduc- tion factors are (a) 1, (b) 1.3, (c) 3, (d) 4, and (e) 5. Note that the region inside the ellipse with a radii 30% of the overall k-space was fully sampled in all cases for the estimation of low-resolution image phase.

ringing artifacts due to k-space truncation, the low spatial frequency data set was multiplied by a 2D Hanning window. (PC-II) Taking the phase of the complex-valued image estimate obtained from a non-PC CS iterative reconstruction. To avoid noise contamination, the PC-II phase map was masked to contain only spatial locations where the magnitude image was greater than 20 % of its maximum value.