- Citation Author(s):
- Submitted by:
- Jianjun HE
- Last updated:
- 23 February 2016 - 1:43pm
- Document Type:
- Presentation Slides
In spatial audio analysis-synthesis, one of the key issues is to decompose a signal into primary and ambient components based on their spatial features. Stereo audio signals are often modeled as a linear mixture of primary and ambient components. Existing approaches like principal component analysis (PCA) and least squares (LS) have been widely employed to extract primary and ambient components from stereo signals. However, the performance and comparisons of these approaches in primary-ambient extraction (PAE) have not been well studied. In this report, we show that existing approaches can be generalized into a linear estimation framework. Under this framework, we propose a systematic series of performance measures identifying the components comprising the extraction error. Based on the linear estimation framework and the performance measures, we present a comparative study of the linear estimation based PAE approaches including existing PCA, LS, and two proposed variant LS approaches for more practical objectives in their performance. Experimental results are provided to justify the relationships and differences of these approaches.
However, the performance of PCA based primary ambient extraction (PAE) is highly dependent on the assumptions of the input signal model, where the primary components in the stereo signal are assumed to be completely correlated at zero lag. One of the most frequently encountered cases where the primary component is partially correlated, namely the primary-complex case, is not well-studied. To alleviate the performance degradation in this case, the time-shifted PCA based PAE is proposed in this work. This approach involves time-shifting the input signal according to the estimated inter-channel time difference (ICTD) of the input signal prior to the linear estimation in PAE. Based on the results from our simulation and informal listening tests, the shifted PAE approach is found to be superior to the conventional PCA based PAE methods.
From our study, we find that the existing approaches are still unable to handle more complicated cases of the input signals. For example, the stereo signal model requires the primary components to come from one direction and the primary and ambient components are only characterized by their inter-channel correlations. These remaining problems stimulate our further study on formulating more realistic signal models as well as the classification of the input signal into a specific model. With the proper extraction of primary and ambient components as well as the appropriate post-processing techniques, a more immersive 3D audio experience can be achieved.
This report was submitted to Nanyang Technological University for the partial fulfillment of the qualification examination of the PhD candidature.