In the recent years, Independent Component Analysis has become a fundamental tool in signal and data processing, especially in the field of Blind Source Separation (BSS); under mild conditions, independent source signals can be recovered from mixtures of them by maximizing a so-called contrast function. Neither the mixing system nor the original sources are needed for that purpose, justifying the "blind" term. Among the existing BSS methods is the class of approaches maximizing Information-Theoretic Criteria (ITC), that rely on Rényi's entropies, including the well-known Shannon and Hartley entropies. These ITC are maximized via adaptive optimization schemes. Two major issues in this field are the following: i) Are ITC really contrast functions? and ii) As most of the algorithms look in fact for a local maximum point, what about the relevance of these local optima from the BSS point of view? Even though there are some partial answers to these questions in the literature, most of them are based on simulations and conjectures; formal developments are often lacking. This thesis aims at filling this lack as well as providing intuitive justifications, too. The BSS problem is stated in Chapter 1, and viewed under the information theory angle. The two next chapters address specifically the above questions: Chapter 2 discusses the contrast function property of ITC while the possible existence of spurious local maximum points in ITC is the purpose of Chapter 3. Finally, Chapter 4 deals with a range-based criterion, the only “entropy-based” contrast function which is discriminant, i.e. free from spurious local maxima. The interest of this approach is confirmed by testing the proposed technique on various examples, including the MLSP 2006 data analysis competition benchmark; our method outperforms the previously obtained results on large-scale and noisy mixture samples obtained through ill-conditioned mixing matrices.
Abstract xiii
Acknowledgments xv
Acronyms xvii
List of Notation xix
Introduction xxiii
1 BSS and its relationship to ICA 1
1.1 BSS: Motivation 2
1.2 ICA: an ecient tool for BSS 7
1.2.1 PD-equivalency and Non-mixing matrices 7
1.2.2 Independence and ICA 11
1.2.3 ICA and BSS 12
1.3 Independence measures 14
1.3.1 Divergence measures between densities 14
1.3.1.1 KL properties 15
1.3.1.2 From KL to mutual information 16
1.3.2 Other measures of independence 17
1.4 Extraction schemes and contrast function de nition 18
1.4.1 Extraction schemes 19
1.4.1.1 Simultaneous separation 19
1.4.1.2 Deation separation 19
1.4.1.3 Partial separation 19
1.4.2 Contrast functions 20
1.4.2.1 Simultaneous separation 20
1.4.2.2 Deation separation 20
1.4.2.3 Partial separation 21
1.5 Whitening preprocessing and geodesic search 22
1.5.1 Whitening 22
viii CONTENTS
1.5.2 Orthogonal contrast functions 24
1.5.3 Angular parametrization in the K=2 case 25
1.5.4 Manifold-constrained problem and geodesic optimization 25
1.6 Adaptive maximization of contrast functions 27
1.7 BSS and information measures 29
1.7.1 Information measure 30
1.7.1.1 Coding using Hartley's formula 30
1.7.1.2 Information and entropy 32
1.7.1.3 Extension to continuous random variables 33
1.7.1.4 Information gain and Mutual information 34
1.7.2 Entropy as a \complexity measure" 37
1.7.3 Generalized information measures 40
1.8 Issues and objectives of the Thesis 42
2 Contrast property of Entropic criteria 45
2.1 Some tools for building contrast functions 47
2.1.1 From orthogonal deation to orthogonal partial separation 47
2.1.2 Huber's superadditivity 49
2.2 Shannon's entropy-based contrast 51
2.2.1 Simultaneous approach 51
2.2.2 Deation approach 53
2.2.2.1 The contrast property 53
2.2.2.2 Non-mixing local maxima 57
2.2.3 Partial approach 58
2.3 Minimum range contrast 59
2.3.1 Support and Brunn-Minkowski Inequality 59
2.3.2 Properties of the range 60
2.3.3 Simultaneous approach 61
2.3.4 Deation approach 62
2.3.5 Partial approach 64
2.3.6 Support versus Range 65
2.3.7 A tool for building a D-BSS contrast based on Huber 65
2.4 Renyi's entropy contrast 66
2.4.1 Taylor development of Renyi's entropy 67
2.4.2 Deation approach 70
2.4.3 Simultaneous approach 71
2.4.4 Partial approach 72
2.4.5 Some counter-examples 72
2.4.5 Some counter-examples 72
2.5 Conclusion of the chapter 78
2.5.1 Summary of results 78
2.5.2 Use of Renyi entropies in blind separation/deconvolution 79
2.6 Appendix: Proofs of results of the Chapter 84
2.6.1 Proof of Corollary 6 (wording p. 50) 84
2.6.2 Proof of Property 4 (wording p. 57) 84
2.6.3 Proof of Theorem 11 (wording p. 58) 86
2.6.4 Proof of Lemma 6 (wording p. 60) 87
2.6.5 Proof of Theorem 14 (wording p. 62) 89
2.6.6 Proof of Theorem 15 (wording p. 63) 90
2.6.7 Proof of Theorem 16 (wording p. 64) 91
2.6.8 Convolution of Gaussian kernels (wording p. 67) 92
2.6.9 Proof of Lemma 7 (wording p. 71) 93
2.6.10 Proof of Lemma 8 (wording p. 71) 94
2.6.11 Proof of Lemma 9 (wording p. 72) 95
2.6.12 Proof of Lemma 10 (wording p. 72) 95
3 Discriminacy property of Entropic contrasts 97
3.1 Concept de nition and justi cation 99
3.2 Discriminacy of Shannon's entropy 100
3.2.1 Informal approach : the multimodal case 102
3.2.1.1 Location of the entropy minima 102
3.2.1.2 Modality and entropy minima 107
3.2.2 Formal analysis using a Taylor expansion 112
3.2.2.1 Simultaneous (mutual information) 112
3.2.2.2 Deation (negentropy) 119
3.2.3 Formal analysis using entropy approximation 120
3.2.3.1 Entropy bounds on multimodal pdf 121
3.2.3.2 Mixing local minima in multimodal BSS 125
3.2.3.3 Local minimum points of H(wU) 126
3.2.3.4 Local minimum points of h(wS) 129
3.2.3.5 Complementary observations 133
3.2.4 Cumulant-based versus Information-theoretic approaches 133
3.3 Discriminacy of Renyi's entropy 138
3.4 Discriminacy of the minimum range approach 139
3.4.1 Preliminaries : K = 2 case 139
3.4.2 Deation approach 140
3.4.2.1 Small variation analysis on manifolds 140
3.4.2.2 Piecewise g-convexity on hyper-sphere 143
3.4.2.3 Second order approach 149
3.4.3 Simultaneous approach 150
3.4.4 Partial approach 152
3.4.5 Givens trajectories and discriminacy 155
3.5 Discriminacy of the minimum support approach 160
3.6 Summary of results and contrast sets con guration 163
3.7 Conclusion of the chapter 164
3.7.1 Summary of results 164
3.7.2 Comparison with existing results 166
3.8 Appendix: Proofs of results of the Chapter 167
3.8.1 Proof of relation (3.14) (wording p. 114) 167
3.8.2 Proof of Lemma 11 (wording p. 116) 169
3.8.3 Proof of Lemma 12 (wording p. 116) 170
3.8.4 Proof of Lemma 13 (wording p. 121) 171
3.8.5 Proof of Lemma 14 (wording p. 126) 173
3.8.6 Proof of Lemma 15 (wording p. 130) 174
3.8.7 Proof of Lemma 16 (wording p. 140) 177
3.8.8 Proof of Theorem 18 (wording p. 141) 178
3.8.9 Proof of Lemma 17 (wording p. 149) 179
3.8.10 Proof of Lemma 19 (wording p. 151) 180
3.8.11 Proof of Corollary 15 (wording p. 152) 181
3.8.12 Proof of Lemma 20 (wording p. 152) 181
3.8.13 Proof of Lemma 21 (wording p. 152) 181
3.8.14 Proof of Lemma 23 (wording p. 154) 185
3.8.15 Proof of Corollary 18 (wording p. 155) 186
4 Minimum output range methods 187
4.1 Minimum range approach 188
4.1.1 Interpretation of the simultaneous approach 189
4.1.1.1 Interpretation in the mixture space 189
4.1.1.2 Interpretation in the output space 190
4.1.2 Interpretation of the deation approach 191
4.2 Range estimation 198
4.2.1 Some existing methods for endpoint estimation 198
4.2.2 Existing Range estimation 200
4.2.2.1 Support estimation via density estimation 202
4.2.2.2 Range estimation for BSS application 203
4.2.3 Quasi-range based approach 203
4.2.3.1 The observed range estimator 204
4.2.3.2 The m-averaged quasi-range estimator 208
4.2.3.3 Robustness of minimum-support ICA algorithms 213
4.3 Range minimization algorithm: SWICA 220
4.3.1 Algorithm 222
4.3.2 Performance analysis of SWICA for OS-based range estimators 224
4.4 Extensions of the minimum range 226
4.4.1 The problem of blind images separation : NOSWICA 227
4.4.1.1 SWICA for correlated images separation 227
4.4.1.2 NOSWICA: a non-orthogonal extension 232
4.4.2 Application to lower- or upper-bounded sources with possible in nite range 238
4.4.2.1 LABICA 238
4.4.2.2 Practical estimation 241
4.4.2.3 Optimization scheme for \hard" ICA problems 242
4.4.2.4 LABICA applied to MLSP'06 benchmark 246
4.5 Conclusion of the Chapter 250
4.6 Appendix 250
4.6.1 Proof of relation (4.52) 250
4.6.2 Expectation of the order statistics cdf di erences 251
4.6.3 Variance of the order statistics cdf di erences 253
Conclusion 255
Appendix A: Announcement of the IEEE MLSP 2006 data analysis competition 275
Appendix B: Author's publication list 279