# Schedule of the Workshop on Low Complexity Models

## Monday, February 15

10:15 - 10:45 |
Registration & Welcome coffee |

10:45 - 10:55 |
Opening remarks |

10:55 - 11:40 |
Rémi Gribonval: Projections, Learning, and Sparsity for Efficient Data Processing |

11:45 - 12:30 |
Jean-Luc Starck: Low Complexity Models to Map the Dark Matter Mass of the Universe |

12:35 - 14:20 |
Lunch break |

14:20 - 15:05 |
John Wright: Nonconvex Recovery of Low-Complexity Models |

15:10 - 15:55 |
Laurent Jacques / Valerio Cambareri: Small width, low distortions: quantized random projections of low-complexity signal sets |

16:00 - 16:30 |
Tea and cake |

16:30 - 17:15 |
Xiadong Li: Phase Retrieval: from Convex to Nonconvex Methods |

17:20 - 18:05 |
Ke Wei: Low rank matrix recovery: From iterative hard thresholding to Riemannian optimization |

afterwards |
Reception |

## Tuesday, February 16

9:35 - 10:20 |
André Uschmajew: Finding a low-rank basis in a matrix subspace |

10:25 - 10:55 |
Group photo and coffee break |

10:55 - 11:40 |
Bart Vandereycken: Manifold optimization for low rank matrix and tensor completion |

11:45 - 12:30 |
Lek-Heng Lim: Blind multilinear identification and tensor nuclear norm |

12:35 - 14:20 |
Lunch break |

14:20 - 16:00 |
Poster Session |

16:00 - 16:30 |
Tea and cake |

16:30 - 17:15 |
Jan Vybiral: Non-asymptotic analysis of l_{1}-support vector machines |

17:20 - 18:05 |
Sjoerd Dirksen: On the gap between restricted isometry properties and sparse recovery conditions |

## Wednesday, February 17

9:35 - 10:20 |
Gabriel Peyré: Exact Support Recovery for Sparse Spikes Deconvolution |

10:25 - 10:55 |
Coffee break |

10:55 - 11:40 |
Lorenzo Rosasco: Less is more: optimal learning with stochastic projection regularization |

11:45 - 12:30 |
Anders Hansen: What is the Solvability Complexity Index (SCI) of your problem? - On the SCI Hierarchy and the foundations of computational mathematics |

12:35 - 14:20 |
Lunch break |

14:20 - 15:05 |
Ben Adcock: Sensing and parallel acquisition |

afterwards |
Excursion to Ludwig van Beethoven's birthplace (Beethoven-Haus, Bonngasse 24-26)We walk together from HIM, leaving directly after Ben Adcock's talk. |

18:00 - |
Conference dinner in the Restaurant Meyer's (Clemens-August-Str. 51a) |

## Thursday, February 18

9:35 - 10:20 |
Fabio Nobile: Polynomial Approximation of Random PDEs by discrete least squares with observations in random points |

10:25 - 10:55 |
Coffee break |

10:55 - 11:40 |
Albert Cohen: Data assimilation in reduced modeling |

11:45 - 12:30 |
Simone Brugiapaglia: CORSING: sparse approximation of PDEs based on compressed sensing |

12:35 - 14:20 |
Lunch break |

14:20 - 15:05 |
Jelani Nelson: Optimal space and fast heavy hitters with high probability |

15:10 - 15:55 |
Martin Lotz: A blind spot in the probabilistic complexity analysis of algorithms |

16:00 - 16:30 |
Tea and cake |

16:30 - 17:15 |
Karin Schnass: Sparsity, Co-sparsity and Learning |

## Friday, February 19

9:35 - 10:20 |
David Gross: Diamond norm as improved regularizer for low rank matrix recovery |

10:25 - 10:55 |
Coffee break |

10:55 - 11:40 |
Felix Krahmer: Matrix factorization with binary components - uniqueness in a randomized model |

11:45 - 12:30 |
Mahdi Soltanolkotabi: Finding low complexity models without the shackles of convexity |

12:35 - |
Lunch break - end of workshop |

# Abstracts

(Underlined titles can be clicked for the video recording)

## Ben Adcock: Sensing and parallel acquisition

Parallel sensing systems are found in various applications where a single sensor may not collect enough data for an adequate reconstruction. These systems allow simultaneous data acquisition in multiple sensors, thus alleviating such problems by providing more overall measurements. Applications include parallel MRI, multi-view imaging, seismic imaging and system identification. In this talk I will consider the combination of compressed sensing with parallel acquisition. I will assess the theoretical improvements of such systems by providing recovery guarantees for which, subject to appropriate conditions, the number of measurements required per sensor decreases linearly with the total number of sensors. I will specifically consider two different sampling scenarios - distinct (i.e. independent sampling in each sensor) and identical (i.e. dependent sampling between sensors) - and a general mathematical framework that allows for a wide range of different sensing matrices. Several distinct consequences of this work are a first theoretical justification of the use of compressed sensing in parallel MRI, and a sparse, discrete generalized sampling theorem analogous to the classical result of Papoulis.

This is joint work with Il Yong Chun (Purdue).

## Simone Brugiapaglia: CORSING: sparse approximation of PDEs based on compressed sensing

We present the CORSING (COmpRessed SolvING) method, a novel technique for the numerical approximation of PDEs based on compressed sensing.

The main idea is to establish an analogy between the sampling of a signal and the Petrov-Galerkin discretization of a PDE, where the bilinear form associated with the weak formulation virtually plays the role of a measuring device. Making use of concepts and tools from compressed sensing, the CORSING method builds a reduced discretization of the PDE and computes an approximate solution via sparse recovery.

A theoretical analysis of the CORSING method is presented, along with numerical experiments that confirm the robustness and reliability of the proposed strategy applied to classical equations of fluid dynamics.

Joint work with Stefano Micheletti (Politecnico di Milano), Fabio Nobile (EPFL) and Simona Perotto (Politecnico di Milano).

We consider the problem of optimal recovery of an element u in a Hilbert space H from a finite number of linear measurements. Motivated by reduced modeling for solving parametric partial differential equations, the a-priori additional information about u is in the form of how well it can be approximated by a certain known subspace of given dimension (reduced bases, POD). Our work make the distinction between the strategy where only one subspace is exploited, and the multi-space strategy in which we combine the available information for several subspaces. Algorithms that yield near optimal recovery bounds are proposed.

This is a joint work with Peter Binev, Wolfgang Dahmen, Ron DeVore, Guergana Petrova and Przemyslaw Wojtaszczyk.

## Sjoerd Dirksen: On the gap between restricted isometry properties and sparse recovery conditions

In this talk I will consider the problem of recovering sparse vectors from underdetermined linear measurements via l_{p}-constrained basis pursuit. Previous analyses of this problem have suggested that two phenomena occur if p is not equal to 2:

1) one may need substantially more than s log(n/s) measurements (optimal for p=2) for uniform recovery of all s-sparse vectors;

2) the matrix that achieves recovery with the optimal number of measurements may not be Gaussian (as for p=2).

These results are based on analyses via generalized restricted isometry properties (RIP). I will present a new analysis which shows that in fact neither 1) nor 2) occurs. This establishes a sizeable gap between the optimal conditions under which restricted isometry properties hold and under which one can uniformly recover all s-sparse vectors.

As an application, I will show that one can obtain a consistent reconstruction from uniform scalar quantized Gaussian (as well as heavier-tailed) measurements in the optimal measurement regime.

Based on joint work with Guillaume Lecué (Ecole Polytechnique) and Holger Rauhut (RWTH Aachen)

## Rémi Gribonval: Projections, Learning, and Sparsity for Efficient Data Processing

The talk will discuss recent generalizations of sparse recovery guarantees and compressive sensing to the context of machine learning. Assuming some low-dimensional model on the probability distribution of the data, we will see that in certain scenarios it is indeed possible to (randomly) compress a large data-collection into a reduced representation, of size driven by the complexity of the learning task, while preserving the essential information necessary to process it. Two case studies will be given: compressive clustering, and compressive Gaussian Mixture Model estimation, with an illustration on large-scale model-based speaker verification. Time allowing, recent results on compressive spectral clustering will also be discussed.

In the common approach to low-rank matrix recovery, one uses the nuclear norm as a convex surrogate for rank. Geometric proof techniques like Tropp's Bowling scheme or Mendelson's small ball method bound the reconstruction error in terms of the descent cone of the norm at the matrix that is to be recovered. Moreover, these arguments suggest that the error would decrease if another convex function be used, which has a smaller descent cone at the set of matrices of interest. We construct such an improved convex function based on the diamond norm, a concept well-known in quantum information theory and operator theory. We demonstrate numerically that the diamond norm indeed outperforms the nuclear norm in a number of applications. The diamond norm is defined for matrices that can be interpreted as order-4 tensors and it turns out that the above condition depends crucially on that tensorial structure. In this sense, this work touches on an aspect of the notoriously difficult tensor completion problem.

This talk addresses some of the fundamental barriers in the theory of computations. Many computational problems can be solved as follows: a sequence of approximations is created by an algorithm, and the solution to the problem is the limit of this sequence (think about computing eigenvalues of a matrix for example). However, as we demonstrate, for several basic problems in computations such as computing spectra of operators, solutions to inverse problems, roots of polynomials using rational maps, solutions to convex optimization problems, imaging problems etc. such a procedure based on one limit is impossible. Yet, one can compute solutions to these problems, but only by using several limits. This may come as a surprise, however, this touches onto the boundaries of computational mathematics. To analyze this phenomenon we use the Solvability Complexity Index (SCI). The SCI is the smallest number of limits needed in order to compute a desired quantity. The SCI phenomenon is independent of the axiomatic setup and hence any theory aiming at establishing the foundations of computational mathematics will have to include the so called SCI Hierarchy. We will specifically discuss the vast amount of classification problems in this non-collapsing complexity/computability hierarchy that occur in inverse problems, compressed sensing problems, l1 and TV optimization problems, spectral problems, PDEs and computational mathematics in general.

Compressed sensing theory (CS) shows that a "signal" can be reconstructed from a few linear, and most often random, observations. Interestingly, this reconstruction is made possible if the number of observations (or measurements) is adjusted to the (expected) intrinsic complexity of the signal (e.g., its sparsity). This principle is thus a generalization of the Shannon-Nyquist sampling theorem where the sampling rate is set by the signal frequency bandwidth. However, a significant aspect of CS systems is quantization (or digitization) of the acquired observations before further processing or for the purpose of transmission and compression.

Quantization is a non-linear transformation that affects compressive observations of signals, i.e., their projections by a random sensing matrix. To assess its impact on signal reconstruction methods, e.g., in order to bound their estimation error, we must therefore understand how the geometry of low-complexity signal sets, such as sparse vectors or low-rank matrices, is preserved in their quantized and randomly projected domain.

Inspired by recent discoveries in 1-bit quantized compressed sensing, this presentation will show how scalar quantization leads to new non-linear l_{1}/l_{2}-embeddings of low-complexity vector sets. These embeddings differ from the common restricted isometry property (RIP) used in "linear" compressed sensing in that they contain an additive distortion term, that is, they display a quasi-isometric nature directly connected to the information loss induced by quantization. In particular, when combined with sub-Gaussian random matrices, quantized random projections define a quasi-isometry with small distortions as soon as the number of measurements grows like the intrinsic complexity of the vector set, as measured by its Gaussian mean width. We will finally explain how these embeddings allows us to bound the reconstruction error of a variant of the basis pursuit program, i.e., to show that its estimation error decays with the number of observations provided the solution is consistent with the quantized signal measurements.

## Felix Krahmer: Matrix factorization with binary components - uniqueness in a randomized model

Motivated by an application in computational biology, we consider low-rank matrix factorization with {0,1}-constraints on the first of the factors and optionally convex constraints on the second one. For not too large ranks, it has been shown by Hein et al. (2013) that one can provably recover the underlying factorization, provided there exists a uniquesolution. We show that when choosing a sparse Bernoulli random model for the binary factor, there is a unique solution with high probability. At the core of our proof is a new asymmetric Littlewood-Offord inequality.

This is joint work with Matthias Hein and David James.

In phase retrieval, one aims to recover a signal from magnitude measurements. In the literature, an effective SDP algorithm, referred to as PhaseLift, was proposed with numerical success as well as strong theoretical guarantees. In this talk, I will first introduce some recent theoretical developments for PhaseLift, which demonstrate the applicability and adaptivity of this convex method.

Although convex methods are provably effective and robust, the computational complexity may be relatively high. Moreover, there is often an issue of storage to solve the lifted problem. To address these issues, we introduce a nonconvex optimization algorithm, named Wirtinger flow, with theoretically guaranteed performance. It is much more efficient than convex methods in terms of computation and memory. Finally, I will introduce how to modify Wirtinger flow when the signal is known to be sparse, in order to improve the accuracy of the recovery.

## Lek-Heng Lim: Blind multilinear identification and tensor nuclear norm

In earlier work with Comon on blind identification, we observed that if on the spectral norm unit ball, the nuclear norm of a d-tensor is bounded by its rank (well-known to be true for d = 1 and 2), then we may use a ratio of nuclear norm and spectral norm as a proxy for both rank and border rank. Unfortunately we found that this assertion is not true for any d > 2. In this talk, we will discuss this and other properties of nuclear norm for higher-order tensors: computational complexity of its weak membership problem and its approximation, analogue of Banach's theorem/Comon's conjecture for nuclear norm, existence of a nuclear norm attaining tensor decomposition, dependence of nuclear norm on the choice of base field, relation to entanglement and separability of quantum states, and as a measure of numerical stability. We speculate that with better understanding, the nuclear norm of higher-order tensors may yet play a useful role in signal processing. This is joint work with Shmuel Friedland.

It is a well-established fact that, in practical settings, iterative numerical algorithms outperform their theoretical complexity guarantees by orders of magnitude; an early example of this is the simplex algorithm. This gap becomes smaller, but does not disappear, in the setting of average-case or the celebrated smoothed analysis of algorithms: while optimization algorithms are shown to run in polynomial time, in practice they run in constant time, and while the average running time of power iteration is infinite, in practice it is polynomial. We present the concept of weak average-case complexity, which is based on the observation that removing a small, "numerically invisible" set from the analysis, is enough to get complexity bounds that are more in line with experience. A few applications and challenges in this field are presented.

## Jelani Nelson: Optimal space and fast heavy hitters with high probability

In turnstile streaming heavy hitters, the goal is to identify high frequency items in a stream of item insertions and deletions. Such sketches can also be used to obtain good "for each" sparse recovery schemes in compressed sensing. The CountMin sketch of Cormode and Muthukrishnan provides a solution using optimal O(lg n) words of space with O(lg n) update time and very slow O(n lg n) query, where n is the universe size. The algorithm succeeds whp 1-1/poly(n). The query time can be brought down to polylg n using the "dyadic trick", at the expense of increasing both space and update time to a suboptimal lg^{2} n.

We show this sacrifice is unnecessary. We obtain lg n space, lg n update time, and polylg n query time simultaneously. Our main technique is a reduction from this problem to a certain graph partitioning problem, which we solve using spectral methods.

Joint work with Kasper Green Larsen, Huy Le Nguyen, and Mikkel Thorup.

We consider a general problem F(u,y)=0 where u is the unknown solution, possibly Hilbert space valued, and y a set of uncertain parameters. We specifically address the situation in which the parameter-to-solution map u(y) is smooth, however y could be very high (or even infinite) dimensional. In particular, we are interested in cases in which F is a partial differential operator, u a Hilbert space valued function and y a distributed, space and/or time varying, random field.

We aim at reconstructing the parameter-to-solution map u(y) from noise-free or noisy observations in random points by discrete least squares on polynomial spaces associated to downward closed index sets. The noise-free case is relevant whenever the technique is used to construct metamodels, based on polynomial expansions, for the output of computer experiments. In the case of PDEs with random parameters, the metamodel is then used to approximate statistics of the output quantities.

We discuss the stability of discrete least squares on random points and present error bounds both in expectation and probability for a priori chosen index sets. We will also discuss theoretical bounds on the minimal error achievable if an optimal choice of the index set for a given sample is performed among all possible downward closed index sets of given cardinality. Our theoretical bound sets a benchmark for adaptive-type algorithms that aim at discovering the optimal polynomial space.

This is a joint work with A. Cohen, A. Chkifa, G. Migliorati and R. Tempone.

In this talk, I study sparse spikes deconvolution over the space of measures, following several recent works (see for instance [2,3]). For non-degenerate sums of Diracs, we show that, when the signal-to-noise ratio is large enough, total variation regularization of measures (which the natural extension of the l1 norm of vectors to the setting of measures) recovers the exact same number of Diracs. We also show that both the locations and the heights of these Diracs converge toward those of the input measure when the noise drops to zero. The exact speed of convergence is governed by a specific dual certificate, which can be computed by solving a linear system. These results extend those obtained by [2]. We also draw connections between the performances of sparse recovery on a continuous domain and on a discretized grid. When the measure is positive, it is known that l1-type methods always succeed when there is no noise. We show that exact support recovery is still possible when there is noise. The signal-to-noise ratio should then scale like 1/t^{2N-1} where there are N spikes separated by a distance t. This reflects the intrinsic explosion of the ill-posedness of the problem [4]. This is joint work with Vincent Duval and Quentin Denoyelle, see [1,4] for more details.

Bibliography:

[1] V. Duval, G. Peyré, Exact Support Recovery for Sparse Spikes Deconvolution, Foundation of Computational Mathematics, 15(5), pp. 1315–1355, 2015.

[2] E. J. Candès and C. Fernandez-Granda. Towards a mathematical theory of super-resolution. Communications on Pure and Applied Mathematics, 67(6), 906-956, 2013.

[3] K. Bredies and H.K. Pikkarainen. Inverse problems in spaces of measures. ESAIM: Control, Optimisation and Calculus of Variations, 19:190-218, 2013.

[4] Q. Denoyelle, V. Duval, G. Peyré. Support Recovery for Sparse Deconvolution of Positive Measures. Preprint Arxiv:1506.08264, 2015.

## Lorenzo Rosasco: Less is more: optimal learning with stochastic projection regularization

In this talk, we will discuss the generalization properies of commonly used techniques to scale up kernel methods and Gaussian processes. In particular, we will focus on data dependent and independent sub-sampling methods, namely Nystrom and random features, and study their generalization properties within a statistical learning theory framework. On the one hand we show that these methods can achieve optimal learning errors while being computational efficient. On the other hand, we show that subsampling can be seen as a form of stochastic projection regularization, rather than only a way to speed up computations.

Joint work with Raffaello Camoriano, Alessandro Rudi.

## Karin Schnass: Sparsity, Co-sparsity and Learning

While (synthesis) sparsity is by now a well-studied low complexity model for signal processing, the dual concept of (analysis) co-sparsity is much less invesitigated but equally promising. We will first give a quick overview over both models and then turn to optimisation formulations for learning sparsifying dictionaries as well as co-sparsifying (analysis) operators. Finally we will discuss the resulting learning algorithms and ongoing research directions.

Joint work with Michi Sandbichler.

## Mahdi Soltanolkotabi: Finding low complexity models without the shackles of convexity

In many applications, one wishes to estimate a large number of parameters from highly incomplete data samples. Low-dimensional models such as sparsity, low-rank, etc provide a principled approach for addressing the challenges posed by such high-dimensional data. The last decade has witnessed a flurry of activity in understanding when and how it is possible to find low complexity models via convex relaxations. However, the computational cost of such convex schemes can be prohibitive. In fact, in this talk I will argue that over insistence on convex methods has stymied progress in many application domains. I will discuss my ongoing research efforts to “unshackle” such problems from the confines of convexity opening the door for new applications.

In the first part of the talk, I will discuss nonconvex schemes for recovering low rank matrices from a few random linear measurements by simple local search heuristics. Starting with rank one positive semidefinite matrices, I will present improved guarantees for Wirtinger flows (see also Xiaodong Li’s talk). I will demonstrate that these fast algorithms are effective without the need for any sophisticated initialization or gradient truncation schemes. I will then discuss a new algorithm dubbed Procrustes flow that allows low-rank matrix recovery for general rectangular matrices of any rank and size.

In the second part of the talk, I will discuss linear inverse problems more broadly and present a unified theoretical framework for sharply characterizing the convergence rates of various (non)convex iterative schemes for solving structured signal recovery problems. The focus will be on problem domains where carefully designed non-convex techniques are effective but convex counterparts are known to fail or yield suboptimal results.

This is based on joint work with collaborators who shall be properly introduced during the talk.

## Jean-Luc Starck: Low Complexity Models to Map the Dark Matter Mass of the Universe

Weak gravitational lensing provides a unique way of mapping directly the dark matter in the universe. Gravitational lensing is the process in which light from distant galaxies is bent by the gravity of intervening mass in the universe as it travels towards us. The convergence field is derived from the gravitational shear of the background galaxy images.

The Euclid mission is the next major spatial survey of the European Spatial Agency. The telescope launch is planned for 2020. The scientific goal of this mission will be to map the geometry of the dark matter and energy. An important point in this task is the galaxies shapes measurement, since it gives a direct mean to constraint the dark matter mass density. This will unavoidably be affected with the point spread function (PSF) of the instrument itself, which is his optical impulse response. The shape distortion due to the dark matter being very slight, It’s critical to account for the PSF accurately. In practice, isolated stars provide a measurement of the point spread function at a given location in the telescope field of view. Yet, this measurement will contain aliasing in EUCLID data due to the size of the CCD sensors. Thus we propose a super-resolution algorithm to recover the suitably sampled PSF, using the low resolution measurements available for the corresponding location in the field. This amounts to solving an inverse problem that we regularize using both matrix factorization and a sparsity.

Then we show how to reconstruct high resolution 2D mass maps from both shear and flexion measurements, and how this method can be extended to the third dimension, allowing us to reconstruct the 3D matter density contrast and therefore to put constraints on the masses of reconstructed structures.

## André Uschmajew: Finding a low-rank basis in a matrix subspace

For a given matrix subspace, a basis consisting of low-rank matrices can be in theory found using a greedy strategy. The greedy procedure involves the problem of finding a matrix of low(est) rank in the subspace, which is NP hard. We propose an algorithm with two phases: first, ranks for the basis are estimated by alternating between soft singular value thresholding and projection on the unit sphere in the given subspace. This procedure generalizes a recent algorithm for the sparse vector problem by Qu, Sun and Wright to low-rank matrices. In a second phase, the basis is constructed by replacing soft with hard thresholding according to the estimated ranks. Numerical experiments indicate that our algorithm can find a low-rank basis quite reliably in several situations. We also compare to a CP tensor decomposition algorithm, which is an alternative approach when the basis is known to consist of rank-one matrices. Our algorithm, however, does not require this information as an input. This is joint work with Yuji Nakatsukasa and Tasuku Soma (University of Tokyo).

## Bart Vandereycken: Manifold optimization for low rank matrix and tensor completion

The minimisation of a smooth objective function subject to (matrix) rank constraints can sometimes be very effectively solved by methods from Riemannian optimisation. This is for instance the case with the low-rank matrix and tensor completion problems However, the standard theory of Riemannian optimisation leaves some questions unanswered regarding its theoretical and practical application. I will focus on two such questions. The first is how the (Riemannian) metric has a significant impact on the convergence of the numerical methods which I will illustrate by deriving recovery guarantees assuming RIP. The second topic is rank adaptivity. In rank-constrained optimisation, the rank is typically not known and instead one is searching for the smallest rank satisfying a certain criterion, like a small residual. I will explain how the geometry of the tangent cone of the variety of matrices of bounded rank can be incorporated so as to obtain rank adaptive algorithms that stay true to the manifold philosophy.

## Jan Vybiral: Non-asymptotic analysis of l_{1}-support vector machines

Support vector machines are one of the most important machine learning algorithms for classification problems. Replacing l_{2} regularization by its l_{1} analogue leads to sparse classifiers of high-dimensional data so important in many real-life applications. We provide analysis of l_{1}-support vector machines with error guaranties valid also in the non-asymptotic regime. This is a joint work with Anton Kolleck (TU Berlin).

The iterative hard thresholding algorithms for low rank matrix recovery from linear measurements are presented, including the normalized iterative hard thresholding and conjugate gradient iterative hard thresholding algorithms. Then we will show that how the Riemannian optimization algorithms based on the embedded low rank matrix manifold can be interpreted as the iterative hard thresholding algorithms with subspace projections. Based on this connection, we establish the recovery guarantees of a class of the Riemannian optimization algorithms for low rank matrix recovery in terms of the restricted isometry constants of the sensing operators. Extensions to low rank matrix completion will also be discussed.

## John Wright: Nonconvex Recovery of Low-Complexity Models

We consider a *complete dictionary recovery* problem, in which we are given a data matrix Y, and the goal is to factor it into a product Y ~ A_{0}X_{0}, with A_{0} a square and invertible matrix and X_{0} is a sparse matrix of coefficients. This is an abstraction of the dictionary learning problem, in which we try to learn a concise approximation to a given dataset. While dictionary learning is widely (and effectively!) used in signal processing and machine learning, relatively little is known about it in theory. Much of the difficulty owes to the fact that standard learning algorithms solve nonconvex problems, and are difficult to analyze globally.

We describe an efficient algorithm which provably learns representations in which the matrix X_{0} has as many as O(n) nonzeros per column, under a suitable probability model for X_{0}. Previous efficient algorithms either only worked for very sparse instances (O(n^{1/2}) nonzeros per column) or required multiple rounds of SDP relaxation. Our results follow from a reformulation of the dictionary recovery problem as a nonconvex optimization over a high dimensional sphere. This particular nonconvex problem has a surprising property: once about n^{3} data samples have been observed, with high probability the objective function has no spurious local minima.

This geometric phenomenon, in which seemingly challenging nonconvex problems can be solved globally by efficient iterative methods, also arises in problems such as tensor decomposition and phase recovery from magnitude measurements. We sketch these connections, and illustrate our results with applications in microscopy and computer vision.

Joint work with Ju Sun and Qing Qu (Columbia).