# Schedule of the Workshop on Harmonic Analysis, Graphs and Learning

All talks take place in the lecture hall "Kleiner Hörsaal" (Wegelerstr. 10). The registration, the coffee breaks and the poster session are in the room "Zeichensaal" (Wegelerstr. 10).

## Monday, March 14

10:15 - 10:45 |
Registration & Welcome coffee |

10:45 - 10:55 |
Opening remarks |

10:55 - 11:40 |
Joel Tropp: Universality laws for randomized dimension reduction |

11:45 - 12:30 |
Rachel Ward: Some recent results in hashing and clustering |

12:35 - 14:20 |
Lunch break |

14:20 - 15:05 |
Gilad Lerman: Fast, Robust and Non-convex Subspace Recovery |

15:10 - 15:55 |
Andrea Montanari: Graph estimation via semi-definite programming |

16:00 - 16:30 |
Coffee break |

16:30 - 17:15 |
Naoki Saito: Matrix Data Analysis using Hierarchical Co-Clustering and Multiscale Basis Dictionaries on Graphs |

afterwards |
Reception at HIM (Poppelsdorfer Allee 45) |

## Tuesday, March 15

9:00 - 9:45 |
Radu Balan: Lipschitz analysis of the phase retrieval problem |

9:50 - 10:35 |
Vladimir Temlyakov: Dictionary descent in optimization |

10:40 - 11:10 |
Group photo and coffee break |

11:10 - 11:55 |
Afonso Bandeira: On solving certain semidefinite programs with low-rank solutions |

12:00 - 12:45 |
Stephan Dahlke: Shearlet Coorbit Spaces and Shearlet Groups |

12:50 - 14:20 |
Lunch break |

14:20 - 16:00 |
Poster Session |

16:00 - 16:30 |
Coffee break |

16:30 - 17:15 |
Gitta Kutyniok: Optimal Compressive Imaging of Fourier Data |

## Wednesday, March 16

9:00 - 9:45 |
Gabriele Steidl: Variational Models for Restoring Manifold-Valued Images |

9:50 - 10:35 |
Holger Boche: Banach-Steinhaus Theory Revisited: Lineability, Spaceability, and related Questions for Phase Retrieval Problems for Functions with finite Energy |

10:40 - 11:10 |
Coffee break |

11:10 - 11:55 |
Pascal Frossard: Sparse representation learning for graph signals |

12:00 - 12:45 |
Tomas Sauer: Learning sparse exponentials from discrete data |

12:50 - 14:20 |
Lunch break |

14:20 - 15:05 |
Robert Calderbank: GraphConnect: A Framework for Regularizing Neural Networks |

afterwards |
Excursion: Exhibition "M.C. Escher" (Max Ernst Museum Brühl) |

19:00 - |
Conference dinner in the Restaurant Meyer's (Clemens-August-Str. 51a) |

## Thursday, March 17

9:00 - 9:45 |
Joan Bruna: Signal Recovery from Scattering Convolutional Networks |

9:50 - 10:35 |
Helmut Bölcskei: Deep convolutional feature extraction: Theory and new architectures |

10:40 - 11:10 |
Coffee break |

11:10 - 11:55 |
Ronen Talmon: Hierarchical Coupled Geometry for Data-driven Analysis of Dynamical Systems |

12:00 - 12:45 |
Yue Lu: Dynamics of Online M-Estimation for High-Dimensional Regression: the ODE Approach |

12:50 - 14:20 |
Lunch break |

14:20 - 15:05 |
Matthias Hein: A nodal domain theorem and a higher-order Cheeger inequality for the graph p-Laplacian - CANCELLED - |

15:10 - 15:55 |
Akram Aldroubi: Iterative actions of normal operators |

16:00 - 16:30 |
Coffee break |

16:30 - 17:15 |
Pierre Vandergheynst: Random sampling of bandlimited signals on graphs |

## Friday, March 18

9:00 - 9:45 |
Alain Pajor: On the covariance matrix - CANCELLED - |

9:50 - 10:35 |
Christine De Mol: Combining information or forecasts for predicting economic variables |

10:40 - 11:10 |
Coffee break |

11:10 - 11:55 |
Philipp Grohs: Perspectives of Computational Harmonic Analysis in Numerics |

12:00 - 12:45 |
Ingrid Daubechies / Rima Alaifari: Phase retrieval in the infinite-dimensional setting |

12:50 - |
Lunch break - end of workshop |

# Abstracts

## Akram Aldroubi: Iterative actions of normal operators

Let A be a normal operator in a Hilbert space H, and let G ⊂ H be a countable set of vectors. We investigate the relations between A, G, and L that makes the system of iterations {A^{n}g : g ∈ G, 0 ≤ n < L(g)} complete, Bessel, a basis, or a frame for H. The problem is motivated by the dynamical sampling problem and is connected to several topics in functional analysis, including, frame theory and spectral theory. It also has relations to topics in applied harmonic analysis including, wavelet theory and time-frequency analysis.

## Radu Balan: Lipschitz analysis of the phase retrieval problem

The phaseless reconstruction problem can be stated as follows. Given the magnitudes of a vector coefficients with respect to a linear redundant system (frame), we want to reconstruct the unknown vector. This problem has first occurred in X-ray crystallography starting from the early 20th century.

The same nonlinear reconstruction problem shows up in speech processing, particularly in speech recognition.

In this talk we present a Lipschitz analysis of the problem as well as Cramer-Rao Lower Bounds that govern any reconstruction algorithm. In particular we show that the left inverse of the nonlinear analysis map can be extended to the entire measurement space with a small increase in the Lipschitz constant independent of the dimension of the space or the frame redundancy.

This is joint work with Dongmian Zou.

## Afonso Bandeira: On solving certain semidefinite programs with low-rank solutions

To address difficult optimization problems, convex relaxations based on semidefinite programming are now common place in many fields. Although solvable in polynomial time, large semidefinite programs tend to be computationally challenging. Over a decade ago, exploiting the fact that in many applications of interest the desired solutions are low rank, Burer and Monteiro proposed a heuristic to solve such semidefinite programs by restricting the search space to low-rank matrices. The accompanying theory does not explain the extent of the empirical success. We focus on Synchronization and Community Detection problems and provide theoretical guarantees shedding light on the remarkable efficiency of this heuristic. This is joint work with Nicolas Boumal and Vlad Voroninski.

## Holger Boche: Banach-Steinhaus Theory Revisited: Lineability, Spaceability, and related Questions for Phase Retrieval Problems for Functions with finite Energy

In the first part of the talk we study the divergence behavior of linear approximation processes in general Banach spaces. We are interested in the structure of the set of divergence creating functions. The Banach-Steinhaus theory gives some information about this set, however, it cannot be used to answer the question whether this set contains infinite dimensional sets with linear structure, i. e. whether the set contains infinite dimensional subspaces. We give necessary and sufficient conditions for the lineability and the spaceability of the set of divergence creating functions, i.e., the existence of infinite dimensional subspaces and closed subspaces. We also discuss the connection to Gower´s dichotomy theorem for Banach spaces.

In the second part of the talk we use the results on lineability and spaceability to characterize the behavior of general reconstruction processes for classical phase retrieval problems for functions with finite energy, i.e. solvability of dispersion relation for functions satisfying Dirichlet´s Principle.

## Helmut Bölcskei: Deep convolutional feature extraction: Theory and new architectures

Deep convolutional neural networks have led to breakthrough results in practical feature extraction applications. The mathematical analysis of such networks was initiated by Mallat, 2012. Specifically, Mallat considered so-called scattering networks based on semi-discrete shift-invariant wavelet frames and modulus non-linearities in each network layer, and proved translation invariance (asymptotically in the wavelet scale parameter) and deformation stability of the corresponding feature extractor. In this talk, we show how Mallat’s theory can be developed further by allowing for general convolution kernels, or in more technical parlance, general semi-discrete shift-invariant frames (including Weyl-Heisenberg, curvelet, shearlet, ridgelet, and wavelet frames) and general Lipschitz-continuous non-linearities (e.g., rectified linear units, shifted logistic sigmoids, hyperbolic tangents, and modulus functions), as well as Lipschitz-continuous pooling operators, all of which can be different in different network layers. We prove deformation stability for a large class of deformations, establish a new translation invariance result which is of vertical nature in the sense of the network depth determining the amount of invariance, and show energy conservation under certain technical conditions. On a conceptual level our results establish that deformation stability, vertical translation invariance, and energy conservation are guaranteed by the network structure per se rather than the specific convolution kernels, non-linearities, and pooling operators. This offers an explanation for the tremendous success of deep convolutional neural networks in a wide variety of practical feature extraction applications. The mathematical techniques we employ are based on continuous frame theory, as developed by Ali et al., 1993, and Kaiser, 1994, and allow to completely detach our proofs from the algebraic structures of the underlying frames and the particular form of the Lipschitz non-linearities and pooling operators. Finally, we introduce new network architectures and we present performance results for cartoon functions.

This is joint work with T. Wiatowski, P. Grohs, and M. Tschannen.

## Joan Bruna: Signal Recovery from Scattering Convolutional Networks

Convolutional Neural Networks (CNN) are a powerful class of non-linear representations that have shown through numerous supervised learning tasks their ability to extract rich information from images, speech and text, with excellent statistical generalization. These are examples of truly high-dimensional signals, in which classical statistical models suffer from the so-called curse of dimensionality, referring to their inability to generalize well unless provided with exponentially large amounts of training data.

In order to gain insight into the reasons for such success, in this talk we will start by studying statistical models defined from wavelet scattering networks, a class of CNNs where the convolutional filter banks are given by complex, multi-resolution wavelet families. As a result of this extra structure, they are provably stable and locally invariant signal representations, and yield state-of-the-art classification results on several pattern and texture recognition problems.

We will give conditions under which signals can be recovered from their scattering coefficients, and will introduce a family of Gibbs processes defined by a collection of scattering CNN sufficient statistics, from which one can sample image and auditory textures. Although the scattering recovery is non-convex and corresponds to a generalized phase recovery problem, gradient descent algorithms show good empirical performance and enjoy weak convergence properties. We will discuss connections with non-linear compressed sensing, applications to texture synthesis and inverse problems such as super-resolution, as well as generalizations to unsupervised learning using deep convolutional sufficient statistics.

## Robert Calderbank: GraphConnect: A Framework for Regularizing Neural Networks

Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, *DropOut* and *DropConnect* are data-independent. We shall describe *GraphConnect*, a complementary method for regularizing deep networks that is data-dependent, and is motivated by the observation that data of interest typically lie close to a manifold. Our proposed method encourages the relationships between the learned decisions to resemble a graph representing the original manifold structure. Essentially *GraphConnect* is designed to learn attributes that are present in data samples in contrast to weight decay, *DropOut* and *DropConnect* which are simply designed to make it more difficult to fit to random error or noise. Analysis of empirical Rademacher complexity suggests that *GraphConnect* is stronger than weight decay as a regularization. Experimental results on several benchmark datasets validate the theoretical analysis, and show that when the number of training samples is small, *GraphConnect* is able to significantly improve performance over weight decay, and when used in isolation, is competitive with *DropOut*.

## Stephan Dahlke: Shearlet Coorbit Spaces and Shearlet Groups

This talk is concerned with recent progress in shearlet theory. First of all, we discuss the group theoretical background of the continuous shearlet transform. Then, we explain how these relationships can be used to combine the shearlet approach with the coorbit theory introduced by Feichtinger and Gröchenig in a series of papers. This combination gives rise to new smoothness spaces, the shearlet coorbit spaces. Then, we discuss the relations of the shearlet groups to other classical groups such as the Weyl-Heisenberg groups and the symplectic groups. In the last part of the talk, we will study the structure of the shearlet coorbit spaces, i.e., we will discuss trace and embedding theorems.

## Christine De Mol: Combining information or forecasts for predicting economic variables

We consider the problem of predicting a given macroeconomic or financial variable, (i) based on the information contained in a large ensemble of (stationary) time series, typically strongly correlated with the series to forecast, using penalized regression methods such as ridge or lasso; (ii) based on an optimal combination of forecasts of that variable provided by different sources (surveys, various models, etc.).

## Pascal Frossard: Sparse representation learning for graph signals

The sparse representation of signals residing on weighted graphs has to incorporate the intrinsic geometric structure of the irregular data domain into the atoms of the dictionary. In this work, we propose a parametric dictionary learning algorithm to design data-adapted, structured dictionaries that sparsely represent graph signals. In particular, we model graph signals as combinations of overlapping local patterns. We impose the constraint that each dictionary is a concatenation of subdictionaries, with each subdictionary being a polynomial of the graph Laplacian matrix, representing a single pattern translated to different areas of the graph. Illustrative application of such dictionaries will be discussed in distributed signal processing and visual data representation.

## Philipp Grohs: Perspectives of Computational Harmonic Analysis in Numerics

In the last decades, directional representation systems such as curvelets, shearlets or ridgelets have made a big impact in applied harmonic analysis and image- and signal processing, due to their superior ability to sparsify curved singularities in multivariate functions, arising for instance as edges in image data.Their approximation properties are vastly superior over standard discretizations such as wavelets for FEM for the approximation of functions with curved singularities and therefore the use of directional representation systems also carries great potential in numerical analysis.

In this talk we discuss the useability of directional representation systems in numerical analysis, namely for the numerical solution of different types of PDEs whose generic solutions possess curved singularities. Concretely, we present a ridgelet-based adaptive solver for linear advection equations which, for the first time, can be shown to converge optimally, even for solutions with line singularities, as well as a construction of shearlet systems on bounded domains which may be used for the design of the first optimally convergent adaptive solvers for elliptic PDEs whose solutions possess singularities along curved submanifolds.

The results are based on joint work with Simon Etter, Gitta Kutyniok, Jackie Ma, Axel Obermeier and Philipp Petersen.

## Matthias Hein: A nodal domain theorem and a higher-order Cheeger inequality for the graph p-Laplacian

We consider the nonlinear graph p-Laplacian and its set of eigenvalues and associated eigenvectors of this operator defined by a variational principle. We prove a nodal domain theorem for the graph p-Laplacian for any p ≥ 1. While for p > 1 the bounds on the number of weak and strong nodal domains are the same as for the linear graph Laplacian (p = 2), the behavior changes for p = 1. We show that the bounds are tight for p ≥ 1 as the bounds are attained by eigenvectors of the graph p-Laplacian on two graphs. Finally, using the properties of the nodal domains, we prove a higher-order Cheeger inequality for the graph p-Laplacian for p > 1. If the eigenvector associated to the k-th variational eigenvalue of the graph p-Laplacian has exactly k strong nodal domains, then the higher order Cheeger inequality becomes tight as p tends to 1.

## Gitta Kutyniok: Optimal Compressive Imaging of Fourier Data

One fundamental problem in applied mathematics is the issue of recovery of continuous data from specific samples. Of particular importance is the case of pointwise samples of the associated Fourier transform, which are, for instance, collected in Magnetic Resonance Imaging (MRI). Strategies to reduce the number of samples required for reconstruction with a prescribed accuracy have thus a direct impact on such applications – which in the case of MRI will, for instance, shorten the time a patient is forced to lie in the scanner.

In this talk, we will present a sparse subsampling strategy of Fourier samples which can be shown to perform optimally for multivariate functions, which are typically governed by anisotropic features. For this, we will introduce a dualizable shearlet frame for reconstruction, which provides provably optimally sparse approximations of cartoon-like images, a class typically regarded as a suitable model for images. The sampling scheme will be based on compressed sensing ideas combined with a coherence-adaptive sampling density considering the coherence between the Fourier basis and the shearlet frame.

This is joint work with W.-Q. Lim (Fraunhofer Heinrich-Hertz-Institute Berlin).

## Gilad Lerman: Fast, Robust and Non-convex Subspace Recovery

This joint work with Tyler Maunu presents a fast and non-convex algorithm for robust subspace recovery. The datasets considered include inliers drawn around a low-dimensional subspace of a higher dimensional ambient space, and a possibly large portion of outliers that do not lie nearby this subspace. The proposed algorithm, which we refer to as Fast Median Subspace (FMS), is designed to robustly determine the underlying subspace of such datasets, while having lower computational complexity than existing methods. We prove convergence of the FMS iterates to a stationary point. Further, under a special model of data, we prove that FMS converges globally sublinearly, and locally r-linearly with overwhelming probability to a point which is near to the global minimum. The latter theorem holds for any fixed fraction of outliers (less than 1) and any fixed positive distance between the limit point and the global minimum. Numerical experiments on synthetic and real data demonstrate its competitive speed and accuracy. The real data experiments emphasize the usefulness of FMS for dimension reduction.

## Yue Lu: Dynamics of Online M-Estimation for High-Dimensional Regression: the ODE Approach

In this talk, I will present an exact analysis of the dynamics of a large class of online row-action methods for solving high-dimensional M-estimation problems. In the large systems limit, the dynamics of these algorithms converges to trajectories governed by a set of deterministic, coupled ODEs. Combined with suitable spectral initialization, this analysis establishes the theoretical performance guarantee of these row-action methods for solving both convex and nonconvex M-estimation problems in high dimensions.

## Andrea Montanari: Graph estimation via semi-definite programming

Denote by A the adjacency matrix of an Erdos-Renyi graph with bounded average degree. I consider the problem of maximizing <A,X> over the set of positive semidefinite matrices X with diagonal entries equal to one. I will prove that for large (bounded) average degree d, the value of this semidefinite program (SDP) is –with high probability– 2n√d, plus lower order terms.

I will next consider the sparse, two-groups, symmetric community detection problem (also known as planted partition). I will prove that SDP achieves the information-theoretically optimal detection threshold for large (bounded) degree. Our approach is based on tools from different research areas: (i) A new 'higher-rank' Grothendieck inequality for symmetric matrices; (ii) An interpolation method inspired from statistical physics; (iii) An analysis of the eigenvectors of deformed Gaussian random matrices.

I will finally compare this rigorous results, with a series of non-rigorous predictions that come from statistical physics, and outline a number of open problems in this area.

Based on Joint work with Adel Javanmard, Federico Ricci-Tersenghi and Subhabrata Sen.

## Alain Pajor: On the covariance matrix

We will survey recent results on the empirical covariance matrix of a sample from a random vector which coordinates are not necessarily independent. We will discuss the quantitative point of view as well as the asymptotic point of view.

## Naoki Saito: Matrix Data Analysis using Hierarchical Co-Clustering and Multiscale Basis Dictionaries on Graphs

Many modern data analysis tasks often require one to efficiently handle and analyze large matrix-form datasets such as term-document matrices and spatiotemporal measurements made via sensor networks. Since such matrices are often shuffled and scrambled, they do not have spatial coherency and smoothness compared to usual images, and consequently, the conventional wavelets and their relatives cannot be used in practice. Instead we propose to use our multiscale basis dictionaries for graphs, i.e., the Generalized Haar-Walsh Transform (GHWT) and the Hierarchical Graph Laplacian Eigen Transform (HGLET). In particular, we build such dictionaries for columns and those for rows, extract the column best basis and the row best basis from the basis dictionaries, and finally construct the tensor product of such best bases, which turns out to reveal hidden dependency and underlying geometric structure of a given matrix data. To build our basis dictionaries, the hierarchical recursive bi-partitioning of columns and rows is essential, and to do so, we fully adopt the spectral co-clustering of I. S. Dhillon at each recursion.

In this talk, I will first review both this co-clustering method and the GHWT dictionary construction. Then, I will discuss the case study of our approach using the popular Science News database consisting of relative frequencies of occurrences of 1042 words over 1153 documents, which are categorized into eight different subjects: Anthropology; Astronomy; Behavioral Sciences; Earth Sciences; Life Sciences; Math/CS; Medicine; and Physics. Our results are quite encouraging, e.g., some of the selected row basis vectors (i.e., a combination of certain words) discriminate the documents of a specific category from the others while some column basis vectors (indicating the grouping structure of the documents) reveal the statistics of the word usages of those document groups as a whole. This is a joint work with Jeff Irion.

## Tomas Sauer: Learning sparse exponentials from discrete data

Reconstructing a function of the form f(x) = ∑_{ω∈Ω} f_{ω} e^{ω⋅x} for some "small" set Ω of complex multivariate frequencies is known as Prony's problem. The talk points out the algebraic structure of this problem and how an approximate solution of this problem can be obtained very fast by methods from numerical Linear Algebra. In addition, some aspects of minimal sampling set, in particular their dependency on the set Ω, will be discussed.

## Gabriele Steidl: Variational Models for Restoring Manifold-Valued Images

We introduce a new non-smooth variational model for the restoration of manifold-valued data which includes second order differences in the regularization term. While such models were successfully applied for real-valued images, we introduce the second order difference and the corresponding variational models for manifold data, which up to now only existed for cyclic data. The approach requires a combination of techniques from numerical analysis, convex optimization and differential geometry. First, we establish a suitable definition of absolute second order differences for signals and images with values in a manifold. Employing this definition, we introduce a variational denoising model based on first and second order differences in the manifold setup.

In order to minimize the corresponding functionals, we generalize three kind of algorithms: i) inexact cyclic proximal point algorithm, ii) half-quadratic minimization, iii) Douglas Rachford splitting. We propose an efficient strategy for the computation of the corresponding proximal mappings in symmetric spaces. For the first algorithm we utilizing the machinery of Jacobi fields. We demonstrate the performance of our algorithms in particular for the n-sphere, the rotation group SO(3) and the manifold of symmetric positive definite matrices. We prove the convergence of the proposed algorithms in Hadamard spaces.

This is joint work with Miroslav Bacak (MPI Leipzig), Ronny Bergmann, Johannes Persch (TU Kaiserslautern), R. Hielscher (TU Chemnitz), and Andreas Weinmann (GSF München).

## Ronen Talmon: Hierarchical Coupled Geometry for Data-driven Analysis of Dynamical Systems

In this talk, we will introduce a data-driven method for building hierarchical coupled geometry of data arising from complex dynamical systems. This method gives rise to the joint organization of the system variables, parameters and dynamic patterns in hierarchical data structures at multiple scales. These structures provide local to global representations of the data, from local partitioning in flexible trees through a new multiscale metric to a global low-dimensional embedding. We will show applications to simulated data as well as to in-vivo recordings of neuronal activity from awake animals. The application of our technique to such recordings demonstrates its capability of capturing the spatio-temporal network complexity in sufficient resolutions. More specifically, it enables us to extract neuronal activity patterns and to identify temporal trends, associated with particular behavioral events and manipulations introduced in the experiments.

## Vladimir Temlyakov: Dictionary descent in optimization

The problem of convex optimization will be discussed. Usually in convex optimization the minimization is over a d-dimensional domain. Very often the convergence rate of an optimization algorithm depends on the dimension d. The algorithms discussed in this talk utilize dictionaries instead of a canonical basis used in the coordinate descent algorithms. We investigate which properties of a dictionary are beneficial for the convergence rate of typical greedy-type algorithms.

## Joel Tropp: Universality laws for randomized dimension reduction

Dimension reduction is the process of embedding high-dimensional data into a lower dimensional space to facilitate its analysis. In the Euclidean setting, one fundamental technique for dimension reduction is to apply a random linear map to the data. The question is how large the embedding dimension must be to ensure that randomized dimension reduction succeeds with high probability.

This talk describes a phase transition in the behavior of the dimension reduction map as the embedding dimension increases. The location of this phase transition is universal for a large class of datasets and random dimension reduction maps. Furthermore, the stability properties of randomized dimension reduction are also universal. These results have many applications in numerical analysis, signal processing, and statistics.

Joint work with Samet Oymak.

## Pierre Vandergheynst: Random sampling of bandlimited signals on graphs

We study the problem of sampling k-bandlimited signals on graphs, as models for information over networks. We propose two sampling strategies that consist in selecting a small subset of nodes at random. The first strategy is non-adaptive, i.e., independent of the graph structure, and its performance depends on a parameter called the graph coherence. On the contrary, the second strategy is adaptive but yields optimal results. Indeed, no more than O(k log(k)) measurements are sufficient to ensure an accurate and stable recovery of all k-bandlimited signals. This second strategy is based on a careful choice of the sampling distribution, which can be estimated quickly. Then, we propose a computationally efficient decoder to reconstruct k-bandlimited signals from their samples. We prove that it yields accurate reconstructions and that it is also stable to noise. Finally, we illustrate this technique by showing one can use it to sketch networked data and efficiently approximate spectral clustering.

Joint work with Gilles Puy, Nicolas Tremblay and Rémi Gribonval.

## Rachel Ward: Some recent results in hashing and clustering

In the first part of the talk, we discuss locality sensitive hashing (LSH) for approximate nearest neighbor search. We introduce a "fast" variant of the cross-polytope LSH scheme for angular distance. To our knowledge, is the first LSH scheme with provably optimal sensitivity which is also fast. In the second part of the talk, we discuss an SDP relaxation of the k-means clustering problem. Under a "stochastic ball model", the SDP provably recovers global k-means clusters with high probability. Stability of the SDP in the setting of overlapping clusters is also discussed.