Principal Component Analysis for Distributed Data David Woodruff IBM Almaden Based on works with Ken Clarkson, Ravi Kannan, and Santosh Vempala Outline 1. What is low rank approximation? 2. How do we solve it offline? 3. How do we solve it in a distributed setting? 2 Low rank approximation § A is an n x d matrix § Think of n points in Rd § E.g., A is a customer-product matrix § Ai,j = how many times customer i purchased item j § A is typically well-approximated by low rank matrix § E.g., high rank because of noise § Goal: find a low rank matrix approximating A § Easy to store, data more interpretable 3 What is a good low rank approximation? Singular Value Decomposition (SVD) Any matrixAAk == argmin U ¢ Σ ¢rank V k matrices B |A-B|F § U has orthonormal columns § Σ is diagonal with non-increasing positive 1/2 2 (|C| ) ) F = (Σ i,jVC i,j The rows of are k entries down the diagonal the orthonormal top k principal § V has rows components Computing Ak exactly is expensive § Rank-k approximation: Ak = Uk ¢ Σk ¢ Vk 4 Low rank approximation § Goal: output a rank k matrix A’, so that |A-A’|F · (1+ε) |A-Ak|F § Can do this in nnz(A) + (n+d)*poly(k/ε) time [S,CW] § nnz(A) is number of non-zero entries of A 5 Solution to low-rank approximation [S] § Given n x d input matrix A § Compute S*A using a sketching matrix S with k/ε << n rows. S*A takes random linear combinations of rows of A A SA § Project rows of A onto SA, then find best rank-k approximation to points inside of SA. 6 What is the matrix S? § S can be a k/ε x n matrix of i.i.d. normal random variables § [S] S can be a k/ε x n Fast Johnson Lindenstrauss Matrix § Uses Fast Fourier Transform § [CW] S can be a poly(k/ε) x n CountSketch matrix [ [ 0010 01 00 1000 00 00 0 0 0 -1 1 0 -1 0 0-1 0 0 0 0 0 1 S ¢ A can be computed in nnz(A) time! 7 Caveat: projecting the points onto SA is slow § Current algorithm: 1. Compute S*A 2. Project each of the rows onto S*A 3. Find best rank-k approximation of projected points inside of rowspace of S*A § Bottleneck is step 2 § [CW] Approximate the projection § Fast algorithm for approximate regression minrank-k X |X(SA)-A|F2 § nnz(A) + (n+d)*poly(k/ε) time 8 Distributed low rank approximation § We have fast algorithms, but can they be made to work in a distributed setting? § Matrix A distributed among s servers § For t = 1, …, s, we get a customer-product matrix from the t-th shop stored in server t. Server t’s matrix = At § Customer-product matrix A = A1 + A2 + … + As § More general than row-partition model in which each customer shops in only one shop 9 Communication cost of low rank approximation § Input: n x d matrix A stored on s servers § Server t has n x d matrix At § A = A1 + A2 + … + As § Output: Server t has n x d matrix Ct satisfying § C = C1 + C2 + … + Cs has rank at most k § |A-C|F · (1+ε)|A-Ak|F § Application: distributed clustering § Resources: Each server is polynomial time, linear space, communication is O(1) rounds. Bound the total number of words communicated § [KVW]: O(skd/ε) communication, independent of n 10 Protocol § Designate one machine the Central Processor (CP) § Let SProblems: be one of the poly(k/ε) x n random matrices above § S can be generated pseudorandomly from small seed tUUTfor § CP chooses smallAseed S and sends to all servers § Can’t output since rank too itlarge § Could communicate AtU to CP, § Server t computes SAt and sends it to CPthen CP computes SVD of Σt AtU UT = AUUT § CP computes Σi=1s SAt = SA § But communicating AtU depends on n § CP sends orthonormal basis UT for row space of SA to each server § Server t computes AtU 11 Approximate SVD lemma § Problem reduces to Communication independent of n! § Server t has n x r matrix Bt = AtU, where r = poly(k/ε) § B = Σt Bt § CP outputs top k principal components of B § Approximate SVD § If WT 2 Rk x r is the matrix of top k principal components of PB, where P is a random r/ε2 x n matrix, |B-BW WT|F · (1+ε) |B-Bk|F § CP sends P to every server § Server t sends PBt to CP who computes PB = Σt PBt § CP computes W, sends everyone W 12 The protocol § Phase 1: § Learn an orthonormal basis U for row space of SA optimal space in U U cost · (1+ε)|A-Ak|F 13 The protocol § Phase 2: § Find an approximately optimal space W inside of U optimal space in U approximate space W in U U cost · (1+ε)2|A-Ak|F 14 Conclusion § O(sdk/ε) communication protocol for low rank approximation § A bit sloppy with words vs. bits but can be dealt with § Almost matching Ω(sdk) bit lower bound § Can be strengthened to Ω(sdk/ε) in one-way model § Can we remove the one-way restriction? § Communication cost of other optimization problems? § Linear programming § Frequency moments § Matching § etc. 15

© Copyright 2019 ExploreDoc