CS 395T: Sublinear Algorithms Fall 2014 Lecture 22 — Nov 11, 2014 Prof. Eric Price 1 Scribe: Taewan Kim Overview In the last few lectures we covered 1. Fourier Transform 2. Sparse Fourier Transform 3. Fourier RIP In this lecture, a new topic ’Oblivious Subspace Embeddings’ is covered, especially algorithms introduced by Clarkson and Woodruff [CW13] for regression and low rank approximation problems. 2 Application Oblivious Subspace Embedding (OSE) is a tool for faster numerical linear algebra. There are two possible applications where OSE can be applied: regression and low rank approximation. 2.1 Regression Problem Statement Given A ∈ Rn×d and b ∈ Rn , Find x ∈ Rd minimizing kAx − bk2 . (n d) A is given data composed of n rows of size d which indicates the d different features. And for those n items, vector b is composed of n outcomes corresponding to each 1 × d feature vector. By finding solution x ∈ minimizing kAx − bk2 , we can find an approximate linear mapping between A and b via x: Ax ≈ b. This problem can be relaxed by allowing error: Find x0 s.t. kAx0 − bk2 ≤ (1 + ) min kAx − bk2 x An algorithm to find optimal solution of the regression problem ( = 0) is using Moore-Penrose pseudoinverse 1 . Algorithm 1. x = A+ b, (A+ is pseudoinverse of A) 1 Details of Moore-Penrose pseudoinverse can be found in Wikipedia or Chapter 4 of Laub, Alan J. Matrix analysis for scientists and engineers. Siam, 2005. 1 2. When rank(A) = d and d n, A+ = (AT A)−1 AT Time complexity of this algorithm to calculate x = A+ b is O(d2 n + d3 ) = O(d2 n) when d n. To speed up, sparsity of A can be utilized when A is sparse. If nnz(A) represents the number of nonzero elements in A, time complexity can be improved to O(d · nnz(A) + d3 ). The actual computation should be done as follows: compute AT x first, which gives O(nnz(A)), then compute (AT A)−1 which gives O(d3 ), and finally compute (AT A)−1 (AT x). However, by using OSE of [CW13] one can achieve: ˜ 3 /2 ) • O(nnz(A)) + O(d ˜ 3 log(1/)) • O(nnz(A) log(1/)) + O(d ˜ ) , f · logO(1) (f )) (Here, O(f 2.2 Low Rank Approximation Problem Statement Given a matrix A ∈ Rn×n , find a matrix B with rank(B) = k which minimizes kA − Bk2F . This low rank approximation problem with Frobenius norm can also be relaxed by allowing error: Find B 0 s.t. kA − B 0 k2F ≤ (1 + ) min B rank(B)=k kA − Bk2F When = 0, Singular Value Decomposition (SVD) gives the best rank-k approximation of A by selecting top k singular values and corresponding singular vectors. SVD requires O(n3 ) of computational time. However by using Power method/subspace iteration: • Each iteration takes O(n2 k) time. • For Frobenius norm approximation, bound is not known. ˜ 2 k/2 ) is possible per iteration. • By allowing spectral error, O(n Also, utilizing OSE can give better time bound: 2 /4 + k 3 /5 ) ˜ • O(nnz(A)) + O(nk For a dense matrix A, rank-k matrix approximation using random projection was introduced by [Sarlos06, CW09]. 3 Oblivious Subspace Embedding Definition 1. Defined on parameters (m, n, d, , δ). An Oblivious Subspace Embedding (OSE) is a distribution on matrices S ∈ Rm×n , s.t. ∀ d-dimensional subspace U of Rn , with probability 1 − δ over S, we have ∀x ∈ U that kSxk2 = (1 ± )kxk2 2 3.1 Regression with OSE Now, we can solve the problem in easier way with lower dimension using OSE. Rather than solving x∗ = arg minx kAx − bk, solve: x0 = arg min kSAx − Sbk x = arg min kS(Ax − b)k x where (Ax − b) ∈ Col(A ◦ b). (Col(A ◦ b) means a column space of A adjoined with the vector b) Then from the definition of OSE, kAx0 − bk ≤ kAx∗ − bk 1+ 1− kS(Ax0 − b)k ≤ kS(Ax∗ − b)k 1+ 1− . 1 + 3 (1) Computational time is determined by ”Embedding time + Solve(m, d)”, where Solve(m, d) represents the time to solve new regression problem with size m × d of SA and m × 1 vector Sb. One example of OSE is Gaussian random matrix which can be defined as: Si,j = N (0, 1/m) (2) With Gaussian OSE, m = O(d/2 ). Therefore, embedding requires O(mnd) = O(d2 n/2 ) and Solve(d, m) requires O(d3 /2 ) computational time. So, total time is O(d2 n/2 + d3 /2 ). 3.2 Fast Johnson-Lindenstrauss Now, we introduce an important lemma, which is called Johnson-Lindenstrauss (JL) lemma. Definition 2 (Johnson-Lindenstrauss Lemma). If m = O((1/2 ) log(1/δ)), then ∀x, kAxk22 = (1 ± )kxk22 w.p. 1 − δ Think as this way: given d-dim. subspace U , take -net: C = (1 + 1/)d points. O((1/2 ) log(1/δ)), then all are preserved, i.e. C can be covered. x = x1 +x2 + 2 x3 + · · · for x1 , · · · ∈ C ⇒ kAxk2 ≥ kAx1 k − kAx2 k − 2 kAx3 k − · · · ≥ 1 − − (1 + )( + 2 + · · · ) ≥ 1 − 3 Faster version of Johnson-Lindenstrauss embedding technique was introduced by [KW11]: If A has RIP of order k, then AD has (, 2−k ) JL property, where ±1 ±1 D= .. . ±1 3 If m = . Last class, it is shown that FΩ∈[n] satisfies (k, ) RIP if |Ω| ≥ (1/2 )k log4 n. So, if m = |Ω| is greater than (d/2 ) log(1/) log4 n, then subspace embeddings with m = (d/2 ) log5 n, and computational time is n log n. So, with Fast JL, embedding requires O(dn log n) and Solve(m, d) requires O((d3 /2 ) log5 n). 3.3 [CW13] To improve the complexity, [CW13] used the sparsity of A. In each column of S, exactly one element has ±1 value defined with hash functions: h : [n] → [m] ← 2-independent σ : [n] → {±1} ← 4-independent therefore OSE matrix S is defined as, Sh(i),i = σi Let’s prove that above S is OSE by showing: a, b ∈ Rn ⇒ hSa, Sbi ≈ ha, bi Proof. Denote δr,i = Ih(i)=r (indicator function). ! n m n X X X hSa, Sbi = δr,i σr,i ai δr,j σr,j bi r=1 " = i=1 n X ai bi i=1 = ha, bi + j=1 m X !# 2 2 δr,i σr,i r=1 m X X + m X X δr,i δr,j σr,i σr,j ai bj r=1 i6=j δr,i δr,j σr,i σr,j ai bj r=1 i6=j ⇒ E[hSa, Sbi] = ha, bi Now let’s consider the variance, V ar[hSa, Sbi]. (This proof can be referred to [NN13]) 2 (V ar[hSa, Sbi]) = m X X 2 2 2 2 E σr,i δr,j (ai bj + ai bj aj bi ) r=1 i6=j ∵ Consider (r, i), (r, j), (r0 , i0 ), (r0 , j 0 ) r = r0 or {i, j} = {i0 , j 0 } → E[ · ] 6= 0 Otherwise → E[ · ] = 0 by independence. X 1 a2i b2j + ai bj aj bi ⇒ V ar[hSa, Sbi] = m i6=j 2 X 2 2 ≤ ai bj m i6=j 2 X 2 2 2 ≤ ai bj = kak22 kbk22 m m i,j 4 Let U ∈ Rn×d have orthonormal columns. We want, kSU xk2 = (1 ± )kxk2 ∀x ∈ Rd ⇔ xT U T S T SU x = (1 ± )xT x ⇔ kU T S T SU − Ik2 ≤ ⇐ kU T S T SU − Ik2F ≤ 2 So it is sufficient to show for Frobenius norm case. (U T S T SU )i,j = hSUi , SUj i (Ui : ith column of U ) Ii,j = hUi , Uj i Also, ∀i, j E[(U T S T SU − I)2i,j ] ≤ 2 m 2d2 ≤ 22 m ⇒ kU T S T SU − Ik2 ≤ ⇒ E[kU T S T SU − Ik2F ] ≤ which shows that kSU xk2 = (1 ± )kxk2 ∀x ∈ Rd , i.e. S is OSE. With this setting of S by [CW13], complexity can be achieved to O(nnz(A) + (d3 /2 ) log5 (d/)), ˜ 3 /2 ). which is O(nnz(A)) + O(d Following Table compares the computational time for introduced algorithms when applied to regression problem. (O notation is omitted.) No OSE: d · nnz(A) + d3 / d2 n + d3 with OSE: Embedding + Solve(d, m) Gaussian: mnd = d2 n/ + d3 /2 Fast JL: dn log n + (d3 /2 ) log5 n C-W: nnz(A) + Solve(d, d2 /2 ) = d4 /2 ← bad! → (d3 /2 ) log5 (d/) Table 1: Comparing complexities for various algorithms for regression 5 References [CW09] Clarkson, Kenneth L., and David P. Woodruff. Numerical linear algebra in the streaming model. Proceedings of the forty-first annual ACM symposium on Theory of computing, ACM, 2009. [CW13] Clarkson, Kenneth L., and David P. Woodruff. Low rank approximation and regression in input sparsity time. Proceedings of the forty-fifth annual ACM symposium on Theory of computing, ACM, 2013. [KW11] Krahmer, Felix, and Rachel Ward. New and improved Johnson-Lindenstrauss embeddings via the restricted isometry property. SIAM Journal on Mathematical Analysis 43.3 (2011): 1269-1281. [NN13] Nelson, Jelani, and Huy L. Nguyn. OSNAP: Faster numerical linear algebra algorithms via sparser subspace embeddings. Foundations of Computer Science (FOCS), 2013 IEEE 54th Annual Symposium on. IEEE, 2013. [Sarlos06] Sarlos, Tamas. Improved approximation algorithms for large matrices via random projections. Foundations of Computer Science, 2006. FOCS’06. 47th Annual IEEE Symposium on, IEEE, 2006. 6

© Copyright 2018 ExploreDoc