Many-core computing for Big Biomedical Data Marc A. Suchard Departments of Biomathematics and Human Genetics David Geffen School of Medicine at UCLA Department of Biostatistics UCLA Fielding School of Public Health IDRE July 2014 [Joint work with David Madigan, Patrick Ryan, Martijn Schumie and Trevor Shaddox] Marc Suchard Massively parallel statistical computing Drug Safety using Massive Observational Databases Goals: Model-based approach that: Controls for other drugs and covariates Includes temporal information Most drugs without effect (priors) Merck recalls Vioxx Post-market surveillance: ∼ 100K - 10M patients ∼ 1K - 10K drugs Rare adverse events Major concern for Big Pharma/FDA Marc Suchard Throws out asymptotics Paradigm Shift Required “Bootstrapping is computationally infeasible with > 20K patients. . . " – , August 2011, ICPE Some truth: ≈ 100 days in my hands (prelim in R) Massively parallel statistical computing Self-Controlled Case Series Basic model: Events ← inhomogeneous Poisson process Vioxx MI Vioxx MI MI M78 Olanzapine Quetiapine Celecoxib 1 Era Time P 7 # of events yik for subject i during era k : N i Ki = K ∼ 10 , condition on total ni . drug indicators xik = {xikj }0 during era k to drug j drug effects β = {βj }, J ∼ 1000 log posterior: log (conditional) likelihood + log lasso prior Ki Ki N X X X 0 − yik x0ik β − ni log tik exik β + λ |β| i=1 k =1 k =1 Convex optimization → Many choices! Marc Suchard Massively parallel statistical computing Block Relaxation via Cyclic Coordinate Descent (CCD) Vioxx MI Vioxx MI MI M78 Olanzapine Quetiapine Celecoxib 1 Era Time Lasso forces many βj = 0 CCD, with shrunken Newton step, on j: β1 , β2 , . . . , βJ , β1 , . . . 1-D Gradient/Hessian require: Column-wise operations ex dominates when dense Reductions dominate when sparse Solution: Exploit sparsity: limited previous work Good convergence No matrix inversion Compute in parallel using GPUs Very serial de Leeuw (1994) log posterior: log (conditional) likelihood + log lasso prior Ki Ki N X X X 0 − yik x0ik β − ni log tik exik β + λ |β| i=1 k =1 k =1 Marc Suchard Massively parallel statistical computing Getting Down and Dirty with SCCS Let SCCS − log (conditional) likelihood Let (−) log likelihood (also same for CLR, Cox model) � � 0 0 L(β) = =Y L(β) Y Xβ Xβ − −N N log log {M {M [T [T × × exp exp(Xβ)]} (Xβ)]}.. Then one-dimensional one-dimensional gradients Then gradients and and Hessians Hessians are are ∂∂22LL � 0 [W × (1 − W)] , = −N 2 = −N [W × (1 − W)] , ∂βj2 ∂βj ∂L � � ∂L 0 W and =Y Y0 X Xj − N = − N W and j ∂βj ∂β j where where � � M T × exp (Xβ) × Xj W= M [T × exp (Xβ)] sparse matrix {0,1} sparse column {0,1} element-wise Forming W requires column-wise operations and sparse matrix-vector βj changes multiplication → update Xβ Further propagation? (sparse), linear/logistic Fused transformation regression and reduction for Marc(2007); Suchard Genkins, Madigan et al. (2007); Park and Hastie Wu and Lange (2009) Marc Suchard Ridiculuously parallel statistics gradient and Hessian Massively parallel statistical computing Vectorization of Sparse Updates CUDA kernel for updating: w math for figure: Start 1 thread per non-0 element in sparse-column Xj (compressed column storage) Xβ, T × exp(Xβ) and M [T × exp(Xβ)] β, L × exp(Xβ), and M [L × exp(Xβ)] given Xj and ∆βj β ← Xβ + ∆βj−1 Xj−1 given Xj and ∆βj date M [L × exp(Xβ)] new Xβ 1 Xβ 2 Xβ = 3 . . . Xβ K old sparsecolumn Thread #1, k = 1, n = 1 1 Xβ 1 0 Xβ 2 Xβ + ∆βj 1 Thread #2, k = 3, n = 1 3 . . . . . . Thread #3, k = K, n = N 1 Xβ K t1 ∆ exp(Xβ 1 ) map k to n new t2 ∆ exp(Xβ 2 ) ∆1 1 1 1 . .. t × ∆ exp(Xβ ) .. = . 3 3 . ... 1 1 .. ∆N store (n, k) in COO ∆ exp(Xβ K ) tK Element-wise ex and × M in coordinate (COO) storage, columns same as Xj Atomic adds to M [T × exp(Xβ)] (∆k ) to avoid race conditions new Marc Suchard Massively parallel statistical computing Observational Healthcare Data Science and Informatics (OHDSI) Public-private collaborative between government, industry and academia to inform active medical product surveillance Example Databases MarketScan Commercial Claims and Encounters (CCAE) – 59 million privately insured lives MarketScan Lab Database (MSLR) – 1.5 million privately insured lives with laboratory results Example Health Outcomes Acute liver injury Acute renal failure Bleeding Upper gastrointestinal tract ulcer Marc Suchard Massively parallel statistical computing Can We Fit in Reasonable Time? Cases-Only Dataset Ranges N = 115K to 3.6M patients, taking J = 1224 to 1428 different drugs in K = 3.8M to 75M exposure eras. Fitting largest originally drained 51 hours (pt-estimate) 105 104 ● ● 172 103 181 150 ● 179 37 ● ● 23 101 102 ● 51 hours → 29 seconds 30 ↑ hopes for cross-validation and full Bayesian inference 23 100 Fitting Time (secs) Genkin/Wu (white circles), all sparse on CPU (black circles), all sparse on GPU (black squares) ● ● 105 106 107 Number of Patients Off by an order-of-magnitude on hyperparameter λ Ran on Amazon EC2 Cloud Marc Suchard Massively parallel statistical computing “Bootstrapping Is Computationally Infeasible . . . " Pharmacological Causes of Angiodema 2 0.5 0.625 0.75 0 1 −2 −1 ^ Effect size β 1 0.875 ^ > 0.5 Drugs (441) for which ρ Non-parametric bootstrap 95% confidence intervals (CIs) for the 441 drug effects that demonstrated non-zero coefficients in at least 50% of the bootstrap replicates. Drotrecogin alfa (treatment for severe sepsis): 4th largest coefficient, but CI overs 0. Medical controversy! Marc Suchard Massively parallel statistical computing Hardware Comparison (via OpenCL) 1024 256 What about the Intel Phi? Dense problem (in different domain) 256 ●● ●●●● ●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● 64 ●● GPU: AMD Radeon HD 7970 GHz Edition GPU: NVIDIA GeForce GTX 580 (CUDA) GPU: NVIDIA Tesla K20m MIC : Intel Xeon Phi SE10P ● CPU: Intel Xeon E5−2680 x2 (16 cores) ● CPU: Intel Xeon E5−2680 (single core) GFLOPS ●● 16 16 ● 4 ●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● 4 1 speedup factor 64 164 100 1,000 1e+04 6e+04 unique site patterns 60+ async CPUs, expensive cache, large memory bus Modest performance: highly stereotyped computation, memory-bandwidth limited (like most inference problems) Marc Suchard Massively parallel statistical computing Acknowledgments Joint work with D. Madigan (Columbia) P. Ryan and M. Schumie (J&J) T. Shaddox (UCLA Ph.D. student) Financial support Origins of HIV KINSHASA Alfred P. Sloan Fellowship KISANGANI MBUYI MAYI BWAMANDA LUBUMBASHI John Simon Guggenheim Fellowship LIKASI NSF IIS 1251151 (BigData) and DMS 1264153 NIH R01 AI107034 and R01 HG006139 Ancient DNA Google Research Gift Marc Suchard Massively parallel statistical computing

© Copyright 2017 ExploreDoc