STAT 380: Lecture 11 Matthew Schofield STAT 380: Lecture 11 Slide 1 Motivation Have some function that we want to maximise Often a likelihood Could be another function Unable/unwilling to do it analytically e.g. MLEs for Gamma or Beta distribution iid y1 , . . . , yn ∼ Gamma(α, β) iid y1 , . . . , yn ∼ Beta(α, β) STAT 380: Lecture 11 Slide 2 Options Option 1: Use a built in optimisation routine in R? nlm optim optimize Problem: need to understand what’s going on! STAT 380: Lecture 11 Slide 3 Options Option 1: Use a built in optimisation routine in R? nlm optim optimize Problem: need to understand what’s going on! Option 2: Do it yourself. What we will focus on here... Bisection, Newton, and Secant Algorithms Often we will use the above R functions to check our work STAT 380: Lecture 11 Slide 3 Question What do bisection and newton algorithms do? STAT 380: Lecture 11 Slide 4 Question What do bisection and newton algorithms do? Find solutions to the equation f (x) = 0 STAT 380: Lecture 11 Slide 4 Question What do bisection and newton algorithms do? Find solutions to the equation f (x) = 0 How can we use them to maximize our likelihood? STAT 380: Lecture 11 Slide 4 Question What do bisection and newton algorithms do? Find solutions to the equation f (x) = 0 How can we use them to maximize our likelihood? Use the methods on the derivative of the likelihood STAT 380: Lecture 11 Slide 4 Numerically Solving an Equation We have an equation f (x) = 0 Rewrite that equation to give something of the form x = h(x) Set up an iterative equation: x (i+1) = h(x (i) ) Iterate (i.e. continually update) until convergence Infinite number of h(x) we can choose e.g. suppose log (x) − x 2 = 0 We could rewrite that as: x = p log (x) We could rewrite that as: x = exp(x 2 ) etc STAT 380: Lecture 11 Slide 5 Numerically Solving an Equation Set up an iterative equation: x (i+1) = h(x (i) ) Iterate (i.e. continually update) until convergence Whole lot of theory (MATH 361) that determines 1. whether there will be a solution 2. whether that solution is unique 3. whether or not the equation will converge 4. if convergent, the (approximate) rate at which it converges Depending on h(·) the solution can actually diverge The speed of convergence depends on the algorithm and the choice of h(·) STAT 380: Lecture 11 Slide 6 Convergence We want the successive xi values to get closer and closer to the true answer Refer to this as a convergent algorithm Take a practical view to convergence Usually only know that the xi values get closer and closer to some answer Stop the algorithm is successive xi values are very close STAT 380: Lecture 11 Slide 7 Bisection Method The bisection method is one of the simplest and most intuitive optimisation algorithms. Given we wish to find f (x) = 0, the steps are: 1. Set i = 0 and make an initial guess at the solution: x0 . 2. Define a lower and upper limit such that xlow < x0 < xup . 3. Set ( i ← i + 1 and set x (i) = (xup + xlow )/2. xlow = xi if SIGN[h(xi )] = SIGN[h(xlow )] 4. Set xup = xi otherwise 5. Set i ← i + 1 and return to step 3. Repeat until convergence xup - xlow is small Solution is x = STAT 380: Lecture 11 xup −xlow 2 Slide 8 Bisection Method The bisection method steers the solution by comparing the signs of the limits with the current state. Each successive iteration halves the interval length. The function h is required to be continuous between the limits, but not necessarily differentiable. STAT 380: Lecture 11 Slide 9 Example Suppose we want to minimise f where f (x) = −x(log(x) − 1) − x2 4 Find the derivative and set to zero (on board): STAT 380: Lecture 11 Slide 10 Problem Analytically finding the solution to f 0 (x) = − log(x) − x/2 = 0 is impossible (at least for me). STAT 380: Lecture 11 Slide 11 Example: functions in R f = function(x) { fx = -x * (log(x) - 1) - x^2/4 return(fx) } f.prime = function(x) { fx = -log(x) - x/2 } STAT 380: Lecture 11 Slide 12 Example: plotting f (x) and f 0 (x) curve(f, from = 0.1, to = 2, n = 1001) abline(h = 0, lty = 2) curve(f.prime, add = T, col = "blue", n = 1001) legend("bottomleft", legend = c("f", "d/dx f", "zero"), lty = c(1, 1, 2), col = c("black", "blue", "black")) STAT 380: Lecture 11 Slide 13 f(x) −0.4 −0.2 0.0 0.2 0.4 0.6 0.8 Example: plotting f (x) and f 0 (x) f d/dx f zero 0.5 1.0 1.5 2.0 x STAT 380: Lecture 11 Slide 14 1.0 Example: bisection method −0.5 | || || −1.0 f.prime(x) 0.0 0.5 Solution is approx 0.7034674... | | | −1.5 | | | | | −2.0 | | | 0.4 0.5 | 0.6 0.7 0.8 0.9 1.0 1.1 x STAT 380: Lecture 11 Slide 15 Convergence of bisection method k 1 = 2−k times After k steps, the length of the interval is 2 its initial length. The convergence rate is linear i.e. the rate at which the interval reduces is constant. Based on this we can define a stopping rule. STAT 380: Lecture 11 Slide 16 Stopping rule for bisection method There are no hard-and-fast laws regarding stopping rules for these kinds of algorithms. Usually base it on the length of the interval Define some small constant Define convergence to have taken place when (xup − xlow ) ≤ , Given a certain value for , how many steps will it take to achieve convergence? STAT 380: Lecture 11 Slide 17 Stopping rule for bisection method Two options: 1. For a given use our knowledge about the bisection method to find the number of iterations k to ensure the desired degree of accuracy Run a for loop for that many iterations Hard to do for algorithms consider next 2. Run until xup − xlow ≤ Run a while loop The exact choice of will typically depend upon the scale of the problem. STAT 380: Lecture 11 Slide 18 Finding k for bisection method Define z to be initial interval width Want final interval width < Want to find k (number of iterations) such that: k 1 z < 2 k 1 < 2 z k log(0.5) < log z log z k> log(0.5) STAT 380: Lecture 11 Slide 19 Example: bisection method If we want accuracy to = 0.01 with starting interval z = 1 z = 1 epsilon = 0.01 (k = ceiling(log(epsilon/z)/log(0.5))) ## [1] 7 If we want accuracy to = 1e −10 with starting interval z = 1 epsilon = 1e-10 (k = ceiling(log(epsilon/z)/log(0.5))) ## [1] 34 STAT 380: Lecture 11 Slide 20 Using built-in functions Check our code with built-in functions optimize help(optimize) optimize(f, interval = c(0, 10), maximum = TRUE) ## $maximum ## [1] 0.7035 ## ## $objective ## [1] 0.8272 STAT 380: Lecture 11 Slide 21 Exercise: MLE for the Rayleigh distribution We observe the n = 6 data points x = c(0.34, 0.77, 1.37, 1.64, 2.46, 0.26) We believe they are independent samples from the same Rayleigh distribution. STAT 380: Lecture 11 Slide 22 Exercise: MLE for the Rayleigh distribution The Rayleigh distribution: is defined for values x ≥ 0 is controlled by the parameter σ > 0 has density function f (x; σ) = x −x22 e 2σ . σ2 Given a sample of size {x1 , . . . , xn }, the log-likelihood for σ is: Pn 2 x 2 `(σ|x1 , . . . , xn ) = −n log(σ ) − i 2 i 2σ STAT 380: Lecture 11 Slide 23 Excercises 1. Starting with the pdf of the Rayleigh distribution, show that the log-likelihood in the previous slide is correct 2. Find the derivative of the log-likelihood. Specify a function (in R) that evaluates the derivative (using the data given) at a given value (or values) of σ 3. Plot the derivative of the log-likelihood function and use this plot to find appropriate values for σlow and σup that are either side of the value of σ that leads to a derivative of 0 STAT 380: Lecture 11 Slide 24 Excercises 4. Write R code that runs one iteration of bisection method: i.e. start with your values of σlow and σup and (i) find the midpoint σi and (ii) use it to replace either σlow or σup . 5. Write R code that runs k iterations of the bisection method using a for loop. Run it for k = 7 and k = 34 iterations. 6. Write R code that runs the bisection method until xup − xlow < for some using a while loop. Run it until < 0.01 and < 1e −10 . 7. This is a case where the MLE can be obtained in closed form (i.e. you can algebraicly solve for it). Find the analytical MLE and use it to ensure your functions above are correct. STAT 380: Lecture 11 Slide 25

© Copyright 2019 ExploreDoc