Local Search Algorithms

Introduction to local search algorithms

This app is designed to help you understand the Steepest Ascent and Newton-Raphson local search algorithms used in statistical estimation. For simplicity, this app focusses on one-dimensional functions with one local maximum, namely the Poisson and Binomial functions

Local versus Global algorithms

Optimization algorithms are divided into two categories; local and global algorithms. A local optimization algorithm is used when you know that the function contains a single optimum (see Figure A). A global optimization algorithm is used when there is little known about the structure of the function because it deals with multi-model distribtuions better (see Figure B). You may wonder, if global algorithms are more versatile, why use a local optimization algorithm at all? Besides that local search algorithms are computationally more straightforward, there is a guarantee that it will find the local optima when the assumptions are met, which is not the case for global search algorithms

Maximum likelihood estimation

We use maximum likelihood estimation (MLE) to estimate the parameters of a statistical model given observations. MLE tries to find the parameter value that maximizes the likelihood of making the observations given the parameters. As in many statistical applications, we will maximize the log-likelihood; as finding the maximum value of the likelihood is the same as finding the maximum value of the log-likelihood (given that it is a monotonic transformation), but it is mathematically simpler because it turns multiplication into sums.

Steepest Ascent

The steepest ascent algorithm is an iterative optimization algorithm to find the maximum of the likelihood function. This algorithm consists of taking repeated steps in the direction of the first derivative. Given a point $\theta^{t-1}$ the next step of the algorithm $\theta^{t}$ is obtained by the following updating rule:

$$\theta^{t}=\theta^{t-1} - \alpha f'(\theta^{t-1}) $$

where $\alpha$ a smaller constant chooses to scale the first derivative.

In the ShinyApp the user can select one of the two likelihood functions, namely, Poisson and binomial. The three sliders on the left allow to set the starting point $\theta^{t-1}$ and the parameters of the function (i.e., mean and the number of observations for the Poisson and number of observations and successes for the binomial). The alpha is set to 0.015. The convergence criterion is the termination criterion for the algorithm. The Reset button allows restarting the visualization after the change of any of these parameters.

The Next step button allows the user to visualize each step of the procedure. In the upper figure, the black line is the log-likelihood. The current point $\theta^{t-1}$ is represented by the blue dot , the continuous red segment is the local approximation, the grey dot is the estimation of $\theta$ at the next step and finally, the vertical dashed lines indicate the maximum of the likelihood function. In the lower figure, on the x-axis, there are iterations and the black circles are the estimation of $\theta$ at that step. The green horizontal line is the true value of $\theta$ Finally, the Finish button allows the algorithm to continue until the algorithm converges or the maximum number of iterations is reached.

Newton-Raphson Algorithm

In the Newton-Raphson algorithm, the likelihood function is locally approximated as a second-order Taylor expansion, then the algorithm computes the maximum of this expansion. Given a point $\theta^{t-1}$ the maximum of the second-order Taylor expansion of the log-likelihood is

$$\frac{f'(\theta^{t-1})}{f''(\theta^{t-1})}.$$

The algorithm proceeds in the direction of the maximum and the point $\theta^{t}$ is obtained by the following updating rule

$$\theta^{t}=\theta^{t-1} - \frac{f'(\theta^{t-1})}{f''(\theta^{t-1})}.$$

In the ShinyApp the user can select one of the two likelihood functions, namely, Poisson and binomial. The three sliders on the left allow to set the starting point $\theta^{t-1}$ and the parameters of the function (i.e., mean and the number of observations for the Poisson and number of observations and successes for the binomial). The convergence criterion is the termination criterion for the algorithm. The Reset button allows to restart the visualization after the change of any of these parameters.

The Next step button allows the user to visualize each step of the procedure. In the upper figure, the black line is the log-likelihood, and the red line is the first derivative of the log-likelihood.

The current point $\theta^{t-1}$ is represented by the blue dot , the continuous green segment is the local approximation of the first derivative, the grey dot is the estimation of $\theta$ at the next step and finally, the vertical dashed lines indicate the maximum of the likelihood function. In the lower figure, on the x-axis, there are iterations and the black circles are the estimation of $\theta$ at that step. The green horizontal line is the true value of $\theta$

Finally, the Finish button allows the algorithm to continue until the algorithm converges or the maximum number of iterations is reached.

## libraries ---- library(shiny) library(shinydashboard)# Guides on how to use this are at: ttps://rstudio.github.io/shinydashboard/ library(shinyjs) ## Functions ---- source("functions.R") ## Tab pages ---- source("tabs.R") ## UI ---- ui <- dashboardPage( dashboardHeader(title = "Local Search Algorithms"), dashboardSidebar( useShinyjs(), includeScript("../../../Matomo-qhelp.js"), width = 350, sidebarMenu( menuItem("Introduction", tabName = "intro_tab"), menuItem("Steepest Ascent", tabName = "accent_tab"), menuItem("Newton-Raphson", tabName = "newtraph_tab"), box( title = "Variables", solidHeader = FALSE, width = NULL, background = "navy", actionButton("nextIter", "Next step"), actionButton("finish", "Finish"), actionButton("reset", "Reset"), selectInput("func", "Function", choices = list("Poisson" = 1, "Binomial" = 2), selected = 1), conditionalPanel( "input.func == 1", sliderInput("pois_start_val", "Starting value", 1.5, min = 1, max = 5, step = 0.10), sliderInput("pois_mean", "Mean", 2.5, min = 1.5, max = 4.5, step = 0.10), sliderInput("pois_n", "Observations", 10, min = 1, max = 100) ), conditionalPanel( "input.func == 2", sliderInput("binom_start_val", "Starting value", .5, min = 0, max = 1), sliderInput("binom_successes", "Successes", 3, min = 1, max = 10), sliderInput("binom_failures", "Failures", 7, min = 1, max = 10) ), numericInput("criterion", "Convergence criterion", 1e-5, min = 1e-10, max = 1e-3) ) ) ), dashboardBody(tabItems(intro_tab, accent_tab, newtraph_tab)) ) ## UI ---- server <- function(input, output, session) { shinyjs::runjs('toggleCodePosition();') unlockBinding("invalidateLater", as.environment("package:shiny")) assign("invalidateLater", invalidateLaterNew, "package:shiny") #### reactive values #### i <- reactiveVal(1) history <- get_iter_history(input) output$accent <- renderPlot({plot_accent(history$accent, i())}) output$newtraph <- renderPlot({plot_newtraph(history$newtraph, i())}) test <- function(x) { x * sin(x) } output$intro <- renderPlot({ par(mfrow=c(1,2)) plot(loglikpois(seq(0, 5, 0.01), 31, 1)$ll, type = "l", xaxt='n', yaxt = 'n', ylab = "Log Likelihood", xlab = expression(theta), main = "A") plot(test(seq(0, 4*pi, 0.01)), type = "l", xaxt='n', yaxt = 'n', ylab = "Log Likelihood", xlab = expression(theta), main = "B")}) #### buttons handling #### observeEvent(input$reset, { i(1) history <<- get_iter_history(input) output$accent <- renderPlot({plot_accent(history$accent, i())}) output$newtraph <- renderPlot({plot_newtraph(history$newtraph, i())}) }) observeEvent(input$nextIter, { i(i() + 1) output$accent <- renderPlot({plot_accent(history$accent, i())}) output$newtraph <- renderPlot({plot_newtraph(history$newtraph, i())}) }) observeEvent(input$finish, { ii <- i() jj <- i() output$accent <- renderPlot({ ii <<- ii + 1 invalidateLaterNew(400, session, ii <= length(history$accent$theta)) plot_accent(history$accent, ii) }) output$newtraph <- renderPlot({ jj <<- jj + 1 invalidateLaterNew(400, session, jj <= length(history$accent$theta)) plot_newtraph(history$newtraph, jj) }) }) } shinyApp(ui, server)

#### Helpers #### t_likelihood <- function(input){ switch( input$func, "1" = "pois", "2" = "binom" ) } t_data <- function(input){ switch( t_likelihood(input), "pois" = list( n = if(is.null(input$pois_n)) 10 else input$pois_n, mean = if(is.null(input$pois_mean)) 2.5 else input$pois_mean ), "binom" = list( n = if(is.null(input$binom_failures) | is.null(input$binom_successes)) 10 else input$binom_successes + input$binom_failures, successes = if(is.null(input$binom_successes)) 3 else input$binom_successes) ) } t_start_pos <- function(input){ switch( t_likelihood(input), "pois" = if(is.null(input$pois_start_val)) 1.5 else input$pois_start_val, "binom" = if(is.null(input$binom_start_val)) .5 else input$binom_start_val ) } t_criterion <- function(input){ return(if(is.null(input$criterion)) 1e-5 else input$criterion) } #### Likelihoods #### # Log likelihood Poisson: loglikpois <- function(theta, n, mean) { ll <- -n * theta + mean * n * log(theta) grad <- -n + mean * n / theta hess <- - mean * n / theta^2 out <- list(ll = ll, grad = grad, hess = hess) return(out) } # Log likelihood Binomial: loglikbinom <- function(theta, n, successes){ ll <- successes*log(theta)+(n-successes)*log(1-theta) grad <- successes/theta - (n-successes)/(1-theta) hess <- -successes/theta^2 - (n-successes)/(1-theta)^2 out <- list(ll = ll, grad = grad, hess = hess) return(out) } # Wrapper for choosing the likelihood get_likelihood <- function(likelihood){ switch( likelihood, "pois" = loglikpois, "binom" = loglikbinom ) } #### Algorithms #### # Steepest accent algorithm alg_accent <- function(likelihood, data, start_pos, criterion, alpha = 0.015, maxiter = 500){ # initiate the algorithm args <- data args$theta <- start_pos fun <- get_likelihood(likelihood) out <- do.call(fun, args) th.vec <- start_pos LL <- out$ll LLp <- out$grad convergence <- FALSE # run the algorithm i <- 1 while (!convergence & (i < maxiter)){ th.vec[i+1] <- th.vec[i] + alpha*out$grad args$theta <- th.vec[i+1] out <- do.call(fun, args) LL[i+1] <- out$ll LLp[i+1] <- out$grad convergence <- (abs(th.vec[i+1]-th.vec[i])) < criterion i <- i + 1 } return(list( t = seq(1, i), theta = th.vec, ell = LL, ellp = LLp, fun = fun, data = data, likelihood = likelihood )) } # Newton-Raphson algorithm alg_newtraph <- function(likelihood, data, start_pos, criterion, delta = 10^-6, maxiter = 500){ # initiate the algorithm args <- data args$theta <- start_pos fun <- get_likelihood(likelihood) out <- do.call(fun, args) th.vec <- start_pos LL <- out$ll LLp <- out$grad LLp2 <- out$hess convergence <- FALSE # run the algorithm i <- 1 while(!convergence & (i < maxiter)){ th.vec[i+1] <- th.vec[i] - out$grad/out$hess args$theta <- th.vec[i+1] out <- do.call(fun, args) LL[i+1] <- out$ll LLp[i+1] <- out$grad LLp2[i+1] <- out$hess convergence <- abs(th.vec[i+1]-th.vec[i]) < delta i <- i + 1 } return(list( t = seq(1, i), theta = th.vec, ell = LL, ellp = LLp, ellp2 = LLp2, fun = fun, data = data, likelihood = likelihood )) } # Wrapper for obtaining the iteration history get_iter_history <- function(input){ likelihood <- isolate(t_likelihood(input)) data <- isolate(t_data(input)) start_pos <- isolate(t_start_pos(input)) criterion <- isolate(t_criterion(input)) return(list( "newtraph" = alg_newtraph(likelihood, data, start_pos, criterion), "accent" = alg_accent(likelihood, data, start_pos, criterion) )) } #### Plotting functions #### plot_accent <- function(history, i){ th.vec <- history$theta LL <- history$ell LLp <- history$ellp fun <- history$fun data <- history$data likelihood <- history$likelihood if(i > length(th.vec)){ i <- length(th.vec) converged <- TRUE }else{ converged <- FALSE } if(likelihood == "pois"){ x <- seq(1, 5, length.out = 100) x_adj <- .25 }else if(likelihood == "binom"){ x <- seq(.05, .95, length.out = 100) x_adj <- .05 } args <- data args$theta <- x out <- do.call(fun, args) loglik <- out$ll grad <- out$grad if(likelihood == "pois"){ x_range <- range(x) y_range <- range(loglik) }else if(likelihood == "binom"){ x_range <- c(0, 1) y_range <- range(loglik) } par(mfrow = c(2, 1)) # general plot plot(NA, type = "n", xlim = x_range, ylim = y_range, las = 1, main = "The Steepest-Ascent-Algorithm", ylab = "Log Likelihood", xlab = expression(theta)) lines(x, loglik, lwd = 2) abline(h = max(loglik), lty = 2) text(x_range[1], y_range[1], paste("Iter: ", i, if(converged) "(Converged)"), adj = c(0, 0), cex = 1.25) text(x_range[2], y_range[1], bquote(hat(theta)*"="*.(format(round(th.vec[i], 3), nsmall = 3))), adj = c(1, 0), cex = 1.25) if(likelihood == "pois"){ abline(v = data$mean, lty = 2, lwd = 2, col = "green") }else if(likelihood == "binom"){ abline(v = data$successes / data$n, lty = 2, lwd = 2, col = "green") } legend( "topright", legend = c("f(x)", "f'(x)"), col = c("black", "red"), lwd = 2, bty = "n" ) # prediction of the next movement if(i + 1 <= length(th.vec)){ arr_adj <- if(th.vec[i] > th.vec[i+1]) .01 else -.01 points(th.vec[i + 1], LL[i + 1], col = "grey", pch = 1, lwd = 3) arrows(th.vec[i] + arr_adj/2, y_range[1], th.vec[i+1] - arr_adj/2, length = .10) } # iterations points(th.vec[i], LL[i], col = "blue", pch = 20, lwd = 3) segments( x0 = th.vec[i] -x_adj, y0 = (th.vec[i] - x_adj) * LLp[i] + LL[i] - LLp[i] * th.vec[i], x1 = th.vec[i] + x_adj, y1 = (th.vec[i] + x_adj) * LLp[i] + LL[i] - LLp[i] * th.vec[i], col = "red", lwd = 2) # Plot the iteration history: x_range2 <- if(i == 1) c(1, 2) else c(1,i) plot(1:i, th.vec[1:i], xlim = x_range2, ylim = x_range, lwd = 2, type = "b", las = 1, main = "Iteration", ylab = "Parameter Estimate", xlab = "") if(likelihood == "pois"){ abline(h = data$mean, lty = 2, lwd = 2, col = "green") text(x_range2[1], data$mean, expression(theta), adj = c(0, 0), cex = 1.25) }else if(likelihood == "binom"){ abline(h = data$successes / data$n, lty = 2, lwd = 2, col = "green") text(x_range2[1], data$successes / data$n, expression(theta), adj = c(0, 0), cex = 1.25) } } plot_newtraph <- function(history, i){ th.vec <- history$theta LL <- history$ell LLp <- history$ellp LLp2 <- history$ellp2 fun <- history$fun data <- history$data likelihood <- history$likelihood if(i > length(th.vec)){ i <- length(th.vec) converged <- TRUE }else{ converged <- FALSE } if(likelihood == "pois"){ x <- seq(1, 5, length.out = 100) x_adj <- .25 }else if(likelihood == "binom"){ x <- seq(.05, .95, length.out = 100) x_adj <- .05 } args <- data args$theta <- x out <- do.call(fun, args) loglik <- out$ll grad <- out$grad if(likelihood == "pois"){ x_range <- range(x) y_range <- range(c(loglik, grad)) }else if(likelihood == "binom"){ x_range <- c(0, 1) y_range <- range(c(loglik, grad)) } par(mfrow = c(2, 1)) # general plot plot(NA, type = "n", xlim = x_range, ylim = y_range, las = 1, main = "The Newton-Raphson-Algorithm", ylab = "Log Likelihood", xlab = expression(theta)) lines(x, loglik, lwd = 2) lines(x, grad, col = "red") abline(h = 0, lty = 2) text(x_range[1], y_range[1], paste("Iter: ", i, if(converged) "(Converged)"), adj = c(0, 0), cex = 1.25) text(x_range[2], y_range[1], bquote(hat(theta)*"="*.(format(round(th.vec[i], 3), nsmall = 3))), adj = c(1, 0), cex = 1.25) legend( "topright", legend = c("f(x)", "f'(x)", "f''(x)"), col = c("black", "red", "green"), lwd = 2, bty = "n" ) if(likelihood == "pois"){ abline(v = data$mean, lty = 2, lwd = 2, col = "green") }else if(likelihood == "binom"){ abline(v = data$successes / data$n, lty = 2, lwd = 2, col = "green") } # prediction of the next movement if(i + 1 <= length(th.vec)){ arr_adj <- if(th.vec[i] > th.vec[i+1]) .01 else -.01 points(th.vec[i + 1], LLp[i + 1], col = "grey", pch = 1, lwd = 3) arrows(th.vec[i] + arr_adj/2, y_range[1], th.vec[i+1] - arr_adj/2, length = .10) } # iterations points(th.vec[i], LLp[i], col = "blue", pch = 20, lwd = 5) lines(c(th.vec[i] - x_adj, th.vec[i] + x_adj), (LLp[i]-(LLp2[i]*th.vec[i])) + LLp2[i]*c(th.vec[i] - x_adj, th.vec[i] + x_adj), col = "green", lwd = 2) # Plot the iteration history: x_range2 <- if(i == 1) c(1, 2) else c(1,i) plot(1:i, th.vec[1:i], xlim = x_range2, ylim = x_range, lwd = 2, type = "b", las = 1, main = "Iteration", ylab = "Parameter Estimate", xlab = "") if(likelihood == "pois"){ abline(h = data$mean, lty = 2, lwd = 2, col = "green") text(x_range2[1], data$mean, expression(theta), adj = c(0, 0), cex = 1.25) }else if(likelihood == "binom"){ abline(h = data$successes / data$n, lty = 2, lwd = 2, col = "green") text(x_range2[1], data$successes / data$n, expression(theta), adj = c(0, 0), cex = 1.25) } } # making the animation work # https://stackoverflow.com/questions/43337147/r-shiny-adding-to-plot-via-a-loop/43344162#43344162 invalidateLaterNew <- function (millis, session = getDefaultReactiveDomain(), update = TRUE) { if(update){ ctx <- shiny:::.getReactiveEnvironment()$currentContext() shiny:::timerCallbacks$schedule(millis, function() { if (!is.null(session) && session$isClosed()) { return(invisible()) } ctx$invalidate() }) invisible() } }

#### intro_tab #### intro_tab <- tabItem( tabName = "intro_tab", h3("Introduction to local search algorithms"), p("This app is designed to help you understand the Steepest Ascent and Newton-Raphson local search algorithms used in statistical estimation. For simplicity, this app focusses on one-dimensional functions with one local maximum, namely the Poisson and Binomial functions"), h4("Local versus Global algorithms"), p("Optimization algorithms are divided into two categories; local and global algorithms. A local optimization algorithm is used when you know that the function contains a single optimum (see Figure A). A global optimization algorithm is used when there is little known about the structure of the function because it deals with multi-model distribtuions better (see Figure B). You may wonder, if global algorithms are more versatile, why use a local optimization algorithm at all? Besides that local search algorithms are computationally more straightforward, there is a guarantee that it will find the local optima when the assumptions are met, which is not the case for global search algorithms"), plotOutput("intro", height = "450px", width = "80%"), h4("Maximum likelihood estimation"), p("We use maximum likelihood estimation (MLE) to estimate the parameters of a statistical model given observations. MLE tries to find the parameter value that maximizes the likelihood of making the observations given the parameters. As in many statistical applications, we will maximize the log-likelihood; as finding the maximum value of the likelihood is the same as finding the maximum value of the log-likelihood (given that it is a monotonic transformation), but it is mathematically simpler because it turns multiplication into sums.") ) #### steep_tab #### accent_tab <- tabItem( tabName = "accent_tab", h3("Steepest Ascent"), withMathJax(), # p("Steepest Ascent is a local search algorithm, slowly moving to the local maxima"), p("The steepest ascent algorithm is an iterative optimization algorithm to find the maximum of the likelihood function. This algorithm consists of taking repeated steps in the direction of the first derivative. Given a point \$\\theta^{t-1}\$ the next step of the algorithm \$\\theta^{t}\$ is obtained by the following updating rule:"), p("$$\\theta^{t}=\\theta^{t-1} - \\alpha f'(\\theta^{t-1}) $$"), p("where \$\\alpha\$ a smaller constant chooses to scale the first derivative."), p("In the ShinyApp the user can select one of the two likelihood functions, namely, Poisson and binomial. The three sliders on the left allow to set the starting point \$\\theta^{t-1}\$ and the parameters of the function (i.e., mean and the number of observations for the Poisson and number of observations and successes for the binomial). The alpha is set to 0.015. The convergence criterion is the termination criterion for the algorithm. The ", strong("Reset") ," button allows restarting the visualization after the change of any of these parameters. "), p("The ",strong("Next step")," button allows the user to visualize each step of the procedure. In the upper figure, the black line is the log-likelihood. The current point \$\\theta^{t-1}\$ is represented by the ", span("blue dot",style= "color:blue"),", the continuous ", span("red segment",style= "color:red")," is the local approximation, the ", span("grey dot",style= "color:grey")," is the estimation of \$\\theta\$ at the next step and finally, the vertical dashed lines indicate the maximum of the likelihood function. In the lower figure, on the x-axis, there are iterations and the black circles are the estimation of \$\\theta\$ at that step. The green horizontal line is the true value of \$\\theta\$ Finally, the", strong("Finish") ," button allows the algorithm to continue until the algorithm converges or the maximum number of iterations is reached."), plotOutput("accent", height = "600px", width = "80%") ) #### new_raph_tab #### newtraph_tab <- tabItem( tabName = "newtraph_tab", h3("Newton-Raphson Algorithm"), withMathJax(), #p("This algorithm is much faster than Steepest Ascent"), p("In the Newton-Raphson algorithm, the likelihood function is locally approximated as a second-order Taylor expansion, then the algorithm computes the maximum of this expansion. Given a point \$\\theta^{t-1}\$ the maximum of the second-order Taylor expansion of the log-likelihood is"), p("$$\\frac{f'(\\theta^{t-1})}{f''(\\theta^{t-1})}.$$"), p("The algorithm proceeds in the direction of the maximum and the point \$\\theta^{t}\$ is obtained by the following updating rule"), p("$$\\theta^{t}=\\theta^{t-1} - \\frac{f'(\\theta^{t-1})}{f''(\\theta^{t-1})}.$$"), p("In the ShinyApp the user can select one of the two likelihood functions, namely, Poisson and binomial. The three sliders on the left allow to set the starting point \$\\theta^{t-1}\$ and the parameters of the function (i.e., mean and the number of observations for the Poisson and number of observations and successes for the binomial). The convergence criterion is the termination criterion for the algorithm. The ", strong("Reset") ," button allows to restart the visualization after the change of any of these parameters. "), p("The ",strong("Next step")," button allows the user to visualize each step of the procedure. In the upper figure, the black line is the log-likelihood, and the ", span("red line",style= "color:red")," is the first derivative of the log-likelihood."), p("The current point \$\\theta^{t-1}\$ is represented by the ", span("blue dot",style= "color:blue"),", the continuous ", span("green segment",style= "color:green")," is the local approximation of the first derivative, the ", span("grey dot",style= "color:grey")," is the estimation of \$\\theta\$ at the next step and finally, the vertical dashed lines indicate the maximum of the likelihood function. In the lower figure, on the x-axis, there are iterations and the black circles are the estimation of \$\\theta\$ at that step. The green horizontal line is the true value of \$\\theta\$ "), p("Finally, the ", strong("Finish") ," button allows the algorithm to continue until the algorithm converges or the maximum number of iterations is reached."), plotOutput("newtraph", height = "600px", width = "80%") )