A decision tree for function minimization

What R calls “optimization” is more generally known as function minimization. The tools in the stats package function optim() are all essentially function mimizers, as are those in the package optimrx found at https://r-forge.r-project.org/projects/optimizer/. optimrx tries to make it easier for users to call quite a few methods without having to learn many different sets of command syntax. It uses the same syntax as optim()even down to parameter scaling, by changing the method=”name” argument.

There are, however, a number of methods. Which is best for your purpose? This posting presents a form of decision tree to try to help. It isn’t the last word, and you should not take it as a set of hard rules. But it does reflect a lot of experience and I have tried to give the reasoning behind some of the suggestions.

We assume there is a function we wish to minimize or maximize. Call it fn(x) where x is parameter vector of length n; we will let edata be exogenous data — those numbers or factors that are needed to compute the function but which we will not be adjusting. So we should really write fn(x, edata), but will often be lazy, just as R allows us to be with exogenous data.

Function fn(x) is assumed:

  • computable
  • to have an initial starting set of parameters x0 that is feasible (which I won’t define just now)
  • to have more or less 1 local minimum (though we often try to solve problems with  multiple minima present if these minima are believed to be sufficiently separated).

Note: if fn() is a sum of squares, then packages nlmrt or minpack.lm are preferred. The stats function nls() may also be useful. I plan to write more about nonlinear least squares in a later post. If you are desperate, then get in touch with me!

Let us proceed in a stepwise fashion.

Step 0: Presets.
If we are maximizing , create   fn(x, edata) <- -fn.orig(x, edata) .    That is, we
minimize the negative of the function we want to maximize.

Let us set some logical variables to define our problem:

  • haveg = TRUE if there are analytic gradients of fn(x) available
  • bounds = TRUE if there are lower and/or upper bounds on the parameters x
  • masks = TRUE if some parameters are fixed — currently only packages Rvmmin  and Rcgmin support masks well.

Step 1: Check function  (This is always a REALLY good idea.)

  • compute fn(x, edata) for at least x0 and verify the value is correct
  • haveg TRUE: compute g(x0) and compare it with grad(fn, x0, edata) from package numDeriv

Step2: Running a method

  • bounds TRUE,
    • haveg FALSE:  try nmkb() from package dfOptim, but note that the transfinite function approach to bounds in this method cannot have x0 — in fact any initial parameter — on a bound.  bobyqa() from package minqa may be much more efficient, but can behave poorly. hjkb() from package  dfOptim is very easy to understand, but sometimes stalls and generally inefficient.
    • haveg TRUE:
      • n < 50 (i.e., small)packages Rvmmin or nlminb
      • n large: packages Rtnmin, Rcgmin, or method L-BFGS-B (from optim() in stats). Package lbfgsb3 is an implementation of the 2011 FORTRAN version of the same algorithm and in theory may be preferred, but L-BFGS-B() called via optim() or optimr() is likely more efficient. The three approaches mentioned can also be applied to small problems, and sometimes are as efficient as the methods recommended for small nRtnmin and LBFGS variants are in the same general class of methods, but have many differences in implementation approach. Generally I have found that the relatively newer Rtnmin may be faster, but less reliable than,  Rcgmin though the former seems to work better when the function has more “nonlinearity” (which for now I will not define).  Rvmmin is, by design, very aggressive in trying to find a minimum, so will “try too hard” in many cases. As such it should not be criticised for slowness when it is intended to favour reliability over speed.
  • bounds FALSE:
    • haveg FALSE: Methods nmk() from dfOptim,  or uobyqa() from minqa are indicated (but bobyqa() from minqa may be better as it is more up-to-date than uobyqa(); we can use very loose values for the bounds on x)
    • haveg TRUE:
      • n small: packages Rvmmin or ucminf , or function nlm() from stats.
      • n large: method L-BFGS-B or packages Rcgmin and Rtnmin

In the above I have not mentioned function spg() from package BB. My experience is that occasionally this can be very efficient, but more often slower than other methods. However it has a particular advantage of allowing the application of quite complicated constraints.




2 thoughts on “Choosing which method to use for “optimizing” functions

  1. Hi John,
    I have the experience that nlminb works pretty well even if one does not have gradients. Can you provide some explanation for why those functions that can work with a gradient, but do not need it should not be used if one does not have the gradient?


    1. That’s actually worth a blog post itself, and I’ll try to do one sometime. The general approach is that gradient methods supply an approximation when there is no gradient provided. With a simple forward approximation ( (f(x+delta) – f(x))/delta ) we get a secant method for minimization. This often does even better than analytic gradient in early steps. But when we get “near” an minimum, we really want the precise gradient to know we are “finished”. For a large percentage of problems, the approximation is fine. But there is a real fraction of problems for which we get into trouble and either stop early or just plain get things wrong. Unfortunately, people (and the one I see in the mirror!) want to get the answers and don’t investigate the failures enough. That’s work that needs to be done to give better support to this answer.

      So the argument comes down to “Are we there yet?”. You will find the non-gradient Nelder-Mead family methods do very well getting close to a minimum, but spend a lot of effort deciding they are done. That is where good gradient calculations are valuable.

      FWIW, I’m working on an optimization test structure to try to record and analyze cases. It’s a lot of work, and not stuff that generally gets a lot of applause. However, I’ll keep at it steadily.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s