**A decision tree for function minimization**

What R calls “optimization” is more generally known as function minimization. The tools in the **stats **package function *optim() *are all essentially function mimizers, as are those in the package **optimrx **found at https://r-forge.r-project.org/projects/optimizer/. **optimrx **tries to make it easier for users to call quite a few methods without having to learn many different sets of command syntax. It uses the same syntax as *optim() , *even down to parameter scaling, by changing the

**method=”name”**argument.

There are, however, a number of methods. Which is best for your purpose? This posting presents a form of decision tree to try to help. It isn’t the last word, and you should not take it as a set of hard rules. But it does reflect a lot of experience and I have tried to give the reasoning behind some of the suggestions.

We assume there is a function we wish to minimize or maximize. Call it *fn(x) *where *x* is parameter vector of length *n*; we will let e*data* be exogenous data — those numbers or factors that are needed to compute the function but which we will not be adjusting. So we should really write *fn(x, edata)*, but will often be lazy, just as R allows us to be with exogenous data.

Function *fn(x)* is assumed:

- computable
- to have an initial starting set of parameters
*x0*that is feasible (which I won’t define just now) - to have more or less 1 local minimum (though we often try to solve problems with multiple minima present if these minima are believed to be sufficiently separated).

Note: if *fn()* is a sum of squares, then packages **nlmrt** or **minpack.lm **are preferred. The **stats** function *nls()* may also be useful. I plan to write more about nonlinear least squares in a later post. If you are desperate, then get in touch with me!

Let us proceed in a stepwise fashion.

**Step 0: Presets.**

If we are maximizing , create *fn(x, edata) <- -fn.orig(x, edata) . *That is, we

minimize the negative of the function we want to maximize.

Let us set some logical variables to define our problem:

*haveg*= TRUE if there are analytic gradients of*fn(x)*available*bounds*= TRUE if there are lower and/or upper bounds on the parameters*x**masks*= TRUE if some parameters are fixed — currently only packages**Rvmmin**and**Rcgmin**support masks well.

* Step 1: Check function *(This is always a REALLY good idea.)

- compute
*fn(x, edata)*for at least*x0*and verify the value is correct *haveg*TRUE: compute*g(x0)*and compare it with*grad(fn, x0, edata)*from package**numDeriv**

**Step2: Running a method**

*bounds*TRUE,*haveg*FALSE: try*nmkb()*from package**dfOptim**, but note that the transfinite function approach to bounds in this method cannot have*x0*— in fact any initial parameter — on a bound.*bobyqa()*from package**minqa**may be much more efficient, but can behave poorly.*hjkb()*from package**dfOptim**is very easy to understand, but sometimes stalls and generally inefficient.*haveg*TRUE:*n < 50*(i.e., small)*:*packages**Rvmmin**or**nlminb***n large*: packages**Rtnmin**,**Rcgmin**, or method**L-BFGS-B**(from*optim()*in**stats**). Package**lbfgsb3**is an implementation of the 2011 FORTRAN version of the same algorithm and in theory may be preferred, but*L-BFGS-B()*called via*optim()*or*optimr()*is likely more efficient. The three approaches mentioned can also be applied to small*n*problems, and sometimes are as efficient as the methods recommended for small*n*.**Rtnmin**and LBFGS variants are in the same general class of methods, but have many differences in implementation approach. Generally I have found that the relatively newer**Rtnmin**may be faster, but less reliable than,**Rcgmin**though the former seems to work better when the function has more “nonlinearity” (which for now I will not define).**Rvmmin**is, by design, very aggressive in trying to find a minimum, so will “try too hard” in many cases. As such it should not be criticised for slowness when it is intended to favour reliability over speed.

*bounds*FALSE:*haveg*FALSE: Methods*nmk()*from**dfOptim**, or*uobyqa()*from**minqa**are indicated (but*bobyqa()*from**minqa**may be better as it is more up-to-date than*uobyqa();*we can use very loose values for the bounds on*x*)*haveg*TRUE:*n*small: packages**Rvmmin**or**ucminf**, or function*nlm()*from**stats.***n*large: method**L-BFGS-B**or packages**Rcgmin**and**Rtnmin**

In the above I have not mentioned function *spg()* from package **BB**. My experience is that occasionally this can be very efficient, but more often slower than other methods. However it has a particular advantage of allowing the application of quite complicated constraints.