A Numerical Study on Efficiency and Robustness of
Some Conjugate Gradient Algorithms for
Large-scale Unconstrained Optimization

Neculai ANDREI^1,2
¹Center for Advanced Modeling and Optimization,
I C I Bucharest
(National Institute for R & D in Informatics)
8-10 Averescu Blvd.
011455 Bucharest 1, Romania
nandrei@ici.ro
²Academy of Romanian Scientists
54, Splaiul Independenţei, Bucharest 5, Romania

Abstract: A numerical evaluation and comparisons using performance profiles of some representative conjugate gradient algorithms for solving a large variety of large-scale unconstrained optimization problems are carried on. In this intensive numerical study we selected eight known conjugate gradient algorithms: Hestenes and Stiefel (HS), Polak-Ribière-Polyak (PRP), CONMIN, ASCALCG, CG-DESCENT, AHYBRIDM, THREECG and DESCON. These algorithms are different in many respects. However, they have a lot of concepts in common, which give the numerical comparisons sense and confident expectations. The initial search direction in all algorithms is the negative gradient computed in the initial point and the step length is computed by the Wolfe line search conditions. Excepting CONMIN and CG-DESCENT, all the algorithms from this numerical study implement an acceleration scheme which modifies the step length in a multiplicative manner to improve the reduction of the functions values along the iterations. The numerical study is based on a set of 800 artificially large-scale unconstrained optimization test functions of different complexity and with different structures of their Hessian matrix. A detailed numerical evaluation based on performance profiles is applied to the comparisons of these algorithms showing that all of them are able to solve difficult large-scale unconstrained optimization problems. However, comparisons using only artificially test problems are weak and dependent on arbitrary choices concerning the stopping criteria of the algorithms and on decision of whether an algorithm found a solution or not. To get definitive conclusions using this sort of comparisons based only on artificially test problems is an illusion. However, using some real unconstrained optimization applications we can get a more confident conclusion about the efficiency and robustness of optimization algorithms considered in this numerical study.

Keywords: Large scale unconstrained optimization. Conjugate gradient algorithms. Numerical comparisons. Benchmarking, Performance profiles, Data profiles, MINPACK-2 applications.

>>Full text
CITE THIS PAPER AS:
Neculai ANDREI, A Numerical Study on Efficiency and Robustness of Some Conjugate Gradient Algorithms for Large-scale Unconstrained Optimization, Studies in Informatics and Control, ISSN 1220-1766, vol. 22 (4), pp. 259-284, 2013. https://doi.org/10.24846/v22i4y201302

Introduction

Conjugate gradient method represents an important computational innovation for continuously differentiable large scale unconstrained optimization with strong local and global convergence properties and very modest and predictable memory requirements. This family of algorithms includes a lot of variants and extensions with important convergence properties and numerical efficiency. Different from the Newton or quasi-Newton methods (including here the limited-memory quasi-Newton methods), the descent condition plays a crucial role in convergence of the conjugate gradient algorithms. As a characteristic the searching directions in conjugate gradient algorithms are selected in such a way that, when applied to minimize a strongly quadratic convex function, two successive directions are conjugate, subject to the Hessian of the quadratic function. Therefore, to minimize a convex quadratic function in a subspace spanned by a set of mutually conjugate directions is equivalent to minimize this function along each conjugate direction in turn. This is a very good and productive idea, leading us to many variants of conjugate gradient algorithms, but the performance of these algorithms is strongly dependent on the accuracy of the line search.

For solving the nonlinear unconstrained optimization problem:

(1)

where is a continuously differentiable function, bounded from below, starting from an initial guess x₀, a nonlinear conjugate gradient algorithm generates a sequence of points {x_k}, according to the following recurrence formula

, (2)

where α_k is the step length, usually obtained by Wolfe line search [51, 52],

(3a)

(3b)

with and the directions d_k are computed as:

, (4)

Here β_k is a scalar known as the conjugate gradient parameter, and . In the following . Even that the conjugate gradient algorithms correspond to different choices for the parameter β_k, often they are designed in a specific manner in such a way that the search direction d_k satisfies the sufficient descent condition for some arbitrary positive constant c>0. In these algorithms, the conjugacy condition, or the modified conjugacy condition, is or where is a scalar. When applied to general nonlinear functions, often, the searching directions in conjugate gradient algorithms are computed using some formulas which do not satisfy the conjugacy condition. However, by extension we call they conjugate gradient algorithms.

The elaboration of nonlinear optimization software using nonlinear conjugate gradient algorithms is a very active field of research. On one hand, many conjugate gradient algorithms have achieved a maturity stage and are frequently used for solving a wide range of real applied problems in a large variety of areas. On the other hand, plenty of conjugate gradient algorithms are continuously elaborated and therefore their efficiency and robustness need to be established. The development of different versions of nonlinear conjugate gradient algorithms can be presented as follows. Classical conjugate gradient algorithms: Hestenes and Stiefel [33], Fletcher and Reeves [26], Daniel [21], Polak and Ribière [42] and Polyak [43], conjugate descent by Fletcher [27], Liu and Storey [36] and Day and Yuan [22]. Hybrid conjugate gradient algorithms using projections: hybrid Dai-Yuan [23], Gilbert and Nocedal [28], Hu and Storey [34], Touati-Ahmed and Storey [50], hybrid Liu and Storey [36], and hybrid conjugate gradient algorithms using the concept of convex combination of classical schemes: convex combination of Hestenes-Stiefel and Dai-Yuan with Newton direction [3, 4, 8], convex combination of Polak-Ribière-Polyak and Dai-Yuan with conjugacy condition [7]. Scaled BFGS preconditioned conjugate gradient algorithms by Shanno [47, 48], Birgin and Martínez [18] and Andrei [2, 9]. Conjugate gradient algorithms with guaranteed descent and guaranteed conjugacy conditions by Hager and Zhang [32] and Andrei [12]. Three-term conjugate gradient algorithms [10, 11].

The purpose of this paper is to study the performance of some conjugate gradient algorithms in a controlled numerical environment to highlight the main differences among them and to indicate the developer of algorithms and practitioner the best algorithms and the types of problems that are well suited to each algorithm. Therefore, we are interested to see the efficiency and robustness of some conjugate gradient algorithms for solving a large class of large-scale unconstrained optimization problems. For this purposes from the above classes of algorithms we selected a number of eight conjugate gradient algorithms, which seem to be the most representative: Hestenes and Stiefel, (HS) [33], Polak-Ribière-Polyak (PRP) [42, 43], CONMIN [47-49], ASCALCG [2, 9], CG-DESCENT [32], AHYBRIDM [3, 8], THREECG [10] and DESCON [12]. For a numerical evaluation of these algorithms the performance profiles [24] or the data profiles [37] are now standards for presenting efficiency and robustness as well as the numerical comparisons. Besides, the collection of unconstrained optimization test problems used in evaluation may have a great influence on the conclusions of the numerical study of these algorithms. In order to see the performances of these algorithms we assembled a collection of 800 large-scale unconstrained optimization test problems of a large variety and of different complexity and different structures of their Hessian matrix. The comparisons among algorithms are presented using the performance profiles. Besides, a number of five applications from MINPACK-2 collection [14] have been used to see the performances of the conjugate gradient algorithms considered in this numerical study. Our study is limited by this collection of test problems we used. However, we have tried to consider a test set of considerable diversity. Some of the artificially test problems are quadratic or nearly quadratic, while others are cubic or cubic perturbed with quadratic and linear. Some are combinations of quadratics including exp, sin or cos functions. There are varying degrees of nonlinearity and ill-conditioning. The functions are expressed in extended or generalized form as a sum or difference of element functions [31]. It is worth saying that the Hessian of the functions from this collection has different structure: diagonal, block-diagonal, tri-diagonal or penta-diagonal, bounded-diagonal, bounded-block-diagonal, etc. or full Hessian. The numerical conclusions concerning the efficiency and robustness of algorithms are based on this sample of functions, but we hope that they may be more generally useful for both the developer of algorithms for unconstrained optimization or practitioners faced with solving practical applications.

All these eight Fortran codes, which implement the conjugate gradient algorithms considered in this numerical study, are not new. The oldest is CONMIN, the 1978 version written by Shanno and Phua [49]. CG-DESCENT is version 1.4, (2005) written by Hager and Zhang [32]. The most recent are ASCALCG (2010) [9], AHYBRIDM (2010) [8], THREECG (2013) [10] and DESCON (2013) [12], all written by Andrei. In our numerical experiments we do not try to tune the algorithms to a particular set of test problems, and a single fixed version of each algorithm with fixed parameters was used.

As a general conclusion of this numerical study we can indicate that the conjugate gradient software analyzed in this numerical study is able to solve a very large diversity of unconstrained optimization problems of different complexity and with different structures of the Hessian matrix. At least for this set of artificially test problems, concerning the efficiency, CG-DESCENT is slightly more efficient, followed by DESCON and followed by THREECG. Subject to robustness by far DESCON is the most robust, followed by THREECG and followed by ASCALCG. It seems that the conjugate gradient algorithms implementing both the sufficient descent condition and the conjugacy condition are the best. However, this is not a definitive conclusion.

In front of us there are an infinite number of artificially unconstrained optimization test problems and it is always possible to assemble a set of problems for which the efficiency and robustness of the considered algorithms are completely different. However, in order to have a true conclusion at all we compared the above algorithms on five applications from MINPACK-2 collection with 10⁶ variables. In this case DESCON proved to be the fastest and the most reliable algorithm.

The structure of the paper is as follows. In section 2 the main characteristics of unconstrained optimization test problems considered in this numerical study are presented. A detailed presentation of the comparison framework including the performance profiles and the data profiles, their advantages and weakness, and the efficiency and the robustness of an algorithm is given in section 3. In section 4 we present the conjugate gradient algorithms considered in this numerical study insisting on their definition and convergence properties. Section 5 is devoted to present the numerical experiments and comparisons using the performance profiles. In section 6 some discussions are given including some comparisons among algorithms for solving problems with different structures of the Hessian, the weakness of the numerical experiments and comparisons using artificially test problems and some results and comparisons for solving five MINPACK-2 applications. Conclusions are drawn in the last section.

REFERENCES

ANDREI, N., An Acceleration of Gradient Descent Algorithm with Backtracking for Unconstrained Optimization. Numerical Algorithms, vol. 42, 2006, pp. 63-73.
ANDREI, N., Scaled Conjugate Gradient Algorithms for Unconstrained Optimization. Computational Optimization and Applications, vol. 38, 2007, pp. 401-416.
ANDREI, N., A Hybrid Conjugate Gradient Algorithm for Unconstrained Optimization as a Convex Combination of Hestenes-Stiefel and Dai-Yuan. Studies in Informatics and Control, vol. 17, no. 1, March 2008, pp. 55-70.
ANDREI, N., Another Hybrid Conjugate Gradient Algorithm for Unconstrained Optimization. Numerical Algorithms, vol. 47, 2008, pp. 143-156.
ANDREI, N., An Unconstrained Optimization Test Functions Collection. Advanced Modeling and Optimization, vol. 10, 2008, pp. 147-161.
ANDREI, N., Acceleration of Conjugate Gradient Algorithms for Unconstrained Optimization. Applied Mathematics and Computation, vol. 213, 2009, pp. 361-369.
ANDREI, N., A Hybrid Conjugate Gradient Algorithm for Unconstrained Optimization, Journal of Optimization Theory and Applications, vol. 141, 2009, pp. 249-264.
ANDREI, N., Accelerated Hybrid Conjugate Gradient Algorithm with Modified Secant Condition for Unconstrained Optimization, Numerical Algorithms, vol. 54, 2010, pp. 23-46.
ANDREI, N., Accelerated Scaled Memoryless BFGS Preconditioned Conjugate Gradient Algorithm for Unconstrained Optimization. European Journal of Operational Research, vol. 204, 2010, pp. 410-420.
ANDREI, N., A Simple Three-term Conjugate Gradient Algorithm for Unconstrained Optimization. Journal of Computational and Applied Mathematics, vol. 241, 2013, pp. 19-29.
ANDREI, N., On Three-term Conjugate Gradient Algorithms for Unconstrained Optimization. Applied Mathematics and Computation, vol. 219, 2013, pp. 6316-6327.
ANDREI, N., Another Conjugate Gradient Algorithm with Guaranteed Descent and Conjugacy Conditions for Large-scale Unconstrained Optimization. Journal of Optimization Theory and Applications, DOI: 10.1007/s10957-013-0285-9
ARIS, R., The Mathematical Theory of Diffusion and Reaction in Permeable Catalysts. Oxford, 1975.
AVERICK, B. M., R. G. CARTER, J. J. MORÉ, G. L. XUE, The MINPACK-2 Test Problem Collection. Mathematics and Computer Science Division, Argonne National Laboratory, Preprint MCS-P153-0692, June 1992.
BABAIE-KAFAKI, S., A Note on the Global Convergence Theorem of the Scaled Conjugate Gradient Algorithm Proposed by Andrei, Computational Optimization and Applications, vol. 52, 2012, pp. 409-414.
BEALE, E. M. L., A Derivation of Conjugate Gradients. In F. A. Lootsma (Ed.), Numerical Methods for Nonlinear Optimization, Academic Press, London, 1972, pp. 39-43.
BEBERNES, J., D. EBERLY, Mathematical Problems from Combustion Theory. Applied Mathematical Sciences, Springer-Verlag, vol. 83, 1989.
BIRGIN, E., J. M. MARTÍNEZ, A Spectral Conjugate Gradient Method for Unconstrained Optimization. Applied Mathematics & Optimization, vol. 43, 2001, pp. 117-128.
BONGARTZ, I., A. R. CONN, N. I. M. GOULD, Ph. L. TOINT, CUTE: Constrained and Unconstrained Testing Environment. ACM Transactions on Mathematical Software, vol. 21, 1995, pp. 123-160.
CIMATTI, G., On a Problem of the Theory of Lubrication Governed by a Variational Inequality. Applied Mathematics & Optimization, vol. 3, 1977, pp. 227-242.
DANIEL, J. W., The Conjugate Gradient Method for Linear and Nonlinear Operator Equations. SIAM Journal on Numerical Analysis, vol. 4, 1967, pp. 10-26.
DAI, Y. H., Y. YUAN, A Nonlinear Conjugate Gradient Method with a Strong Global Convergence Property, SIAM Journal on Optimization, vol. 10, 1999, pp. 177-182.
DAI, Y. H., Y. YUAN, An Efficient Hybrid Conjugate Gradient Method for Unconstrained Optimization. Annals of Operation Research, vol. 103, 2001, pp. 33-47.
DOLAN, E. D., J. J. MORÉ, Benchmarking Optimization Software with Performance Profiles, Mathematical Programming, vol. 91, 2002, pp. 201-213.
FILIP. F. G., Sisteme suport pentru decizii. Editura Tehnică, Bucharest, 2004. (Editia II completată şi revizuită, 2007.)
FLETCHER, R., C. M. REEVES, Function Minimization by Conjugate Gradients, The Computer Journal, vol. 7, 1964, pp. 149-154.
FLETCHER, R., Practical Methods of Optimization, vol. 1: Unconstrained Optimization, John Wiley & Sons, New York, 1987.
GILBERT, J. C., J. NOCEDAL, Global Convergence Properties of Conjugate Gradient Methods for Optimization, SIAM Journal on Optimization, vol. 2, 1992, pp. 21-42.
GLOWINSKI, R., Numerical Methods for Nonlinear Variational Problems. Springer-Verlag, Berlin, 1984.
GOODMAN, J., R. KOHN, L. REYNA, Numerical Study of a Relaxed Variational Problem from Optimal Design. Computer Methods in Applied Mechanics and Engineering, vol. 57, 1986, pp.107-127.
GRIEWANK, A., Ph. L. TOINT, Partitioned Variable Metric Update for Large Structured Optimization Problems. Numerical Mathematics, vol. 39, 1982, pp. 119-137.
HAGER, W. W., H. ZHANG, A New Conjugate Gradient Method with Guaranteed Descent and an Efficient Line Search, SIAM Journal on Optimization, vol. 16, 2005, pp. 170-192.
HESTENES, M. R., E. L. STIEFEL, Methods of Conjugate Gradients for Solving Linear Systems. Journal of Research of the National Bureau Standards, vol. 49, 1952, pp. 409-436.
HU, Y. F., STOREY, C., Global Convergence Result for Conjugate Gradient Methods. Journal of Optimization Theory and Applications, vol. 71, 1991, pp. 399-405.
LI, G., C. TANG, Z. WEI, New Conjugacy Condition and Related New Conjugate Gradient Methods for Unconstrained Optimization, Journal of Computational Applied Mathematics, vol. 202, 2007, pp. 523-539.
LIU, Y., C. STOREY, Efficient Generalized Conjugate Gradient Algorithms, Part 1: Theory. Journal of Optimization Theory and Applications, vol. 69, 1991, pp. 129-137.
MORÉ, J. J., S. M. WILD, Benchmarking Derivative-free Optimization Algorithms. SIAM Journal on Optimization, vol. 20, 2009, pp. 172-191.
NASH, S. G., J. NOCEDAL, A Numerical Study of the Limited Memory BFGS Method and the Truncated-Newton Method for Large Scale Optimization, SIAM Journal on Optimization, vol. 1, 1991, pp. 358-372.
NOETHER, E., Invariante Variationsprobleme. Nachrichten der Könighche Gesellschaft der Wissenschaften zu Göttingen, Math-phys. Klasse, 1918, pp. 235-257.
NITSCHE, J. C. C., Lectures on Minimal Surfaces. vol. 1, Cambridge University Press, 1989.
PERRY, A., A Class of Conjugate Gradient Algorithms with a Two Step Variable Metric Memory. Discussion Paper No. 269, Center for Mathematical Studies in Economics and Management Science, Northwestern University, 1977.
POLAK, E., G. RIBIÈRE, Note sur la convergence de directions conjuguée, Revue française d’informatique et de recherche opérationnelle, 3e Année, vol. 16, 1969, pp. 35-43.
POLYAK, B. T., The Conjugate Gradient Method in Extreme Problems. USSR Computational Mathematics and Mathematical Physics, vol. 9, 1969, pp. 94-112.
POWELL, M. J. D., Nonconvex Minimization Calculations and the Conjugate Gradient Method. Numerical Analysis (Dundee, 1983), Lecture Notes in Mathematics, Vol. 1066, Springer, Berlin, 1984, pp. 122-141.
POWELL, M. J. D., Restart procedures of the Conjugate Gradient Method. Mathematical Programming, vol. 2, 1977, pp. 241-254.
RĂDULESCU, C. Z., M. RĂDULESCU, A Decision Support Tool Based on a Portfolio Selection Model for Crop Planning under Risk, Studies in Informatics and Control, ISSN 1220-1766, vol. 21 (4), 2012, pp. 377-382.
SHANNO, D. F., On the Convergence of a New Conjugate Gradient Algorithm. SIAM Journal on Numerical Analysis, vol. 15, 1978, pp. 1247-1257.
SHANNO, D. F., Conjugate Gradient Methods with Inexact Searches. Mathematics of Operations Research, vol. 3, No. 3, 1978, pp. 244-256.
SHANNO, D. F., K. H. PHUA, Algorithm 500, Minimization of unconstrained multivariate functions, ACM Transcriptions on Mathematical Software, vol. 2, 1976, pp. 87-94.
TOUATI-AHMED, D., C. STOREY, Efficient Hybrid Conjugate Gradient Techniques, Journal of Optimization Theory and Applications, vol. 64, 1990, pp. 379-397.
WOLFE, P., Convergent Conditions for Ascent Methods, SIAM Review, vol. 11, 1968, pp. 226-235.
WOLFE, P., Convergence Conditions for Ascent Methods, (II): Some Corrections. SIAM Review, vol. 13, 1971, pp. 185-188.