Remote Access Mathematics of Computation
Green Open Access

Mathematics of Computation

ISSN 1088-6842(online) ISSN 0025-5718(print)

 
 

 

Penalty methods with stochastic approximation for stochastic nonlinear programming


Authors: Xiao Wang, Shiqian Ma and Ya-xiang Yuan
Journal: Math. Comp. 86 (2017), 1793-1820
MSC (2010): Primary 90C15, 90C30, 62L20, 90C60
DOI: https://doi.org/10.1090/mcom/3178
Published electronically: October 12, 2016
Full-text PDF

Abstract | References | Similar Articles | Additional Information

Abstract: In this paper, we propose a class of penalty methods with stochastic approximation for solving stochastic nonlinear programming problems. We assume that only noisy gradients or function values of the objective function are available via calls to a stochastic first-order or zeroth-order oracle. In each iteration of the proposed methods, we minimize an exact penalty function which is nonsmooth and nonconvex with only stochastic first-order or zeroth-order information available. Stochastic approximation algorithms are presented for solving this particular subproblem. The worst-case complexity of calls to the stochastic first-order (or zeroth-order) oracle for the proposed penalty methods for obtaining an $ \epsilon $-stochastic critical point is analyzed.


References [Enhancements On Off] (What's this?)

  • [1] Fabian Bastin, Cinzia Cirillo, and Philippe L. Toint, Convergence theory for nonconvex stochastic programming with an application to mixed logit, Math. Program. 108 (2006), no. 2-3, Ser. B, 207-234. MR 2238700, https://doi.org/10.1007/s10107-006-0708-6
  • [2] Wei Bian and Xiaojun Chen, Worst-case complexity of smoothing quadratic regularization methods for non-Lipschitzian optimization, SIAM J. Optim. 23 (2013), no. 3, 1718-1741. MR 3093871, https://doi.org/10.1137/120864908
  • [3] John R. Birge and François Louveaux, Introduction to Stochastic Programming, 2nd ed., Springer Series in Operations Research and Financial Engineering, Springer, New York, 2011. MR 2807730
  • [4] D. Brownstone, D. S. Bunch, and K. Train, Joint mixed logit models of stated and revealed preferences for alternative-fuel vehicles, Transportation Research B 34 (2000), no. 5, 315-338.
  • [5] Coralia Cartis, Nicholas I. M. Gould, and Philippe L. Toint, On the evaluation complexity of composite function minimization with applications to nonconvex nonlinear programming, SIAM J. Optim. 21 (2011), no. 4, 1721-1739. MR 2869514, https://doi.org/10.1137/11082381X
  • [6] K. L. Chung, On a stochastic approximation method, Ann. Math. Statistics 25 (1954), 463-483. MR 0064365
  • [7] Cong D. Dang and Guanghui Lan, Stochastic block mirror descent methods for nonsmooth and stochastic optimization, SIAM J. Optim. 25 (2015), no. 2, 856-881. MR 3341135, https://doi.org/10.1137/130936361
  • [8] John C. Duchi, Peter L. Bartlett, and Martin J. Wainwright, Randomized smoothing for stochastic optimization, SIAM J. Optim. 22 (2012), no. 2, 674-701. MR 2968871, https://doi.org/10.1137/110831659
  • [9] Yuri Ermoliev, Stochastic quasigradient methods and their application to system optimization, Stochastics 9 (1983), no. 1-2, 1-36. MR 703846, https://doi.org/10.1080/17442508308833246
  • [10] Michael C. Fu, Optimization for simulation: theory vs. practice, INFORMS J. Comput. 14 (2002), no. 3, 192-215. MR 1918923, https://doi.org/10.1287/ijoc.14.3.192.113
  • [11] A. A. Gaĭvoronskiĭ, Nonstationary stochastic programming problems, Kibernetika (Kiev) 4 (1978), 89-92 (Russian, with English summary). MR 509843
  • [12] R. Garmanjani and L. N. Vicente, Smoothing and worst-case complexity for direct-search methods in nonsmooth optimization, IMA J. Numer. Anal. 33 (2013), no. 3, 1008-1028. MR 3081492, https://doi.org/10.1093/imanum/drs027
  • [13] Saeed Ghadimi and Guanghui Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization I: A generic algorithmic framework, SIAM J. Optim. 22 (2012), no. 4, 1469-1492. MR 3023780, https://doi.org/10.1137/110848864
  • [14] Saeed Ghadimi and Guanghui Lan, Optimal stochastic approximation algorithms for strongly convex stochastic composite optimization, II: Shrinking procedures and optimal algorithms, SIAM J. Optim. 23 (2013), no. 4, 2061-2089. MR 3118261, https://doi.org/10.1137/110848876
  • [15] Saeed Ghadimi and Guanghui Lan, Stochastic first- and zeroth-order methods for nonconvex stochastic programming, SIAM J. Optim. 23 (2013), no. 4, 2341-2368. MR 3134439, https://doi.org/10.1137/120880811
  • [16] Saeed Ghadimi and Guanghui Lan, Accelerated gradient methods for nonconvex nonlinear and stochastic programming, Math. Program. 156 (2016), no. 1-2, Ser. A, 59-99. MR 3459195, https://doi.org/10.1007/s10107-015-0871-8
  • [17] Saeed Ghadimi, Guanghui Lan, and Hongchao Zhang, Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization, Math. Program. 155 (2016), no. 1-2, Ser. A, 267-305. MR 3439803, https://doi.org/10.1007/s10107-014-0846-1
  • [18] D. A. Hensher and W. H. Greene, The mixed logit model: The state of practice, Transportation 30 (2003), 133-176.
  • [19] A. Juditsky, P. Rigollet, and A. B. Tsybakov, Learning by mirror averaging, Ann. Statist. 36 (2008), no. 5, 2183-2206. MR 2458184, https://doi.org/10.1214/07-AOS546
  • [20] Anton J. Kleywegt, Alexander Shapiro, and Tito Homem-de-Mello, The sample average approximation method for stochastic discrete optimization, SIAM J. Optim. 12 (2001/02), no. 2, 479-502. MR 1885572, https://doi.org/10.1137/S1052623499363220
  • [21] Guanghui Lan, An optimal method for stochastic composite optimization, Math. Program. 133 (2012), no. 1-2, Ser. A, 365-397. MR 2921104, https://doi.org/10.1007/s10107-010-0434-y
  • [22] Guanghui Lan, Arkadi Nemirovski, and Alexander Shapiro, Validation analysis of mirror descent stochastic approximation method, Math. Program. 134 (2012), no. 2, Ser. A, 425-458. MR 2961314, https://doi.org/10.1007/s10107-011-0442-6
  • [23] J. Mairal, F. Bach, J. Ponce, and G. Sapiro, Online dictionary learning for sparse coding,
    In ICML, 2009.
  • [24] A. Nemirovski and R.Y. Rubinstein, An efficient stochastic approximation algorithm for stochastic saddle point problems, in Modeling Uncertainty, Springer, 2005, pp. 156-184.
  • [25] A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro, Robust stochastic approximation approach to stochastic programming, SIAM J. Optim. 19 (2008), no. 4, 1574-1609. MR 2486041, https://doi.org/10.1137/070704277
  • [26] A. S. Nemirovsky and D. B. Yudin, Problem Complexity and Method Efficiency in Optimization, A Wiley-Interscience Publication, John Wiley & Sons, Inc., New York, 1983. Translated from the Russian and with a preface by E. R. Dawson; Wiley-Interscience Series in Discrete Mathematics. MR 702836
  • [27] Yu. E. Nesterov, A method for solving the convex programming problem with convergence rate $ O(1/k^{2})$, Dokl. Akad. Nauk SSSR 269 (1983), no. 3, 543-547 (Russian). MR 701288
  • [28] Y. E. Nesterov, Random gradient-free minimization of convex functions, Technical report, Center for Operation Research and Econometrics (CORE), Catholic University of Louvain, 2010.
  • [29] Jorge Nocedal and Stephen J. Wright, Numerical Optimization, 2nd ed., Springer Series in Operations Research and Financial Engineering, Springer, New York, 2006. MR 2244940
  • [30] B. T. Polyak, A new method of stochastic approximation type, Avtomat. i Telemekh. 7 (1990), 98-107 (Russian); English transl., Automat. Remote Control 51 (1990), no. 7, 937-946 (1991). MR 1071220
  • [31] B. T. Polyak and A. B. Juditsky, Acceleration of stochastic approximation by averaging, SIAM J. Control Optim. 30 (1992), no. 4, 838-855. MR 1167814, https://doi.org/10.1137/0330046
  • [32] Herbert Robbins and Sutton Monro, A stochastic approximation method, Ann. Math. Statistics 22 (1951), 400-407. MR 0042668
  • [33] Andrzej Ruszczyński and Wojciech Syski, A method of aggregate stochastic subgradients with on-line stepsize rules for convex stochastic programming problems: Stochastic programming 84. II, Math. Programming Stud. 28 (1986), 113-131. MR 836764
  • [34] Jerome Sacks, Asymptotic distribution of stochastic approximation procedures, Ann. Math. Statist. 29 (1958), 373-405. MR 0098427
  • [35] Alexander Shapiro, Darinka Dentcheva, and Andrzej Ruszczyński, Lectures on Stochastic Programming: Modeling and Theory, MPS/SIAM Series on Optimization, vol. 9, Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA; Mathematical Programming Society (MPS), Philadelphia, PA, 2009. MR 2562798
  • [36] Mengdi Wang and Dimitri P. Bertsekas, Stochastic First-Order Methods with Random Constraint Projection, SIAM J. Optim. 26 (2016), no. 1, 681-717. MR 3472017, https://doi.org/10.1137/130931278
  • [37] Y. Yuan, Conditions for convergence of trust region algorithms for nonsmooth optimization, Math. Programming 31 (1985), no. 2, 220-228. MR 777292, https://doi.org/10.1007/BF02591750

Similar Articles

Retrieve articles in Mathematics of Computation with MSC (2010): 90C15, 90C30, 62L20, 90C60

Retrieve articles in all journals with MSC (2010): 90C15, 90C30, 62L20, 90C60


Additional Information

Xiao Wang
Affiliation: School of Mathematical Sciences, University of Chinese Academy of Sciences; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, People’s Republic of China
Email: wangxiao@ucas.ac.cn

Shiqian Ma
Affiliation: Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Shatin, N. T., Hong Kong
Email: sqma@se.cuhk.edu.hk

Ya-xiang Yuan
Affiliation: State Key Laboratory of Scientific and Engineering Computing, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, People’s Republic of China
Email: yyx@lsec.cc.ac.cn

DOI: https://doi.org/10.1090/mcom/3178
Keywords: Stochastic programming, nonlinear programming, stochastic approximation, penalty method, global complexity bound
Received by editor(s): April 8, 2015
Received by editor(s) in revised form: December 1, 2015
Published electronically: October 12, 2016
Additional Notes: The research of the first author was supported in part by Postdoc Grant 119103S175, UCAS President Grant Y35101AY00 and NSFC Grant 11301505.
The research of the second author was supported in part by a Direct Grant of the Chinese University of Hong Kong (Project ID: 4055016) and the Hong Kong Research Grants Council General Research Fund Early Career Scheme (Project ID: CUHK 439513)
The research of the third author was supported in part by NSFC Grants 11331012, 11321061 and 11461161005
Article copyright: © Copyright 2016 American Mathematical Society

American Mathematical Society