北京大学工学院航空航天动力学与控制实验室



	Xi-Ren Cao 教授应邀来访并作学术报告 Xi-Ren Cao教授进行演讲黄琳院士主持报告会认真听讲　　2008年5月6日，香港科技大学Xi-Ren Cao 教授应邀来访并作了题为“A Map of the World of Stochastic Learning and Optimization”的学术报告，黄琳院士主持报告会，工学院2部分师生参加了报告会和交流。随机系统的学习与优化问题在控制、运筹学及计算机科学等领域一直以来受到广泛关注，而在各个领域中所使用的方法一般是各不相同的。报告介绍了如何从敏感度角度出发将各领域中的策略进行统一化的方法，指出学习与优化问题的基本要素是两个性能敏感度公式，各领域中的学习与优化问题都可置于这个统一的框架之下。此外，报告还介绍了一种从敏感度角度出发得到的新的优化方法，称为基于事件的优化方法。曹教授并结合他文革期间的求学经历，勉励大家珍惜学习机会，勤奋读书，为将来为国家做贡献打好基础。报告内容摘要： In the information technology era, many systems such as Internet, wireless communication networks, sensor networks, logistic networks, manufacturing systems, transportation networks, and other complex systems, can be modeled as stochastic discrete event dynamic systems. Optimization techniques play a crucial role in designing and operating these modern engineering and social systems. Most systems are too complicated to model, or the system parameters cannot be easily measured. Therefore, learning techniques have to be applied. Learning and optimization of stochastic systems has been attracting wide attentions from researchers in many disciplines including control systems, operations research, and computer science. Areas such as perturbation analysis (PA), Markov decision processes (MDPs), reinforcement learning (RL), and adaptive control (AC), share the common goal: to optimize a system’s performance by analyzing its dynamic properties. In PA, performance gradients are first obtained by observing and analyzing a single sample path of the system and are then used to improve the system performance; In AC, system parameters are first identified, and control techniques are then applied to optimize the performance; In MDPs, policy iteration can be implemented to improve the performance by using the performance potentials (or the value functions) estimated on a sample paths of the system; In RL, decisions are made by learning from the evolution of the system under different actions. Recent research indicates that the above different disciplines can be explained from a sensitivity point of view in a unified way. In this talk, we offer an overview of the learning and optimization world from such a sensitivity perspective. Specifically, 1. We show that the fundamental elements of learning and optimization are two performance sensitivity formulas, one for performance gradients, and the other for performance differences. These two sensitivity formulas can be easily derived from the Poison equation with the concept of performance potentials. 2. We show how these two formulas lead to the results in different disciplines, including PA, MDPs, RL, and AC, how these results are closely related, and how one topic leads to the others. In particular, we show how a complete theory for multi-chain average cost MDPs can be derived intuitively from the performance difference formula without discounting. 3. We discuss the implementation of learning techniques and show briefly how Q-learning, TD(λ), neuro-dynamic programming, PA-based gradient estimates, on-line policy iteration, potential aggregation, Lebesgue sampling, etc., fit the general sensitivity-based framework. 4. We show how this sensitivity-based perspective leads naturally to a new optimization approach, called the event-based optimization, in which control os applied when some events, rather than states, occur. Because the number of events may scale to the system size, this approach saves a considerable amount of computation. The fundamental ideas of this talk can be illustrated clearly by a “map” of the learning and optimization world, with the two sensitivity formulas at the center. 报告人简历： Xi-Ren Cao received the M.S. and Ph.D. degrees from Harvard University, in 1981 and 1984, respectively, where he was a research fellow from 1984 to 1986. He then worked as consultant engineer/engineering manager at Digital Equipment Corporation, U.S.A, until October 1993. Then he joined the Hong Kong University of Science and Technology (HKUST), where he is currently chair professor, director of the Research Center for Networking. He held visiting positions at Harvard University, Tsinghua University, and AT&T Labs, etc. Dr. Cao owns three patents in data- and tele- communications and published three books in the area of discrete event dynamic systems. He received the Outstanding Transactions Paper Award from the IEEE Control System Society in 1987 and the Outstanding Publication Award from the Institution of Management Science in 1990. He was elected as a Fellow of IEEE in 1995, and as a Fellow of IFAC in 2008. He is the Chairman of IEEE Fellow Evaluation Committee of IEEE Control System Society, Editor-in-Chief of Discrete Event Dynamic Systems: Theory and Applications, Associate Editor at Large of IEEE Transactions of Automatic Control, and he is on Board of Governors of IEEE Control Systems Society and on the Technical Board of IFAC, He is/was associate editor of a number of international journals and chairman of a few technical committees of international professional societies. His current research areas include discrete event dynamic systems, stochastic learning and optimisation, performance analysis of communication systems, signal processing, and financial engineering.. [关闭窗口]　[打印此页]