Fall 2021
Maryam Fazel (University of Washington)
Date: TBD
Time: TBD
Location: Online (Zoom link will be provided)
Title: TBD
Abstract: TBD
Bio: Maryam Fazel is an Associate Professor of Electrical Engineering at the University of Washington, with adjunct appointments in the departments of Computer Science and Engineering, Mathematics, and Statistics. Maryam received her MS and PhD from Stanford University, her BS from Sharif University of Technology in Iran, and was a postdoctoral scholar at Caltech before joining UW. Her current research interests are in mathematical optimization and applications in machine learning and control. She is a recipient of the NSF Career Award, UWEE Outstanding Teaching Award, UAI conference Best Student Paper Award (with her student), and coauthored a paper selected as a Fast-Breaking paper by Science Watch (2011). She co-leads the NSF Algorithmic Foundations for Data Science Institute (ADSI) at UW, and is an associate editor of SIAM journal on Optimization and SIAM journal on Mathematics of Data Science.
Access: Seminar will be delivered live on the date and time above via the following link: Zoom link
A Zoom account is required in order to access the seminar. Zoom is accessible to UT faculty, staff, and students with support from ITS. Otherwise, you can sign up for a free personal account on the Zoom website.
Fall 2020
Stefanie Jegelka (MIT)
Date: Friday, November 20, 2020
Time: 11:00 AM – 12:00 PM (CST; UTC -6)
Location: Online
Title: Representation and Learning in Graph Neural Networks
Abstract: Graph Neural Networks (GNNs) have become a popular tool for learning representations of graph-structured inputs, with applications in computational chemistry, recommendation, pharmacy, reasoning, and many other areas.
This talk will show recent results on representational power and learning in GNNs. First, we will address representational power and important limitations of popular message passing networks and of some recent extensions of these models. Second, we consider learning, and provide new generalization bounds for GNNs. Third, although many networks may be able to represent a task, some architectures learn it better than others. I will show results that connect the architectural structure and learning behavior, in and out of the training distribution.
This talk is based on joint work with Keyulu Xu, Behrooz Tahmasebi, Jingling Li, Mozhi Zhang, Simon S. Du, Ken-ichi Kawarabayashi, Vikas Garg and Tommi Jaakkola.
Bio:Stefanie Jegelka is an X-Window Consortium Career Development Associate Professor in the Department of EECS at MIT, and a member of the Computer Science and AI Lab (CSAIL). Before joining MIT, she was a postdoctoral researcher at UC Berkeley, and obtained her PhD from ETH Zurich and the Max Planck Institute for Intelligent Systems. Stefanie has received a Sloan Research Fellowship, an NSF CAREER Award, a DARPA Young Faculty Award, the German Pattern Recognition Award and a Best Paper Award at ICML. Her research interests span the theory and practice of algorithmic machine learning, including discrete and continuous optimization, discrete probability, and learning with structured data.
Access: Seminar was delivered live via Zoom on Friday, November 20. You can watch a video recording of the talk on our YouTube channel here.
Zahra Ghodsi (NYU)
Date: Friday, October 30, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)
Location: Online
Title: Secure Frameworks for Outsourced Neural Network Inference
Abstract: Machine learning as a service (MLaaS) has emerged as a paradigm allowing clients to outsource machine learning computations to the cloud. However, MLaaS raises immediate security concerns, specifically relating to the integrity (or correctness) of computations performed by an untrusted cloud, and the privacy of the client’s data. In this talk, I discuss frameworks we built on cryptographic tools that can be used for secure deep learning based inference on an untrusted cloud: CryptoNAS (building models for private inference) and SafetyNets (addressing correctness). CryptoNAS is a novel neural architecture search method for finding and tailoring deep learning models to the needs of private inference. Existing models are not well suited for this task, as operation latencies in the private domain are inverted: non-linear layers dominate latency, while linear operations become effectively free. CryptoNAS introduces the idea of a ReLU budget as a proxy for inference latency, and can build models that maximize accuracy within a given budget. SafetyNets is a framework built on interactive proof protocols that enables an untrusted server (the cloud) to provide a client with a short mathematical proof of the correctness of the inference task that they perform on behalf of the client.
Bio: Zahra Ghodsi is a Ph.D. candidate at New York University. She is a recipient of the AI Research Fellowship from J.P. Morgan, and the Ernst Weber Fellowship from the Department of Electrical and Computer Engineering. Her research interests include designing efficient, secure, and private systems for machine learning. Previously, she received her M.S. degree from NYU, and B.S. degree from Sharif University of Technology.
Access: Seminar was delivered live via Zoom on Friday, October 30. You can watch a video recording of the talk on our YouTube channel here.
Siva Theja Maguluri (Georgia Tech)
Date: Friday, October 23, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)
Location: Online
Title: Finite Sample Convergence Bounds of Off-Policy Reinforcement Learning Algorithms
Abstract: The focus of our work is to obtain finite-sample and/or finite-time convergence bounds of various model-free Reinforcement Learning (RL) algorithms. Many RL algorithms are special cases of Stochastic Approximation (SA), which is a popular approach for solving fixed point equations when the information is corrupted by noise. We first obtain finite-sample bounds for general SA using a generalized Moreau envelope as a smooth potential/ Lyapunov function. We then use this result to establish the first-known convergence rate of the V-trace algorithm for off-policy TD-Learning and to recover the state-of-the-art results for tabular Q-Learning. We also use Lyapunov drift arguments to provide finite time error bounds of Q-learning algorithm with linear function approximation under an assumption on the sampling policy. This talk is based on the following papers: https://arxiv.org/abs/2002.00874 and https://arxiv.org/abs/1905.11425
Bio: Siva Theja Maguluri is the Fouts Family Early Career Professor and an Assistant Professor in the H. Milton Stewart School of Industrial and Systems Engineering at Georgia Tech. Before that, he was a Research Staff Member in the Mathematical Sciences Department at IBM T. J. Watson Research Center. He obtained his Ph.D. and MS in ECE as well as MS in Applied Math from UIUC, and B.Tech in Electrical Engineering from IIT Madras. His research interests span the areas of Control, Optimization, Algorithms and Applied Probability. In particular, he works on Reinforcement Learning theory, scheduling, resource allocation and revenue optimization problems that arise in a variety of systems including Data Centers, Cloud Computing, Wireless Networks, Block Chains, Ride hailing systems, etc. He is a recipient of the biennial “Best Publication in Applied Probability” award in 2017 and the “CTL/BP Junior Faculty Teaching Excellence Award” in 2020.
Access: Seminar was delivered live via Zoom on Friday, October 23. You can watch a video recording of the talk on our YouTube channel here.
R. Srikant (University of Illinois at Urbana-Champaign)
Date: Friday, October 16, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)
Location: Online
Title: The Role of Regularization in Overparameterized Neural Networks
Abstract: Overparameterized neural networks have proved to be remarkably successful in many complex tasks such as image classification and deep reinforcement learning. In this talk, we will consider the role of explicit regularization in training overparameterized neural networks. Specifically, we consider ReLU networks and show that the landscape of commonly used regularized loss functions have the property that every local minimum has good memorization and regularization performance. Joint work with Shiyu Liang and Ruoyu Sun.
Bio: R. Srikant is one of two Co-Directors of the C3.ai Digital Transformation Institute, jointly headquartered at UIUC and Berkeley, which is a consortium of universities (Stanford, MIT, CMU, UChicago, Princeton, Berkeley and UIUC) and industries (C3.ai and Microsoft) aimed at promoting research on AI, ML. IoT and cloud computing for the betterment of society. He is also the Fredric G. and Elizabeth H. Nearing Endowed Professor of ECE and the Coordinated Science Lab at UIUC. His research interests are in machine learning, AI, communication networks and applied probability. He is the winner of the 2019 IEEE Koji Kobayashi Computers and Communications Award, the 2017 Applied Probability Society Best Publication Award and the 2015 IEEE INFOCOM Achievement Award. He also won the Distinguished Alumnus Award from the Indian Institute of Technology, Madras.
Access: Seminar was delivered live via Zoom on Friday, October 16. You can watch a video recording of the talk on our YouTube channel here.
Rahul Jain (USC)
Date: Friday, October 9, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)
Location: Online
Title: Reinforcement Learning using Generative Models for Continuous State and Action Space Systems
Abstract: Reinforcement Learning (RL) problems for continuous state and action space systems are among the most challenging in RL. Recently, deep reinforcement learning methods have been shown to be quite effective for certain RL problems in settings of very large/continuous state and action spaces. But such methods require extensive hyper-parameter tuning, huge amount of data, and come with no performance guarantees. We note that such methods are mostly trained `offline’ on experience replay buffers. In this talk, I will describe a series of simple reinforcement learning schemes for various settings. Our premise is that we have access to a generative model that can give us simulated samples of the next state. We will start with finite state and action space MDPs. An `empirical value learning’ (EVL) algorithm can be derived quite simply by replacing the expectation in the Bellman operator with an empirical estimate. We note that the EVL algorithm has remarkably good numerical performance for practical purposes. We next extend this to continuous state spaces by considering randomized function approximation on a reproducible kernel Hilbert space (RKHS). This allows for arbitrarily good approximation with high probability for any problem due to its universal function approximation property. Last, I will introduce the RANDPOL (randomized function approximation for policy iteration) algorithm, an actor-critic algorithm that used randomized neural networks that can successfully solve a tough robotic problem. We also provide theoretical performance guarantees for the algorithm. I will also touch upon the probabilistic contraction analysis framework of iterative stochastic algorithms that underpins the theoretical analysis. This talk is based on work with a number of people that includes Dileep Kalathil (Texas A&M), Hiteshi Sharma (Microsoft), Abhishek Gupta (Ohio State), William Haskell (Purdue), Vivek Borkar (IIT Bombay) and Peter Glynn (Stanford).
Bio: Rahul Jain is the K. C. Dahlberg Early Career Chair and Associate Professor of Electrical Engineering, Computer Science* and ISE* (*by courtesy) at the University of Southern California (USC). He received a B.Tech from the IIT Kanpur, and an MA in Statistics and a PhD in EECS from the University of California, Berkeley. Prior to joining USC, he was at the IBM T J Watson Research Center, Yorktown Heights, NY. He has received numerous awards including the NSF CAREER award, the ONR Young Investigator award, an IBM Faculty award, the James H. Zumberge Faculty Research and Innovation Award, and is a US Fulbright Scholar. His interests span reinforcement learning, stochastic control, statistical learning, stochastic networks, and game theory, and power systems and healthcare on the applications side.
Access: Seminar was delivered live via Zoom on Friday, October 9. You can watch a video recording of the talk on our YouTube channel here.
Satyen Kale (Google Research)
Date: Friday, October 2, 2020
Time: 1:30 PM – 2:30 PM (CDT; UTC -5)
Location: Online
Title: Optimization algorithms for heterogeneous clients in Federated Learning
Abstract: Federated Learning has emerged as an important paradigm in modern large-scale machine learning, where the training data remains distributed over a large number of clients, which may be phones, network sensors, hospitals, etc. A major challenge in the design of optimization methods for Federated Learning is the heterogeneity (i.e. non i.i.d. nature) of client data. This problem affects the currently dominant algorithm deployed in practice known as Federated Averaging (FedAvg): we provide results for FedAvg quantifying the degree to which this problem causes unstable or slow convergence. We develop two optimization algorithms to handle this problem for two different settings of Federated Learning. In the cross-silo setting of Federated Learning, we propose a new algorithm called SCAFFOLD which uses control variates to correct for client heterogeneity and prove that SCAFFOLD requires significantly fewer communication rounds and is not affected by data heterogeneity or client sampling. In the cross-device setting of Federated Learning, we propose a general framework called Mime which mitigates client-drift and adapts arbitrary centralized optimization algorithms (e.g. SGD, Adam, etc.) to Federated Learning via a combination of control-variates and server-level statistics (e.g. momentum) at every client-update step. Our theoretical and empirical analyses establish the superiority of SCAFFOLD and Mime over other baselines in their respective settings.
Bio: Satyen Kale is a research scientist at Google Research working in the New York office. His current research is the design of efficient and practical algorithms for fundamental problems in Machine Learning and Optimization. More specifically, he is interested in decision making under uncertainty, statistical learning theory, combinatorial optimization, and convex optimization techniques such as linear and semidefinite programming. His research has been recognized with several awards: a best paper award at ICML 2015, a best paper award at ICLR 2018, and a best student paper award at COLT 2018. He was a program chair of COLT 2017 and ALT 2019.
Access: Seminar was delivered live via Zoom on October 2, 2020. You can watch a video recording of the talk here.
Francis Bach (Inria-ENS)
Date: Friday, September 18, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)
Location: Online
Title: On the convergence of gradient descent for wide two-layer neural networks
Abstract: Many supervised learning methods are naturally cast as optimization problems. For prediction models which are linear in their parameters, this often leads to convex problems for which many guarantees exist. Models which are non-linear in their parameters such as neural networks lead to non-convex optimization problems for which guarantees are harder to obtain. In this talk, I will consider two-layer neural networks with homogeneous activation functions where the number of hidden neurons tends to infinity, and show how qualitative convergence guarantees may be derived. I will also highlight open problems related to the quantitative behavior of gradient descent for such models. (Joint work with Lénaïc Chizat)
Bio: Francis Bach is a researcher at Inria, leading since 2011 the machine learning team which is part of the Computer Science department at Ecole Normale Supérieure. He graduated from Ecole Polytechnique in 1997 and completed his Ph.D. in Computer Science at U.C. Berkeley in 2005, working with Professor Michael Jordan. He spent two years in the Mathematical Morphology group at Ecole des Mines de Paris, then he joined the computer vision project-team at Inria/Ecole Normale Supérieure from 2007 to 2010. Francis Bach is primarily interested in machine learning, and especially in sparse methods, kernel-based learning, large-scale optimization, computer vision and signal processing. He obtained in 2009 a Starting Grant and in 2016 a Consolidator Grant from the European Research Council, and received the Inria young researcher prize in 2012, the ICML test-of-time award in 2014, as well as the Lagrange prize in continuous optimization in 2018, and the Jean-Jacques Moreau prize in 2019. He was elected in 2020 at the French Academy of Sciences. In 2015, he was program co-chair of the International Conference in Machine learning (ICML), and general chair in 2018; he is now co-editor-in-chief of the Journal of Machine Learning Research.
Spring 2020
VIRTUAL SEMINAR
Kartik Ahuja (IBM Research AI)
Date: Friday, May 22, 2020
Time: 11:00 AM – 12:00 PM (CDT; UTC -5)
Location: Online (see information below)
Title: “Invariant Risk Minimization Games”
Abstract: The standard risk minimization paradigm of machine learning is brittle when operating in environments whose test distributions are different from the training distribution due to spurious correlations. Training on data from many environments and finding invariant predictors reduces the effect of spurious features by concentrating models on features that have a causal relationship with the outcome. In this work, we pose such invariant risk minimization as finding the Nash equilibrium of an ensemble game among several environments. By doing so, we develop a simple training algorithm that uses best response dynamics and, in our experiments, yields similar or better empirical accuracy with much lower variance than the challenging bi-level optimization problem of Arjovsky et al. 2019. One key theoretical contribution is showing that the set of Nash equilibria for the proposed game are equivalent to the set of invariant predictors for any finite number of environments, even with nonlinear classifiers and transformations. As a result, our method also retains the generalization guarantees to a large set of environments shown in Arjovsky et al. 2019. The proposed algorithm adds to the collection of successful game-theoretic machine learning algorithms such as generative adversarial networks.
Bio: Kartik Ahuja is currently an AI Resident at IBM Research AI, IBM TJ Watson Research Center. Kartik received his PhD from UCLA’s Electrical and Computer Engineering and B.tech-M.tech dual degree in Electrical Engineering from the Indian Institute of Technology, Kanpur. Kartik’s research focuses on developing optimization, game theory, and machine learning methods. He has applied these methods to tackle problems related to robustness and trust in machine learning and resource allocation in engineering. His research works have featured in the IEEE spotlight, IEEExplore, and IEEE MMTC letter. He also received the second-best student paper award at Asilomar Conference on Signals. Systems and Computers and was nominated for the best paper award at IEEE Globecom. He was the recipient of the Guru Krupa Fellowship and the Dissertation Year Fellowship at UCLA.
VIRTUAL SEMINAR
Prof. Sid Banerjee (Cornell Univ.)
Date: Friday, May 15, 2020
Time: 10:30 AM – 11:30 AM (CDT; UTC -5)
Location: Online (see information below)
Title: “Constant Regret Algorithms for Online Decision-Making”
Abstract: I will present a simple algorithm that achieves constant regret in many widely-studied online decision-making problems, including online resource-allocation and pricing, generalized assignment, and online bin packing. In particular, I will consider a general class of finite-horizon control problems, where we see a stream of stochastic arrivals from some known distribution, and need to select actions, with the final objective depending only on the aggregate type-action counts. For such settings, I will introduce a unified algorithmic paradigm, and provide a simple, yet general, condition under which these algorithms achieve constant regret, i.e., additive loss compared to the hindsight optimal solution which is independent of the horizon and state-space. The results stem from an elementary sample-path coupling argument, which may prove useful for a larger class of problems in online decision-making. Time permitting, I will illustrate this by showing how we can use this technique to obtain simple data-driven implementations of the above algorithms, which achieve constant regret with as little as a single data trace.
Bio: Sid Banerjee is an Assistant Professor in the School of Operations Research and Information Engineering (ORIE) at Cornell, as well as a field member in the CS and ECE Departments and the Center for Applied Mathematics. His research is on stochastic modeling and control, and the design of algorithms and incentives for large-scale systems. He got his Ph.D. in ECE from UT Austin and worked as a postdoctoral researcher in the Social Algorithms Lab at Stanford, as well as a technical consultant at Lyft. His work is supported by an NSF CAREER award, and grants from the NSF and ARL.
VIRTUAL SEMINAR
Prof. Dimitris Papailiopoulos (UW-Madison)
Date: Friday, May 8, 2020
Time: 11:00 AM – 12:00 PM
Location: Online (see information below)
Title: “Robust Distributed Training! But at What Cost?”
Abstract: In this talk, we aim to quantify the robustness of distributed training against worst-case failures and adversarial nodes. We show that there is a gap between robustness guarantees, depending on whether adversarial nodes have full control of the hardware, the training data, or both. Using ideas from robust statistics and coding theory we establish robust and scalable training methods for centralized, parameter server systems.
Perhaps unsurprisingly, we prove that robustness is impossible when a central authority does not own the training data, e.g., in federated learning systems. We then provide a set of attacks that force federated models to exhibit poor performance on either the training, test, or out-of-distribution data sets. Our results and experiments cast doubts on the security presumed by federated learning system providers, and show that if you want robustness, you probably have to give up some of your data privacy.
Bio: Dimitris Papailiopoulos is an Assistant Professor of ECE and CS (by courtesy) at UW-Madison. His research spans machine learning, information theory, and distributed systems, with a current focus on scalable and fault-tolerant distributed machine learning systems. Dimitris was a postdoctoral researcher at UC Berkeley and a member of the AMPLab. He earned his Ph.D. in ECE from UT Austin in 2014, under the supervision of Alex Dimakis. Dimitris is a recipient of the NSF CAREER Award (2019), a Sony Faculty Innovation Award (2019), the Benjamin Smith Reynolds Award for Excellence in Teaching (2019), and the IEEE Signal Processing Society, Young Author Best Paper Award (2015). In 2018, he co-founded MLSys, a new conference that targets research at the intersection of machine learning and systems.
Access:
The seminar was delivered live via Zoom on Friday, May 8, 2020. You can see a recording of the talk on WNCG’s YouTube channel HERE.
VIRTUAL SEMINAR
Prof. Rich Zemel (Univ. of Toronto)
Date: Thursday, May 7, 2020
Time: 11:00 AM – 12:00 PM
Location: Online (see information below)
Title: “Modeling Uncertainty in Learning with Little Data”
Abstract: Few-shot classification, the task of adapting a classifier to unseen classes given a small labeled dataset, is an important step on the path toward human-like machine learning. I will present what I think are some of the key advances and open questions in this area. I will then focus on the fundamental issue of overfitting in the few-shot scenario. Bayesian methods are well-suited to tackling this issue because they allow practitioners to specify prior beliefs and update those beliefs in light of observed data. Contemporary approaches to Bayesian few-shot classification maintain a posterior distribution over model parameters, which is slow and requires storage that scales with model size. Instead, we propose a Gaussian process classifier based on a novel combination of Pólya-gamma augmentation and the one-vs-each loss that allows us to efficiently marginalize over functions rather than model parameters. We demonstrate improved accuracy and uncertainty quantification on both standard few-shot classification benchmarks and few-shot domain transfer tasks.
Bio: Richard Zemel is a Professor of Computer Science at the University of Toronto, where he has been a faculty member since 2000. Prior to that, he was an Assistant Professor in Computer Science and Psychology at the University of Arizona and a Postdoctoral Fellow at the Salk Institute and at Carnegie Mellon University. He received a B.Sc. degree in History & Science from Harvard University in 1984 and a Ph.D. in Computer Science from the University of Toronto in 1993. He is also the co-founder of SmartFinance, a financial technology startup specializing in data enrichment and natural language processing.
His awards include an NVIDIA Pioneers of AI Award, a Young Investigator Award from the Office of Naval Research, a Presidential Scholar Award, two NSERC Discovery Accelerators, and seven Dean’s Excellence Awards at the University of Toronto. He is a Fellow of the Canadian Institute for Advanced Research and is on the Executive Board of the Neural Information Processing Society, which runs the premier international machine learning conference.
His research contributions include foundational work on systems that learn useful representations of data without any supervision; methods for learning to rank and recommend items; and machine learning systems for automatic captioning and answering questions about images. He developed the Toronto Paper Matching System, a system for matching paper submissions to reviewers, which is being used in many conferences, including NIPS, ICML, CVPR, ICCV, and UAI. His research is supported by grants from NSERC, CIFAR, Microsoft, Google, Samsung, DARPA and iARPA.
Prof. Gauri Joshi (Carnegie Mellon University)
Date: Friday, March 6, 2020
Time: 11:00 am – 12:00 pm
Location: EER 1.516
Title: Speeding up Distributed SGD via Communication-Efficient Model Aggregation
Abstract: Large-scale machine learning training, in particular, distributed stochastic gradient descent (SGD), needs to be robust to inherent system variability such as unpredictable computation and communication delays. This work considers a distributed SGD framework where each worker node is allowed to perform local model updates and the resulting models are averaged periodically. Our goal is to analyze and improve the true speed of error convergence with respect to wall-clock time (instead of the number of iterations). For centralized model-averaging, we propose a strategy called AdaComm that gradually increases the model-averaging frequency in order to strike the best error-runtime trade-off. For decentralized model-averaging, we propose MATCHA, where we use matching decomposition sampling of the base graph to parallelize inter-worker information exchange and reduce communication delay. Experiments on training deep neural networks show that AdaComm and MATCHA can take 3x less time to achieve the same final training loss as compared to fully synchronous SGD and vanilla decentralized SGD respectively.
Based on joint work with Jianyu Wang, Anit Sahu, Zhouyi Yang, and Soummya Kar. Links to papers: https://arxiv.org/abs/1808.07576, https://arxiv.org/pdf/1810.08313.pdf, https://arxiv.org/abs/1905.09435
Bio: Gauri Joshi is an assistant professor in the ECE department at Carnegie Mellon University since September 2017. Previously, she worked as a Research Staff Member at IBM T. J. Watson Research Center. Gauri completed her Ph.D. from MIT EECS in June 2016, advised by Prof. Gregory Wornell. She received her B.Tech and M.Tech in Electrical Engineering from the Indian Institute of Technology (IIT) Bombay in 2010. Her awards and honors include the NSF CRII Award (2018), IBM Faculty Research Award (2017), Best Thesis Prize in Computer science at MIT (2012), Institute Gold Medal of IIT Bombay (2010), and the Claude Shannon Research Assistantship (2015-16).
Prof. Ameet Talwalkar (Carnegie Mellon University)
Date: Friday, February 21, 2020
Time: 11:00 am – 12:00 pm
Location: EER 1.516
Title: Toward the Jet Age of Machine Learning
Abstract: Machine learning today bears resemblance to the field of aviation soon after the Wright Brothers’ pioneering flights in the early 1900s. It took half a century of aeronautical engineering advances for the ‘Jet Age’ (i.e., commercial aviation) to become a reality. Similarly, machine learning (ML) is currently experiencing a renaissance, yet fundamental barriers must be overcome to fully unlock the potential of ML-powered technology. In this talk, I describe our work to help democratize ML by tackling barriers related to scalability, privacy, and safety. In the context of scalability and privacy, I discuss theoretically principled, privacy-preserving approaches to federated learning (i.e., learning over massive networks of edge devices) that rely on novel connections to gradient-based meta-learning. In the context of safety, we reduce the gap between model transparency and model accuracy via a novel model family of interpretable random forests that also serves as a state-of-the-art black-box explanation system.
Bio: Ameet Talwalkar is an assistant professor in the Machine Learning Department at CMU, and also co-founder and Chief Scientist at Determined AI. His interests are in the field of statistical machine learning. His current work is motivated by the goal of democratizing machine learning, with a focus on topics related to scalability, automation, fairness, and interpretability of learning algorithms and systems. He led the initial development of the MLlib project in Apache Spark, is a co-author of the textbook ‘Foundations of Machine Learning’ (MIT Press), and created an award-winning edX MOOC on distributed machine learning. He also helped to create the MLSys conference, serving as the inaugural Program Chair in 2018, General Chair in 2019, and currently as President of the MLSys Board.
Prof. Volkan Cevher (EPFL)
Date: Monday, February 3, 2020
Time: 3:00 pm – 4:00 pm
Location: EER 3.646
Title: Robust Reinforcement Learning with Langevin Dynamics
Abstract:
In this talk, I will talk about principled ways of solving a classical reinforcement learning (RL) problem and introduce its robust variant.
In particular, we rethink the exploration-exploitation trade-off in RL as an instance of a distribution sampling problem in infinite dimensions. Using the powerful Stochastic Gradient Langevin Dynamics (SGLD), we propose a new RL algorithm, which results in a sampling variant of the Twin Delayed Deep Deterministic Policy Gradient (TD3) method. Our algorithm consistently outperforms existing exploration strategies for TD3 based on heuristic noise injection strategies in several MuJoCo environments.
The sampling perspective enables us to introduce an action-robust variant of RL objective, which is as a particular case of a zero-sum two-player Markov game. In this setting, at each step of the game, both players simultaneously choose an action. The reward each player gets after one step depends on the state and the convex combination of the actions of both players. Based on our earlier work (SGLD for min-max/GAN problem), we propose a new robust RL algorithm with convergence guarantee and provide numerical evidence of the new algorithm. Finally, I will also discuss future directions on the application of the framework to self-play in games.
Bio: Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park from 2006-2007and also with Rice University in Houston, TX, from 2008-2009. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University. His research interests include signal processing theory, machine learning, convex optimization, and information theory. Dr. Cevher was the recipient of the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011.
Prof. Justin Romberg (Georgia Tech)
Date: Friday, January 31, 2020
Time: 11:00am – 12:00pm
Location: EER 1.516
Title: Distributed Stochastic Approximation: Reinforcement Learning and Optimization with Communication Constraints
Abstract:
We will discuss two problems that have different application spaces but share a common mathematical core. These problems combine stochastic approximation, an iterative method for finding the fixed point of a function from noisy observations, and consensus, a general averaging technique for multiple agents to cooperatively solve a distributed problem.
In the first part of the talk, we will discuss the policy evaluation problem in multi-agent reinforcement learning. In this problem, a set of agents operate in a common environment under a fixed control policy, working together to discover the value (accumulative reward) associated with each environmental state. We give a finite-time analysis on the performance of the well-known “TD-lambda” algorithm that depends on the connectivity of the agents and the intrinsic properties of the Markov decision process driving the agents decisions.
In the second part, we discuss distributed optimization problems over a network when the communication between the nodes is constrained, and so information that is exchanged between the nodes must be quantized. We propose a modified consensus-based gradient method that communicates using random (dithered) quantization. We derive an upper bounds on the rate of convergence of this method as a function of the bandwidth available between the nodes and the underlying network topology, and derive rates for both convex and strongly convex objective functions.
Bio: Dr. Justin Romberg is a Professor in the School of Electrical and Computer Engineering at the Georgia Institute of Technology. Dr. Romberg received the B.S.E.E. (1997), M.S. (1999) and Ph.D. (2004) degrees from Rice University in Houston, Texas. From Fall 2003 until Fall 2006, he was a Postdoctoral Scholar in Applied and Computational Mathematics at the California Institute of Technology. He spent the Summer of 2000 as a researcher at Xerox PARC, the Fall of 2003 as a visitor at the Laboratoire Jacques-Louis Lions in Paris, and the Fall of 2004 as a Fellow at UCLA’s Institute for Pure and Applied Mathematics. In the Fall of 2006, he joined the Georgia Tech ECE faculty. In 2008 he received an ONR Young Investigator Award, in 2009 he received a PECASE award and a Packard Fellowship, and in 2010 he was named a Rice University Outstanding Young Engineering Alumnus. In 2006-2007 he was a consultant for the TV show “Numb3rs”. He was an Associate Editor for the IEEE Transactions on Information Theory from 2008-2011, for the SIAM Journal on Imaging Science from 2013-2018, and the SIAM Journal on the Mathematics of Data Science from 2018-present. In 2018, he was named an IEEE Fellow.
Fall 2019
Prof. Amin Karbasi (Yale University)

Title: User Friendly Submodular Maximization
Abstract: Submodular functions model the intuitive notion of diminishing returns. Due to their far-reaching applications, they have been rediscovered in many fields such as information theory, operations research, statistical physics, economics, and machine learning. They also enjoy computational tractability as they can be minimized exactly or maximized approximately. The goal of this talk is simple. We see how a little bit of randomness, a little bit of greediness, and the right combination can lead to pretty good methods for offline, streaming, and distributed solutions. I do not assume any background on submodularity and try to explain all the required details during the talk.
Bio: Amin Karbasi is currently an assistant professor of Electrical Engineering, Computer Science, and Statistics at Yale University. He is also a senior visiting scientist at Google NY. He has been the recipient of the National Science Foundation (NSF) Career Award 2019, Office of Naval Research (ONR) Young Investigator Award 2019, Air Force Office of Scientific Research (AFOSR) Young Investigator Award 2018, DARPA Young Faculty Award 2016, National Academy of Engineering Grainger Award 2017, Amazon Research Award 2018, Google Faculty Research Award 2016, Microsoft Azure Research Award 2016, Simons Research Fellowship 2017, and ETH Research Fellowship 2013. His work has also been recognized with a number of paper awards, including Medical Image Computing and Computer Assisted Interventions Conference (MICCAI) 2017, International Conference on Artificial Intelligence and Statistics (AISTAT) 2015, IEEE ComSoc Data Storage 2013, International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2011, ACM SIGMETRICS 2010, and IEEE International Symposium on Information Theory (ISIT) 2010 (runner-up). His Ph.D. thesis received the Patrick Denantes Memorial Prize 2013 from the School of Computer and Communication Sciences at EPFL, Switzerland.
Prof. Suvrit Sra (MIT)
Date: Friday, November 8, 2019
Time: 11:00am – 12:00pm
Location: EER 1.516
Title: ReLU nets are powerful memorizers: a tight analysis of finite sample expressive power
Abstract: I will talk about finite sample expressivity, aka memorization power of ReLU networks. Recent results showed (unsurprisingly) that arbitrary input data could be perfectly memorized using a shallow ReLU network with one hidden layer having N hidden nodes. I will describe a more careful construction that trades of width with depth to show that a ReLU network with 2 hidden layers, each with 2*sqrt(N) hidden nodes, can perfectly memorize arbitrary datasets. Moreover, we prove that width of Θ(sqrt(N)) is necessary and sufficient for having perfect memorization power. A notable corollary of this result is that mild overparametrization suffices to permit a NN to achieve zero training loss!
Bio: Suvrit Sra joined MIT’s Department of Electrical Engineering and Computer Science as an Assistant Professor, and IDSS as a core faculty member, in January 2018. Prior to this, he was a Principal Research Scientist in the MIT Laboratory for Information and Decision Systems (LIDS). Before coming to LIDS, he was a Senior Research Scientist at the Max Planck Institute for Intelligent Systems, in Tübingen, Germany. During this time, he was also a visiting faculty member at the University of California at Berkeley (EECS Department) and Carnegie Mellon University (Machine Learning Department). He received his Ph.D. in Computer Science from the University of Texas at Austin.
Suvrit’s research bridges a variety of mathematical topics including optimization, matrix theory, differential geometry, and probability with machine learning. His recent work focuses on the foundations of geometric optimization, an emerging subarea of nonconvex optimization where geometry (often non-Euclidean) enables efficient computation of global optimality. More broadly, his work encompasses a wide range of topics in optimization, especially in machine learning, statistics, signal processing, and related areas. He is pursuing novel applications of machine learning and optimization to materials science, quantum chemistry, synthetic biology, healthcare, and other data-driven domains. His work has won several awards at machine learning conferences, the 2011 “SIAM Outstanding Paper” award, faculty research awards from Criteo and Amazon, and an NSF CAREER award. In addition, Suvrit founded (and regularly co-chairs) the popular OPT “Optimization for Machine Learning” series of Workshops at the Conference on Neural Information Processing Systems (NIPS). He has also edited a well-received book with the same title (MIT Press, 2011).
Spring 2019
Towards demystifying over-parameterization and early stopping in deep learning
Prof. Mahdi Soltanolkotabi
University of Southern California
Date: 5/3/19
Abstract: Many modern neural networks are trained in an over-parameterized regime where the parameters of the model exceed the size of the training dataset. Due to their over-parameterized nature, these models in principle have the capacity to (over)fit any set of labels including pure noise. Despite this high fitting capacity, somewhat paradoxically, models trained via first-order methods (often with early stopping) continue to predict well on yet unseen test data. In this talk I will discuss some results aimed at demystifying such phenomena by demonstrating that gradient methods enjoy a few intriguing properties: (1) when initialized at random the iterates converge at a geometric rate to a global optima, (2) among all global optima of the loss the iterates converge to one with good generalization capability, (3) with early-stopping are provably robust to noise/corruption/shuffling on a fraction of the labels with these algorithms only fitting to the correct labels and ignoring the corrupted labels.
Bio: Mahdi Soltanolkotabi is an assistant professor in the Ming Hsieh Department of Electrical and Computer Engineering and Computer Science at the University of Southern California where he holds an Andrew and Erna Viterbi Early Career Chair. Prior to joining USC, he completed his Ph.D. in electrical engineering at Stanford in 2014. He was a postdoctoral researcher in the EECS department at UC Berkeley during the 2014-2015 academic year. Mahdi is the recipient of the Packard Fellowship in Science and Engineering, a Sloan Research Fellowship, an NSF Career award, an Airforce Office of Research Young Investigator award (AFOSR-YIP), and a Google faculty research award.
Ant Navigation and When to Follow GPS Directions
Prof. Steve Alpern
University of Warwick
Location: EER 1.516 (North Tower)
Abstract:The following problem arose from the work of Fonio et al, a group of ecologists and computer scientists, who tried to understand the behavior of longhorn crazy ants (Paratrechina longicomis) in navigating back to their nest after gathering food. Single ants were demonstrated to be laying pheromone ‘pointers’ to be followed by groups of ants carrying large loads. Sometimes the pointers are wrong. This leads to an optimization problem on networks with a destination node (the nest). A GPS or other system selects a direction (pointer) to the nest at every node. This will be the same direction every time the node is reached. These directions are correct with a known probability p. If followed all the time this might lead to an infinite loop starting at some node with wrong pointing direction. The problem is to choose a ‘trust probability’ q (as a function of p) with which to choose the pointer direction, in order to minimize the expected time to reach the destination node. If the pointer direction is not chosen, a random edge is chosen to leave the node. We show how to attack this problem in several ways.
Bio: Prof. Steve Alpern joined the ORMS group at Warwick Business School after a long stay at the London School of Economics, where he was Professor of Mathematics and Operational Research. Alpern grew up and completed his education in the United States, where he received the AB at Princeton (Senior Thesis written with Oskar Morgenstern) and the PhD at the Courant Institute-NYU (thesis on dynamical systems, under Peter Lax). Before crossing the Atlantic, he taught at NYU, UCLA, Bryn Mawr and Yale. Alpern did much of his early work in the area of ergodic theory and dynamical systems, working with J. Oxtoby, S. Kakutani and V. Prasad. Current research interests are in the Operational Research areas of game theory, search theory (particularly rendezvous search, which he first proposed in 1976) and decentralized matching theory. Lately Alpern has been applying theoretical results in search theory to animal behaviour (predator search for prey, ‘pilferer’ search for cached nuts). He has recently become interested in problems of group decision making, including jury voting and committees.
Hashing Algorithms for Extreme Scale Machine Learning
Prof. Anshumali Shrivastava
Rice University
Location: EER 1.516 (North Tower)
Abstract: In this talk, I will discuss some of my recent and surprising findings on the use of hashing algorithms for large-scale estimations. Locality Sensitive Hashing (LSH) is a hugely popular algorithm for sub-linear near neighbor search. However, it turns out that fundamentally LSH is a constant time (amortized) adaptive sampler from which efficient near-neighbor search is one of the many possibilities. Our observation adds another feather in the cap for LSH. LSH offers a unique capability to do smart sampling and statistical estimations at the cost of few hash lookups. Our observation bridges data structures (probabilistic hash tables) with efficient unbiased statistical estimations. I will demonstrate how this dynamic and efficient sampling beak the computational barriers in adaptive estimations where it is possible that we pay roughly the cost of uniform sampling but get the benefits of adaptive sampling. We will demonstrate the power of one simple idea for three favorite problems 1) Partition function estimation for large NLP models such as word2vec, 2) Adaptive Gradient Estimations for efficient SGD, 3) Sub-Linear Deep Learning with Huge Parameter Space. I will show the power of these randomized algorithm by introducing SLIDE system. SLIDE is an auspicious illustration of the power of smart randomized algorithms over CPUs in outperforming the best available GPU with an optimized implementation for training large neural networks. Our evaluations on large industry-scale datasets, with some large fully connected architectures, show that training with SLIDE on a 44 core CPU is more than 2.7 times (2 hours vs. 5.5 hours) faster than the same network trained using Tensorflow on Tesla V100 at any given accuracy level. In the end, if time permits, we will switch to memory cost and see the power of a simple hashing that can shrink memory requirements associated with classification problems exponentially! Using our algorithms, we can train 100,000 classes with 400,000 features, on a single Titan X while only needing 5% or less memory required to store all the weights. Running a simple logistic regression on this data, the model size of 160 GB is unavoidable.
Bio: Anshumali Shrivastava is an assistant professor in the computer science department at Rice University. His broad research interests include randomized algorithms for large-scale machine learning. In 2018, Science news named him one of the Top-10 scientists under 40 to watch. He is a recipient of National Science Foundation CAREER Award, a Young Investigator Award from Air Force Office of Scientific Research, and machine learning research award from Amazon. His research on hashing inner products has won Best Paper Award at NIPS 2014 while his work on representing graphs got the Best Paper Award at IEEE/ACM ASONAM 2014. Anshumali finished his Ph.D. in 2015 from Cornell University.
Stein Variational Gradient Descent: Algorithm, Theory, Applications
Prof. Qiang Liu
The University of Texas at Austin
Friday, March 15, 2019
11:00AM – 12:00PM
Location: EER 1.516 (North Tower)
Abstract: Approximate probabilistic inference is a key computational task in modern machine learning, which allows us to reason with complex, structured, hierarchical (deep) probabilistic models to extract information and quantify uncertainty. Traditionally, approximate inference is often performed by either Markov chain Monte Carlo (MCMC) and variational inference (VI), both of which, however, have their own critical weaknesses: MCMC is accurate and asymptotically consistent but suffers from slow convergence; VI is typically faster by formulating inference problem into gradient-based optimization, but introduces deterministic errors and lacks theoretical guarantees. Stein variational gradient descent (SVGD) is a new tool for approximate inference that combines the accuracy and flexibility of MCMC and practical speed of VI and gradient-based optimization. The key idea of SVGD is to directly optimize a non-parametric particle-based representation to fit intractable distributions with fast deterministic gradient-based updates, which is made possible by integrating and generalizing key mathematical tools from Stein’s method, optimal transport, and interacting particle systems. SVGD has been found a powerful tool in various challenging settings, including Bayesian deep learning and deep generative models, reinforcement learning, and meta learning. This talk will introduce the basic ideas and theories of SVGD, and cover some examples of application.
Bio: Prof. Qiang Liu is an Associate Professor of computer science at The University of Texas at Austin. His research focuses on statistical machine learning, probabilistic inference and learning, and deep learning.
Uncertainty quantification and active sampling for low-rank matrices
Prof. Yao Xie
Georgia Institute of Technology
11:00AM – 12:00PM
Location: EER 1.516 (North Tower)
Geometry and Regularization in Nonconvex Low-Rank Estimation
Princeton University
Friday, February 01, 2019
EER 3.646 – Blaschke Conference Room
Abstract: Recent years have seen a flurry of activities in designing provably efficient nonconvex procedures for solving statistical estimation problems. The premise is that despite nonconvexity, the loss function may possess benign geometric properties that enable fast global convergence under carefully designed initializations, such as local strong convexity, local restricted convexity, etc. In many sample-starved problems, this benign geometry only holds over a restricted region of the entire parameter space with certain structural properties, yet gradient descent seems to follow a trajectory staying within this nice region without explicit regularizations, thus is extremely computationally efficient and holds strong statistical guarantees. In this talk, we formally establish this “implicit regularization” phenomenon of gradient descent for the fundamental problem of estimating low-rank matrices from noisy incomplete, rank-one, or 1-bit measurements, by exploiting statistical modeling in analyzing iterative optimization algorithms via a leave-one-out perturbation argument.
Bio: Dr. Yuejie Chi received the Ph.D. degree in Electrical Engineering from Princeton University in 2012, and the B.E. (Hon.) degree in Electrical Engineering from Tsinghua University, Beijing, China, in 2007. Since January 2018, she is Robert E. Doherty Career Development Professor and Associate Professor with the department of Electrical and Computer Engineering at Carnegie Mellon University, after spending 5 years at The Ohio State University. She is interested in the mathematics of data representation that take advantage of structures and geometry to minimize complexity and improve performance. Specific topics include mathematical and statistical signal processing, machine learning, large-scale optimization, sampling and information theory, with applications in sensing, imaging and data science. She is a recipient of NSF CAREER Award, AFOSR YIP Award, ONR YIP Award and IEEE SPS Young Author Paper Award.
Statistical State Compression and Primal-Dual Pi Learning
Prof. Mengdi Wang
Princeton University
Friday, January 25, 2019
11:00am – 12:00pm
EER 3.646 – Blaschke Conference Room
Abstract: This talk focuses on the statistical sample complexity and model reduction of Markov decision process (MDP). We begin by surveying recent advances on the complexity for solving MDP, without any dimension reduction. In the first part we study the statistical state compression of general Markov processes. We propose a spectral state compression method for learning state features and aggregation structures from data. The state compression method is able to “ sketch” a black-box Markov process from its empirical data, for which both minimax statistical guarantees and scalable computational tools are provided. In the second part, we propose a bilinear primal-dual pi learning method for finding the optimal policy, which utilizes given state and action features. The method is motivated from a saddle point formulation of the Bellman equation. Its sample complexity depends only on the number of parameters and is variant with respect to the dimension of the problem, making high-dimensional reinforcement learning possible using “small” data.
Bio: Mengdi Wang is an assistant professor at the Department of Operations Research and Financial Engineering at Princeton University. She is also affiliated with the Department of Computer Science and Princeton’s Center for Statistics and Machine Learning. Her research focuses on data-driven stochastic optimization and applications in machine and reinforcement learning. She received her PhD in Electrical Engineering and Computer Science from Massachusetts Institute of Technology in 2013. At MIT, Mengdi was affiliated with the Laboratory for Information and Decision Systems and was advised by Dimitri P. Bertsekas. Mengdi became an assistant professor at Princeton in 2014. She received the Young Researcher Prize in Continuous Optimization of the Mathematical Optimization Society in 2016 (awarded once every three years), the Princeton SEAS Innovation Award in 2016, the NSF Career Award in 2017, and the Google Faculty Award. She is currently serving as an associate editor for Operations Research.
For a complete list of previous talks in the WNCG Seminar Series, please visit the WNCG website at: