Publications

Below is a collection of my publications, including any preprints or technical reports. You can also check out my Google Scholar entry for a list that is more likely to be up-to-date.

Papers

  1. P. G. Sessa, R. Dadashi, L. Hussenot, J. Ferret, N. Vieillard, A. Ramé, B. Shahriari, S. Perrin, A. Friesen, G. Cideron, S. Girgin, P. Stanczyk, A. Michi, D. Sinopalnikov, S. Ramos, A. Héliou, A. Severyn, M. W. Hoffman, N. Momchev, and O. Bachem. (2024). BOND: Aligning LLMs with Best-of-N Distillation. Google DeepMind. [pdf] [bibtex]

  2. Gemma Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Ramé, J. Ferret, P. Liu, P. Tafti, A. Friesen, M. Casbon, S. Ramos, R. Kumar, C. L. Lan, S. Jerome, A. Tsitsulin, N. Vieillard, P. Stanczyk, S. Girgin, N. Momchev, M. W. Hoffman, S. Thakoor, J.-B. Grill, B. Neyshabur, O. Bachem, and et al. (2024). Gemma 2: Improving Open Language Models at a Practical Size. Google DeepMind. [pdf] [bibtex]

  3. M. W. Hoffman, B. Shahriari, J. Aslanides, G. Barth-Maron, N. Momchev, D. Sinopalnikov, P. Stańczyk, S. Ramos, A. Raichuk, D. Vincent, L. Hussenot, R. Dadashi, G. Dulac-Arnold, M. Orsini, A. Jacq, J. Ferret, N. Vieillard, S. K. S. Ghasemipour, S. Girgin, O. Pietquin, F. Behbahani, T. Norman, A. Abdolmaleki, A. Cassirer, F. Yang, K. Baumli, S. Henderson, A. Friesen, R. Haroun, A. Novikov, S. G. Colmenarejo, S. Cabi, C. Gulcehre, T. L. Paine, S. Srinivasan, A. Cowie, Z. Wang, B. Piot, and N. de Freitas. (2022). Acme: A Research Framework for Distributed Reinforcement Learning. Google DeepMind. [pdf] [bibtex]

  4. F. Yang, G. Barth-Maron, P. Stańczyk, M. W. Hoffman, S. Liu, M. Kroiss, A. Pope, and A. Rrustemi. (2021). Launchpad: A programming model for distributed machine learning research. Google DeepMind. [pdf] [bibtex]

  5. C. Gulcehre, S. G. Colmenarejo, Z. Wang, J. Sygnowski, T. Paine, K. Zolna, Y. Chen, M. W. Hoffman, R. Pascanu, and N. de Freitas. (2021). Regularized behavior value estimation. Google DeepMind. [pdf] [bibtex]

  6. C. Gulcehre, Z. Wang, A. Novikov, T. Paine, S. Gómez Colmenarejo, K. Zolna, R. Agarwal, J. S. Merel, D. J. Mankowitz, C. Paduraru, G. Dulac-Arnold, J. Li, M. Norouzi, M. W. Hoffman, N. Heess, and N. de Freitas. (2020). RL unplugged: A suite of benchmarks for offline reinforcement learning. Neural Information Processing Systems. [bibtex]

  7. Y. Chen, A. L. Friesen, F. Behbahani, A. Doucet, D. Budden, M. W. Hoffman, and N. de Freitas. (2020). Modular meta-learning with shrinkage. Neural Information Processing Systems. [bibtex]

  8. A. Gu, C. Gulcehre, T. L. Paine, M. W. Hoffman, and R. Pascanu. (2019). Improving the Gating Mechanism of Recurrent Neural Networks. Google DeepMind. [pdf] [bibtex]

  9. T. L. Paine, C. Gulcehre, B. Shahriari, M. Denil, M. W. Hoffman, H. Soyer, R. Tanburn, S. Kapturowski, N. Rabinowitz, D. Williams, G. Barth-Maron, Z. Wang, N. de Freitas, and W. Team. (2019). Making Efficient Use of Demonstrations to Solve Hard Exploration Problems. Google DeepMind. [pdf] [bibtex]

  10. B. Shillingford, Y. Assael, M. W. Hoffman, T. Paine, C. Hughes, U. Prabhu, H. Liao, H. Sak, K. Rao, L. Bennett, M. Mulville, B. Coppin, B. Laurie, A. Senior, and N. de Freitas. (2019). Large-scale visual speech recognition. In INTERSPEECH. [pdf] [bibtex]

  11. T. L. Paine, S. G. Colmenarejo, Z. Wang, S. Reed, Y. Aytar, T. Pfaff, M. W. Hoffman, G. Barth-Maron, S. Cabi, D. Budden, and N. de Freitas. (2018). One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL. arXiv:1810.05017. [pdf] [bibtex]

  12. G. Barth-Maron, M. W. Hoffman, D. Budden, W. Dabney, D. Horgan, D. and TB, A. Muldal, N. Heess, and T. Lillicrap. (2018). Distributed Distributional Deterministic Policy Gradients. In International Conference on Learning Representations. [pdf] [bibtex]

  13. S. Cabi, S. G. Colmenarejo, M. W. Hoffman, M. Denil, Z. Wang, and N. Freitas. (2017). The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously. In Conference on Robotic Learning. [pdf] [bibtex]

  14. Y. Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, T. P. Lillicrap, M. Botvinick, and N. de Freitas. (2017). Learning to learn without gradient descent by gradient descent. In International Conference on Machine Learning. [pdf] [bibtex]

  15. O. Wichrowska, N. Maheswaranathan, M. W. Hoffman, S. G. Colmenarejo, M. Denil, N. de Freitas, and J. Sohl-Dickstein. (2017). Learned optimizers that scale and generalize. International Conference on Machine Learning. [pdf] [bibtex]

  16. M. Andrychowicz, M. Denil, S. Gomez, M. W. Hoffman, D. Pfau, T. Schaul, and N. de Freitas. (2016). Learning to learn by gradient descent by gradient descent. In Neural Information Processing Systems. [pdf] [bibtex]

  17. J. M. Hernández-Lobato, M. A. Gelbart, R. P. Adams, M. W. Hoffman, and Z. Ghahramani. (2016). A general framework for constrained Bayesian optimization using information-based search. Journal of Machine Learning Research, 17. [pdf] [bibtex]

  18. M. W. Hoffman, and Z. Ghahramani. (2015). Output-Space Predictive Entropy Search for Flexible Global Optimization. In NIPS workshop on Bayesian optimization. [pdf] [bibtex]

  19. J. M. Hernández-Lobato, M. A. Gelbart, M. W. Hoffman, R. P. Adams, and Z. Ghahramani. (2015). Predictive Entropy Search for Bayesian Optimization with Unknown Constraints. In International Conference on Machine Learning. [pdf] [bibtex]

  20. B. Shahriari, Z. Wang, M. W. Hoffman, A. Bouchard-Côté, and N. de Freitas. (2015). An Entropy Search Portfolio for Bayesian Optimization. arXiv:1406.4625. [pdf] [bibtex]

  21. M. W. Hoffman, and B. Shahriari. (2014). Modular mechanisms for Bayesian optimization. In NIPS workshop on Bayesian optimization. [pdf] [bibtex]

  22. J. M. Hernández-Lobato, M. W. Hoffman, and Z. Ghahramani. (2014). Predictive Entropy Search for Efficient Global Optimization of Black-box Functions. In Neural Information Processing Systems. [pdf] [bibtex]

  23. M. W. Hoffman, B. Shahriari, and N. de Freitas. (2014). On correlation and budget constraints in model-based bandit optimization with application to automatic machine learning. In International Conference on Artificial Intelligence and Statistics. [pdf] [bibtex]

  24. M. W. Hoffman, and N. de Freitas. (2012). Inference strategies for solving semi-Markov decision processes. In L. E. Sucar, E. F. Morales, and J. Hoey (Eds.), Decision Theory Models for Applications in Artificial Intelligence: Concepts and Solutions. IGI Global. [pdf] [bibtex]

  25. M. W. Hoffman, A. Lazaric, M. Ghavamzadeh, and R. Munos. (2012). Regularized Least Squares Temporal Difference Learning with Nested ell_2 and ell_1 Penalization. In European Workshop on Reinforcement Learning. [pdf] [bibtex]

  26. M. Ghavamzadeh, A. Lazaric, M. W. Hoffman, and R. Munos. (2011). Finite-Sample Analysis of Lasso-TD. In International Conference on Machine Learning. [pdf] [bibtex]

  27. M. W. Hoffman, E. Brochu, and N. de Freitas. (2011). Portfolio Allocation for Bayesian Optimization. In Uncertainty in Artificial Intelligence. [pdf] [bibtex]

  28. M. W. Hoffman, H. Kueck, N. de Freitas, and A. Doucet. (2009). New inference strategies for solving Markov decision processes using reversible jump MCMC. In Uncertainty in Artificial Intelligence. [pdf] [bibtex]

  29. M. W. Hoffman, N. de Freitas, A. Doucet, and J. Peters. (2009). An Expectation Maximization algorithm for continuous Markov Decision Processes with arbitrary reward. In International Conference on Artificial Intelligence and Statistics. [pdf] [code] [bibtex]

  30. H. Kueck, M. W. Hoffman, A. Doucet, and N. de Freitas. (2009). Inference and Learning for Active Sensing, Experimental Design and Control. In Iberian Conference on Pattern Recognition and Image Analysis. [pdf] [bibtex]

  31. M. W. Hoffman, A. Doucet, N. de Freitas, and A. Jasra. (2007). Bayesian policy learning with trans-dimensional MCMC. In Neural Information Processing Systems. [pdf] [bibtex]

  32. M. W. Hoffman, A. Doucet, N. de Freitas, and A. Jasra. (2007). On solving general state-space sequential decision problems using inference algorithms (No. TR-2007-04). University of British Columbia, Computer Science. [pdf] [bibtex]

  33. M. W. Hoffman, D. B. Grimes, A. P. Shon, and R. P. N. Rao. (2006). A probabilistic model of gaze imitation and shared attention. Neural Networks, 19. [pdf] [bibtex]

  34. A. P. Shon, D. B. Grimes, C. L. Baker, M. W. Hoffman, S. Zhou, and R. P. N. Rao. (2005). Probabilistic gaze imitation and saliency learning in a robotic head. In International Conference on Robotics and Automation. [pdf] [bibtex]

Thesis

  1. M. W. Hoffman. (2013). Decision making with inference and learning methods (PhD thesis). University of British Columbia. [pdf] [bibtex]