About me
I’m a Member of Technical Staff at AMI Labs. I think about how to build visually intelligent systems. My research is on multimodal generation, understanding, and open world perception, with the goal of building a unified world model.
I build large-scale systems and research how they should work. I’ve explored concurrent mixed-modal generation [OneFlow], temporally expansive video generation [Flowception], scaling laws for multimodal pretraining [Beyond Language Modeling], and tokenization-free language modeling [Byte Latent Transformer, ACL Outstanding Paper]. On the systems side, I built Opacus (4M+ downloads) for differentially private model training, and Papaya (MLSys 2022), a large-scale asynchronous federated learning platform deployed to millions of users.
Previously, I was at Meta (FAIR) working on large multimodal models and federated learning. I graduated Cum Laude from UC Davis with double majors in Statistics and Computer Science (2018) and M.S. in Computer Science (2019).
Publications (see all)
2026
Beyond Language Modeling: An Exploration of Multimodal Pretraining
- Shengbang Tong*, David Fan*, John Nguyen*, Ellis Brown, Gaoyue Zhou, Shengyi Qian, Boyang Zheng, Théophane Vallaeys, Junlin Han, Rob Fergus, Naila Murray, Marjan Ghazvininejad, Mike Lewis, Nicolas Ballas, Amir Bar, Michael Rabbat, Jakob Verbeek, Luke Zettlemoyer, Koustuv Sinha, Yann LeCun, Saining Xie
- *Joint first author
- Website
2025
Flowception: Temporally Expansive Flow Matching for Video Generation
- Tariq Berrada Ifriqi, John Nguyen, Karteek Alahari, Jakob Verbeek, Ricky T. Q. Chen
OneFlow: Concurrent Mixed-Modal and Interleaved Generation with Edit Flows
- John Nguyen, Marton Havasi, Tariq Berrada, Luke Zettlemoyer, Ricky T. Q. Chen
- Website
VUGEN: Visual Understanding priors for GENeration
- Xingyi Chen, Tom Vallaeys, Maha Elbayad, John Nguyen, Jakob Verbeek
2024
Byte Latent Transformer: Patches Scale Better Than Tokens
- Artidoro Pagnoni, Ram Pasunuru*, Pedro Rodriguez *, John Nguyen *, Benjamin Muller, Margaret Li, Chunting Zhou, Lili Yu, Jason Weston, Luke Zettlemoyer, Gargi Ghosh, Mike Lewis, Ari Holtzman, Srinivasan Iyer
- *Joint second author
- Outstanding Paper Award ACL 2025
Now It Sounds Like You: Learning Personalized Vocabulary On Device
- Sid Wang, Ashish Shenoy, Pierce Chuang, John Nguyen
- AAAI 2024 Spring Symposium
2023
READ: Recurrent Adaptation of Large Transformers
- John Nguyen*, Sid Wang*, Ke Li, Carole-Jean Wu
- NeurIPS 2023 R0-FoMo: Robustness of Few-shot and Zero-shot Learning in Foundation Models Workshop
On Noisy Evaluation in Federated Hyperparameter Tuning
- Kevin Kuo, Pratiksha Thaker, Mikhail Khodak, John Nguyen, Daniel Jiang, Ameet Talwalkar, Virginia Smith
- Conference on Machine Learning and Systems (MLSys), 2023
Where to Begin? Exploring the Impact of Pre-Training and Initialization in Federated Learning
- John Nguyen, Jianyu Wang, Kshitiz Malik, Maziar Sanjabi, Michael Rabbat
- Spotlight at International Conference on Learning Representations (ICLR) 2023
- Presentation
2022
- Kiwan Maeng, Haiyu Lu, Luca Melis, John Nguyen, Mike Rabbat, Carole-Jean Wu
- Best Paper Finalist Award at the ACM Conference Series on Recommender Systems (RecSys), 2022
Papaya: Practical, Private, and Scalable Federated Learning
- Dzmitry Huba, John Nguyen, Kshitiz Malik, Ruiyu Zhu, Mike Rabbat, Ashkan Yousefpour, Carole-Jean Wu, Hongyuan Zhan, Pavel Ustinov, Harish Srinivas, Kaikai Wang, Anthony Shoumikhin, Jesik Min, Mani Malek
- Conference on Machine Learning and Systems (MLSys), 2022
Federated Learning with Buffered Asynchronous Aggregation
- John Nguyen, Kshitiz Malik, Hongyuan Zhan, Ashkan Yousefpour, Mike Rabbat, Mani Malek, Dzmitry Huba
- International Conference on Artificial Intelligence and Statistics (AISTATS), 2022
- Presentation
2021
Opacus: User-Friendly Differential Privacy Library in PyTorch
- Ashkan Yousefpour*, Igor Shilov*, Alexandre Sablayrolles*, Davide Testuggine, Karthik Prasad, Mani Malek, John Nguyen, Sayan Ghosh, Akash Bharadwaj, Jessica Zhao, Graham Cormode, Ilya Mironov
- ∗Equal contribution
- Privacy in Machine Learning (PriML) workshop, NeurIPS 2021
