Om Thakkar

Member of Technical Staff
OpenAI
San Francisco, CA, USA
Email : om "at" openai.com

Short Bio

I am a research engineer at OpenAI, working on privacy research in the team of Vinnie Monaco. My research is in privacy-preserving AI, with a specific focus on differential privacy and its applications to deep learning in production systems (including language and speech models).

Before joining OpenAI, I was a senior research scientist at Google, where I worked for 5 years. Prior to Google, I graduated with a Ph.D. in Computer Science from Boston University (BU) in 2019. I was very fortunate to be advised by Dr. Adam Smith. At BU, I was a part of the Security group, and the Theoretical Computer Science group. I completed the first 3.5 years of my Ph.D. in the Department of Computer Science and Engineering at the Pennsylvania State University, advised by Dr. Adam Smith. Before joining Penn State, I completed my B.Tech. in Information and Communication Technology from the Dhirubhai Ambani University, India in 2014.

News

Our paper titled Improving Streaming ASR via Differentially Private Fusion of Data from Multiple Sources has been accepted to appear at ASRU 2025.
I am giving a talk on Privacy Leakage in Speech Models: Attacks and Mitigations @ the CISPA - ELLIS - Summer School 2025 on Trustworthy AI - Secure and Safe Foundation Models (August 2025).
I am part of the panel discussion @ the The Impact of Memorization on Trustworthy Foundation Models – MemFM at ICML 2025 (July 2025).
I am giving a talk on Privacy Leakage in Speech Models: Attacks and Mitigations @ the UMass AI&Sec Security and Privacy seminar (March 2025).
Our paper titled Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models has been accepted to appear at Interspeech 2025.
Our paper titled Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints has been accepted to appear at PETS 2025.
Two papers (Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping, and Quantifying Unintended Memorization in BEST-RQ ASR Encoders) (oral presentation) accepted to appear in Interspeech 2024.
Two papers (Unintended Memorization in Large ASR Models, and How to Mitigate It, and Noise Masking Attacks and Defenses for Pretrained Speech Models) accepted to appear in ICASSP 2024.
Our paper titled Why Is Public Pretraining Necessary for Private Model Training? has been accepted to appear at ICML 2023.
Our paper titled Measuring Forgetting of Memorized Training Examples has been accepted to appear at ICLR 2023 .
Our paper titled Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints has been accepted to appear at PPAI 2023 (AAAI 2023) as an oral presentation.

Resume

My most recent resume (last updated in August, 2025) can be found here.

Manuscripts

Patents

Leveraging Intermediate Checkpoints To Improve The Performance of Trained Differentially Private Models.
Om Thakkar, Arun Ganesh, Virat Shejwalkar, Abhradeep Thakurta, and Rajiv Mathews.
Published US Patent 2024/0095594.
Detecting Unintended Memorization in Language-Model-Fused ASR Systems.
W. Ronny Huang, Steve Chien, Om Thakkar, and Rajiv Mathews.
Published US Patent 2023/0335126.
Generating and/or Utilizing Unintentional Memorization Measure(s) for Automatic Speech Recognition Model(s).
Om Thakkar, Hakim Sidahmed, W. Ronny Huang, Rajiv Mathews, Françoise Beaufays, and Florian Tramèr.
Published US Patent 2023/0317082.
Server Efficient Ehnancement of Privacy in Federated Learning.
Om Thakkar, Peter Kairouz, Brendan McMahan, Borja Balle, and Abhradeep Thakurta.
Published US Patent 2023/0223028.
Phrase Extraction for ASR Models.
Ehsan Amid, Om Thakkar, Rajiv Mathews and Françoise Beaufays.
Published US Patent 2023/0178094.
Leveraging Public Data in Training Neural Networks with Private Mirror Descent.
Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith Suriyakumar, Om Thakkar, and Abhradeep Thakurta.
Published US Patent 2023/0103911.
Ascertaining And/or Mitigating Extent of Effective Reconstruction, of Predictions, From Model Updates Transmitted in Federated Learning.
Om Thakkar, Trung Dang, Swaroop Ramaswamy, Rajiv Mathews, and Françoise Beaufays.
Published US Patent 2022/0383204.
Mixed Client-Server Federated Learning.
Françoise Beaufays, Swaroop Ramaswamy, Rajiv Mathews, Om Thakkar, and Andrew Hard.
Published US Patent 2022/0293093.

Publications

Papers available here may be subject to copyright, and are intended for personal, non-commercial use only. Unless specifically indicated, all publications have authors listed in the alphabetical order of last names (as per the convention in theoretical computer science).

Improving Streaming ASR via Differentially Private Fusion of Data from Multiple Sources.
Virat Shejwalkar, Om Thakkar, Steve Chien, Nicole Rafidi, and Arun Narayanan.
In ASRU 2025.
Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models.
Hongbin Liu, Lun Wang, Om Thakkar, Abhradeep Thakurta, and Arun Narayanan. (In order of contribution).
In Interspeech 2025.
Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints.
Virat Shejwalkar, Arun Ganesh*, Rajiv Mathews*, Yarong Mu*, Shuang Song*, Om Thakkar*, Abhradeep Thakurta*, and Xinyi Zheng*. (*Alphabetical order.)
In Privacy Enhancing Technologies Symposium (PETS) 2025.
Efficiently Train ASR Models that Memorize Less and Perform Better with Per-core Clipping.
Lun Wang, Om Thakkar, Zhong Meng, Nicole Rafidi, Rohit Prabhavalkar, and Arun Narayanan. (In order of contribution).
In Interspeech 2024.
Quantifying Unintended Memorization in BEST-RQ ASR Encoders.
Virat Shejwalkar, Om Thakkar, and Arun Narayanan. (In order of contribution).
In Interspeech 2024. Accepted for an oral presentation.
Unintended Memorization in Large ASR Models, and How to Mitigate It.
Lun Wang, Om Thakkar, and Rajiv Mathews. (In order of contribution).
In the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024).
Noise Masking Attacks and Defenses for Pretrained Speech Models.
Matthew Jagielski, Om Thakkar, and Lun Wang. (In order of contribution).
In the 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2024).
Why Is Public Pretraining Necessary for Private Model Training?
Arun Ganesh, Mahdi Haghifam, Milad Nasr, Sewoong Oh, Thomas Steinke, Om Thakkar, Abhradeep Thakurta, and Lun Wang.
In the Fortieth International Conference on Machine Learning (ICML 2023).
Measuring Forgetting of Memorized Training Examples.
Matthew Jagielski, Om Thakkar, Florian Tramèr, Daphne Ippolito, Katherine Lee, Nicholas Carlini, Eric Wallace, Shuang Song, Abhradeep Thakurta, Nicolas Papernot, and Chiyuan Zhang. (In order of contribution)
In the Eleventh International Conference on Learning Representations (ICLR 2023).
Extracting Targeted Training Data from ASR Models, and How to Mitigate It.
Ehsan Amid*, Om Thakkar*, Arun Narayanan, Rajiv Mathews, and Françoise Beaufays. (In order of contribution)
In Interspeech 2022. Accepted for an oral presentation.
*Equal contribution.
Detecting Unintended Memorization in Language-Model-Fused ASR.
W. Ronny Huang, Steve Chien, Om Thakkar, and Rajiv Mathews. (In order of contribution)
In Interspeech 2022. Accepted for an oral presentation.
Public Data-Assisted Mirror Descent for Private Model Training.
Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith Suriyakumar, Om Thakkar, and Abhradeep Thakurta.
In the Thirty-ninth International Conference on Machine Learning (ICML 2022).
A Method to Reveal Speaker Identity in Distributed ASR Training, and How to Counter It. Abstract▼
End-to-end Automatic Speech Recognition (ASR) models are commonly trained over spoken utterances using optimization methods like Stochastic Gradient Descent (SGD). In distributed settings like Federated Learning, model training requires transmission of gradients over a network. In this work, we design the first method for revealing the identity of the speaker of a training utterance with access only to a gradient. We propose Hessian-Free Gradients Matching, an input reconstruction technique that operates without second derivatives of the loss function (required in prior works), which can be expensive to compute. We show the effectiveness of our method using the DeepSpeech model architecture, demonstrating that it is possible to reveal the speaker's identity with 34% top-1 accuracy (51% top-5 accuracy) on the LibriSpeech dataset. Further, we study the effect of two well-known techniques, Differentially Private SGD and Dropout, on the success of our method. We show that a dropout rate of 0.2 can reduce the speaker identity accuracy to 0% top-1 (0.5% top-5).

Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, and Françoise Beaufays. (In order of contribution)
In the Forty-Seventh International Conference on Acoustics, Speech, & Signal Processing (ICASSP 2022).
The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection. Abstract▼
Hyperparameter optimization is a ubiquitous challenge in machine learning, and the performance of a trained model depends crucially upon their effective selection. While a rich set of tools exist for this purpose, there are currently no practical hyperparameter selection methods under the constraint of differential privacy (DP). We study honest hyperparameter selection for differentially private machine learning, in which the process of hyperparameter tuning is accounted for in the overall privacy budget. To this end, we i) show that standard composition tools outperform more advanced techniques in many settings, ii) empirically and theoretically demonstrate an intrinsic connection between the learning rate and clipping norm hyperparameters, iii) show that adaptive optimizers like DPAdam enjoy a significant advantage in the process of honest hyperparameter tuning, and iv) draw upon novel limiting behaviour of Adam in the DP setting to design a new and more efficient optimizer.

Shubhankar Mohapatra, Sajin Sasy, Gautam Kamath*, Xi He*, and Om Thakkar*. (*Alphabetical order.)
In the Thirty-Sixth AAAI Conference on Artificial Intelligence (AAAI 2022). Accepted for an oral presentation.
Differentially Private Learning with Adaptive Clipping. Abstract▼
Existing approaches for training neural networks with user-level differential privacy (e.g., DP Federated Averaging) in federated learning (FL) settings involve bounding the contribution of each user's model update by clipping it to some constant value. However there is no good a priori setting of the clipping norm across tasks and learning settings: the update norm distribution depends on the model architecture and loss, the amount of data on each device, the client learning rate, and possibly various other parameters. We propose a method wherein instead of a fixed clipping norm, one clips to a value at a specified quantile of the update norm distribution, where the value at the quantile is itself estimated online, with differential privacy. The method tracks the quantile closely, uses a negligible amount of privacy budget, is compatible with other federated learning technologies such as compression and secure aggregation, and has a straightforward joint DP analysis with DP-FedAvg. Experiments demonstrate that adaptive clipping to the median update norm works well across a range of realistic federated learning tasks, sometimes outperforming even the best fixed clip chosen in hindsight, and without the need to tune any clipping hyperparameter.

Galen Andrew, Om Thakkar, Swaroop Ramaswamy, and Brendan McMahan. (In order of contribution)
In the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021).
Revealing and Protecting Labels in Distributed Training. Abstract▼
Distributed learning paradigms such as federated learning often involve transmission of model updates, or gradients, over a network, thereby avoiding transmission of private data. However, it is possible for sensitive information about the training data to be revealed from such gradients. Prior works have demonstrated that labels can be revealed analytically from the last layer of certain models (e.g., ResNet), or they can be reconstructed jointly with model inputs by using Gradients Matching [Zhu et al'19] with additional knowledge about the current state of the model. In this work, we propose a method to discover the set of labels of training samples from only the gradient of the last layer and the id to label mapping. Our method is applicable to a wide variety of model architectures across multiple domains. We demonstrate the effectiveness of our method for model training in two domains - image classification, and automatic speech recognition. Furthermore, we show that existing reconstruction techniques improve their efficacy when used in conjunction with our method. Conversely, we demonstrate that gradient quantization and sparsification can significantly reduce the success of the attack.

Trung Dang, Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, Peter Chin, and Françoise Beaufays. (In order of contribution)
In the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021).
Practical and Private (Deep) Learning without Sampling or Shuffling. Abstract▼
We consider training models with differential privacy (DP) using mini-batch gradients. The existing state-of-the-art, Differentially Private Stochastic Gradient Descent (DP-SGD), requires privacy amplification by sampling or shuffling to obtain the best privacy/accuracy/computation trade-offs. Unfortunately, the precise requirements on exact sampling and shuffling can be hard to obtain in important practical scenarios, particularly federated learning (FL). We design and analyze a DP variant of Follow-The-Regularized-Leader (DP-FTRL) that compares favorably (both theoretically and empirically) to amplified DP-SGD, while allowing for much more flexible data access patterns. DP-FTRL does not use any form of privacy amplification.

Peter Kairouz, Brendan McMahan, Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu.
In the 38th International Conference on Machine Learning (ICML 2021).
Evading the Curse of Dimensionality in Unconstrained Private GLMs. Abstract▼
We revisit the well-studied problem of differentially private empirical risk minimization (ERM). We show that for unconstrained convex generalized linear models (GLMs), one can obtain an excess empirical risk of $\tilde O\left(\sqrt{\texttt{rank}}/\epsilon n\right)$, where $\texttt{rank}$ is the rank of the feature matrix in the GLM problem, $n$ is the number of data samples, and $\epsilon$ is the privacy parameter. This bound is attained via differentially private gradient descent (DP-GD). Furthermore, via the first lower bound for unconstrained private ERM, we show that our upper bound is tight. In sharp contrast to the constrained ERM setting, there is no dependence on the dimensionality of the ambient model space ($p$). (Notice that $\texttt{rank}\leq \min\{n, p\}$.) Besides, we obtain an analogous excess population risk bound which depends on $\texttt{rank}$ instead of $p$.

For the smooth non-convex GLM setting (i.e., where the objective function is non-convex but preserves the GLM structure), we further show that DP-GD attains a dimension-independent convergence of $\tilde O\left(\sqrt{\texttt{rank}}/\epsilon n\right)$ to a first-order-stationary-point of the underlying objective.

Finally, we show that for convex GLMs, a variant of DP-GD commonly used in practice (which involves clipping the individual gradients) also exhibits the same dimension-independent convergence to the minimum of a well-defined objective. To that end, we provide a structural lemma that characterizes the effect of clipping on the optimization profile of DP-GD.

Shuang Song, Thomas Steinke, Om Thakkar, and Abhradeep Thakurta.
In the 24th International Conference on Artificial Intelligence and Statistics (AISTATS 2021).
- Poster
- Video presented as an oral talk at the TPDP 2020 workshop.
Privacy Amplification via Random Check-Ins. Abstract▼
Differentially Private Stochastic Gradient Descent (DP-SGD) forms a fundamental building block in many applications for learning over sensitive data. Two standard approaches, privacy amplification by subsampling, and privacy amplification by shuffling, permit adding lower noise in DP-SGD than via naïve schemes. A key assumption in both these approaches is that the elements in the data set can be uniformly sampled, or be uniformly permuted — constraints that may become prohibitive when the data is processed in a decentralized or distributed fashion. In this paper, we focus on conducting iterative methods like DP-SGD in the setting of federated learning (FL) wherein the data is distributed among many devices (clients). Our main contribution is the random check-in distributed protocol, which crucially relies only on randomized participation decisions made locally and independently by each client. It has privacy/accuracy trade-offs similar to privacy amplification by subsampling/shuffling. However, our method does not require server-initiated communication, or even knowledge of the population size. To our knowledge, this is the first privacy amplification tailored for a distributed learning framework, and it may have broader applicability beyond FL. Along the way, we improve the privacy guarantees of amplification by shuffling and show that, in practical regimes, this improvement allows for similar privacy and utility using data from an order of magnitude fewer users.

Borja Balle, Peter Kairouz, Brendan McMahan, Om Thakkar, and Abhradeep Thakurta.
In the 34th Conference on Neural Information Processing Systems (NeurIPS 2020).
Guaranteed Validity for Empirical Approaches to Adaptive Data Analysis. Abstract▼
We design a general framework for answering adaptive statistical queries that focuses on providing explicit confidence intervals along with point estimates. Prior work in this area has either focused on providing tight confidence intervals for specific analyses, or providing general worst-case bounds for point estimates. Unfortunately, as we observe, these worst-case bounds are loose in many settings --- often not even beating simple baselines like sample splitting. Our main contribution is to design a framework for providing valid, instance-specific confidence intervals for point estimates that can be generated by heuristics. When paired with good heuristics, this method gives guarantees that are orders of magnitude better than the best worst-case bounds. We provide a Python library implementing our method.

Ryan Rogers, Aaron Roth, Adam Smith, Nathan Srebro, Om Thakkar, and Blake Woodworth.
In the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS 2020).
- Conference Talk Video
- Code Repo
Advances in Privacy-Preserving Machine Learning.
Ph.D. Thesis, BU, September 2019.
Supervisor: Dr. Adam Smith.
Towards Practical Differentially Private Convex Optimization. Abstract▼
Building useful predictive models often involves learning from sensitive data. Training models with differential privacy can guarantee the privacy of such sensitive data. For convex optimization tasks, several differentially private algorithms are known, but none has yet been deployed in practice.

In this work, we make two major contributions towards practical differentially private convex optimization. First, we present Approximate Minima Perturbation, a novel algorithm that can leverage any off-the-shelf optimizer. We show that it can be employed without any hyperparameter tuning, thus making it an attractive technique for practical deployment. Second, we perform an extensive empirical evaluation of the state-of-the-art algorithms for differentially private convex optimization, on a range of publicly available benchmark datasets, and real-world datasets obtained through an industrial collaboration. We release open-source implementations of all the differentially private convex optimization algorithms considered, and benchmarks on as many as nine public datasets, four of which are high-dimensional.

Roger Iyengar, Joseph P. Near, Dawn Song, Om Thakkar, Abhradeep Thakurta, and Lun Wang.
In the 40th IEEE Symposium on Security and Privacy (S&P 2019).
- Conference Talk Video (Preview)
- Code Repo
Model-Agnostic Private Learning. Abstract▼
We design differentially private learning algorithms that are agnostic to the learning model assuming access to a limited amount of unlabeled public data. First, we provide a new differentially private algorithm for answering a sequence of $m$ online classification queries (given by a sequence of $m$ unlabeled public feature vectors) based on a private training set. Our algorithm follows the paradigm of subsample-and-aggregate, in which any generic non-private learner is trained on disjoint subsets of the private training set, and then for each classification query, the votes of the resulting classifiers ensemble are aggregated in a differentially private fashion. Our private aggregation is based on a novel combination of the distance-to-instability framework, and the sparse-vector technique. We show that our algorithm makes a conservative use of the privacy budget. In particular, if the underlying non-private learner yields a classification error of at most $\alpha\in (0, 1)$, then our construction answers more queries, by at least a factor of $1/\alpha$ in some cases, than what is implied by a straightforward application of the advanced composition theorem for differential privacy. Next, we apply the knowledge transfer technique to construct a private learner that outputs a classifier, which can be used to answer an unlimited number of queries. In the (agnostic) PAC model, we analyze our construction and prove upper bounds on the sample complexity for both the realizable and the non-realizable cases. Similar to non-private sample complexity, our bounds are completely characterized by the VC dimension of the concept class.

Raef Bassily, Om Thakkar, and Abhradeep Thakurta.
In the 32nd Conference on Neural Information Processing Systems (NeurIPS 2018). Accepted for an oral presentation.
- Conference Talk Video
- Poster
Differentially Private Matrix Completion Revisited. Abstract▼
We provide the first provably joint differentially private algorithm with formal utility guarantees for the problem of user-level privacy-preserving collaborative filtering. Our algorithm is based on the Frank-Wolfe method, and it consistently estimates the underlying preference matrix as long as the number of users $m$ is $\omega(n^{5/4})$, where $n$ is the number of items, and each user provides her preference for at least $\sqrt{n}$ randomly selected items. Along the way, we provide an optimal differentially private algorithm for singular vector computation, based on the celebrated Oja's method, that provides significant savings in terms of space and time while operating on sparse matrices. We also empirically evaluate our algorithm on a suite of datasets, and show that it provides nearly same accuracy as the state-of-the-art non-private algorithm, and outperforms the state-of-the-art private algorithm by as much as 30%.

Prateek Jain, Om Thakkar, and Abhradeep Thakurta.
In the 35th International Conference on Machine Learning (ICML 2018). Presented as a long talk.
- Conference Talk Video
- Poster
Max-Information, Differential Privacy, and Post-Selection Hypothesis Testing. Abstract▼
In this paper, we initiate a principled study of how the generalization properties of approximate differential privacy can be used to perform adaptive hypothesis testing, while giving statistically valid $p$-value corrections. We do this by observing that the guarantees of algorithms with bounded approximate max-information are sufficient to correct the $p$-values of adaptively chosen hypotheses, and then by proving that algorithms that satisfy $(\epsilon,\delta)$-differential privacy have bounded approximate max-information when their inputs are drawn from a product distribution.

This substantially extends the existing connection between differential privacy and max-information, which previously was only known to hold for (pure) $(\epsilon,0)$-differential privacy. It also extends our understanding of max-information as a partially unifying measure controlling the generalization properties of adaptive data analyses. We also show a lower bound, proving that (despite the strong composition properties of max-information), when data is drawn from a product distribution, $(\epsilon,\delta)$-differentially private algorithms can come first in a composition with other algorithms satisfying max-information bounds, but not necessarily second if the composition is required to itself satisfy a nontrivial max-information bound. This, in particular, implies that the connection between $(\epsilon,\delta)$-differential privacy and max-information holds only for inputs drawn from particular distributions, unlike the connection between $(\epsilon,0)$-differential privacy and max-information.

Ryan Rogers, Aaron Roth, Adam Smith, and Om Thakkar.
In the 57th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2016).
- Poster

Workshop Publications

Training Large ASR Encoders with Differential Privacy.
Geeticka Chauhan, Steve Chien, Om Thakkar, Abhradeep Thakurta, and Arun Narayanan. (In order of contribution)
In the 2024 IEEE Spoken Language Technology Workshop (SLT 2024).
Differentially Private Parameter-Efficient Fine-tuning for Large ASR Models.
Hongbin Liu, Lun Wang, Om Thakkar, Abhradeep Thakurta, and Arun Narayanan. (In order of contribution)
In the 7th Deep Learning Security and Privacy Workshop (DLSP 2024), and the Theory and Practice of Differential Privacy Workshop (TPDP 2024).
Recycling Scraps: Improving Private Learning by Leveraging Intermediate Checkpoints.
Virat Shejwalkar, Arun Ganesh*, Rajiv Mathews*, Om Thakkar*, and Abhradeep Thakurta*.
In the Fourth AAAI Workshop on Privacy-Preserving Artificial Intelligence (PPAI 2023). Accepted for an oral presentation.
*Alphabetical order.
Public Data-Assisted Mirror Descent for Private Model Training.
Ehsan Amid, Arun Ganesh, Rajiv Mathews, Swaroop Ramaswamy, Shuang Song, Thomas Steinke, Vinith Suriyakumar, Om Thakkar, and Abhradeep Thakurta.
In the Theory and Practice of Differential Privacy (TPDP) 2022 workshop (ICML 2022).
Practical and Private (Deep) Learning without Sampling or Shuffling.
Peter Kairouz, Brendan McMahan,Shuang Song, Om Thakkar, Abhradeep Thakurta, and Zheng Xu.
In the Theory and Practice of Differential Privacy (TPDP) 2021 workshop (ICML 2021).
The Role of Adaptive Optimizers for Honest Private Hyperparameter Selection.
Shubhankar Mohapatra, Sajin Sasy,Gautam Kamath*, Xi He*, and Om Thakkar*.
In the Theory and Practice of Differential Privacy (TPDP) 2021 workshop (ICML 2021).
*Alphabetical order.
Training Production Language Models without Memorizing User Data.
Swaroop Ramaswamy*, Om Thakkar*, Rajiv Mathews, Galen Andrew, Brendan McMahan, and Françoise Beaufays. (In order of contribution)
In the Privacy Preserving Machine Learning (PPML) 2020 workshop (NeurIPS 2020). Accepted for an oral presentation.
*Equal contribution.
- Video presented as an oral talk at the PPML 2020 workshop.
Privacy Amplification via Random Check-Ins.
Borja Balle, Peter Kairouz, Brendan McMahan, Om Thakkar, and Abhradeep Thakurta.
In the Theory and Practice of Differential Privacy (TPDP) 2020 workshop (CCS 2020).
Understanding Unintended Memorization in Federated Learning. Abstract▼
Recent works have shown that generative sequence models (e.g., next word prediction models) have a tendency to memorize rare or unique sequences in the training data. Since useful models are often trained on sensitive data, it is critical to identify and mitigate such unintended memorization. Federated Learning (FL) has emerged as a novel framework for large-scale distributed learning tasks. It differs in many aspects from the well-studied central learning setting where all the data is stored at the central server, and minibatch stochastic gradient descent is used to conduct training. This work is motivated by our observation that next word prediction models trained under FL exhibited remarkably less propensity to such memorization compared to the central learning setting. Thus, we initiate a formal study to understand the effect of different components of FL on unintended memorization in trained models. Our results show that several differing components of FL play an important role in reducing unintended memorization. To our surprise, we discover that the clustering of data according to users---which happens by design in FL---has the most significant effect in reducing such memorization. Moreover, using the method of Federated Averaging with larger effective minibatch sizes for training causes a further reduction. We also demonstrate that training in FL with a user-level differential privacy guarantee results in models that can provide high utility while being resilient to memorizing out-of-distribution phrases with thousands of insertions across over a hundred users in the training set.

Om Thakkar, Swaroop Ramaswamy, Rajiv Mathews, and Françoise Beaufays. (In order of contribution)
In the PrivateNLP workshop (NAACL’21), the Theory and Practice of Differential Privacy (TPDP) 2020 workshop (CCS 2020), and the Privacy Preserving Machine Learning (PPML) 2020 workshop (NeurIPS 2020).
- Poster
Characterizing Private Clipped Gradient Descent on Convex Generalized Linear Problems.
Shuang Song, Om Thakkar, and Abhradeep Thakurta.
In the Theory and Practice of Differential Privacy (TPDP) 2020 workshop, and the Privacy Preserving Machine Learning (PPML) 2020 workshop (NeurIPS 2020). Accepted for an oral presentation at TPDP 2020 (CCS 2020).
- Poster

Interns

Geeticka Chauhan (Summer'23-Fall'23, co-hosted with Abhradeep Thakurta)
Hongbin Liu (Summer'23-Fall'23, co-hosted with Lun Wang)
Virat Shejwalkar (Summer'22-Fall'22, co-hosted with Abhradeep Thakurta)
Vinith Suriyakumar (Summer'21, co-hosted with Swaroop Ramaswamy)
Trung Dang (Summer'20-Spring'21, co-hosted with Swaroop Ramaswamy)

Internships and Research Visits

Visiting Graduate Student in the Data Privacy program at the Simons Institute, Berkeley during Spring'19.
Research Intern at Google Brain, Mountain View, CA during Summer 2018. Mentors: Úlfar Erlingsson, and Kunal Talwar.
Visiting Student Researcher at University of California, Berkeley, CA during Fall 2017. Host: Dr. Dawn Song.
Research Intern at Google, Seattle, WA during Summer 2017. Mentors: Brendan McMahan, and Martin Pelikan.
Research Intern in the CoreOS: Machine Learning team at Apple, Cupertino, CA during Summer 2016.

Talks and Poster Presentations

Privacy Leakage in Speech Models: Attacks and Mitigations

@ the CISPA - ELLIS - Summer School 2025 on Trustworthy AI - Secure and Safe Foundation Models on August 4, 2025. (Slides)
@ the UMass AI&Sec Security and Privacy seminar on March 31, 2025. (Slides)

Training in Prod Federated Learning with Formal Differential Privacy

@ the Privacy Paradox: AI and Digital Platforms elective, IIM-Ahmedabad, India on October 11, 2022. (Slides)
@ the Privacy Paradox: AI and Digital Platforms elective, IIM-Ahmedabad, India on July 1, 2022. (Slides)
@ the 2022 HAI Spring Conference on Key Advances in Artificial Intelligence on April 12, 2022. (Talk video)

Part of the panel discussion on Accountable AI.

Google AI blog post on which the talks were based.

DP-FTRL in Practice @ the Data Privacy: Foundations and Applications Reunion workshop, Simons Institute, Berkeley on March 14, 2022.
Practical and Private (Deep) Learning without Sampling or Shuffling @ the Boston-area Data Privacy Seminar on February 28, 2022.
Understanding Unintended Memorization in Language Models Under Federated Learning

@ the PrivateNLP 2021 Workshop (NAACL 2021) on June 11, 2021.
@ PPML2020 Workshop (NeurIPS 2020) on December 11, 2020.
@ TPDP 2020 Workshop (CCS 2020) on November 13, 2020.

Towards Training Provably Private Models via Federated Learning in Practice

@ PPML2020 Workshop (NeurIPS 2020) on December 11, 2020.
@ the Workshop on Federated Learning and Analytics 2020 (Google) on July 29, 2020.

Characterizing Private Clipped Gradient Descent on Convex Generalized Linear Problems

@ PPML2020 Workshop (NeurIPS 2020) on December 11, 2020.
@ TPDP 2020 Workshop (CCS 2020) on November 13, 2020.

Privacy Amplification via Random Check-Ins

@ NeurIPS 2020 on December 8, 2020.
@ TPDP 2020 Workshop (CCS 2020) on November 13, 2020.
@ the Ph.D. Intern Research Conference 2020 (Google) on July 22, 2020.

Towards Practical Differentially Private Convex Optimization

@ the Future of Privacy Forum booth, Global Privacy Summit 2019, Washington, DC on May 3, 2019. (Coverage)
@ the Privacy Tools Project meeting, Harvard on March 5, 2018.

Model-Agnostic Private Learning,

@ the 2019 IEEE North American School of Information Theory (NASIT), held at BU, on July 3, 2019. (Poster)
@ the 2018 Open AIR: Industry Open House, BU on October 12, 2018. (Poster)

Building Tools for Controlling Overfitting in Adaptive Data Analysis, @ the Adaptive Data Analysis workshop, Simons Institute, Berkeley on July 7, 2018.
Differentially Private Matrix Completion Revisited

@ the Mathematical Foundations of Data Privacy workshop, BIRS on May 2, 2018. (Talk video)
@ the BU Data Science (BUDS) Day, Boston University on January 26, 2018. (Poster)
@ the Privacy Tools Data Sharing workshop, Harvard University on December 12, 2017. (Poster)
@ the Security Seminar, UC Berkeley on October 9, 2017.

A brief introduction to Concentrated Differential Privacy, @ CSE Theory Seminar, Penn State on April 14, 2017.
Max-Information, Differential Privacy, and Post-selection Hypothesis Testing

@ INSR Industry Day, Penn State on April 24, 2017. (Poster)
@ SMAC Talks, Penn State on December 2, 2016.
@ CSE Theory Seminar, UCSD on November 7, 2016.
@ CSE Theory Seminar, Penn State on October 14, 2016.

Max-Information and Differential Privacy, @ CSE Theory Seminar, Penn State on May 5, 2016.
The Stable Roommates Problem with Random Preferences, @ CSE Theory Seminar, Penn State on April 10, 2015.
The Multiplicative Weights Update Method and an Application to Solving Zero-Sum Games Approximately, @ CSE Theory Seminar, Penn State on November 3, 2014.

Teaching

Teaching assistant:

CMPSC 465 Data Structures and Algorithms, Spring 2017 @ Penn State.
CMPSC 360 Discrete Mathematics for Computer Science, Spring 2015 @ Penn State.
IT 114 Object Oriented Programming, Spring 2014 @ DA-IICT.
IT 105 Introduction to Programming, Fall 2013 @ DA-IICT.

Professional Activities

Program committee member for S&P 2026, AAAI 2025, TPDP (2020, 2022, 2024-2025).
Reviewer for journals: Information Sciences 2024, T-IFS (2019, 2021-2022), JPC (2019, 2022), JSSAM 2021, TSC 2020, JMLR 2018.
Reviewer for conferences: ICML (2018, 2021-2024), ASRU 2023, NeurIPS (2019-2023), RANDOM 2023, AISTATS 2022, S&P (2017, 2019, 2022), PETS (2017-2021), IJCAI 2019, CCS (2018-2019), STOC (2016, 2018), ACSAC 2017, FOCS 2017, WABI 2015.
Reviewer for NIST's The Unlinkable Data Challenge: Advancing Methods in Differential Privacy.

Recent Awards

Received a travel award for S&P 2019.
Received a travel award for NeurIPS 2018.
Received a travel award for ICML 2018.
Received a GSO Conference Travel Grant for Summer 2018.
Received a registration award for FOCS 2014.

Miscellaneous

Report on Node-differentially Private Algorithms for Graph Statistics. It includes joint work with Ramesh Krishnan.