I am a research scientist at Google Brain. My research lies in the broad area of artificial intelligence and machine learning. I am particularly interested in generative models of high dimensional data, and in designing principled methods for improved robustness and generalization in machine learning. More recently, I have been working on large language models, studying attention biases in transformers, efficiency aspects in LLMs through distillation and activation sparsity, and alternative architectures for sequence modeling and their scaling properties.
Before Google, I was a researcher at IBM TJ Watson Research Center in Yorktown Heights, NY. Even before that, I graduated with a Ph.D. from Department of Computer Science at University of Maryland where my advisor was Hal Daumé III. I also spent time working on robust automatic speech recognition during my Masters at CRSS, advised by John H. L. Hansen.