Leading AI Research at Scale

I'm an Applied Research Scientist Lead at Meta AI, where I spearhead the development of multimodal foundation models including Chameleon, Llama 3/4, DinoV2 etc. My work spans trillion token scale pretraining, mixture of expert architectures, and next generation conversational AI agents.

With a Masters from Carnegie Mellon (Department Rank #1) and experience across Meta, Amazon Alexa, and Citadel, I bridge cutting-edge research with real-world applications.

20 Publications
7k+ citations
4 Major Models
3 Top Companies
🧠

Multimodal Foundation Models

Leading development of next-gen models that understand text, images, and audio simultaneously
💬

Large Language Models (LLMs)

Architecting trillion-parameter models with advanced training techniques and optimization
👁️

Computer Vision & NLP

Bridging visual understanding with natural language processing for comprehensive AI systems
🎵

Speech & Audio Processing

Advanced audio AI systems from Amazon Alexa to next-gen conversational agents
🎯

Reinforcement Learning

Training AI agents to make optimal decisions through reward-based learning systems
🛡️

AI Safety & Evaluation

Ensuring responsible AI deployment through rigorous testing and safety protocols

Research & Industry Experience

M
Meta AI
Applied Research Scientist Lead
2022-Present
Leading multimodal foundation model development. Creating Llama 3/4 and Chameleon models. Research in trillion-token pretraining and conversational AI agents.
A
Amazon Alexa
Applied Scientist
2021-2022
Developed visual-language navigation benchmarks. Created efficient multimodal transformers and video processing applications.
C
Citadel LLC
Quantitative Research Analyst
2019-2021
Built automated ML pipelines for trading strategies. Optimized frameworks for large-scale financial data processing.

Education

Masters in Language Technologies
Carnegie Mellon University
GPA: 4.19/4.33 • Dept. Rank: 1
B.Tech Computer Science
IIT Kanpur
GPA: 9.9/10.0

Key Research Internships

University of Toronto
2016
Carnegie Mellon
2014
EPFL Switzerland
2017
Xerox Research Labs Europe
2015

Research Impact & Publications

Advancing the frontiers of AI and machine learning through impactful research contributions

0 Publications
0 Citations
0 H-index
4 Top-tier Venues
NeurIPS 2023

MAViL: Masked Audio-Video Learners

Novel multi-modal learning framework that jointly processes audio and video through masked reconstruction, enabling robust cross-modal understanding.

ArXiv 2024

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Pioneering foundation model architecture that seamlessly integrates multiple modalities through early fusion, enabling unified reasoning across text, images, and code.

COLM 2024

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Innovative approach to creating mixture-of-experts models by combining specialized language models, achieving superior performance with efficient parameter utilization.

ICLR 2024

Demystifying CLIP data

Comprehensive analysis of CLIP training data, providing crucial insights into data quality, bias, and their impact on model performance and fairness.

CVPR 2024

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models

Thorough evaluation framework for vision-language models, revealing important limitations and proposing improvements for better multi-modal understanding.

Let's Build the Future of AI Together

Interested in collaborating on cutting-edge AI research? Looking for expertise in multimodal foundation models or large-scale AI systems? I'm always excited to discuss innovative projects and research opportunities.

Areas of Collaboration

Foundation Model Research Multimodal AI Systems AI Safety & Evaluation Speaking Engagements Technical Consulting