Vasu Sharma

Leading AI Research at Scale

I'm an Applied Research Scientist Lead at Meta AI, where I spearhead the development of multimodal foundation models including Chameleon, Llama 3/4, DinoV2 etc. My work spans trillion token scale pretraining, mixture of expert architectures, and next generation conversational AI agents.

With a Masters from Carnegie Mellon (Department Rank #1) and experience across Meta, Amazon Alexa, and Citadel, I bridge cutting-edge research with real-world applications.

100 Publications

11k+ citations

6 Major Models

3 Top Companies

🧠

Multimodal Foundation Models

Leading development of next-gen models that understand text, images, and audio simultaneously

💬

Large Language Models (LLMs)

Architecting trillion-parameter models with advanced training techniques and optimization

👁️

Computer Vision & NLP

Bridging visual understanding with natural language processing for comprehensive AI systems

🎵

Speech & Audio Processing

Advanced audio AI systems from Amazon Alexa to next-gen conversational agents

🎯

Reinforcement Learning

Training AI agents to make optimal decisions through reward-based learning systems

🛡️

AI Safety & Evaluation

Ensuring responsible AI deployment through rigorous testing and safety protocols

Professional Experience

Research & Industry Trajectory

A focused journey through leading tech institutions, specializing in multimodal foundation models, large-scale AI infrastructure, and quantitative research.

M Meta AI

2022 - Present

Applied Research Scientist Lead

Spearheading multimodal foundation model development. Instrumental in creating Llama 3/4 and Chameleon models. Conducting pioneering research in trillion-token pretraining and advanced conversational AI agents.

A Amazon Alexa

2021 - 2022

Applied Scientist

Developed comprehensive visual-language navigation benchmarks. Engineered efficient multimodal transformers and optimized video processing applications for Alexa's core AI systems.

C Citadel LLC

2019 - 2021

Quantitative Research Analyst

Architected automated ML pipelines for high-frequency trading strategies. Optimized distributed computing frameworks to handle large-scale financial data processing.

Masters in Language Technologies

Carnegie Mellon University

GPA: 4.19/4.33 Rank: 1

B.Tech Computer Science

IIT Kanpur

GPA: 9.9/10.0

Research Impact & Publications

Advancing the frontiers of AI and machine learning through impactful research contributions

100+ Publications

11,000+ Citations

16 H-index

20 Top-tier Venues

TMLR 2024

DINOv2: Learning Robust Visual Features without Supervision

A breakthrough self-supervised learning approach that learns powerful visual representations without requiring labeled data, achieving state-of-the-art performance across multiple computer vision tasks.

NeurIPS 2023

MAViL: Masked Audio-Video Learners

Novel multi-modal learning framework that jointly processes audio and video through masked reconstruction, enabling robust cross-modal understanding.

ArXiv 2024

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Pioneering foundation model architecture that seamlessly integrates multiple modalities through early fusion, enabling unified reasoning across text, images, and code.

COLM 2024

Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM

Innovative approach to creating mixture-of-experts models by combining specialized language models, achieving superior performance with efficient parameter utilization.

ICLR 2024

Demystifying CLIP data

Comprehensive analysis of CLIP training data, providing crucial insights into data quality, bias, and their impact on model performance and fairness.

CVPR 2024

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models

Thorough evaluation framework for vision-language models, revealing important limitations and proposing improvements for better multi-modal understanding.

View Full Publication List Google Scholar Profile

Let's Build the Future of AI Together

Interested in collaborating on cutting-edge AI research? Looking for expertise in multimodal foundation models or large-scale AI systems? I'm always excited to discuss innovative projects and research opportunities.

✉

Email

sharma.vasu55@gmail.com

🌐

Portfolio

Visit My Full Portfolio

📄

Resume

Download Full Resume

Areas of Collaboration

Foundation Model Research Multimodal AI Systems AI Safety & Evaluation Speaking Engagements Technical Consulting

💬 Start a Conversation 👁 View Portfolio ⬇ Download Resume