- Menlo Park, CA, USA
- https://howiema.github.io/
- https://ai.meta.com/people/926455432572211/haoyu-ma/
Stars
[NeurIPS 2025 Spotlight] Demo implementation of MoCha Towards Movie-Grade Talking Character Synthesis
Wan: Open and Advanced Large-Scale Video Generative Models
Code for: "Long-Context Autoregressive Video Modeling with Next-Frame Prediction"
Official inference repo for FLUX.1 models
Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
"Effective Whole-body Pose Estimation with Two-stages Distillation" (ICCV 2023, CV4Metaverse Workshop)
A generative world for general-purpose robotics & embodied AI learning.
HunyuanVideo: A Systematic Framework For Large Video Generation Model
Code for [CVPR 2025] ROICtrl: Boosting Instance Control for Visual Generation
[ICCV2025] UniPortrait: A Unified Framework for Identity-Preserving Single- and Multi-Human Image Personalization
Allegro is a powerful text-to-video model that generates high-quality videos up to 6 seconds at 15 FPS and 720p resolution from simple text input.
[ICLR 2025] AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
The image prompt adapter is designed to enable a pretrained text-to-image diffusion model to generate images with image prompt.
StoryMaker: Towards consistent characters in text-to-image generation
MoVQGAN - model for the image encoding and reconstruction
SEED-Story: Multimodal Long Story Generation with Large Language Model
Autoregressive Model Beats Diffusion: 🦙 Llama for Scalable Image Generation
Official Pytorch implementation of StreamV2V.
Extract frames and motion vectors from H.264 and MPEG-4 encoded video.
A collection of resources on controllable generation with text-to-image diffusion models.
[T-PAMI 2025] V3D: Video Diffusion Models are Effective 3D Generators
Open-Sora: Democratizing Efficient Video Production for All
InstantID: Zero-shot Identity-Preserving Generation in Seconds 🔥
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
[ECCV 2024] Single Image to 3D Textured Mesh in 10 seconds with Convolutional Reconstruction Model.

