Research I work on computer vision, machine learning, and generative AI. Particularly, I am recently interested in generative AI in 3D. Below are recent and selected publications.
Revisiting ResNets: Improved Training and Scaling Strategies Irwan Bello , William Fedus , Xianzhi Du , Ekin Dogus Cubuk , Aravind Srinivas , Tsung-Yi Lin , Jonathon Shlens , Barret Zoph NeurIPS , 2021 (spotlight)
Revisit ResNets with modern scaling and training strategies, showing ResNets are still competitive against modern model architectures.
Multi-Task Self-Training for Learning General Features Golnaz Ghiasi* , Barret Zoph* , Ekin Dogus Cubuk* , Quoc V. Le , Tsung-Yi Lin , ICCV , 2021
Apply pseudo labeling to Harness knowledge in multiple datasets/tasks to train one general vision model, achieving competitive results to SoTA on PASCAL, ADE20K, and NYUv2.
Patch2CAD: Patchwise Embedding Learning for In-the-Wild Shape Retrieval from a Single Image Weicheng Kuo , Anelia Angelova , Tsung-Yi Lin , Angela Dai ICCV , 2021
Learning a patch-based image-CAD embedding space for retrieval based 3D reconstruction, improving upon our prior work Mask2CAD.
Your browser does not support the video tag.
iNeRF: Inverting Neural Radiance Fields for Pose Estimation Lin Yen-Chen , Pete Florence , Jonathan T. Barron , Alberto Rodriguez , Phillip Isola , Tsung-Yi Lin , IROS , 2021 project page / arXiv / video
Given an image of an object and a NeRF of that object, you can estimate that object's pose.
Bottleneck Transformers for Visual Recognition Aravind Srinivas , Tsung-Yi Lin , Niki Parmar , Jonathon Shlens , Pieter Abbeel , Ashish Vaswani CVPR , 2021
Explore a hybrid architecture of CNN and transformer by simply replacing spatial convolutions with self-attention in the final three bottleneck blocks.
Simple Copy-Paste Is a Strong Data Augmentation Method for Instance Segmentation Golnaz Ghiasi , Yin Cui , Aravind Srinivas , Rui Qian , Tsung-Yi Lin , Ekin Dogus Cubuk , Quoc V. Le , Barret Zoph CVPR , 2021 (oral)
Study copy-paste augmentation for instance segmentation and demonstrating SoTA performance on COCO and LVIS datasets.
Rethinking Pre-training and Self-training Barret Zoph* Golnaz Ghiasi* , Tsung-Yi Lin* , Yin Cui , Hanxiao Liu , Ekin Dogus Cubuk , Quoc V. Le NeurIPS , 2020 (oral)
Compare self-training and pre-training and observe self-training can still improve when pre-training hurts in a region with more labeled data .
Learning to See before Learning to Act: Visual Pre-training for Manipulation Lin Yen-Chen , Andy Zeng , Shuran Song Phillip Isola , Tsung-Yi Lin ICRA , 2020 Blog / Video
Leverage visual pre-training from passive observations to aid fast trail-and-error robot learning. Can learn to pick up new objects in ~10 mins.
Mask2CAD: 3D Shape Prediction by Learning to Segment and Retrieve Weicheng Kuo , Anelia Angelova , Tsung-Yi Lin , Angela Dai ECCV , 2020 (spotlight)
Given a single-view image, predict object's 3D shape based on retrieval of CAD models and object pose estimation.
Class-Balanced Loss Based on Effective Number of Samples Yin Cui , Menglin Jia , Tsung-Yi Lin , Yang Song Serge Belongie CVPR , 2019
Propose a benchmark and a simple yet effective class-balanced loss for long-tailed image classification.
DropBlock: A regularization method for convolutional networks Golnaz Ghiasi , Tsung-Yi Lin , Quoc V. Le NeurIPS , 2018
Drop intermediate features randomly during training to regularize learning, working for image classification, object detection, and semantic segmentation.
Focal Loss for Dense Object Detection Tsung-Yi Lin , Priya Goyal , Ross Girshick , Kaiming He , Piotr Dollar ICCV , 2017 (best student paper award)
Propose Focal Loss to address fg/bg imbalanced issue in dense object detection. Focal Loss has been adopted beyond object detection since its invention.
Feature Pyramid Networks for Object Detection Tsung-Yi Lin , Piotr Dollar , Ross Girshick , Kaiming He , Bharath Hariharan , Serge Belongie CVPR , 2017
Implement an efficient deep network to bring back the idea of pyramidal representations for object detection.
Microsoft COCO: Common Objects in Context Tsung-Yi Lin , Michael Maire , Serge Belongie , Lubomir Bourdev , Ross Girshick , James Hays , Pietro Perona , Deva Ramanan , Larry Zitnick , Piotr Dollar ECCV , 2014 (oral)
Collecting instance segmentation masks of 80 common objects for training object detection models. The dataset was then extended for panoptic segmentation , multi-modal image-text learning , and beyond.