I am a Deep Learning Architect at NVIDIA, working on deep learning inference architecture. I work on both hardware and software techniques to enable ultra-low latency LLM inference on NVIDIA GPUs.

I graduated my Ph.D. in CS from MIT advised by Professor Daniel Sanchez and Professor Joel Emer. My thesis work focuses on computer architecture, specifically on accelerating irregular and sparse applications such as sparse transformers (GPT, Bert, etc.), sparse CNNs, sparse tensor algebra, and graph analytics.

Before joining MIT, I received a bachelor’s degree in Mathematics and Physics from Tsinghua University in 2019, where I worked with Professor Leibo Liu. I did a summer internship at UC Berkeley working with Professor Kurt Keutzer. I also had industry internship at Apple.

You can access my curriculum vitae here.

Blogs

I plan to write blogs during my process of learning CUDA/Cutlass/CuTe/Triton programming. Stay tuned for more!

Publications

Azul: An Accelerator for Sparse Iterative Solvers Leveraging Distributed On-Chip Memory

Axel Feldmann, Courtney Golden, Yifan Yang, Joel S. Emer, Daniel Sanchez
in Proceedings of the 57th annual international symposium on Microarchitecture (MICRO-57), 2024.
[paper]

Trapezoid: A Versatile Accelerator for Dense and Sparse Matrix Multiplications

Yifan Yang, Joel S. Emer, Daniel Sanchez
in Proceedings of the 51th annual International Symposium on Computer Architecture (ISCA-51), 2024.
[paper]

ISOSceles: Accelerating Sparse CNNs through Inter-Layer Pipelining

Yifan Yang, Joel S. Emer, Daniel Sanchez
in Proceedings of the 29th international symposium on High Performance Computer Architecture (HPCA-29), 2023.
[paper] [slides] [poster]

SpZip: Architectural Support for Effective Data Compression In Irregular Applications

Yifan Yang, Joel S. Emer, Daniel Sanchez
in Proceedings of the 48th annual International Symposium on Computer Architecture (ISCA-48), 2021.
[paper] [slides] [lightning] [poster]

GraphABCD: Scaling Out Graph Analytics with Asynchronous Block Coordinate Descent

Yifan Yang, Zhaoshi Li, Yangdong Deng, Zhiwei Liu, Shouyi Yin, Shaojun Wei, Leibo Liu
in Proceedings of the 47th annual International Symposium on Computer Architecture (ISCA-47), 2020.
[paper] [slides] [lightning]

Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs

Yifan Yang, Qijing Huang, Bichen Wu, Tianjun Zhang, Liang Ma, Giulio Gambardella, Michaela Blott, Luciano Lavagno, Kees Vissers, John Wawrzynek, Kurt Keutzer
in Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA), 2019.
[paper] [slides] [code]

Experience

June 2024 - Present: Deep Learning Architect, NVIDIA

Deep learning inference architecture

Summer 2022: Platform Architecture Intern, Apple

CPU cache subsystem performance research

Spring 2022: Teaching Assistant, MIT

6.812/6.825 Hardware Architecture for Deep Learning

Summer 2018: Research Intern, UC Berkeley

Algorithm-hardware co-design for ConvNet accelerators on embedded FPGAs

Services

Selection Committee Member, MICRO 2024 Student Research Competition
Artifact Evaluation Committee Member, ASPLOS 2022, 2023

Yifan Yang (杨轶凡)