Zixuan Wang
- Software Engineer at Apple
- Ph.D. from University of California, San Diego
About
I am Zixuan Wang (王子轩), a software engineer at Apple working on CloudOS. I've gottent my Ph.D. degree from University of California San Diego, where I worked with Prof. Jishen Zhao on architecture and system research. I also worked with Prof. Steven Swanson at NVSL Lab, UCSD.
My research focuses on building scalable and secure systems with emerging architecture, systems, and programming technologies. At each level, I conduct systematic analysis, from characterizing performance to attacking and securing the system to developing programming support.
My industrial efforts across multiple companies are all on deploying emerging technologies in real-world systems, focusing on trusted execution using confidential virtual machines. At Google 2021, I worked on modernizing Linux KVM testing framework with the UEFI and AMD SEV/SEV-ES support, which is the first such contribution to the Linux KVM community. At Meta 2022, I worked on the initial confidential virtual machine platform by initiating and developing the system and software supports, and this work was highlighted at Meta's Annual Security Summit. At Google 2023, I enhanced the guest confidential computing with measurable hypervisor service code by leveraging the AMD SEV-SNP SVSM technology. Previously at SK Hynix 2019, I worked on the early evaluation of CXL prototypes, which led to one of the first publications on CXL systems.
My open-source works facilitate research, industry, and personal usage: I have contributed tens of patches to the Linux KVM community, which many cloud companies and open-source communities then use. I developed and am maintaining multiple open-source projects on GitHub. These projects have received more than 2742 stars and impacted more than 200,000 users (200K + 20K + ...).
I earned my bachelor's degree from Zhejiang University, where I worked with Prof. Wenzhi Chen and Prof. Qingsong Shi on architecture and operating systems at the Computer Architecture Lab.
Some fun facts:
- I take photos (my photo gallery).
- My radio call sign is KN6TTT.
- Here's Inu, my kitten, and his Instagram.
- I have a Guinness Record.
- I play table tennis and badminton.
- I learned snow skiing back at my hometown.
- I learned surfing at UCSD.
- I play games on PC, Switch, PS4, and PS5.
- I have a 3D Printer.
- I have a rack in my closet.
- I play guitar.
- I'm a fan of Sony products.
- I have a compound bow.
- I'm practicing for bow hunting.
- I ride a Kawasaki Ninja 400 motorcycle.
Experience
Software Engineer
I'm working on CloudOS at Apple.
Graduate Research Assistant
I worked in STABLE Lab and NVSL Lab, doing research on arch and system design for memory, recent works include:
- Hiemdall: a heterogeneous system benchmarking framework, demonstrated on various CXL-based systems.
- NVLeak: an off-chip memory architecture reverse engineering and covert/side channel attack framework.
- COARSE: a disaggregated memory system for distributed deep learning training.
- Ayudante: a learning-based persistent memory automatic programming framework.
- LENS: a profiler that discovers NVRAM DIMM micro-architecture design.
- VANS: a cycle-level NVRAM simulator that matches the real products' performance.
Software Engineering Intern
Build AMD SEV-SNP SVSM support in Google Cloud to enhance cloud users' data confidentiality. I received a peer bonus for my work.
Student Researcher
Deployed the first confidential virtual machine platform in Meta.
Software Engineering Intern
Initiated and developed confidential virtual machine platform in Meta.
Software Engineering Intern
I developed UEFI and AMD SEV/SEV-ES supports for KVM-Unit-Tests. Code merged by the Linux KVM community. I received two peer bonuses during this internship.
Research Intern
We evaluated the performance of the CXL memory prototype and enabled GPU direct access to CXL memory.
Undergraduate Research Assistant
I worked in Computer Architecture Lab, did research on operating system and computer architecture:
- We developed ZJUNIX operating system from scratch, and ran it on ZJU-SoC, which is a self-implemented MIPS SoC on FPGA.
- We also developed an FPGA accelerator for High-Frequency Trading.
- Another cool project is the Portable Modular 3D Bioprinter, which earned the Outstanding Prize in Challenge Cup.
Publication
In Progress
The Hitchhiker's Guide to Programming and Optimizing CXL-Based Heterogeneous Systems
Towards LLM-Powered Verilog RTL Assistant: Self-Verification and Self-Correction
CXLeak: Architectural Attacks via Practical CXL Systems
Conference & Journal
NVLeak: Off-Chip Side-Channel Attacks via Non-Volatile Memory Systems
Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
Ayudante: A Deep Reinforcement Learning Approach to Assist Persistent Memory Programming
Characterizing and Modeling Non-Volatile Memory Systems
Characterizing and Modeling Non-Volatile Memory Systems
Preprint & Workshop
Fork is All You Need in Heterogeneous Systems
Characterizing WebAssembly Performance in the Era of Serverless Computing
COLA: Characterizing and Optimizing the Tail Latency for Safe Level-4 Autonomous Vehicle Systems
Enabling Fast Recovery for Autonomous Vehicle Systems with Linux Container Checkpointing
Basic Performance Measurements of the Intel Optane DC Persistent Memory Module
Reliable and Flexible Large Scale Memory Network
Service
Committee
- Submission Chair @ MICRO 2020
- Shadow TPC @ EuroSys 2023
Organizing Committee
I'm one of the founders and organizers of Students@Systems: We are a group of PhD students in the system research area and organizing a series of talks, podcasts, and panels to serve the students in the system community. Checkout our website and Twitter for the latest information.
I helped with organizing more than ten online events, including panels on applying for PhD, and interviews with researchers from underrepresented groups.
I have hosted the following panels:
Submission Chair
I served as submission co-chair for MICRO 2021, worked with program chairs to manage submissions and organize the TPC meeting.
I also open-sourced MightyPC, a toolkit I built to manage conference submissions, which has then been used by MICRO'21, IEEE MICRO TopPicks'22, HPCA'22, MICRO'22, DSN'23, and more.
Invited Talk
NVLeak: Off-Chip Side-Channel Attacks via Non-Volatile Memory Systems
- NVMW'23
Enabling Efficient Large-Scale Deep Learning Training with Cache Coherent Disaggregated Memory Systems
- Intel Co.
- IBM Research
- SK hynix Inc.
- Micron Inc.
- Alibaba Cloud USA Inc.
- Foundational Microarchitecture Research (FoMR)
- CRISP Center at Semiconductor Research Corporation
Characterizing and Modeling Non-Volatile Memory Systems
- TECHCON'20
- NVMW'21
- Foundational Microarchitecture Research (FoMR)
- CRISP Center at Semiconductor Research Corporation
Trust but Verify: Co-Locating Hypervisor Services with User Code via AMD SEV-SNP SVSM
- Google Cloud'23
Securing User Data with Confidential Virtual Machine
- Meta Annual Security Summit'22
Modernizing KVM-Unit-Tests with UEFI and AMD Confidential Virtual Machine
- Google Cloud'21
- AMD'21
Teaching
Teaching Assistant: Introduction to Computer Architecture
Undergrad level computer arch course.
Associate Instructor: Hardware-Based Computer System Design
Developed and instructed a new course that guides students to develop their own CPU (using FPGA) to run their OS.
Associate Instructor: Operating System Course
Developed and instructed a new course that guides students to develop their own OS from scratch.
Project
Architectural Security Attacks in Main Memory Systems
- C
- x86 Assembly
- Linux Kernel
- Reverse Engineering
- Side-Channel Attacks
We are the first to reverse engineer the NVRAM and reveal its architecture design. Based on this we present one of the first architectural attacks based on NVRAM.
- Software-based reverse engineering of the micro-architecture of non-volatile main memory.
- Side-channel attacks that leaks sensitive information (database tables, private encryption keys).
Trusted Execution of Hypervisor Code within Guest Virtual Machine
- C
- x86 Assembly
- x86 Bootstrap
- KVM
- UEFI
- AMD SEV-SNP
- AMD SVSM
- Rust
I built the initial SVSM support in Google Cloud's Linux kernel, hypervisor, guest firmware, and guest kernel.
- Code will soon be posted in Linux kernel mailing list.
Confidential Virtual Machine Platform
- QEMU
- C++
- x86 Assembly
- x86 Bootstrap
- KVM
- UEFI
- AMD SEV
- Rust
I built the initial software and operating system support for the first confidential virtual machine platform at Meta.
- The project was highlighted at Meta's Annual Security Summit.
KVM-unit-tests under UEFI and AMD SEV/SEV-ES
- C
- x86 Assembly
- x86 Bootstrap
- KVM
- SeaBIOS
- UEFI
- AMD SEV
- AMD SEV-ES
- GNU Toolchain
- Linker Script
We are one of the first to build UEFI and AMD SEV/SEV-ES support in KVM-unit-tests, a widely-adopted KVM testing framework:
- Implement UEFI support as an alternative to the existing SeaBIOS+Multiboot solution.
- Implement AMD SEV and SEV-ES support
Source code:
Accelerating Distributed Training of LLM
- CXL
- FPGA
- TensorFlow
- CUDA
- Verilog
We are the first to attach a CXL-based disaggregated memory to multi-GPU systems and demonstrate speedup in LLM training.
- Built a disaggregated memory prototype in FPGA.
- Train LLMs with much less GPUs by extending GPU memory space with CXL memory.
LENS: A Low-Level NVRAM Profiler
- C
- Linux Kernel
- x86 Assembly
We build the first profiler that can discover the non-volatile memory on-DIMM micro-architecture.
- LENS runs in Linux kernel space.
- Micro-benchamrks are written in x86 assembly.
- LENS discovers the complex micro-architecture design of NVRAM DIMM products.
- Source code
VANS: A Validated NVRAM Simulator
- C++ 17
- Python
- R
- Cycle Accurate Simulation
We build a cycle-level NVRAM simulator and validate its performance with Intel Optane Persistent Memory, the first commercially available NVRAM product.
- VANS takes advantage of modern C++ features to simplify the code and increase the simulation performance.
- VANS performance matches the real NVRAM products.
- Automated testing for VANS precision.
- Source code
GPU Direct Access to Cache Coherent Off-Chip Memory
- FPGA
- GPU
- Linux
- GEN-Z
- CXL
We evaluated the GEN-Z memory prototype, and developed a framework for GPU to directly access GEN-Z memory through PCIe.
- cuDF library runs 16x faster compared to indirect access through CPU.
FPGA Accelerated High-Frequency Trading
- FGPA
- Linux
- Userspace IO
- MIPS Assembly
An FPGA accelerator for high-frequency trading. We offload the decision procedure to FPGA, and forward incoming network data directly to FPGA. In this way, trading data doesn't go through NIC, PCIe, CPU, Main Memory to make a decision.
- Use FPGA to accelerate network response and decision-making.
- Use 10-Gigabit Ethernet for communication between FPGA and PC.
QEMU micro:bit emulator
- C
- ARM Assembly
- ARM Mbed OS
- Bootloader
- QEMU
micro:bit is a development board for students' programming training. It has an ARM Cortex-M0 processor, as well as many peripherals. But micro:bit developers need an emulator to help them debug the low-level code, so we made it on QEMU.
Develop:
- Implement Cortex-M0 features from QEMU's original Cortex-M3 emulator.
- Implement peripheral emulators, e.g., LED matrix, timer, clock generator, random number generator, etc.
- Organize CPU Virtual Address Space with regard to micro:bit's spec.
- Disassemble and reverse engineered micro:bit's Bootloader.
- Go through ARM Mbed OS to make sure hardware emulation is correct.
- Debug with QEMU and remote GDB, at assembly code level.
Result:
- It can run unmodified micro:bit example code written in C, Python, or JavaScript.
- Outstanding graduation thesis of the computer science department, Zhejiang University, 2018.
- Source code
ZJUNIX Operating System
- Open Source
- OS Design
- Bootloader
- C
- MIPS Assembly
- Linker Script
A self-designed OS from the ground up, running on self-designed SoC or QEMU. We also implement the corresponding bootloader. This project serves as a sample for OS courses at Zhejiang University.
- Bootloader to load a kernel from filesystem.
- Interrupt and exception.
- Buddy + Slab memory management.
- Process scheduling.
- Userspace programs: shell, ls, ps...
- FAT and ext2 filesystem.
ZJU-SoC
- Open Source
- FPGA
- Verilog
- CPU Design
- Periperal Design
An SoC built from scratch. It can run ZJUNIX as well as Arduino programs. This project serves as a sample for OS courses and Computer Hardware System courses in Zhejiang University.
- Self-implemented 5 stage pipeline MIPS32 CPU on FPGA, 93 instructions, 2 level caches.
- With 512M DDR3, VGA, PS2, SD controller.
- Capable of running ZJUNIX Operating System.
- Capable of running Arduino programs, with our implementation of Arduino library.
- Teamwork with two other undergraduate students.
Portable 3D Bioprinter
- Image Processing
- Mechanical Design
- Real-time System
A portable modular 3D bioprinter that prints tissue directly on wounds. The whole device fits in a 28-inch travel suitcase and can be assembled in several minutes.
- Demo Video
- Using FPGA accelerated edge detector to fulfill the real-time computing.
- Utility model patent, 201720246090.1
- The outstanding prize of Challenge Cup, Zhejiang Province.
- Top 10 Academic Projects, Zhejiang University, 2017
- Teamwork with 8 other undergraduates.
FPGA Accelerated Fluid Simulation
- Graphics Engine
- FPGA
- FPGA for calculation, PC for rendering.
- FPGA communicates with PC through Ethernet.
- The second prize of Digilent Design Contest, 2017 China
Project For Fun
VS Code LinkerScript
- VS Code Extension
- Yaml
- Linker Script
- VSCode Market
- GitHub Repo
- Over 180K installation
ZJU Thesis
- LaTeX
- LaTeX template, widely used by students at Zhejiang University
- Recommended by School of Undergraduates.
- GitHub Repo
- Total GitHub stars: 2017
Makefile Templates
- Makefile
- GitHub Repo
- Total GitHub stars: 565
Chinese Input Method in MIPS
- MIPS Assembly
- Keyboard Interrupt
- VGA Display
- A Chinese pinyin IM written in pure MIPS assembly, ~2000 lines of code.
- It captures keyboard input, decodes English characters into pinyin, then queries the GB2312 database to find a corresponding Chinese character.
- Demo picture. (The code is lost, though.)
Tiger Language Compiler
- Bison
- Flex
- C++ 11/14
- GitHub Repo
- Lexical analysis
- Syntax analysis
- Abstract syntax tree
- Intermediate code generator
Education
Awards
- IEEE Micro Top PicksAnnually awarded to 10~12 best computer architecture papers published in the past year, 2021
- Google Peer BonusAwarded by peer Googlers recognizing my excellent work, 2023
- Google Peer BonusAwarded by peer Googlers recognizing my excellent work. Got two peer bonuses during my summer internship, Google, 2021
- Outstanding Grad ThesisOutstanding graduation thesis among cs department, Zhejiang University, 2018
- He-Zhi-Jun FellowshipTop 10 outstanding students among cs department, including grad and undergrad students, Zhejiang University, 2017
- Outstanding PrizeTop 10 academic projects, Challenge Cup, National Undergraduate Curricular Academic Science and Technology Works Competition, Zhejiang Province China, 2017
- Award for Academic ExcellenceTop 1% student in academic achievements among cs department, Zhejiang University, 2017
- Top 10 Academic ProjectsOur portable modulized 3D bioprinter won top 10 academic projects prize, Zhejiang University, 2017
- DDC 2nd PrizeDigilent Design Contest, China, 2017
- 3rd PrizeAdvanced Computer Architecture Undergraduate Innovation Competition, CCF China, 2016
Skills
Programming Language
- C/C++
- Python
- R
- x86/ARM/MIPS Assembly
- Go
- Java
- Java Script
- Verilog
- CUDA
Technologies
- Linux Kernel
- KVM
- Persistent Memroy
- AMD SEV/SEV-ES/SEV-SNP
- UEFI
- SeaBIOS
- Virtualization
- Microarchitecture Security
- QEMU
- GEM5
- FPGA
- MIPS Arch
- ARM Mbed OS
- MongoDB
- InfluxDB
- Grafana
- Tensorflow
Skills
- Performance Profiling
- x86 Bootstrapping
- Microarchitecture Reverse Engineering
- Side/Covert Channel Attack
- Trusted Execution