Our Client
A leading global technology company driving innovation in AI acceleration and next-generation computing architectures. The organization develops high-performance hardware-software solutions focused on neural processing units (NPUs) and advanced compiler systems. Their products power large-scale AI systems across industries such as cloud computing, edge AI, and high-performance research infrastructure.
Mission
As an NPU Compiler & Framework Engineer, you will lead the co-design of AI frameworks and compiler toolchains tailored for NPU acceleration. Your work will directly impact how large AI models are executed with maximum efficiency, leveraging low-level optimizations, intelligent scheduling, and hardware-software synergy. This role is at the intersection of systems design, compiler architecture, and AI infrastructure, with strong visibility in both academic and open-source communities.
Responsibilities
NPU-Centric Framework and Runtime Design
- Design and implement smart cross-layer optimizations for AI compilers and runtimes (e.g., PyTorch, vLLM) targeting NPU workloads.
- Automate model transformation, quantization, and adaptive deployment for optimized execution on custom NPU hardware.
Compiler and Toolchain Development
- Extend and optimize compiler stacks (LLVM, TVM, GCC) to translate high-level AI models into high-performance NPU code.
- Focus on scheduling strategies, memory management, and parallel execution tailored to NPU microarchitectures.
Hardware-Software Co-Design
- Collaborate with hardware teams to define ISA extensions, performance counters, and architectural features that enable better software-level optimization.
- Participate in shaping the future of AI accelerators through feedback-driven development.
Research & Ecosystem Contribution
- Publish results in top-tier systems and machine learning conferences (e.g., ISCA, ASPLOS, MLSys).
- Support the developer ecosystem with documentation, tooling, and contributions to open-source AI compiler projects.
Required Qualifications
- Master’s degree in Computer Science, Computer Engineering, or related field, plus 3 years of experience in systems software; or a recent PhD in a relevant domain.
- Solid knowledge of compiler internals (LLVM, GCC, TVM, XLA) and modern NPU/GPU architecture.
- Strong programming skills in Python and C/C++ for system-level development.
- Excellent communication skills in English, both written and spoken.
Preferred Experience
- Hands-on experience with AI frameworks such as PyTorch or TensorFlow.
- Familiarity with model optimization techniques, quantization, and graph transformation.
- Experience working with DSP/xPU toolchains or specialized accelerators.
- Open-source contributions or publications in relevant conferences.