【社招AMD北京】LLM High-Performance Optimization

返回本版

/ 1

跳转

[复制链接]

楼主

Hann [离线]

天外访客

3.1中级站友

发帖数：251 原创分：0

关注

<只看ta> <ASCIIArt>

1楼

Contact: Han.Wang@amd.com

WHAT YOU DO AT AMD CHANGES EVERYTHING

We care deeply about transforming lives with AMD technology to enrich our industry, our communities, and the world. Our mission is to build great products that accelerate next-generation computing experiences – the building blocks for the data center, artificial intelligence, PCs, gaming and embedded. Underpinning our mission is the AMD culture. We push the limits of innovation to solve the world’s most important challenges. We strive for execution excellence while being direct, humble, collaborative, and inclusive of diverse perspectives.

AMD together we advance_

THE ROLE:

We are seeking a technically proficient MTS/SMTS Engineer to develop and optimize AI model inference and training solutions for AMD products. You will deeply engage in technical work, optimizing at the framework, model, and operator levels to enhance model training and deployment accuracy and performance, identify and resolve bottlenecks, and improve model inference and training performance.

KEY RESPONSIBILITIES:

Develop and optimize AI model inference and training solutions for AMD products, covering framework, model, and operator-level optimizations. Analyze and fine-tune accuracy and performance issues during model training and deployment, identify and resolve bottlenecks, fully leverage hardware resources, and significantly enhance model inference and training performance, reduce latency in large model inference, and boost throughput.

Research and develop advanced model optimization techniques, including but not limited to model quantization, model compression, efficient attention mechanisms, and efficient model architectures. Collaborate with AMD software and hardware teams to deliver optimal end-to-end model training and inference solutions.

Provide technical support to AMD customers, helping them achieve optimal performance through effective use of AMD software and hardware.

TECHNICAL REQUIREMENTS:

Experience in model compression and inference accelerating techniques, such as quantization, sparsity, efficient attention, KV cache compression, etc. Those with publications in top-level conferences will be given priority.

Proficiency in Python/C/C++ programming, ROCm/CUDA/Triton kernel development, with experience in low-level algorithm performance profiling and optimization.

Proficiency in at least one deep learning framework such as PyTorch, and familiarity with common distributed machine learning libraries like Megatron-LM, DeepSpeed, and/or Torchtitan.

Familiarity with mainstream LLM inference engines such as FasterTransformer, vLLM, and TRT-LLM, and common inference optimization methods like FlashAttention, PagedAttention, Continuous Batching, and Speculative Decoding.

PERSONAL ABILITIES:

Strong problem-solving skills to tackle complex technical challenges and find effective solutions.

Excellent learning ability to quickly acquire new knowledge and skills, keeping pace with technological advancements.

Good communication skills to interact clearly and effectively with team members and customers from diverse backgrounds.

回帖
回信
转载
转寄
收入文集

发表于2025-06-23 17:42:46

返回本版

/ 1

跳转