Senior ML Research Engineer - LLM Quantization & Model Optimization

Microsoft
United States, California, Mountain View
Jul 02, 2025
OverviewDo you want to be at the forefront of innovating the latest hardware designs to propel Microsoft's cloud growth? Are you seeking a unique career opportunity that combines technical capabilities, cross team collaboration, with business insight and strategy? Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees, we come together with a growth mindset, innovate to empower others, and collaborate to achieve our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. In alignment with our Microsoft values, we are committed to cultivating an inclusive work environment for all employees to positively impact our culture every day. Join the Strategic Planning and Architecture (SPARC) team within Microsoft's Azure Hardware Systems and Infrastructure (AHSI) organization, the team behind Microsoft's expanding Cloud Infrastructure and for powering Microsoft's "Intelligent Cloud" mission. Microsoft delivers more than 200 online services to more than one billion individuals worldwide and AHSI is the team behind our expanding cloud infrastructure. We deliver the core infrastructure and foundational technologies for Microsoft's cloud businesses including Microsoft Azure, Bing, MSN, Office 365, OneDrive, Skype, Teams and Xbox Live. We looking for Senior ML Research Engineer - LLM Quantization & Model Optimization to join our team! ResponsibilitiesDesign and develop novel quantization techniques to enable efficient deployment of LLM inference and training in Microsoft's Azure production environments.Drive software development and model optimization tooling proof-of-concept effort to streamline deployment of quantized models.Analyze performance bottlenecks in state-of-the-art LLM architectures and drive performance improvements.Prototype and evaluate emerging low-precision data formats through proof-of-concept implementations.Co-design model architecture optimized for low-precision deployment in close collaboration with companywide AI teams.Work cross-functionally with data scientists and ML researchers/engineers to align on model accuracy and performance goals.Partner with hardware architecture and AI software framework teams to ensure end-to-end system efficiency.