We use cookies. Find out more about it here. By continuing to browse this site you are agreeing to our use of cookies.
#alert
Back to search results
New

Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure

Microsoft
United States, Washington, Redmond
Jul 16, 2025
OverviewThe MAIA System Infrastructure team is pioneering the next generation of the developer ecosystem for AI accelerators and we are looking to hire a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure. We are building the core infrastructure that enables deep observability into our proprietary MAIA chips, empowering developers to fully harness the capabilities of our custom AI hardware. Our mission is to create a transparent, performant, and developer-centric ecosystem that surpasses traditional GPU observability by offering unparalleled insight into low-level operations, performance characteristics, and system-wide behavior.We operate at the intersection of advanced AI hardware, system software, and developer tooling, continually pushing the boundaries of what is possible. Our scope extends beyond on-chip instrumentation; we also play a critical role in optimizing the end-to-end data flow infrastructure, including PCIe and frontend networks, to ensure low-latency, high-throughput movement between host systems and accelerators. By decomposing and re-architecting data pathways into state-of-the-art designs, we are unlocking new levels of scalability and performance for AI workloads. Our work involves close collaboration with hardware architects, systems engineers, and AI researchers to build a cohesive observability and runtime foundation that defines the next era of AI system design. Why Join Us? This is an opportunity to work on the cutting edge of AI hardware acceleration, directly contributing to the infrastructure that makes deep observability and optimization possible. As a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure on the MAIA System Infrastructure team, you will have the chance to work on challenging, high-impact projects that require a blend of low-level programming, data flow optimization, and system design. You'll be part of a team of highly talented engineers who are passionate about building the next generation of AI tooling infrastructure, and you'll have the opportunity to make a significant impact on how AI workloads are understood and optimized.This role is ideal for core engineers who are looking to make their mark in an innovative environment, where their contributions will directly influence the performance and capabilities of cutting-edge AI systems.
ResponsibilitiesAs a Senior Software Engineer - MAIA - AI Accelerator Observability and Infrastructure on the MAIA System Infrastructure team, you will help advance the next generation of observability and runtime infrastructure for MAIA AI accelerators. You'll focus on enhancing system intelligence and execution reliability at scale with a focus on designing runtimes that can adapt dynamically to complex workload demands while maintaining performance and predictability. Your work will elevate how developers interact with MAIA hardware, elevating how developers interact with MAIA hardware and enabling streamlined, high-confidence execution across multi-accelerator and multi-node environments. This is a unique opportunity to shape foundational infrastructure at the frontier of AI hardware and distributed systems. This role requires a deep technical background and a hands-on approach, as you will design and implement software that interfaces with both the MAIA chips and the data flow infrastructure. You will:Lead by example in creating an inclusive culture that embraces diversity. Mentor and empower teammates, fostering an environment where all voices are heard and valued.Cultivate a team dynamic that drives high performance through mutual support and respect.Design, develop, and maintain the observability infrastructure for the MAIA AI accelerators, enabling developers to gather the data necessary to debug, profile, analyze, and optimize AI models with unprecedented depth.Optimize the data flow infrastructure over PCIe, ensuring efficient and high-throughput communication between the host and MAIA chips.Collaborate with hardware architects and system engineers to integrate the observability stack with the broader system, capturing detailed metrics and insights into data movement.Develop tools and libraries that provide a holistic view of data flow, execution, and performance, extending beyond traditional GPU observability to meet the unique needs of our accelerators.Engage with the AI research and developer community to understand their needs and incorporate feedback into the observability tools and data flow optimizations.Ensure that the observability and data flow infrastructure meet the highest standards of performance, security, and reliability.
Applied = 0

(web-8588dfb-6fpzf)