Gateworks joins forces with NXP to launch a USA-made M.2 AI Acceleration card.

Deploying AI at the edge has traditionally required redesigning your entire hardware stack, adding cost, complexity and risk.

Gateworks partnered with NXP Semiconductors to introduce a USA-made M.2 AI acceleration card, the GW16168. Built around NXP’s ARA240 Discrete Neural Processing Unit (DNPU), the GW16168 delivers high-performance AI inference in a compact, industrial-grade form factor.

Designed, tested and manufactured in the United States, the GW16168 supports long lifecycle deployments in critical infrastructure and demanding environments where reliability, security and supply chain transparency matter.

Gateworks and NXP officially unveiled the GW16168 at Embedded World 2026, positioning it as a solution to deploying scalable AI in industrial settings without redesigning the entire system.

Challenges in Deploying Edge AI at Scale

AI workloads are advancing faster than traditional embedded platforms can support. As models grow in size and complexity, developers are forced into trade-offs that limit performance, increase system complexity or shorten product lifecycles.

GPU-based systems can deliver high performance, but require significant power and active cooling. This makes them impractical for many embedded and industrial environments. Integrated NPUs offer better efficiency but are tightly coupled to the system-on-chip. They may not provide sufficient performance, limiting scalability as AI requirements evolve. Legacy accelerator modules often lack the memory and compute needed to support modern workloads.

At the same time, supply chain volatility and component constraints are introducing additional risk for long-term deployments. Industrial OEMs need solutions that are not only high-performing but also stable, scalable, and available over extended lifecycles.

The result is a growing gap between what edge AI applications require and what traditional architectures can realistically support.

The Solution: Scalable, Industrial Edge AI Without Compromise

The GW16168 introduces a decoupled AI architecture that allows AI acceleration to scale independently from the host system. Instead of redesigning an entire platform to meet AI requirements, developers can add high-performance inference via a standard M.2 interface.

This approach enables teams to build on proven embedded platforms, including NXP i.MX processors, while integrating advanced AI capabilities. Pictured with the Gateworks i.MX95 Catalina GW9200.

By offloading inference workloads to the GW16168, the host CPU is freed to focus on system control, I/O and real-time operations, improving overall system efficiency and responsiveness.

With up to 40 eTOPS of performance and 16GB of onboard memory, the GW16168 supports modern AI workloads, including vision transformers, multi-model pipelines, and edge-based LLM inference. Its low-power, passively cooled design enables deployment in fanless, rugged systems without adding thermal complexity.

The result is a modular, upgradeable AI architecture that reduces development risk, accelerates time to market and allows systems to evolve as AI demand grows.

Gateworks’ GW16168 illustrates exactly why decoupled AI architectures are the future of edge computing. By combining NXP’s Ara240 DNPU with Gateworks’ industrial-grade design, customers can scale AI performance without redesigning their entire hardware platform. This brings flexibility, longevity and cost efficiency to real-world AI deployments.
Ravi Annavajjhala, Vice President and General Manager, Neural Processing Units, NXP Semiconductors.

Application Highlight: Secure, USA-Made AI for Autonomous Drones

The GW16168 enables autonomous, USA-made drone systems to run real-time AI inference directly at the edge, eliminating reliance on cloud connectivity and reducing latency in mission-critical environments.

By combining low-power, passively cooled performance with 16GB of onboard memory, it supports advanced computer vision tasks such as object detection, inspection and navigation within compact, rugged designs.

Its modular M.2 form factor allows developers to scale AI capabilities without redesigning the flight computer. This makes it an ideal solution for secure, high-performance drone applications in infrastructure, defense and environmental monitoring.

View GW16168 Product Details | Download Datasheet

The GW16168 is available directly through Gateworks or authorized distributors DigiKey, Breamac, RoundSolutions and Avnet.