AI inference on Apple silicon

Feedback of developers using the Mac platform


A true alternative

Learn why it's a game changer for AI projects

You can compute up to 1,000 tokens per second with Mojo language. Apple MLX framework is two times faster than PyTorch thanks to the unified memory optimization.

Memory bandwidth

"For those wondering why the M2 Ultra is so fast, or the M1 & M2 series in general, it's because inference's main bottleneck is memory bandwidth, not compute power. And the M2 Ultra has a bandwidth of 800 GB/s which is about 8 times faster than an average modern desktop CPU (dual-channel DDR4-6400 offers a bandwidth of 102 GB/s)."

Large Language Model with PyTorch

"PyTorch now has enough support for the Apple silicon devices that inference even with very large models is blazingly fast".

Stable Diffusion

"In a significant move to advance the capabilities of their machine learning framework, Apple has announced the open-sourcing of Core ML Stable Diffusion XL (SDXL) for its cutting-edge Apple silicon architecture. The new model, which has grown threefold in size, boasting around 2.6 billion parameters, brings a host of powerful features that enhance performance while maintaining efficiency.."

Tutorial CoreML, TensorFlow

"The built-in CPU+GPU system on Apple’s chips is great for machine learning"


Learn more about Mojo language   Apple MLX framework for training, deployment

MacWeb Computing is an independent company not affiliated with Modular


Get a Mac in the cloud

Discover our configurations available for AI, ML developers

Mac mini   Mac Studio   Request a quote