You can compute up to 1,000 tokens per second with Mojo language. Apple MLX framework is two times faster than PyTorch thanks to the unified memory optimization.
"For those wondering why the M2 Ultra is so fast, or the M1 & M2 series in general, it's because inference's main bottleneck is memory bandwidth, not compute power. And the M2 Ultra has a bandwidth of 800 GB/s which is about 8 times faster than an average modern desktop CPU (dual-channel DDR4-6400 offers a bandwidth of 102 GB/s)."
"PyTorch now has enough support for the Apple silicon devices that inference even with very large models is blazingly fast".
"In a significant move to advance the capabilities of their machine learning framework, Apple has announced the open-sourcing of Core ML Stable Diffusion XL (SDXL) for its cutting-edge Apple silicon architecture. The new model, which has grown threefold in size, boasting around 2.6 billion parameters, brings a host of powerful features that enhance performance while maintaining efficiency.."
"The built-in CPU+GPU system on Apple’s chips is great for machine learning"
MacWeb Computing is an independent company not affiliated with Modular
Discover our configurations available for AI, ML developers
Mac mini Mac Studio Request a quote