You can compute up to 1,000 tokens per second with Mojo language. For many developers, the Mac is one of the best platform for inference.
"For those wondering why the M2 Ultra is so fast, or the M1 & M2 series in general, it's because inference's main bottleneck is memory bandwidth, not compute power. And the M2 Ultra has a bandwidth of 800 GB/s which is about 8 times faster than an average modern desktop CPU (dual-channel DDR4-6400 offers a bandwidth of 102 GB/s)."
"PyTorch now has enough support for the Apple silicon devices that inference even with very large models is blazingly fast".
"In a significant move to advance the capabilities of their machine learning framework, Apple has announced the open-sourcing of Core ML Stable Diffusion XL (SDXL) for its cutting-edge Apple silicon architecture. The new model, which has grown threefold in size, boasting around 2.6 billion parameters, brings a host of powerful features that enhance performance while maintaining efficiency.."
"The built-in CPU+GPU system on Apple’s chips is great for machine learning"