Just days before its much-anticipated event on September 9, Apple has quietly released two new artificial intelligence models, FastVLM and MobileCLIP2, on the open-source platform Hugging Face. Both models are designed to run locally on devices, promising near-instantaneous performance while keeping user data private.
FastVLM, a Visual Language Model (VLM), is engineered to process high-resolution inputs at lightning speed. Alongside the main release, Apple has also introduced FastVLM-0.5B, a lighter version that can be tested directly in a browser. Users can connect their cameras or even virtual camera apps to feed the model with video.
MobileCLIP2, on the other hand, is a vision-language model that integrates image and language understanding. It can analyze pictures, identify objects, generate captions, and even interpret videos, all without relying on the cloud. Apple says the model is 85 times faster and 3.4 times smaller than earlier iterations, thanks to optimization for Apple silicon and the company’s lightweight machine learning framework.
The dual release signals Apple’s intent to push on-device AI as a cornerstone of its ecosystem. By keeping computation local, the company not only promises speed but also underscores its long-standing emphasis on user privacy.
