
Image generated on AI
Apple has dipped its toes into the open-source waters with the release of ‘Ferret’, a multimodal Large Language Model (LLM) developed in collaboration with Cornell University. This move is a notable departure for the tech giant, traditionally known for its close-guarded approach to innovation.
Released quietly in October on GitHub, Ferret’s code, along with a Ferret-Bench benchmark, is now stirring curiosity among AI researchers and enthusiasts alike.
Unlike typical text-only models, Ferret boasts the ability to interact with images. Users can draw on an image, and the AI will focus on that region, identifying and analyzing elements within it. This makes it particularly intriguing for applications requiring detailed visual understanding.
This level of precision in an AI’s image querying capabilities is relatively new and could have significant implications for both academic research and practical application. The model is trained on eight A100 GPUs, a detail that underscores Apple’s serious investment in AI, even as it might not yet match the scale of other AI giants like OpenAI.
Apple enters the LLM race!
Ferret: An End-to-End MLLM that Accept Any-Form Referring and Ground Anything in Response.
Github: https://t.co/vKdGTtUYpp pic.twitter.com/LIOV46Wd92
— Andrada (@andradavulpee) December 24, 2023
The machine-learning tool is being released under a non-commercial license, so direct commercial applications are currently off the table. Above all, it serves as an open invitation to researchers and developers to explore and expand upon Ferret’s capabilities through collaborative innovation.
The potential of Ferret extends beyond its current state. As an AI that understands and interprets both text and images, its applications could range from enhancing user interactions with technology to providing more nuanced analyses in fields like medical imaging or digital humanities.
[via AppleInsider and Silicon Angle, images via various sources]


Recent Comments