6 Latest AI Models from Meta, OpenAI, Apple and More
It’s hard to keep up when AI models launch daily. AI Business presents six of the most significant models that dropped in the past few weeks you may have missed. SEE FULL ARTICLE: https://aibusiness.com/nlp/6-latest-ai-models-from-meta-openai-apple-and-more.
Excepts provided for megalodon.com domain valuation.
Megalodon
Who built it? Meta, University of Southern California, Carnegie Mellon University, University of California San Diego
Is it open source? Yes
Meta’s AI work is best known for naming models after Llamas. The company also borrowed the name of another larger animal last August, Humpback.
Now, the Facebook parent has teamed up with university researchers to build a giant AI model named after one of the largest animals in history, the Megalodon.
Named after a giant species of shark that lived millions of years ago, Meta’s Megalodon model is meant to be massive in scope and is significantly larger than its Llama line of models. No official word was given on the model’s parameter levels, however.
Meta’s Megalodon is less terrifying than a giant AI shark | Credit AI Business via ChatGPT
Meta’s Megalodon is designed to tackle huge and complex tasks, with the model capable of handling incredibly long pieces of information smoothly.
The model can understand and generate responses for extended conversations or documents without losing context.
It’s also designed to be faster and more scalable than older models. This makes Megalodon not just powerful, but also quick when dealing with big tasks that involve lots of data.
However, the new Meta model was not compared against Llama 3, the latest in the Llama series of large language models.
Meta is working on a giant 400-billion-parameter version of its new Llama 3. Megalodon is not that model.
How to Access Meta Megalodon?
Meta Megalodon can be found on GitHub, with a Discord server available where users can troubleshoot with other AI experts.
Apple’s OpenELM
Is it open source? Yes
OpenELM sees Apple join the growing list of companies developing open source AI systems.
The iPhone maker is a relative newcomer to open source AI and OpenELM marks its first model release. Last month, the company offered a glimpse at a multimodal AI system called MM1.
OpenELM, which stands for Open-source Efficient Language Models, is small in stature. It comes in sizes ranging from 270 million parameters up to 3 billion.
There are two types of OpenELM, a pre-trained version and one that’s instruction-tuned which is suitable for responding to natural language instructions.
Each is a text generation model trained on CoreNet, Apple’s new deep neural network library which contains a total of 1.8 trillion tokens. Apple sourced the training data from publicly available sources including RefinedWeb, which was used to build the Falcon model, deduplicated PILE and a subset of Dolma v1.6 corpus.
Apple’s models use an innovative underlying architecture to improve the accuracy of their responses, employing layer-wise scaling that allows for a more efficient allocation of parameters within each of the model’s layers.
Related:Apple Launches First Multimodal AI Model
How to Access OpenELM
Apple’s OpenELM models can be found on Hugging Face while the CoreNet library can be found on GitHub.
Apple’s OpenELM models can be used to power commercial applications. The license, however, is a lot more strict compared to those used for traditional open source models, with specific terms that need to be adhered to, including avoiding implying endorsement by Apple.
Snowflake’s Arctic
Is it open source? Yes
Snowflake recently unveiled Arctic, a large language model the company claims is the “most open enterprise-grade large language model.”
Arctic is a large language model that’s designed for enterprises. It’s 17 billion parameters in size, so it does not need huge amounts of power to run.
The model is designed to expertly follow instructions and perform tasks like code generation while being more cost-effective to run compared to other open source models.
Related:Meta Unveils Llama 3, the Most Powerful Open Source Model Yet
Users can also build atop Arctic, designing custom models optimized for specific enterprise use cases.
Arctic performs on par or better with more expensive-to-run models like the eight billion parameter version of Meta’s new Llama 3 on enterprise-focused benchmarks. Snowflake claims its new model runs at half the cost.
Unlike its Meta rival, Snowflake touts its Arctic model as being truly open, in that users have access to the model, its weights, code and all the data recipes used to power it.
Meta has not disclosed datasets it has used to train its latest Llama models, despite opening access to it.
Snowflake aims to freeze out Meta’s Llama | Credit AI Business via ChatGPT
Mike Finley, CTO and co-founder of AnswerRocket, said the cost to train Arctic will “make your chief financial officer smile.”
“This snowflake trains like a butterfly and thinks like a bee,” Finley said. “The parameter size, at 17 billion, is notable because it is in an empty band in this competitive space. It’s smaller than the full-sized models by 50% but CTOs will note that it beats many of those larger models on important benchmarks.”
How to Access Snowflake Arctic?
Snowflake Arctic can be downloaded from Hugging Face.
The model is also available from a variety of cloud providers, including AWS, Azure and Nvidia’s API catalog.
Related:Hugging Face Launches New Code Generation Models
The company’s GitHub repository contains recipes to improve Arctic’s inference and fine-tuning.
Microsoft’s Phi-3 Mini
Is it open source? Yes
The latest in Microsoft’s small language model efforts, Phi-3 Mini is just 3.8 billion parameters in size but outperforms models more than double its size.
The new model boasts improved reasoning, coding and math capabilities compared to prior Phi models.
It’s the first small-sized model to boast a context window of up to 128K tokens with the small model able to handle sizable inputs without impacting the quality of its response.
Phi-3 can be used out of the box as it’s instruction-tuned, meaning it’s suitable for deployments that require it to follow instructions right from the get-go.
Due to its small size, Phi-3 mini can be used on edge applications like in smartphones or on sensors in industrial environments.
Microsoft had only launched Phi-2 last December but is continuing to work on small model development.
“What we’re going to start to see is not a shift from large to small but a shift from a singular category of models to a portfolio of models where customers get the ability to make a decision on what is the best model for their scenario,” said Sonali Yadav, Microsoft’s principal product manager for generative AI.
Further Phi-3 models are on the way, with Phi-3-small (7 billion parameters) and Phi-3-medium (14 billion parameters) launching “in the coming weeks.”
How to Access Microsoft Phi-3
The new Phi-3 mini model is available on the Azure AI Studio, Hugging Face and Ollama.
It’s also available on Nvidia’s Nim platform with a standard API interface that can be deployed anywhere.
Mistral AI’s Mixtral 8x22B
Is it open source? Yes – It’s available under an Apache 2.0 license
Parameters: 141 billion
Mixtral 8x22B is released by French AI startup Mistral AI. Despite being 141 billion parameters in size, its file size is just 218GB, meaning most consumer laptops can store it.
Businesses can use Mixtral 8x22B to power their AI applications. It is available under an Apache 2.0 license so users can create their own proprietary software and offer the licensed code to customers.
It’s as powerful as Meta’s Llama 2 and OpenAI’s GPT 3.5, with benchmark tests and only uses a portion of its power to generate a response, a feature Mistral touts as “offering unparalleled cost efficiency for its size.”
Mixtral 8x22B works differently compared to other models in that it is a mixture of expert (MoE) systems. MoE models produce responses based on the inputs of multiple smaller systems working in tandem to come up with an answer, akin to assembling a committee rather than having one sole decision-maker.
Mistral has previously developed MoE models, including Mixtral 8x7B. Other notable examples of MoE AI systems include Switch Transformers from the team at Google Brain (now Google DeepMind) and the new Gemini 1.5, also from Google.
How to Access Mixtral 8x22B
The new Mistral model can be demoed via Together AI’s API and tested in Perplexity.ai’s model playground, allowing users to experience its language generation capabilities firsthand.
To download it, you’ll need to decipher a cryptic tweet from Mistral. The tweet is a torrent link. Copying and pasting it into a platform like BitTorrent provides access to the model, with users able to download it via the link onto a computer.
The model can also be downloaded from Hugging Face.
OpenAI’s GPT-4 Turbo
Is it open source? No
Rounding off the list is GPT-4 Turbo from OpenAI, the Microsoft-backed company’s most powerful large language model to date.
Premium ChatGPT subscribers now have access to GPT-4 Turbo, which was first unveiled last November at OpenAI DevDay. It is designed to be an upgrade on GPT-4, offering improved coding, math and reasoning abilities at a reduced cost to run.
OpenAI CEO Sam Altman said the new model is “now significantly smarter and more pleasant to use.”
Credit: OpenAI
The new model has an improved context length or the level of input the model can handle.
GPT-4 Turbo has a context length of 128,000 tokens (pieces of words), which is the equivalent of 300 pages of a book. GPT-4 could only manage more than 8,000 tokens.
The new model powering ChatGPT’s premium features also has a boosted knowledge date up until December 2023, with users also able to expand that with the platform’s access to the internet via Bing.
GPT-4 Turbo with Vision can take in images and respond to user queries about them, a feature previously unavailable in ChatGPT.
Among those using the new GPT-4 Turbo with Vision is Devin, an AI software engineering platform that can automate entire projects. Cognition Labs which built Devin is using the Vision model to power visual coding tasks in its platform.
About the Author
Ben Wodecki
Jr. Editor
Ben Wodecki is the Jr. Editor of AI Business, covering a wide range of AI content. Ben joined the team in March 2021 as assistant editor and was promoted to Jr. Editor.