Exploring LLaMA, The Little Language Model with Big Performance
Last week, I've talked about LLaMA-Adapter that costs only $10 to fine-tune. Today, I want to get onto the original model of LLaMA. It was only released open-sourced by Facebook only in February. So much ecosystem is already developed in the surrounding just under a few months!
What is LLaMA?
LLaMA (The Little Language Model) has become the talk of the town due to its impressive performance. In recent developments, LLaMA-13B outperformed GPT-3 with 175 Billion parameters, despite being ten times smaller.
This raises the question, does size really matter?
In this blog, we take a closer look at the intricacies of LLaMA and understand its capabilities.
1. Smaller Model Size – Easier to Retrain:
Meta researchers discovered with smaller models that are trained on more tokens (words), it becomes relatively easier to retrain the language model.
The success of LLaMA indicates that we no longer need to rely solely on the number of parameters to achieve better performance. That sheds light on the individual machine learning practitioners.
2. LLaMA Competes with Bigger Models:
This Facebook's LLaMA is similar to GPT models, except it's released at smaller scale: 7, 33, and 65 Billion parameters. That level is close to neo GPT I've trained on my won. That size of models is not entirely obsolete.
LLaMA-65B, for instance, has shown promising performance when compared to the best models like Chinchilla-70B and PaLM-540B.
3. The Efficiency of LLaMA:
LLaMA's exceptional performance can be credited to its efficient training process.
The 13B model reportedly can be run on a single GPU, making it more accessible to developers and researchers. This efficiency breakthrough challenges the long-held assumption that more parameters lead to better performance.
4. LLaMA's Public Dataset:
Meta explicitly emphasized on using only publicly avaiable datastes unlike OpenAI's GPT. That gives diverse usability for this model.
LLaMA's underlying data sources include the English Common Crawl, C4 dataset for web data diversity, Github repositories under Apache, BSD, and MIT licenses, Wikipedia in 20 languages, public domain books, ArXiv, and Stackexchange's 28 largest websites.
In total, LLaMA uses approximately 4.2T datasets, and 1.4T tokens are obtained after tokenization.
5. Commercial Licenses & Misuse:
The folks behind LLaMA are mindful of potential misuse and the harm that could be caused by commercial licenses affecting the integrity of the model. By keeping a close eye on the model's deployment and applications, they aim to maintain its ethical principles and protect its integrity.
But you know, everyone loves commercial license. Like Alpaca model released by Stanford, more research and open source community would be creating it commercial license.
Towards the Smaller Future
LLaMA's remarkable performance while maintaining a smaller model size has potentially changed the AI landscape. Its efficiency, diverse dataset, and focus on public source practices make it a promising player in the field of generative AI models.
As we unlock LLaMA's full potential, we will continue to challenge the conventional understanding of model size, all while maintaining the values of transparency and responsible AI development.