Attention (and Timing) is all you need : Langnet
Attention
In 2017 a paper called “attention is all you need” was published and it was the catalyst for the massive growth of the LLM and AI space as we know it today. You can read it here (https://arxiv.org/abs/1706.03762) but in summary, it introduced a new Neural Network architecture, called Transformers, in Natural Language models. With Transformers, we can have parallelism, speed and accuracy that we couldn’t have with previous architectures, such as Recurrent Neural Networks and Long-short term memory models. That triggered our attention and we had our eyes open for what would come out.
Timing
Not long after, we came across Atlas Labs, a company in South Korea that was developing a language blockchain protocol for training data at scale, with a focus on conversational models. The projects was code-named “LangNet”. Given our love for ML, the timely introduction of the aforementioned paper and the exceptional team behind the project, we decided to invest in the company.
so much data not enough power
The current Language model landscape is dominated by a few companies with the computational resources to train large models. These models become “smart” when trained with a huge amount of data and a huge number of parameters. However, most of these models are “LLMs for everything”, with data taken from public sources, while providing answers without clear understanding of how the model arrived at it.
Until now, companies that wanted to adopt an AI approach, had only two options. Either build models in house, or use LLMs from OpenAI, Google etc by opening up their data and go with the lack of explainability in these models.
The first approach is unattainable for most companies, since this means huge investments in hardware and talent and basically becoming AI research companies.
The second approach comes with a huge risk of trust and reputation. Companies need to own their data and models, they need model personalization that aligns with their values, branding and messaging. A generalized LLM model cannot do any of the above.
A third approach that is starting to gain steam lately, is to provide these companies with smaller pre-trained models and the technology to then turn their data into insights. In essence, democratize models, make them small enough so they don’t need a huge amount of compute but are able to learn and let companies train them with their own data for their own specific needs.
seeing through and adapting early
Atlas Labs followed this strategy a couple of years ago in the voice recognition space. Their clients had conversational data containing many nuggets of invaluable insights on their customer behavior, sentiment, issues, preferences etc.
By combining a proprietary conversational voice recognition system with state-of-the-art Large Language Models (LLMs), enabled them to convert, in real time, speech to text and then train models with the text data to gain insights. These insights can inform not only immediate customer responses but also shape the company's product roadmap and overall business strategy.
Most of today’s LLMs achieved better results, by making their models very large and training them with as much data as possible. Recent examples of open source LLMs, such as Chinchilla, have shown that, we could probably train smaller models to perform better or at least the same as larger models, if we train them with more data or with higher quality data.
The jury is still out but there are diminishing returns of getting better models just by throwing more data and compute at them. It might be the case that we just need higher quality data, or a network of small models trained to make decisions for specific tasks on specific data.
If that’s the case, then startups that help companies to fine tune their models and then gain insights from them will be the winners of tomorrow.
END