Businesses of every size have been unable to escape the incredible impact that AI has had on the ways in which we do business of late.
From conglomerate to SME, organisations are becoming faster, more agile, and more robust as we outsource administrative and repetitive tasks to our AI co-workers.
One of the newest AI trends is the establishment of Large Language Models (LLMs) in the public domain: machine learning algorithms trained on colossal volumes of data to recognise the structures and patterns of natural language. They are capable of Natural Language Processing (NLP), which allows us to explore huge datasets through everyday questions or commands.
As such, LLMs are the most common way of making AI intelligible – to cite the most well-known example, LLMs are the means by which ChatGPT can answer your questions. But there’s one conventional drawback to that intelligence: it’s stuck in something of a time capsule.
LLMs are intensively trained, with millions upon millions of data points fired at them in a constant feedback loop to teach each model how to make sense of certain datapoints or patterns. But ‘operationalising’ an LLM – taking it off the training circuit and bringing it online as part of your infrastructure – obviously prevents it from learning anything new. Even some of the first versions of ChatGPT, if you ask a question about very recent events, will politely explain its own temporal limitations to you.
That means you’ve got to be sure that the LLM can rely on the systems they’ll be exploring, and the data available to them. And while the corporate giant might have the funding and the tech stack to make that happen, that’s a brave assumption to make of an SME.
Move it or lose it
Historically, we’ve tended to think of data as static. When the layman downloads a file on their PC, the file isn’t ‘there’ until it pops up in your documents, even as millions of individual data bytes quietly stitch themselves into something infinitely more sophisticated.
With that mindset, you can understand why businesses have often opted to capture as much data as they can, and only then set about establishing what they’ve actually collected. Convention would have us pour data into a huge data warehouse or lake, spending an age clearing and preparing that data, and then dig up different cuts for analysis – a method widely known as batch processing.
This is about as efficient as it sounds. Wrestling an entire dataset duplicates work, camouflages insights, and makes huge demands of hardware and power consumption – all while delaying key business decisions. For the SME trying to find ways to compensate for limited funds and personnel, this method undermines the agility and speed that should be their natural advantage.
Given information until now was not required to be consumed in real time, or even collected in real time this has not been a problem until now. But given how many of the new companies’ end customer value proposition relies in real time data (i.e. think of calling a taxi with Uber or a similar application and imagine not seeing the “live” map with the location of your driver) this is now a “must-have” not a “nice-to-have”.
Fortunately, LLMs don’t only function on a batch processing basis. They can interact with data in different ways – and some of those ways don’t demand that data stands still.
Ask and ye shall receive
Just as disruptive SMEs seek to overturn older and more established companies, data streaming is replacing batch processing.
Data streaming platforms use real-time data ‘pipelines’ to collect, store, and use data – continuously, and in real time. The processing, storage, and analysis that batch processing keeps you waiting on can suddenly now be achieved immediately.
Streaming manages this through what we call event-driven principles, which is essentially treating each change in a dataset as an ‘event’ in itself. Each event includes a trigger to receive more data, creating a constant cascade of new information. Instead of having to go and fetch data (usually stored in a table somewhere in a database), data sources “publish” their data in real time, at all times, to anyone who wishes to consume that data simply by “subscribing” to that data.
All of this can free LLMs from the distinction between training and operating. Furthermore, if every data point can be actioned, it’s possible for the LLM to train itself; to use the correctness of its actions to constantly refine the underlying algorithms that define its purpose.
That means the LLM can draw on a constantly updated and curated dataset, while constantly improving the mechanisms that deliver and contextualise that data. Data isn’t at risk of redundancy or abandoned in some forgotten silo – all you have to do is ask for it!
Cut from the SME cloth
So: what does that mean for the SME?
For one, it takes off the proverbial handbrake. The sheer speed at which LLMs can deliver information through a stream-driven infrastructure empowers decision-makers to drive the business forward at their desired pace, with no batch processing to keep them in second gear. The agility that empowers SMEs to outmanoeuvre larger players is back in abundance.
Those decisions are made with less doubt, and more relevant context, than before. It’s so simple to access a specific insight, thanks to the natural language that LLMs recognise, that data streaming can foster a genuine enthusiasm for business transparency right across the board.
Not only is the output faster and more accurate, but SMEs can free themselves from legacy technology, too. Data streaming can take place entirely on premise, entirely in the cloud, or in a mixture of the two. The heavy-duty hardware often required for batch processing is simply no longer necessary if you can ask an LLM for the same result in record time. Also, there are several providers that provide fully managed (turn key) solutions that require zero capital investment from the SME’s.
For SMEs to make the most of LLMs, then, they need to think about the way in which they approach company data. If a company is ready to commit to treating data as a constant stream of information, they’ll be much better placed to maximise the potential that data in motion has to help them evolve.