Plugging digital leaks
- Tamsin Oxford
Data are gathering in pools and lakes. As we dip our toes into these murky waters, we see a sign that says, ‘Here be dragons…’
The student standing in the corner tapping updates onto her Instagram profile. The tutor sending a quick WhatsApp to his wife, ‘Sorry, I’m going to be late’. The accountant uploading documents to the company intranet. Marketing releasing the monthly newsletters. Each individual adding another byte to the data lakes pooling in virtual space, filled with structured and semi-structured data that teases insight and value but never quite seems to deliver.
Oceans of info
This data is supposedly capable of helping decision makers gain granular insight into their business yet the nature of data is constantly changing in both how it is captured, why it is analysed, and what value it can deliver. It’s an evolution from hastily scribbled notes about the good, the bad and the organisational ugly into digital archives that have swollen with information that has no context or relevance and yet whisper about possibility.
“The computer revolution made it economical for data to be stored in increasingly complex and ever expanding data storage solutions,” says Phumlani Khoza, Associate Lecturer, School of Computer Science and Applied Mathematics and leader of Scilinx Research, a business solutions design and research laboratory.
“The problem is that data hasn’t been strategically recorded in such a way as to deliver a specific economic value, or considered in light of ‘If we do X with the data, then we will achieve Y’. Instead we now have tons of data and no clear vision or idea on what to do with it or how to get it to share its most valuable secrets.”
Potential in the pool
Khoza teamed up with 10 other researchers to develop Scilinx Laboratory and Scilinx Research with the goal of advancing the operational capabilities of organisations through a hybrid structure that targets the generation and application of value-creating research insights. In short, brilliant minds applying themselves to the data conundrum, working to pull out its potential from the mess that relentless data collection has left behind. The goal is to create intelligent networks that define the next generation of analytics and how data relationships are interpreted across multiple data platforms and sources.
At the peak of the big data hype, people were trotting in with fancy algorithms and mathematical constructs supposedly designed to whisk out insights from within these lakes of data. Yet what they saw, what they found, didn’t make much sense. The problem wasn’t the data but the questions that people were asking. Pipelines built to carry data insights into stressed executive offices literally leaked insights from every conduit but they lacked relevance. Where was the insight that would help the business make a decision that would positively impact bottom line or customer engagement?
“Businesses were told that if they built these data centres and gained access to all this computational capacity that they could extract economic value from this data,” says Khoza.
“But when it came time to do this extraction, it couldn’t be done. The off-the-shelf solutions were incapable of dealing with the heterogeneity [differences] of the data. These collections of data across email, social media, and operations, that were different dependent on the organisation, were impossible to unify into single solutions. You cannot interpret the data-powered insights from a supply chain company against one that operates in financial services.”
What happened next? Companies started to invest into the potential abilities of emergent technologies such as machine learning (ML) and artificial intelligence (AI) – technologies capable of deep diving into the data and scouring the murky depths for even the tiniest grain of relevant insight. These technologies are essentially the pen needed to connect digital dots. Yet they too slip at one hurdle – context. Is the data generated for the marketing department being interpreted by a data scientist who understands what marketing needs?
“Scilinx combines machine learning and science to find out what is happening,” says Tresia Holtzhausen, a member of the Scilinx Research team and a lecturer at Nelson Mandela University. “Everything around you is a system and these systems can be represented by networks and managed by ML and AI. We examine where we can improve and optimise data interpretation, using techniques that are not embedded in traditional mathematics but that emphasise the connection to maths and network science, to find out where to make the most improvement and to see what’s happening underneath the waters of big data.”
There is so much information. Vast quantities of data with no context, no point of reference, all gathered relentlessly from the moment that someone said, ‘Gosh, maybe this could be useful one day.’
Through the Scilinix Research work, the team has developed a prototype that can pull data from multiple sources such as twitter, PDF documents, and emails, and build a picture of what is going on.
“We are taking particular datasets and applying a range of ML techniques, some we have developed from scratch, and seeing the results we get, then working out a systematic approach to integrate them,” says Khoza. “We draw a narrative across varied datasets and unlock the relationships hidden within.”
This research combines the information to create analysis that allows the business to make systematic and relevant decisions. It helps the organisation to pull together data from multiple sources and spaces to create a coherent picture. From the broken cash machine (backend alerts) to the outraged consumer (tweets of fury) the data is collated with context and relevance to present an outline of the real business situation.
“Nobody has the right answer – we are all partially right and partially wrong,” concludes Khoza. “If, together, we can erase our biases and create a more accurate representation, then the data have inordinate value. Data allow us to understand why things happen, what people do, and why things have gone wrong. It allows for the business to change and improve, to adapt to what the market wants. As we become increasingly adept at adding context to the data and asking it the right questions, the more we will see how everything is connected.”
The data mess is tidied up not with a plug in a panic, but by dropping a stone into a lake and watching the ripples as they expand outwards and influence markets, businesses, individuals and insights. There, within those myriad mixes of sentiment and data lay the answers the business seeks, not trapped in numbers but revealed in relationships.
- Tasmin Oxford is a freelance journalist.
- This article first appeared in Curiosity, a research magazine produced by Wits Communications and the Research Office.
- Read more in the eighth issue, themed: #Code how our researchers are exploring not only the Fourth Industrial Revolution manifestations of code, such as big data, artificial intelligence and machine learning, but also our genetic code, cryptic codes in queer conversation, political speak and knitting, and interpreting meaning through words, animation, theatre, and graffiti. We delve into data surveillance, the 21st Century ‘Big Brothers’ and privacy, and we take a gander at how to win the Lottery by leveraging the universal code of mathematics.