Start main page content

The "big data mess" and how to clean it up

- Wits University

A rethink of operational processes as a complex system, and the application of machine learning as an adaptive analytical framework.

Phumlani Nhlanganiso Khoza, 3rd year PhD student and Associate Lecturer in the School of Computer Science and Applied Mathematics

A new research and design laboratory at Wits University with a unique commercialisation component to develop bespoke business solutions is rethinking how big data and advanced analytics impact on operational processes in organisations.

Founded in the School of Computer Science and Applied Mathematics (CSAM), the new Scilinx Research laboratory and the Scilinx Studio [please note this website is currently only accessible from outside the Wits University network] is a cross-disciplinary research unit with a novel approach to operations research.

The Scilinx-approach: Advancing Operational Frontier is to give businesses and organisations direct access through its commercial arm, the Scilinx Studio, to a dedicated research team with the relevant expertise who can develop tailored cutting-edge and cross-disciplinary business solutions in the Scilinx Research laboratory.

Through custom-built machine learning enabled tools, its work is a revival of operations research – the application of advanced analytical methods to solve organisational problems. Using this approach, there is a direct connection with the nuts-and-bolts of organisational processes.

“Big data mess”

The brainchild of Phumlani Nhlanganiso Khoza, 3rd year PhD student and Associate Lecturer in the School of Computer Science and Applied Mathematics, Scilinx was conceptualised out of a growing need that businesses and industries require high level specialised expertise to develop solutions for their agile, freely evolving operational systems and processes.

“There is a need to design tools that can extract insights while a system is naturally evolving and use these insights to improve performance, productivity and give the business a competitive edge,” says Khoza.

While this sounds easy and self-evident in principle, the emergence of big data has resulted in complex data sets or “big data mess”: large collections of data from varied sources with no prescribed structure.

“This is making it difficult for organisations to create a holistic view of all of its data. Incomplete data leads to inaccurate analysis and very few organisations know the extent to which their analysis and projections are based on incomplete data,” Khoza explains.

Big data sets and unstructured data have completely disrupted how we deal with more complex organisational processes in dynamic environments. “That is why we need cutting-edge research that advances operational frontiers in organisations, primarily by improving information flows,” says Khoza.

In the Scilinx Research laboratory, an interdisciplinary research team conducts fundamental research in operational complexity by using design thinking to solve business problems and propose innovative solutions.

Traditionally, operations research is conducted to improve performance of organisations and businesses by using an engineering approach to problem-solving. The basic assumption in an engineered system is that its behaviour can be fully specified, and thus controlled as a result of this type understanding. In reality, rarely do decision-makers fully know what the system can do, and what a new element or interaction pattern will lead to. Fundamentally, as interconnectedness is strengthening, knowledge of what the predictive outcome will be is increasingly losing efficacy.

“This is not the reality in solving real business problems today,” says Khoza. “Operational systems are constantly subjected to evolution and change due to human factors such as the behaviour of business operators and clients. When dealing with people and business processes you do not really know what is going to happen when you change an element in the process,” says Khoza. All of this is motivating for new ways to process design, and risk management.

Homegrown technologies

“Therefore, Scilinx is using the iterative bottom-up approach from design thinking, where the basic question is how to experiment and learn. You allow the system to operate as a natural system, and let it evolve. As the data gets messy and tangled, you grow technologies that are able to keep up with the messiness of the data.”

“For instance”, Khoza continues, “there is a saying that data scientists spend about 80% of their time cleaning up data, which is not really what they should be doing. We should be asking how to efficiently handle these types of problems”. To extract operational insights from unstructured data can be a time consuming and costly endeavour.

“One of the big problems with data is that you need to source data from people, so there’s still a considerable amount of data engineering work to be done in organisations. In a company, the data that is needed to perform analytics can sit in emails, spreadsheets, someone’s folders, it could be news feeds or social media feeds, and in many more platforms. Thus, one of our core research pillars is the development of machine learning tools that are coupled with well-defined data pipelines. The objective is to quickly and effectively extract the relevant insights and express them in formats that make it possible for various stakeholders, especially executives, to understand the behaviour and structure of their organisations. Our approach to this is the development of semantic networks, and we currently have research projects related to financial markets and Twitter information flows that are based on the construction of these semantic networks.”

The end of the age of silo organisations

The Scilinx Studio is an example of an organisational network where academic researchers and industry partners can come together to form collaborative links to accelerate experimentation in process and technological innovations.

“Firstly, there is an emerging approach to collaborate with domain-expert teams in academic institutions as a means to build organisational capacity and deepen specialisation. Scilinx Studio is drawing on expertise from various disciplines and Faculties at Wits University and beyond, in undergraduate and postgraduate studies. The fluid nature of these relationships makes it possible to not only de-risk experimentation for industry partners, but also enhances their agility while also maintaining attractive operational costs.”

 “Secondly, organsational competitive advantage should be a top priority in interconnected markets, and this cannot be achieved through the adoption of non-differentiating best practice methods. Scilinx Studio offers bespoke research solutions, and research insights that enjoy limited circulation. These informational offerings are a basis for proprietary strategies.”

“And thirdly, on the research front, our efforts at Scilinx Research are ongoing as we try to understand new problems and develop viable solutions that can be prototyped as well. Our objective is not just to discuss ideas, but to develop functional technologies,” explains Khoza.

Call for collaboration

Going forward Khoza says through the Scilinx Studio, his team is currently reaching out to organisations across all sectors to offer its services to advance operational capabilities through cutting-edge research and bespoke business solutions.

The areas of focus are risk management, computational finance, and operational complexity as a general framework to enhance operational effectiveness. Current research initiatives cover the mining sector, financial markets, and the development of advanced insight-generating operational technologies.

Later this year, Scilinx Research and Scilinx Studio will be organising a closed event to showcase its innovation, research and prototypes to company, corporate and industry representatives.

Visit the Scilinx website: [Please note this website is currently only accessible from outside the Wits University network]

Follow on Twitter: @ScilinxResearch