The industrial world gave rise to industrial research labs where innovation and quality control took place. In today’s Data Driven century the rise of the Data Labs is evident. Remember Bell Labs or the NatLab of Philips? They were the innovative workspaces that attracted great minds of last century where orbit-shifting innovations took place. In today’s data savvy century all companies and governments are seeking to explore, find and execute data driven improvements since Thomas H. Davenport took analytics to the management level as the new science of winning . Data Scientists became the new rock stars and their scarcity is evident. So why scatter their talent over various single laptops? Why waste the track record of every data experiment when a Data Scientist decides to leave for a new horizon? Why not have reproducibility and version control in place to answer to regulatory requirements ? Or answering "how did you do this" questions from the board ? And why does it takes ages to put your new predictive model into production ?
There’s a Data Science platform rising and its name is Data Lab
As a concept, the Data Lab can be regarded from both a Technical as an Organizational context. As usual, the process level will give most clarity to what it should entail so let’s have a look at the elementary process steps that every Data Scientist takes today: Being created over 20 years ago to popularize the then nerdy niche of Data Mining, the six elementary steps of the CRISP-DM diagram are still at the core of what every Data Scientist does every day. The dialogue between Business Understanding and Data Understanding is at the core of understanding every Data Science challenge and defining a meaningful scope. The next steps of Data Preparation, Modeling and Evaluation are the essence of every Data Scientist's job. Deployment is usually the area where a Data Engineer steps in. So how to envision a new platform that supports Data Science in every step? At Xomnia we took a good look at all the functionality that we wanted to support every step of this Data Science process and came up with this schema:
A Data Lab should support every step of the Data Science process
So for starters, a Data Lab should be much richer than being just a Hadoop cluster. It should at least contain functionality to collaborate, test and evaluate Predictive Models all on a secured platform. And not the least it should include a pre-production facility to test drive the business value of the predictive model created. And wouldn’t it be nice if these functional building blocks are integrated in a cohesive way to provide an almost seamless user experience? This is exactly what Xomnia has been looking for around the world. We're delighted to share with you what we've found.... we'll share it with you soon!