This might sound as silly as the same question asked about physics or chemistry. Or not? I am not going to speculate about the peaking of interest in various disciplines in certain time periods. The nearest analogy to what is happening with data science is what happened to applied math – at least the part of it related to solving differential equations. In the applied math world, this part eventually became known as ‘computer modeling’. The discipline that was a science for centuries now solves engineering tasks.
For example, imagine you want to design a new water pump. The equations that describe water flow through a network of pipes are not easy to solve. However, with modern software, a user does not even have to know these equations. Literally anyone can buy engineering software with a library of typical pumps, set the desired parameters, and the software will generate an answer. Similar things are happening with Data Science. In fact, they have already happened. I am a very down-to-earth person. All of you know that Siri and the rest of the AI assistants are already ‘alive’. Most of you know that ‘cognitive computing’ is here. Some of you know that some software packages are so powerful that they can automate a substantial amount of a data scientist’s work. And all data scientists know that growing a decision tree is often nothing more than executing a line of code. But here’s what triggered me to write this post.
Lately, I felt that an average Joe can generate a reasonably good predictive model without knowing much about the math behind it. Today, most data scientists are not really scientists, but engineers. I’m not talking about mathematicians who work in the field of machine learning and statistics. It takes years to get through all the theory, and then years of work ‘in the field’ to be able to create a better solution to a problem. But for the most part, engineering solutions have become affordable and accessible, both mentally and financially. The following example explains the point.
Let’s say you want to predict the churn of customers in your organization. You have a list of records with customer attributes, and a flag that shows whether the customer is with you or has left for some reason. You want to know why the customers are leaving and how to prevent it. For that, you can download a demo version of RapidMiner and create a solution using a tutorial and a subset of your customer list that is allowed by the terms of the free trial. After the software suggests a sequence of data transformations and the best algorithm for your task, you only need to write it down and recreate the sequence in free software like R. That’s it. Without much thinking and no cost, an average Joe can create a machine learning model and prove his data science title to his boss.
Of course, this example illustrates a lousy use of predictive analytics. Most likely it will not work well enough, since the art of application only comes with practice. But the tools are there. Building a Bayesian classifier is not data science anymore – at least for those who don’t know that Laplace mathematically formulated the Bayes rule.