Recently I was referred to a nice article talking about the relationship between Statistics and data science. Here is my feedback to share with you:
- First of all, Statistics is a science dealing with data, including five main components, data collection (design of experiment, sampling), data preparation (storage, reading, organization, cleaning), exploratory data analysis (numerical summarization, visualization), statistical inference (frequentist and Bayesian) and communication (interpretation).
- It’s statistician’s mistake putting extremely unequal weights on the development of the five components in the past 50 years, mostly focusing on the fourth component.
- Fortunately, the first component is now showing resurgence under the massive data situation. How to sample the “influential” data points from massive samples is a big and important research topic.
- People outside of traditional statistics community have been picking up the second and third components, like adopting the two undeveloped statistics children. And the adoptive parents are saying that the two children are not statistics, and instead they call them data science.
- But Statistics is really about all of the five equally important components.
- And our Statistician’s goal is to get the two children back to our statistics community. We are all Statistician!