My Data Science Philosophy
Data Science Philosophy
I am a simple man. I believe that ultimately, data science is about answering questions and providing value to the analytics consumer. At the end of any project, if the needs of the end analytics user aren’t being met, then we are failing in our task. This starting point is tool agnostic and fairly banal as a philosophy goes. However, this is only the beginnings of my philosophy.
My philosophy boils down to five key points:
-
Meet the needs of the end user
a. Any solution must be the minimum to meet their needs
b. Any solution must be delivered on a timeline that is reasonable
-
Understand the User
-
Involve the User
-
Understand the Environment
-
Impact > “Cool”
a. Don’t just think about “sexy” technology
b. Ethics Matter
Meet the needs of the end user
To be successful in data science, we need to meet the needs of the end user. This means that if we aren’t answering the questions they have or supporting their work then we are failing. To that end, we need to design our processes such that they are likely to lead to success. This is especially important when we consider that a high percentage of data science projects end in failure. In this environment, it’s hard to say the needs of the end user are being met if a high percentage of projects fail. A good part of this in my opinion is that these projects often suffer from scope creep and fail to match what is possible with data science.
Part and parcel with this is ensuring that we are meeting their needs on a reasonable timeline. Analytics delivered a long time after they are needed are neither terribly impactful nor reliable. This of course demands that the user needs to give a reasonable amount of time to meet their needs. While business and organizational needs can shift, demanding a complex analytics project be completed with a short turnaround is probably not reasonable and unlikely to succeed.
Understand the user
For an analytics product to be impactful, it needs to meet the user where they are at. This demands that we understand the user and their needs on a deep level. Who are they, what is their level of education with analytics, what is their comfort level. All of this needs to go into designing an analytics offering that they will be capable of making use of. The “perfect” analytics solution that the user doesn’t understand (nor understand how to interact with) is hardly perfect. Thus, this is fundamentally a matter of user-centered design and following all of the best practices found within that rich tradition.
Involve the user
For the user to be happiest, we need to make sure we’re communicating with them throughout the process of designing the analytics offering. We need to make sure they understand what we can and what we can’t do. Involvement ensures that they are most likely to get something that meets their needs, is within what we can manage while balancing demands, and perhaps most importantly, a solution that they are more likely to trust. While few environments are quite like the medical field, the growing practice for analytics within healthcare is to involve practitioners in the process of developing solutions so they can have a trust for it. This trust and understanding is important because along the way, we’ll be creating people who will act as cheerleaders for analytics within our organization. In the end, this will lead to a greater impact than we would have on our own.
Understand the environment
Successful analytics demands an understanding of the environment. It needs to take into account the organizational culture, the political reality, and power dynamics. It also needs to take into account extra-organizational factors. Economic, political, and regulatory pressures can have an impact and need to be taken into account. Don’t be the person who does something stupid by violating a law around data. The legal environment is an important one and while you don’t have to be a lawyer and likely aren’t, it’s important to understand what’s going on.
Impact > “Cool”
Impact of analytics is more important than using “cool” solutions. If a simpler model can be used with the same impact as a more complex model, then there is no need for a complex solution. Similarly, if all that is needed is a few visualizations, then making a model is overkill. Just because we can do something doesn’t mean that we should. Since our focus is on impact, we shouldn’t chase after the latest shiny toy in analytics. Because of this, my definition of data science is broader than some and includes any and all solutions to data-centric problems.
Going along with this idea of impact, we need to think about our impact in other ways. Ethics matters both in the present and in the future. We need to ensure our work is ethical and has the maximum positive impact on the world while minimizing our negative impact on the world. This demands that we think broadly about the impact of our offerings on all stakeholders. Analytics can have an outsized impact on people’s lives and we need to keep that in mind as we do our work. Ethics isn’t just something someone has, it’s what we do and it needs to permeate our work.