Ramgopal Prajapat:

Learnings and Views

6 Hats of Data Scientist

By: Ram on Oct 02, 2022

There is a long list of questions people have around Data Science and Data Scientists considering huge curiosity and interest around AI and ML. In this blog, I am sharing my perspective on

  • What does a data scientist do?
  • What is data scientist?
  • What does a data scientist do daily?
  • What are data scientist skills?

A role of data scientist changes over a data science project life cycle and we will depict the key activities a data scientist is involved in.

Data Science Project Life Cycle

  1. Business Problem Identification & Business Case Development: A Consultant
  2. Hypothesizing, Data Identification and Data Preparation: Data Engineer and Data Tester
  3. Model Development and Validation: Data Scientist and Model Tester
  4. Communicating Model and Its Value: Data Storyteller
  5. Model Deployment and Scaling: ML Engineer

Now we will describe key hats each of the data scientists will wear as he/she plays different roles. We call these 6 hats of a data scientist.

Diagram

Description automatically generated

Consultant

Before a data science project is picked up for the development, data scientist needs to understand business context and formulate a business statement. Also, we need to define an overall approach to solve business problems using Machine Learning & Data Science methodologies.  If there are multiple potential use cases, we may need to measure expected business value to prioritize the right ones.

  • Identify business problems
  • Prepare list of use-cases
  • ML based solution approach
  • Create a business case
  • Estimate value – cost saving or increment revenue
  • Prioritize right use-case

Data Engineer

Once a project is finalized, we need to start the work and one of the first steps is to think about data. A data scientist should plan about data requirement and preparing the data for the model development.

  • Hypothesis data requirement
  • Data source identification
  • Data Preparation
  • Exploratory Data Analysis (EDA)
  • Data Quality assessment

 

Data Tester

Critically evaluating data and steps at every phase is required. Checking the quality of input data, whether the data is good to use or after every step the output data is logically correct is fundament expectations from a data scientist.

  • Evaluating data source quality
  • Checking for logical data structure – unit of modelling or data analysis
  • Testing data output at every stage of data preparation

Testing role is also very critical once model is developed, and focus is becoming validating the model for its performance.

  • Data Science/ML Models Validation

Once Model is deployed, again the data scientist is in best place to evaluate if everything is working as per expectations.

 

Data Science

Core role of a data scientist is to develop ML Model. Before the model development, exploring different ML techniques that can be used, and architecting overall approach of Data Science Model development

  • Creating overall solution for a business problem
  • Research and Explore ML Algorithms
  • Develop ML Models
  • Parameter Tuning
  • Validate and Review Model Performance & Results

Data Storyteller

Once a model is developed, we need to share the results with key business stakeholders and communicate the results. This helps in getting required sponsorship and visibilities for the project.

  • Telling Story from the data and results
  • Get strategic buy-ins for the Models and its strategic importance

ML Ops Engineer

Once a model is developed and received the approval from the key business stakeholders, we need to work with technology team to get it deployed or put into production. In some of the large organizations, the ML Ops team may be taking care of deploying the models. But having the ML Ops skills may be helpful to the data scientists in deploying the models or getting these deployed.

  • Prepare plan for deploying the models
  • Preparing end to end data pipelines, decisions on refresh/scheduling and monitoring structure for the model/s
  • Work with Platform and Engineering teams to get the ML Model deployed

Conclusing Thoughts

Hope you find these views and details helpful in understanding the role played by a data scientist in an organization. Also, some of the key skills required for the data scientist to do the job.  For carrying out these activities, you may be using single or multiple technical tools and platforms.

 

Leave a comment