Sunday, February 28
  • Home
  • Archive
  • Search
Publish Articles Free
Like Haha Love Sad Angry
  • Home
  • Categories
    • Funny News
    • Viral
    • Featured
    • Gallery
  • Contact Form
user avatar
sign in sign up
A password will be e-mailed to you.
Lost password Register Login
or

3 Common Data Science Concepts tested on all interviews

Avatar Rakib Raihan
January 26, 2021
315 views

There are usually several concepts interviewers are testing for on data science interviews but since they might only have time to ask 1-2 questions, they’ll try to pack the concepts into one question. So it’s important to know what these concepts are so you can look out for them in an interview.

 

So what are they really testing for? Really what an interviewer is looking for are interviewee’s with an in-depth understanding of metric design and implementation of a real-world scenarios that would be present in the data. The key phrase here is “real-world scenario”, which means that there are probably going to be multiple edge cases and scenarios you’ll need to think through to solve the problem. There are 3 common concepts that they test for that test your understanding of how to implement code that solves real-world scenarios.

 

Since they only have time to ask 1-2 questions in an interview before their time is up, you’ll often see all 3 concepts wrapped in one question. I see this question, or a version of this question, (https://platform.stratascratch.com/coding-question?id=10300&python=) on almost every interview I’ve been on or given. Follow along with me and see if you would be able to answer this question.

 

The 3 concepts you need to know are CASE statements, JOINs, and subqueries/CTEs. Let’s go through a real interview question that cover these 3 concepts and talk about them in-depth. The link to the question is here ((https://platform.stratascratch.com/coding-question?id=10300&python=) if you want to follow along.

Aggregates from CASE STATEMENTs

You’ll likely get some sort of categorization question where you need to categorize data based on values you see in the table. This is super common in practice and you’ll likely always be categorizing and cleaning up data. So a CASE statement is the simplest technique to test for.

 

Add the addition of aggregates like sum() and count() and they’ll be testing to see if you actually know what is being returned in a case when, not just the implementation of it. Based on the case statements, you can always add an aggregate functions like a count or a sum.

 

Here is an example of a CASE statement with a simple aggregation in the SELECT clause for the question. 

 

You see in the CASe statement below, we’re categorizing users based on if they are paying customers or mot. We then apply a sum() as it’s a quick way to count the number of paying customers vs non-paying customers in one simple query. If we did not have the CASE statement, it would take us two queries to find both numbers.

 

SELECT date, sum(CASE

WHEN paying_customer = ‘yes’ THEN downloads

END) AS paying,

sum(CASE

WHEN paying_customer = ‘no’ THEN downloads

END) AS non_paying

FROM ms_user_dimension a

 

JOINs

The 2nd concept is JOINing tables. Can you join tables? This is the lowest bar you need to jump over to be an analyst, much less a data scientist. This bar is basically on the ground so you can really just step over it.

 

So on interviews — do they usually do a LEFT JOIN, CROSS JOIN, INNER JOIN? Most of your work will be using a LEFT JOIN so they’re testing you based on practicality. You’ll almost never use a cross join. You’ll use an inner join quite a bit but left join is slightly more complicated so they’ll use that just as an additional filter.

 

Self joins are common because it’s not always obvious you’d be using that. But they’re common in practice.

 

In the below example, we’re joining tables to the CASE statement. We’re joining two tables to our main table using a LEFT JOIN.

 

SELECT date, sum(CASE

WHEN paying_customer = ‘yes’ THEN downloads

END) AS paying,

sum(CASE

WHEN paying_customer = ‘no’ THEN downloads

END) AS non_paying

FROM ms_user_dimension a

LEFT JOIN ms_acc_dimension b ON a.acc_id = b.acc_id

LEFT JOIN ms_download_facts c ON a.user_id=c.user_id

GROUP BY date

ORDER BY date

 

Subquery/CTE

The last common concept is a subquery/CTE, basically some concept where you’re doing some work and then need to do more work on it. This is testing to see if you can break up your problem into logical steps. Some solutions take more than one step to solve so they’re testing to see if you can write code that follows a logical flow. Not necessarily complicated or complex, but multi-step and pragmatic. This is especially useful in practice because you’ll 100% be writing code that’s over hundreds of lines long and you need to be able to create solutions that follow a good flow.

 

In the below example, I’m taking the query we wrote above and putting it in a subquery so that we can query its data. This way we can apply an additional filter in the HAVING clause and keep the entire solution to one query.

 

SELECT date, non_paying,

paying

FROM

(SELECT date, sum(CASE

WHEN paying_customer = ‘yes’ THEN downloads

END) AS paying,

sum(CASE

WHEN paying_customer = ‘no’ THEN downloads

END) AS non_paying

FROM ms_user_dimension a

LEFT JOIN ms_acc_dimension b ON a.acc_id = b.acc_id

LEFT JOIN ms_download_facts c ON a.user_id=c.user_id

GROUP BY date

ORDER BY date) t

GROUP BY t.date,

t.paying,

t.non_paying

HAVING (non_paying – paying) >0

ORDER BY t.date ASC

 

Those are the three most common concepts that are tested during data science coding interviews. They’re common concepts because they appear on the job almost everyday, and to be a successful data scientist, you need to learn how to implement and code these solutions. If you want more information on these three concepts, feel free to watch the video here https://youtu.be/lG0PbUq4wkg

 

Categories: Technology
Tags: Data Science data science interviews

Leave a reply

Cancel reply

Your email address will not be published. Required fields are marked *

Post reactions
Like (1)
Haha (0)
Love (0)
Sad (0)
Angry (0)
Related Posts

Which is the best fitness band under 2000 in India?

February 26, 2021

Cloud Security Market Research Report, Size, Share, Industry Outlook – 2021-2027

February 26, 2021

Blockchain scalability issues and the solutions offered by Layer 2 protocols

February 26, 2021

How To Resolve Canon Printer Offline Issue? A Guide to Fix it

February 26, 2021

Why Wooden Watches are the Perfect Accessory

February 26, 2021

Is McAfee Live Safe Better Than Total Protection?

February 26, 2021
Post reactions
Like (1)
Haha (0)
Love (0)
Sad (0)
Angry (0)
Related Posts

Which is the best fitness band under 2000 in India?

February 26, 2021

Cloud Security Market Research Report, Size, Share, Industry Outlook – 2021-2027

February 26, 2021

Blockchain scalability issues and the solutions offered by Layer 2 protocols

February 26, 2021

How To Resolve Canon Printer Offline Issue? A Guide to Fix it

February 26, 2021

Why Wooden Watches are the Perfect Accessory

February 26, 2021

Is McAfee Live Safe Better Than Total Protection?

February 26, 2021
Recent Posts
  • Passive Authentication Market – Global Industry Analysis
  • Global Automotive Airbag and Seat Belt Market
  • Global Medical Equipment Maintenance Market
  • Five Roofing Tips for New Homeowners
  • Global Mobile Photo Printer Market : Industry Analysis
Archives
  • February 2021
  • January 2021
  • December 2020
Categories
  • Adult
  • Automotive
  • Business
  • Crypto
  • Education
  • Entertainment
  • Family
  • Featured
  • Finance
  • Fitnesses
  • Gallery
  • Health
  • Internet
  • Law
  • Press Releases
  • Technology
  • Travel
  • Uncategorized
  • Viral
Copyright 2020 © ArticlePole | All Rights Reserved.
  • Home
  • Archive
  • Search
We use cookies on our website to give you the most relevant experience by remembering your preferences and repeat visits. By clicking “Accept”, you consent to the use of ALL the cookies.
Do not sell my personal information.
Cookie settingsACCEPT
Manage consent

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.
Necessary
Always Enabled

Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.

Non-necessary

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.

SAVE & ACCEPT
Go to mobile version