How to learn data science in 2021 (resources to get a job)
I want to focus on resources you all can use to improve your technical skills and even more specifically resources to get your first job in the field. Because that’s the hardest part — getting your first job. Once you have that, you’ll learn the skills you need so fast, that you don’t need people like me giving you advice.
It’s really hard to learn data science and actually be good at it, because there’s a long laundry list of things you need to know to be a good data scientist. Some of those topics are:
- Programming skills for data analysis and machine learning
- Statistics, probability, and machine learning theory
- Product sense and business knowledge
If you look at those 3 topics, it’s almost like you need 3 different people. You have a software engineer, a mathematician, and a business person with an MBA.
So how do you learn data science to be proficient enough to actually land a job? Let’s take a deeper dive into this topic. You can also head over to the video version of these tips here.
So let’s get into these 3 topics and talk about resources that can help you improve your skill set.
It’s probably the hardest and most time consuming to learn. What’s hard about programming isn’t learning the syntax of say SQL and python, it’s actually about how to approach solutions and implement them.
Within programming, you have data analysis and machine learning.
Data analysis is all about being able to pull and manipulate data, and generate insights and recommendations. You’ll need to know both SQL and another scripting language, usually python or R.
Everyone’s going to tell you to do projects to get better, and I agree with them and will talk about this more later in the video, but let me give you another piece of advice — try doing interview questions to get better. What better way to succeed in an interview than by doing interview questions to get better at data analysis. The main benefit here is that you’re solving problems that are relevant to industries and companies. So when you’re interviewing, you’ll basically be ready and able to answer most questions easily because you’ll have mastered the necessary technical skills that companies want you to have before working for them.
There are so many platforms out there that can help with interview coding practice. The most popular is LeetCode. You probably know this already. But it’s tailored for software engineers so take that with a grain of salt when you’re doing the problems. There’s also StrataScratch, which is a platform designed specifically for data scientists.
So for data analysis, I’d suggest to really master both SQL, and either python or R, and do as many relevant interview questions as possible to understand what companies are looking for in candidates.
Machine learning specifically implementing machine learning models is another programming skill you need to have. You’ll usually need to know python or R really well, and understand the data science workflow to build and implement these models. This is where I’d recommend doing projects. There are so many places where you can find projects like on Kaggle. Find a project there, grab the dataset, install Jupyter notebooks, do the project, and try talking to people to see what you can do to improve.
Another resource is confetti.ai which has a machine learning type questions to help you get better at implementing machine learning models. They have practical examples that require coding as well as theoretical questions to help you understand what the model’s actually doing.
Learning how to implement machine learning models is probably where I’d spend most of my time to be honest. And it’s not because you’ll be implementing ML models every day as a data scientist, it’s actually to learn the data science workflow in terms of pulling data, manipulating data, feature engineering, model implementation, model optimization, and recommendation. Being good at that workflow, understanding why you’re making certain decisions, and why you’re making a recommendation is something you’d do everyday on the job and you need to be really good at it. This topic takes a long time to get good at.
Statistics / Probability
Data science is statistics in a nutshell. If you’re not implementing an ML model or regression, designing experiments, then you’re an analyst, not a data scientist. Everything’s statistics, so let’s break it up into how to better understand statistics for data science.
I just talked about implementing ML models right? So, what are ML models? They’re just statistical models. And as someone that builds them, you’d kind of want to know how they actually work. So for me, as I was doing projects and building out my models — ML and even regression models — I was reading about the underlying theory and math about these models. It allowed me to better understand the underlying assumptions of the model, which helped me better clean my data and design my features. This in turn, helped me develop more accurate models. Interviewers are 100% going to ask you all about ML and regression theory because if you don’t know why you’re doing what you’re doing, then no one can trust any of your results and recommendations.
So where do you go to learn about ML and regression theory? The best resources I’ve found are through Google searches that might take you to Medium or Wikipedia or some other authoritative site you trust. It doesn’t really matter which site, as long as you trust the content. You should read the articles to get a better understanding of the underlying theory.
One site, I used to use a lot for interview practice specifically, is Brilliant.org. This site is good because their questions are similar to questions you might get in an interview. Just like you’d use LeetCode or StrataScratch to get better at programming, you can use Brilliant.org to get better at statistics and probability.
So in summary, learning statistics and probability is a matter of
- Learning the theory behind ML models and regression
- You can do this through projects where you’re implementing models and reading about the underlying theory of each model
- Getting good at interview questions
- . You can get better at this through platforms that specialize in stats like brilliant.org
This is a non-technical concept that you’ll need to learn to be a data scientist. What is product sense? It’s similar to product management — looking at the problem and making decisions through a business lens.
It deals with questions like:
- How would you measure the success of different parts of the product?
- How would you tell if a product is performing well or not?
Why do you need to know this information as a data scientist? Because it helps you figure out how to approach and analyze a problem, in order to make a recommendation to solve the problem. If you’re not optimizing for the business/product, you’re optimizing for your model, and you don’t need to make your model 100% accurate in order to drive business impact.
How do you get better at product sense? For me, it was reading product management case studies to understand how PMs think and make decisions. There are case studies for this but there are also videos and platforms. Youtube has free PM videos and there’s also a popular channel and platform called Exponent where you can learn about product management. PM skills translates very well to data science product sense.
To summarize, I learned data science by breaking down the topics by:
- Product sense
Within both programming and statistics, you have machine learning and regression, both learning how to implement it and the theory behind it. So there are really 4 different topics that a data scientist should know. It’s hard and takes a while to get good at it.
Take a look at the resources. I’ve used all of them in the past and have thought they were valuable in my journey to becoming a data scientist. If you’d like to watch the video version of this article, head over here, https://www.youtube.com/watch?v=0GpgMvyN0Fg