29 September 2022
(This article is best viewed on a desktop or laptop screen, not on mobile.)
A young gentleman called Jacob reached out to me yesterday, because he had questions about a course called “Foundations in AI” which he is doing under the AI for Industry (AI4I) learning programme by AI Singapore (AISG). He contacted me via my Discord group, AI Learners, because he wanted advice from someone whom he thought had already finished the course.
(Funny thing is, he didn’t know that I got stuck. He thought I had already completed “Foundations”, but in fact I was just a little bit ahead of him.)
It turned out that I was able to give him precisely the advice he wanted, because I got stuck at precisely the same point where he is getting stuck now, so I’m able to save him some time by sharing how I got myself unstuck. I decided to post this article because it may help others who might be in the same boat.
The precise point where both of us got stuck is at the Datacamp course called “Preprocessing for Machine Learning in Python“, Chapter 2: Standardizing Data, at an exercise called “Modeling without normalizing“:
At this exercise, to quote the instructions:
The scikit-learn model training process should be familiar to you at this point, so we won’t go too in-depth with it. You already have a k-nearest neighbors model available (knn) as well as the X and y sets you need to fit and score on.
The thing is, if you’ve been following the sequence of courses under “Foundations”, at this point you would not yet have been exposed to scikit-learn or k-nearest neighbors. You wouldn’t know what the heck to do or how to carry on at this point.
You would be thinking, Wait a minute, something’s wrong here!
The answer to your confusion lies on the main page of the course. Just scroll down a bit and look in the right-hand-side column. It says “PREREQUISITES“.
Under the Prerequisites, you’ve done “LIB-1: Cleaning Data in Python”, but you haven’t done “SUP-2: Supervised Learning with scikit-learn” yet. That’s actually further down in “Foundations”.
Here’s the thing. When you sign up for the AISG premium membership to do AI4I, it comes with a one-year subscription to Datacamp (using a discount code which they send to you). Most of the courses under “Foundations” are actually Datacamp courses. “Foundations” basically curates Datacamp courses for you to do in the prescribed sequence.
But why is “Foundations” asking you to do a Datacamp course which has a prerequisite which “Foundations” is asking you to do AFTER, not BEFORE?!! That doesn’t make much sense, and in fact causes frustration for learners.
With all due respect to the AISG folks, I think it might be because they learned all this ML stuff eons ago, so there’s no need for them to do any of the Datacamp courses themselves. They probably decided to curate the Datacamp courses in a sequence which they think makes sense to them, and didn’t notice that Datacamp has its own logical sequence with course prerequisites.
I first noticed something was amiss when I was doing the second set of courses under “Foundations”:
AI4I-2: Libraries and Data Manipulation LIB-1: Cleaning Data in Python LIB-2: Manipulating Time Series Data in Python LIB-3: Manipulating DataFrames with pandas LIB-4: Merging DataFrames with pandas
The 2nd course (time series) was suddenly a lot harder (it actually took me 5 weeks!), but after that, the 3rd and 4th courses were a lot easier. In fact, the 3rd and 4th courses would have helped with the 2nd course if I had done them first. (I didn’t notice the prerequisites!)
It was like I was asked to climb a towering brick wall, but after I managed to climb over it with great effort, then I found that there was a ladder on the other side of the wall!
If I could go back in time and to do this set again, I would follow the proper logical sequence of progression, which is this:
LIB-1: Cleaning Data in Python LIB-3: Manipulating DataFrames with pandas LIB-4: Merging DataFrames with pandas LIB-2: Manipulating Time Series Data in Python
This isn’t the only part of “Foundations” which is out of sequence. Many of the other courses are also jumbled up.
So, what’s the solution?
It took me a long time to discover that Datacamp has this career track called “Machine Learning Scientist with Python“.
If you compare the “Machine Learning Scientist with Python” course listing (which has the proper logical sequence of progression) and the “Foundations in AI” course listing (which has the wrong sequence of progression), you will realize that there is a lot of overlap.
I’ll save you some time by listing both below for you to compare.
First, the “Foundations in AI” courses. Most of these courses overlap with the Datacamp career track. I indicate the non-overlapping courses with a * mark:
AI4I: "Foundations in AI" course listing (2021) * Not found in Datacamp AI4I-1: Introduction to Python IPY-1: Introduction to Python IPY-2: Intermediate Python IPY-3: Python Data Science Toolbox (Part 1) IPY-4: Python Data Science Toolbox (Part 2) AI4I-2: Libraries and Data Manipulation LIB-1: Cleaning Data in Python LIB-2: Manipulating Time Series Data in Python LIB-3: Manipulating DataFrames with pandas LIB-4: Merging DataFrames with pandas AI4I-3: Exploratory Data Analysis EDA-1: Introduction to Data Visualisation in Python EDA-2: Preprocessing for Machine Learning in Python EDA-3: Feature Engineering for Machine Learning in Python EDA-4: Feature Engineering for NLP in Python AI4I-4: Statistical Thinking STAT-1: Statistical Thinking in Python (Part 1) STAT-2: Statistical Thinking in Python (Part 2) STAT-3: Dimensionality Reduction in Python AI4I-5: Supervised Learning SUP-1: Setting up your Machine Learning Environment * SUP-2: Supervised Learning with scikit-learn SUP-3: Regression * SUP-4: Classification * SUP-5: Machine Learning with Tree-based Models in Python SUP-6: Model Validation in Python AI4I-6: Unsupervised Learning UCLU-1: Unsupervised Learning in Python UCLU-2: Cluster Analysis in Python UCLU-3: Unsupervised Learning * AI4I-7: Deep Learning DPL-1: Introduction to Deep Learning in Python DPL-2: Introduction to Tensorflow in Python DPL-3: Introduction to Deep Learning with Keras DPL-4: Deep Learning [outdated: to be updated by AISG] * AI4I-8: Other Programming Languages and Tools to Learn LATO-1: Introduction to SQL LATO-2: Joining Data in SQL LATO-3: Introduction to Shell LATO-4: Introduction to Git LATO-5: Hyperparameter Tuning in Python AI4I-9: Data Science and AI in the Real World DSPM-1: Data Science Project Lifecycle * DSPM-2: Setting up your own AI System using an AI Makerspace ‘brick’ *
And the Datacamp career track is below. Most of these courses overlap with “Foundations in AI”. I indicate the non-overlapping courses with a # mark:
Datacamp Career track: Machine Learning Scientist with Python # Not found in "Foundations in AI" (Before starting on this track, do the prerequisite, "Introduction to Statistics in Python") 1. Supervised Learning with scikit-learn 2. Unsupervised Learning in Python 3. Linear Classifiers in Python # 4. Machine Learning with Tree-Based Models in Python 5. Extreme Gradient Boosting with XGBoost # 6. Cluster Analysis in Python 7. Dimensionality Reduction in Python 8. Preprocessing for Machine Learning in Python 9. Machine Learning for Time Series Data in Python # 10. Feature Engineering for Machine Learning in Python 11. Model Validation in Python 12. Skill Assessment: Machine Learning Fundamentals in Python # 13. Introduction to Natural Language Processing in Python # 14. Feature Engineering for NLP in Python 15. Introduction to TensorFlow in Python 16. Introduction to Deep Learning in Python 17. Introduction to Deep Learning with Keras 18. Advanced Deep Learning with Keras # 19. Image Processing in Python # 20. Image Processing with Keras in Python # 21. Hyperparameter Tuning in Python 22. Introduction to PySpark # 23. Machine Learning with PySpark # 24. Winning a Kaggle Competition in Python #
When you do “Foundations in AI”, you don’t have to do the courses in the top-down sequence. You can actually do the later ones first.
So, what you can do is to follow the logical sequence of the Datacamp Career track, and when you finish each course, go back to “Foundations” and mark it as completed.
You can start on “Foundations” from the beginning and carry on until you finish “EDA-1: Introduction to Data Visualisation in Python”. Then, do “Introduction to Statistics in Python“, and then follow the Datacamp career track from this point onward.
I recommend that you do the entire Datacamp career track, because it has useful courses (the non-overlapping ones) which are missing from “Foundations”.
Likewise, you can do the non-overlapping courses in “Foundations”, according to whatever sequence you think would work for you.
Some additional notes:
- One of the courses under “Foundations” — “DPL-4: Deep Learning” — is outdated. As of June 2022, there is note saying that this course will be updated, but the note has been there for more than half a year and the course is still not updated.
- Two of the Datacamp courses listed under “Foundations” — Statistical Thinking in Python (Part 1) and (Part 2) by Justin Bois — are actually a bit old, with a function argument (normed=True) which is deprecated if you check the matplotlib documentation. Nevertheless, they are excellent and worth doing if you want to be thorough. These 2 courses used to be the prerequisites for “Supervised Learning with scikit-learn“, but as of May 2022, Datacamp has updated it with a new (and shorter) prerequisite, which is “Introduction to Statistics in Python” by Maggie Matsui.
- Notice the last course in the Datacamp track prepares you for Kaggle competitions. That’s a strong motivation to complete the track!
Hope this is useful for anyone out there who is similarly struggling with “Foundations in AI” because of the sequence.
If you benefited from this article, I’d love to hear from you. Please drop me an email or post a comment on my Discord group, AI Learners. I started the group for folks who are preparing for AIAP, and I’ve posted several learning resources (courses and books) which hopefully would be useful to you.
If you reach out to me, I’ll be happy to give you more suggestions which I haven’t had time to write as a blog article yet, which might help to speed up your learning.