This blog article refers to the introductory “AI4I® – Literacy in AI” module of the excellent online course, AI for Industry a.k.a. AI4I, offered by AI Singapore.
(This introductory module is free. The next part of AI4I, Foundations in AI, requires you to pay for a premium membership. It costs about S$150 and is well worth it, because it comes with a 1-year subscription to Datacamp.)
The “Literacy in AI” module covers 3 sections:
- AI Basics
- AI Ethics
- Hands-on practice
I will be focusing here on the last section, the Hands-on practice using Orange Data Mining, an open source machine learning and data visualization software which you install on your desktop or laptop computer.
For your reference, here’s the link to the Orange Hands-on practice section: https://learn.aisingapore.org/courses/ai-for-industry-part-1/lessons/ai4i-basic-practice-orange/
If you are like me and new to Orange, you may run into some roadblocks when doing this tutorial. I managed to work my way through it by trial and error, and I’m sharing my experience here for others who might get stumped.
There are 3 main points which I will cover in this article:
- Installing Orange Data Mining
- Loading the data: define the column types
- Logistic regression error: “Data has no target variable”
1. Installing Orange Data Mining
The easiest way to install Orange is by installing Anaconda Navigator, which comes with a host of data software including Python and Orange.
At the time of writing (mid-December 2021), the current version of Orange is 3.26.0. Unfortunately, after I installed this version via Anaconda, it crashed due to a dependency bug. I resolved this by installing an older version, 3.23.1.
If you run into this problem, you can remove the entire virtual environment containing 3.26.0, and then click on the gear icon shown below to install a specific older version of Orange.
2. Loading the data: define the column types
When you get to page 13 of the slides, “Step 1: Read in the data”, you will be instructed to add the “CSV File Import” widget, double-click on it, and select the CSV file. Then you can view the data by using a “Distributions” widget:
Notice from the slide above that when you open up the Distributions widget, you are supposed to see a bar chart with 2 blue columns. But when I opened up mine, it was empty. Why?
I didn’t know what to do, so I ignored this for now and moved on to the next slide, which was to add the “Select Columns” widget and set the “label” column as the target variable:
Looking at the slide, it seems that in the “Select Columns” dialog box, you should be able to move the “label” item into the “Target” field, but I could not do so. In my “Select Columns” dialog box, the “>” arrow which is supposed to let me move the “label” into “Target Variable” was greyed out, and nothing happened when I clicked on it, as shown below:
So I was stuck, until I googled it and realized there was a missing step. Basically, the columns in the dataset needed to be assigned the proper column types.
So, I had to go back to the previous step and open up the “CSV File Import” widget again to adjust the “Import Options”:
“Import Options” opened up another dialog box, shown below:
I selected the “label” column by clicking on it, and I changed its “Column type” to “Categorical”. Next, I selected the “text” column and I changed its “Column type” to “Text”, like this:
Then I clicked “OK”.
This fixed the problem for me. I was able to go back to the “Select Columns” widget and move “label” into the “Target Variable”. Also, the missing blue columns now appeared under the “Distribution” widget. Yay!
3. Logistic regression error: “Data has no target variable”
When you get to page 24 of the slides, you will be instructed to add the “Logistic Regression” widget.
You may or may not run into this problem, but when I got to this step, it didn’t work because I saw an alert saying “Data has no target variable”.
Through trial and error, I got it to work by going back to the “Select Columns” widget and clicking on the “Send” button, shown below.
Strangely, when I did the steps again later using the larger dataset (containing 1000 SMSes), I did not get this error.
There you have it. I have described 3 roadblocks which I ran into but managed to resolve. Hope this helps someone out there and saves you some time.
If you have any questions or comments about this article or doing AI4I, please feel free to reach me via my contact page here.