Introducing Google's Teachable Machine

Learn machine language basics without the coding

Jun 20, 2024

Google offers a free “AI for Anyone” course that is very worthwhile. It explores a broad range of topics in AI, including how AI models are trained to recognize images, sounds, and other data that can be categorized. The course also addresses some of the current issues in AI that seem to get a lot of media exposure. And one of the projects that is well suited for classroom use is the “Teachable Machine” - a way to experiment with machine learning (ML) without having to learn programming or backpropagation techniques.

You’ll find a customizable activity under “Lesson Ideas” that you can use in your 7th - 12th grade classroom. What I found most interesting about the activity is NOT that you can “teach” the program to recognize cats vs dogs, or different types of fruit, or traffic signs, or zoo animals, but that you can experiment with the ML concepts of overfitting, underfitting, and bias. Briefly, overfitting is when the ML model works well on the test data, but does not correctly identify new data correctly. Underfitting is when the ML model doesn’t even do a good job identifying the test data, let alone any new data. And bias is introduced whenever your test data is not representative of the whole set of data - examples of this include misidentifying faces that have a dark skin tone, or showing images of men when searching for “entrepreneurs”.

In any ML model, the training occurs with about 70% of the data. If it’s images of pets, then the pictures include a label of what the pet is - cat, dog, turtle, ferret, etc. And the other 30% of the data is reserved for testing to see how the model performs. Properly trained, we expect ML models to perform at 95% accuracy or higher, with an expectation of near 100% accuracy when the results are literally life and death - eye disease, cancer, tumors, etc. You might get a chuckle if the model tells you to put glue on pizza (not really life threatening, if you have common sense), but if it tells you that a dark spot on a CT or MRI is NOT CANCER, it better be right, because your life is at stake.

There are several parts to this exercise, and it will make a great group activity for learning about ML concepts.

find a good data set - Kaggle.com is the go-to place for free datasets. You’ll need to register for a free account, and it’s well worth it. If you don’t want your students to have to register, you can download lots of datasets and make them available in a shared location.
get a sample of about 30-50 images for each class that you want to train (or sounds, if you want to experiment with things like musical instruments or bird calls or traffic noises). For a model that recognizes cats vs dogs, you’ll want 30-50 of each type. Try to account for different situations in which you might see the cat or dog - sitting, lying down, running, on a leash, eating, playing catch, etc.
identify the 70% of images that you will use to train each class, and put the other 30% of each type into a folder called “testing”. Then create the two (or more, but don’t overdo it) classes and upload the training data. These are the labeled images that are used to train the model to identify each type.
once you have trained the model (instructions are straightforward, though you can collect data with your webcam OR upload from your computer), you test the model with new data - that has not been used in training (that would be cheating). The testing tells you if the model correctly identifies the new image or sound as one of the classes that were trained.
see the lesson ideas for suggestions on how to teach concepts like overfitting, underfitting, and bias. You might also make this a competition to to which group can accurately train their model with the LEAST amount of data, or which group can include (or remove) a biased dataset.

Rick’s Teachnology Blog

Discussion about this post

Ready for more?