Deepening My Understanding of AI/ML with Fast.ai and Duke’s AI PM Course

I recently started a personal project to sharpen my technical chops in AI/ML so that I can understand its capabilities and collaborate more effectively with my teammates. Since I first started studying the field as part of an ML-based project that I worked on in 2015, the state of the art has changed radically, so I’ve been curious to get up to speed.

The initial challenge was deciding what courses to start off with. As a moderately technical software product manager, I was looking for courses that gave a balanced blend of business applications and nuts-and-bolts implementation details. Finding courses with that profile proved to be difficult. Most of the MOOCs available online are either highly technical and targeted at aspiring ML researchers/engineers, or designed for non-technical generalists, with no math or code to show how things work under the hood.

The first course I took was Duke’s AI Product Management course, which is tailored for product managers who want to drive the development of AI-enabled products. The course was excellent, and provided a comprehensive beginner’s overview of ML/AI, covering topics such as the definition of ML/AI, the history of the field, the various types of models that exist, the different phases of the ML project lifecycle, and how ML models are built, trained, interpreted, deployed, and managed. I left the course satisfied that I had gotten sufficiently broad exposure to all of the key concepts and context that I needed to go deeper with my study.

After completing that course, I felt that a more technical course would help to prepare me to work more effectively as part of a real team. After researching several options, I decided to take fast.ai – practical deep learning for coders. This course is taught by Jeremy Howard, the founder of Kaggle and an accomplished academic in the field of machine learning. The course had gotten rave reviews on hacker news, and I was particularly attracted to its hands-on, learn-by-doing approach.

Fast.ai walks you through how to build real deep learning models with Python. In the first couple of classes, you start by implementing an image recognition model with fast.ai’s libraries and training data from Bing’s image search APIs. The course really piqued my interest by demonstrating how quickly the technology works out of the box with minimal configuration. The hands-on nature of the class helped me to summon the motivation to roll up my sleeves and work through some of the more demanding technical material, which Howard dives progressively more deeply into over the course of the class. Although Fast.ai’s content overlapped significantly with that of the Duke course, working through real examples with code deepened my understanding of the concepts that I was only superficially familiar with.

The most important lesson that I learned from fast.ai was that applying machine learning to real-life problems is a fundamentally different skillset than building new ML algorithms. Particularly in my role as a product manager, I think that it’s key for me to be aware of the range of models that exist, and understand their applications, capabilities, limitations, and trade-offs.

Here are a few of the other interesting lessons that stood out to me from these courses:

  • Transfer Learning: When it comes to images and language, there’s a ton that can be accomplished by taking a pre-trained foundation model like Resnet or LLAMA, and then fine-tuning them for a particular domain or application. So much of the heavy lifting that’s been done in terms of ML research has been packaged up to work right out of the box, so newcomers can build real, useful things surprisingly quickly.
  • The Subjectivity of Model Evaluation: Assessing the effectiveness of a model involves a lot of subjective considerations. For regression-type models, what’s better, a small amount of large errors, or a large amount of small errors? For classification models, what should be prioritized: recall, precision, or accuracy? For generative language models, what’s the ideal amount of “creativity” for the model to demonstrate?
  • The Reason for Separating the Verification Set from the Test Set: I always wondered why the validation set needs to be different from the test set. It’s because if you, as the builder of a model, can see how the model performs on the test set, then that constitutes a form of leakage because it biases you to tune the model in a way that overfits on the validation set. Even if the model itself can’t see the validation set, it can be indirectly trained on it vis-a-vis the person that has seen it, and that’s why there needs to be a separate test set that is hidden from both the model and the people building it.
  • Limitations in the State-of-the-Art: AI still can’t produce good literature. It’s pretty effective at writing short clickbait blog posts, but it just doesn’t have what it takes to produce the kind of character and plot development required for an interesting story.
  • Pace of Improvement in Recent years: Recent years have seen significant advancements in NLP models such as GPT4. This progress is expected to accelerate over the next few years due to increased investment in software and specialized hardware.

Leave a Reply

Your email address will not be published. Required fields are marked *