Lesson 2 Notebook

Lesson 4 Notebook

Lesson 5 Notebook

Lesson 6 Notebook

Lesson 3 Notebook

Lesson 11 Notebook

Lesson 8 Notebook

Lesson 9 Notebook

Lesson 12 Notebook

Lesson 10 Notebook

Lesson 13 Notebook

Lesson 14 Notebook - Unoptimized

Lesson 14 Notebook - Optimized

Lesson 7 Notebook

Course Ebook (PDF)

<h3 id="whatareconvolutionalneuralnetworks">What are Convolutional Neural Networks?</h3>
We've had a short introduction to CNNs as a concept in the previous lesson, so let's dive into them in a bit more detail here. Convolutional Neural Networks (CNNs) are, as mentioned earlier, a type of Deep Artificial Neural Networks (ANNs). The name arises from their use of convolutional layers which perform feature extraction - more on them a bit later.

 <div class="alert alert-note">
 <div class="flex">
 
 <div class="flex-shrink-0 mr-3">
 <img src="/assets/images/icon-information-circle-solid.svg" class="icon" aria-hidden="true" />
 </div>
 
 Note: Convolutional layers, and the convolution operation are at the core of CNNs and it's worth taking out a bit of time to really understand what's going on there.

 </div>
 </div>
 Again, like with most other learning networks, CNNs aim to mimic the human brain (currently best-known learning algorithm) - and in this case, the visual perception that humans have. With the expansion of the neocortex, humans (and other mammals, but to a lesser degree) were able to start noticing intricate abstract patterns in the world around them. This higher-level abstraction is what, down the line, allowed us to make out symbols and give them meaning - giving birth to art, culture and language.
These things are hard to mimic, in large part because they're hard to explain. If you ask a musician or artist to explain their notes or strokes with a brush - more often than not, you'll receive an answer along the lines of:

David Landup

Guide to Convolutional Neural Networks

Introduction to Computer Vision

Image Classification with Transfer Learning - Creating Cutting Edge CNN Models

So far, we've been working with a very distinctive, very exemplary architecture. I've noted that it's fairly similar to the VGG architecture that used to reign supreme for a pretty short while, but which is slowly being phased out.
This sort of network is easy to understand because it's practically a 1-to-1 mapping to the most intuitive explanation of how CNNs work - through convolutional layers, pooling layers, flattening layers and a fully-connected layer. It's also the most intuitive to understand with a limited understanding of how the visual cortex works. If you're a neuroscientist - you've likely aggressively cringed at the simplification of the inner-workings of the visual cortex from earlier lessons. The concept of hierarchical representations is there - but that's where our implementation and the cortex part ways.
The architecture used so far is, in a sense, the most natural and gentle introduction to CNNs - conceptually and implementation-wise. It provides fairly decent performance (in terms of accuracy) but bottlenecks with the number of parameters. At this point, you've built multiple classifiers, all of them very capable. You've gotten introduced to the inner-workings of the classifiers, got exposed to latent space visualization, biases, challanged the notion that overfitting is bad, explored the implications of data augmentation and context, implemented a custom loss function and metric, explored class imbalance, and even wrote a research-grade classifier for Invasive Ductal Carcinoma!

Guided Project: Convolutional Neural Networks - Beyond Basic Architectures

In this lesson, we'll be diving into a hands-on project, from start to finish, contemplating what the challenge is, what the reward would be for solving it. Specifically, we'll be classifying benign and malignant Invasive Ductal Carcinoma from histopathology images. If you're unfamiliar with this terminology - no need to worry, it's covered in the guided project.
We'll start out by performing Domain Research, and getting familiar with the domain we're trying to solve a problem in. We'll then proceed with Exploratory Data Analysis, and begin the standard Machine Learning Workflow. For this guide, we'll both be building a CNN from scratch, as well as use pre-defined architectures (such as the EfficientNet family, or ResNet family). Once we benchmark the most promising baseline model - we'll perform hyperparameter tuning, and evaluate the model.
<h3 id="machinelearninginmedicine">Machine Learning in Medicine</h3>
Machine Learning has been increasingly employed in medicine, and is helping save lives from a wide variety of medical conditions. The application of Machine Learning in Medicine is vast, and an extremely complex topic in and of itself, but some of the major areas include:
<ul>
<li>Precision Medicine (Tailoring medicine to individuals)</li>
<li>Medical Imaging Diagnosis (Diagnosing conditions based on images, etc.)</li>
<li>Drug Discovery (Generating structures such as proteins or drug-like molecules, bioactivity prediction, etc.)</li>
</ul>

Guided Project: Breast Cancer Classification

<h3 id="semanticsegmentation">Semantic Segmentation</h3>
Semantic and Instance Segmentation is the natural next step of object detection, and uses much the same architectures with new heads to predict masks, rather than bounding boxes. Many object detection architectures can be converted into segmentation architectures, and some projects ship both capabilities such as Detectron2.
We've worked with YOLOv5 by Ultralytics in a previous project, which currently doesn't support segmentation, but it is in the works. When it gets released in a later version, I'll update the course.
<h3 id="segmentationarchitectures">Segmentation Architectures</h3>
As with every task - there are various architectures that can be employed to perform segmentation. Some of the defining ones are:
<ul>
<li>Mask R-CNN - A Faster R-CNN variant which predicts masks for detected objects. Mask R-CNNs produce pretty good results.</li>
<li>U-Net - An Encoder-Decoder architecture that downsamples (encodes) and upsamples (decodes) input with skip connections between these steps. The architecture is typically visualized in a way that makes it look like a &quot;U&quot; (with skip connections between the left and right hand side of the &quot;U&quot; letter). While simple, it doesn't provide the best results for segmentation.</li>
<li>DeepLabV3+ - An Encoder-Decoder architecture that liberally uses Atrous Convolutions and a module named the Atrous Spatial Pyramid Pooling (ASPP) module. More on both in a moment. One of the most accurate models to date, and it's surprisingly easy to implement as an end-to-end model for semantic segmentation.</li>
</ul>

Guided Project: DeepLabV3+ Semantic Segmentation with Keras

Object detection is a large field in computer vision, and one of the more important applications of computer vision &quot;in the wild&quot;. On one end, it can be used to build autonomous systems that navigate agents through environments - be it robots performing tasks or self-driving cars.
<blockquote>
Naturally - for both of these applications, more than just computer vision is going on. Robotics is oftentimes coupled with Reinforcement Learning (training agents to act within environments), and if you want to give it tasks using natural language, NLP would be required to convert your words into meaningful representations for them to act on.
</blockquote>
However, anomaly detection (such as defective products on a line), locating objects within images, facial detection and various other applications of object detection can be done without intersecting other fields.
When talking about certain architectures in previous chapters - I mentioned that some can be used as &quot;generic vision backbones&quot;. The backbone of what, exactly? The answer, commonly, is for object detection and instance segmentation. A backbone network (CNN) for feature extraction is used, alongside one of the varying techniques for detecting objects, to localize where instances are.
<h3 id="objectdetectionismessyandlarge">Object Detection is... Messy and Large</h3>

Object Detection and Segmentation - R-CNNs, RetinaNet, SSD, YOLO...

<blockquote>
It's true, nobody wants overfitting end models, just like nobody wants underfitting end models.
</blockquote>
Overfit models perform great on training data, but can't generalize well to new instances. What you end up with is a model that's approaching a fully hard-coded model tailored to a specific dataset.
Underfit models can't generalize to new data, but they can't model the original training set either.
The right model is one that fits the data in such a way that it performs well predicting values in the training, validation and test set, as well as new instances.
<h3 id="overfittingvsdatascientists">Overfitting vs. Data Scientists</h3>
Battling overfitting is given a spotlight because it's more illusory, and more tempting for a rookie to create overfit models when they start with their Machine Learning journey. Throughout books, blog posts and courses, a common scenario is given:
<blockquote>
&quot;This model has a 100% accuracy rate! It's perfect! Or not. Actually, it just badly overfits the dataset, and when testing it on new instances, it performs with just X%, which is equal to random guessing.&quot;
</blockquote>
After these sections, entire book and course chapters are dedicated to battling overfitting and how to avoid it. The word itself became stigmatized as a generally bad thing. And this is where the general conception arises:

Overfitting Is Your Friend, Not Your Foe

Time to put the theory into practice! If it didn't all fit into place already, there's a good chance it will now that you can build and see the results. If not - don't worry! Once the practical application is finished, try revisiting the initial explanations in the lesson. Many people have an &quot;a-ha&quot; moment after practicing with CNNs an then re-reading the introductory parts.

 <div class="alert alert-note">
 <div class="flex">
 
 <div class="flex-shrink-0 mr-3">
 <img src="/assets/images/icon-information-circle-solid.svg" class="icon" aria-hidden="true" />
 </div>
 
 Note: With high-level APIs such as Keras that do the heavy lifting, it's easy to forget how things work under the hood, and it's worth revisiting them in the initial phases of learning (as well as some time down the line). If you haven't had any exposition to some of the terminology used here, it might take you a bit of time to get things to click. It's easy to conect the dots looking backwards, but not so much looking forwards. This is how discoveries are made!

 </div>
 </div>
 <h3 id="theintelimageclassificationdatasetimportingandexploration">The Intel Image Classification Dataset - Importing and Exploration</h3>
Let's try working with the <a rel="nofollow noopener noreferrer" target="_blank" href="https://www.kaggle.com/datasets/puneet6060/intel-image-classification">Intel Image Classification</a> dataset. It's a great dataset to go further from, since it's not super easy to get a high accuracy from the get-go. Additionally, there are some features that are easy to mix up for a network, which will serve as a great introduction into model evaluation and how you can learn about what makes it trip up and misclassify an image.

Guided Project: Building Your First Convolutional Neural Network With Keras

<h3 id="deepdream">DeepDream</h3>
Hierarchical abstraction appears to be what our brains do, with increasing support in the field of Neuroscience. While some protest drawing lines between the computation the brain does and silicone computing found in computers, some support the parallels, such as Dana H. Ballard in his book &quot;Brain Computation as Hierarchical Abstraction&quot;, who works as a Computer Science professor at the University of Texas, with ties to Psychology, Neuroscience and the Center for Perceptual Systems.
Inspired by hierarchical abstraction of the visual cortex, CNNs are hierarchical, and hierarchical abstraction is what allows them to do what they do. Exploiting exactly this property is what allows us to create a really fun (and practical) algorithm, and the focus of this lesson. It's called DeepDream, because we associate odd, almost-there-but-not-there visuals with dreams, and the images are induced by a deep convolutional neural network. Here's a visual created with the code in this lesson:
<img src="https://s3.stackabuse.com/media/ebooks/deep+learning+computer+vision/lesson_12/deep-dream-0.png" alt="">
In &quot;Being You&quot; by British neuroscientist Anil Seth, Anil explains how they used deep dream to create a &quot;hallucination machine&quot;. A &quot;hallucination machine&quot; was meant to computationally simulate overreactive perceptual priors. That's a different way to say &quot;emphasize what you expect&quot;. Where I'm from, folk wisdom states that &quot;In fear, eyes are big&quot;, and it's oftentimes used to explain how you can easily see things that aren't there when you're afraid. You're embedding overreactive perceptual priors in your projection so the shirt you forgot to take off the chair now looks like an intruder in your home during night. In a similar way, you might recognize a cloud as being the shape of a country on a map, or the sequence of lines &quot;-_-&quot; as a face with closed eyes and a flat mouth. This is formally known as pareidolia. In 2015, Google engineer Alexander Mordvintsev popularized a way to embed perceptual priors (induce and visualize pareidolia) into CNNs. This algorithm is known as DeepDream.

DeepDream - Neural Networks That Hallucinate?

<h3 id="imagecaptioning">Image Captioning</h3>
In 1974, Ray Kurzweil's company developed the &quot;Kurzweil Reading Machine&quot; - an omni-font OCR machine used to read text out loud. This machine was meant for the blind, who couldn't read visually, but who could now enjoy entire books being read to them without laborious conversion to braille. It opened doors that were closed for many for a long time. Though, what about images?
While giving a diagnosis from X-ray images, doctors also typically document findings such as:
<blockquote>
&quot;The lungs are clear. The heart and pulmonary are normal. Mediastinal contours are normal. Pleural spaces are clear. No acute cardiopulmonary disease.&quot;
</blockquote>
Websites that catalog images and offer search capabilities can benefit from extracting captions of images and comparing their similarity to the search query. Virtual assistants could parse images as additional input to understand a user's intentions before providing an answer.
<blockquote>
In a sense - Image Captioning can be used to explain vision models and their findings.
</blockquote>
The major hurdle is that you need caption data. For highly-specialized use cases, you probably won't have access to this data. For instance, in our Breast Cancer project, there were no comments associated with a diagnosis, and we're not particularly quallified to make captions ourselves. Captioning images takes time. Lots of it. Many big datasets that have captions have crowdsourced them, and in most cases, multiple captions are applied to a single image, since various people would describe them in various ways. Realizing the use cases of image captioning and descriptions - more datasets are springing up, but this is still a relatively young field, with more datasets yet to come.

Guided Project: Image Captioning with CNNs and Transformers

<h3 id="realtimeroadsigndetection">Real-Time Road Sign Detection</h3>
If you drive - there's a chance you enjoy cruising down the road. A responsible driver pays attention to the road signs, and adjusts their speed in accordance to the laws mandating that you follow the speed limit in a given area, amongst other signs that regulate drivers.
Though - what if you miss a sign? Not everyone has a sidekick to also pay attention and to tell them when there's a change in the speed limit or if there's another sign worth acknowledging. Some cars, especially modern ones, are equiped with cameras that read road signs in real time and show the current limit on your dashboard. For example, the Citroen C3 has a &quot;Memory&quot; button, which applies the latest noticed speed limit to your cruise control if it's active.
<blockquote>
Wouldn't it be nice to have a system that also watches for road signs and gives you audio cues when it sees one?
</blockquote>
Whether it's a speed limit sign, a stop sign, or another sign - having a side passenger that reminds you of the signs can be pretty useful, especially if this side passenger doesn't blink, only watches for the signs, and runs on your phone if your car doesn't already have a system built in. My old car doesn't have this system and I'd love to use my already existing phone to also look out for the signs, with no extra cost. Furthermore, if you're app-savvy, you can integrate the model into an application that plays sounds or audio clips of voices calling the road signs out loud.

Guided Project: Real-Time Road Sign Detection with YOLOv5

Working with KerasCV

<h3 id="whothiscourseisfor">Who This Course is For?</h3>
We are interfacing with deep learning and the applications of deep learning algorithms every day. With the technology shifting away from only the &quot;cool kids&quot; to practically everyone - there's a huge influx of researchers from various domains, as well as software engineers in the field.
This course is dedicated to everyone with a basic understanding of machine learning and deep learning, to orient themselves towards or initially step into computer vision - an exciting field in which deep learning has been making strides recently. The course will primarily be using Keras - the official high-level API for TensorFlow, with some PyTorch in later lessons.
While prerequisite knowledge of Keras isn't strictly required, it will undoubtedly help. I won't be explaining what an activation function is, how cross-entropy works or what weights and biases are. There are amazing resources covering these topics, ranging from free blogs to paid books - and covering these topics would inherently steer the focus of the course in a different direction than it's intended to be set in. The course is written to be beginner-friendly, with layers to be extracted through multiple reads. Not everything is likely to &quot;stick&quot; after the first read, so intermediate users will find useful information on critical thinking, advanced techniques, new tools (many are covered and used interchangeably instead of reusing the same pipeline), and optimization techniques. The last lesson covers how we can achieve 90% parameter reduction and 50% training time reduction while maintaining validation accuracy, for example.
I've decided to use Keras for the vast majority of the course because it's proven de-facto to be the best framework for relaying high-level concepts into actionable code that delivers results, without drowning you in the technical details. On the other hand, you can dig deeper and access lower levels of the API when required. While TensorFlow, the back-end typically used with Keras, isn't as highly regarded by the deep learning community, it's still the leading framework powering the majority of production applications, and provides a great ecosystem besides the framework itself. You don't need to interface with TensorFlow's lower-level API if you don't want to, if you use Keras. Additionally, TensorFlow gets a worse reputation than it should have, due to the tech debt present from earlier days and versions, which is rapidly being changed.
Lessons that utilize PyTorch are located later in the course, in a situation where deviating from the TensorFlow ecosystem benefits us more than it doesn't, such as for object detection. A good engineer can switch tools, and while it's tempting to stay in a single ecosystem, you shouldn't base your experience and knowledge on the quirks of a single framework - but rather, focus on building skills and understanding of concepts through different lenses.
While writing the course, I've focused on covering both the technical, intuitive and practical side of concepts, demystifying them and making them approachable.
The name of the course starts with &quot;Practical...&quot;. Many think that this mainly means having lots and lots of code and not much theory. Skipping important theory of application leads to bad science and bad models for production. Throughout the course, I'll be detouring to cover various techniques, tools and concepts and then implement them practically. It's my humble opinion that practicality requires an explained basis, as to why you're practically doing something and how it can help you. I tried making the course practical with a focus on making you able to implement things practically. This includes a lot of breaks in which we ask ourselves &quot;Ok, but why?&quot; after code samples.
This also includes okay-but-not-ideal practices in the beginning, identifying their weaknesses, and correcting them later on. This builds foundations much stronger than just going with the better practice from the get-go, since many times, the differences aren't too aparent from a quick glance. These small differences are what makes ML research irreproducible and production systems fail silently. They're understandable to make, and easy to slip into your workflow. Let's work them out lesson by lesson.
Things don't always work - you should know why you're trying them out. I want this course to not only teach you the technical side of Computer Vision, but also how to be a Computer Vision engineer.
<h5 id="fortheresearcher">For the Researcher:</h5>
I hope that this effort pays off in the sense that professionals coming from different fields don't struggle with finding their way through the landscape of deep learning in the context of computer vision, and can apply the correct methodologies required for reproducible scientific rigor. In my time navigating research papers - there are clear issues with applying deep learning to solve current problems. From leaking testing data into training data, to incorrectly applying transfer learning, to misusing the backbone architectures and preventing them to work well, to utilizing old and deprecated technologies that are objectively out of date - modern research could significantly be improved by providing a guiding hand that helps researchers navigate the landscape and avoid common pitfalls. These mistakes are understandable. Re-orienting from a lifetime of life sciences to applying computer vision to a problem you're passionate about is difficult. Addressing this requires building resources that equip researchers with the required know-how to shine in computer vision as much as they shine in their respective fields. This course tries to do exactly that.
<h5 id="forthesoftwareengineer">For the Software Engineer:</h5>
I used to be a software engineer before diving into machine and deep learning. It's a vastly different experience. Many call deep learning &quot;Software 2.0&quot; - a term coined by Andrej Karpathy, one of the major names in deep learning and computer vision. While some raise disputes about the naming convention - the fact of the matter is that it's fundamentally different than what a classical software engineer is used to. Software is about precisely writing down a sequence of steps for a machine to take to achieve a goal. This is both the beauty and bane of software - if it works, it works exactly and only because you wrote it to work. If it doesn't work, it doesn't work exactly and only because you wrote it to not work (usually accidentally). With Software 2.0, instead of explicitly writing instructions, we write the container for those instructions, and let it figure out a way to reach some desired behavior.
At many junctions and problems I tried to solve using software, it was extremely difficult to come up with instructions, and for some problems, it was downright impossible. Imbuing software with machine and deep learning models allows our solutions to problems to also include something extra - something that's beyond our own expertise. When I wanted to help solve an unrealistic bubble in the real estate market by providing accurate appraisals free of charge for all users of the website - I knew that I would never be able to code the rules of what makes the price of some real estate. It was both beyond my expertise, and beyond my physical capabilities. In the end, I built a machine learning system that outperformed local agencies in appraisals and imbued my software with this ability. As a software engineer - you can empower your code with machine and deep learning.
<h5 id="forthestudent">For the Student:</h5>
Every fresh graduate that lands an internship gets to realize the gap between traditional academic knowledge and production code. It's usually a process in which you get hit with a hard case of an impostor syndrome, fear and self-doubt. While these feelings are unnecessary, they're understandable, as you're suddenly surrounded by a wall of proprietary solutions, frameworks and tools nobody told you about before and nuanced uses of paradigms you might be familiar with. Thankfully, this state is easily dispelled through practice, mentorship and simply getting familiar with the tools, in most cases. I hope that this course helps you get ahold of the reins in the deep learning ecosystem for computer vision, covering various tools, utilities, repositories and ideas that you can keep in the back of your head. Keep at it, slow and steady. Incremental improvement is an amazing thing!
<h5 id="forthedataenthusiast">For the Data Enthusiast:</h5>
You don't have to be a professional, or even a professional in training, to appreciate data and hierarchical abstraction. Python is a high-level programming language, and easy to get a hold of even if you haven't worked with it before. Without any experience in computer science, software engineering, mathematics or data science, the road will definitely be more difficult, though. Many issues you might run into won't necessarily be tied to the language or ecosystem itself - setting up a development environment, handling versions of dependencies, finding fixes for issues, etc. are more likely to be a show stopper for you than learning the syntax of a <code>for</code> loop. For example, debugging is natural for software engineers, but is commonly being put off by practitioners who step into ML/DL without an SE background.
Even so, delegating your environment to free online machines (such as Google Colab or Kaggle) removes a lot of the issues associated with your local environment! They're useful for novices as much as for advanced practicioners. They offer free and paid versions, and really helped make both research and sharing results much easier, especially for those without an SE background.
You might also be a philosopher or ethicist looking to break into data or AI ethics. This is an important and growing field. Computer vision systems (as have other machine learning systems) have faced their criticisms in the past in regards to ethically questionable biases. Only when we realize that we have problems can we start fixing them - and we need more people assessing the work of data scientists and helping to root out bias. Having a surface-level understanding of these systems might be sufficient for some analysis - but having a more in-depth understanding (even if you don't intend on building some yourself) can help you assess systems and aid in improving them.
<h3 id="doineedexpensiveequipment">Do I Need Expensive Equipment?</h3>
No. It's great if you have it, but it's not necessary. Having a tower build with 4 graphics cards won't make you a good deep learning engineer nor researcher - it'll just make the algorithms run faster.
Some datasets, to be fair, are possible but simply impractical to run on slower systems, and computer vision is generally best done with a GPU. If you don't have access to one at all - you can always use cloud-based providers. They're free. Platforms like Kaggle and Google Colab, at the time of writing, provide you with a weekly quota (in hours) of free GPUs you can use. You just connect to their cloud-based service, and run your notebooks. Even if you have a GPU, chances are that theirs are going to be better than yours. The selection of GPUs and access changes through time, so to stay up to date with their offerings, it's best if you visit the websites yourself.
Other providers do exist as well - and they typically offer a subscription that nets you access to better resources and/or have a payment model where you pay for each minute/hour you use their resources for. I purposefully won't mention or explicitly endorse any paid product for obvious reasons in the course, though, a quick Google search can find the competitive services.
Without a doubt - services like these substantially help democratize knowledge and access to resources, making research from any part of the world, from the comfort of your home very possible and plausible. You can get cutting-edge performance models within a reasonable timeframe, on a lot of the tasks you decide to dedicate your time to, with these services.
<h3 id="howthecourseisstructured">How the Course is Structured</h3>
The course is structured through Guides and Guided Projects.
Guides serve as an introduction to a topic, such as the following introduction and guide to Convolutional Neural Networks, and assume no prior knowledge in the narrow field but can assume prior knowledge of prerequisites (such as at least basic understanding of loss functions and activation functions for example).
Guided Projects are self-contained and serve to bridge the gap between the cleanly formatted theory and practice and put you knee-deep into the burning problems and questions in the field. With Guided Projects, we presume only the knowledge of the narrower field that you could gain from following the lessons in the course. You can also enroll into Guided Projects as individual mini-courses, though, you gain access to all relevant Guided Projects by enrolling into this course.
Once we've finished reviewing how they're built, we'll assess why we'd want to build them. Theory is theory and practice is practice. Any theory will necessarily be a bit behind the curve - it takes time to produce resources like books and courses, and it's not easy to &quot;just update them&quot;.
<blockquote>
Guided Projects are our attempt at making our courses stay relevant through the years of research and advancement. Theory doesn't change as fast. The application of that theory does.
</blockquote>
In the following lesson, we'll jump into Convolutional Neural Networks - how they work, what they're made of and how to build them, followed by an overview of some of the modern architectures. This is quickly followed by a real project with imperfect data, a lesson on critical thinking, important techniques and further projects.
<h3 id="sourcecodeandnotebooks">Source Code and Notebooks</h3>
The source code of this course is made <a target="_blank" href="https://github.com/DavidLandup0/dl4cv">public and freely available on GitHub</a>, throughout various Jupyter Notebooks that encapsulate all of the Guided Projects in the course. This GitHub repository is meant to serve as a central place to hold all of the source code, track issues and changes and host a community of people looking to apply Computer Vision to their field.
As APIs change and new practices are put into place, I'll be updating the repository. I've executed and tested all code samples in the course, and they work at the time of publishing.
If you're having issues with some code samples, there's a chance that an API change occured, so check the repository.
<h3 id="contact">Contact</h3>
If you want to contact the author (questions, issues, remarks, other feedback, sharing your own work), please don't hesitate to send an email to <a href="mailto:david@stackabuse.com">david@stackabuse.com</a>!

Preface

In the lessons so far, I've left various tips and tricks in paragraphs that are behind us. It takes time for some things to &quot;click&quot;, and I personally tend to read a resource, do work in that field, and then re-read the resource again once some of the knowledge is really solidified. Usually, I find gems sprinkled around that just went over my head the first time I read it.
For those that don't have the luxury of re-reading multiple books from time to time, and for those who are new to the field and want to keep some guiding notes, here's a TL;DR of some of the advice that was given throughout the course.
<h3 id="modeldesign">Model Design</h3>
If you're designing your own network, here are a few key elements of performant architectures and notes I gathered while reading through design papers, playing around with the architecures and seeing many architectures being implemented by novices:
<ul>
<li>Remember that designing networks isn't limited to PhD owners. Try your hand.</li>
<li>Skip flattening layers, use global pooling (<code>GlobalAveragePooling2D()</code>, <code>GlobalMaxPooling2D()</code>).</li>
<li>Use skip connections (shortcut connections).</li>
<li>Use multiple layers with smaller filter sizes, rather than one layer with a larger filter size.</li>
<li>For memory efficiency and smaller models (especially on CPU), use depthwise separable convolutions (<code>SeparableConv2D()</code>).</li>
<li>For training speed and devices with GPUs, use regular convolutions (<code>Conv2D</code>).</li>
<li>Dropout rates are usually good between <code>0.3</code> and <code>0.5</code>.</li>
<li>When possible, re-use an existing network and tweak it (usually by adding a new input and top). The EfficientNet family is a great all-round network for most work. Alternatively, try ConvNeXt.</li>
</ul>

Optimizing Deep Learning Models for Computer Vision

That concludes this course - &quot;Practical Deep Learning for Computer Vision with Python&quot;. Thank you for taking a ride with me!
This course is the result of many nights, coffees, models and research papers, with a sprinkle of love.
<blockquote>
Online education is spreading through the world, and is becoming an increasingly important part of many lives. I believe that accessible, high-quality resources can help empower people that build tomorrow, and remain guided by that goal.
</blockquote>
The point of this course is to get you from walking to running, and developing computer vision applications. Whether you want to learn computer vision for sheer curiosity, to apply it to medical images, manufacturing optimization, human workplace risk mitigation, automated robotics and cars, or to study the philosophical implications of computer vision - I hope that you found this course useful and insightful.
Authors have to make tough decisions about what to include, and what not to include, how to portray a concept or tool, identify potential confusing points, assess their work from many angles, and make it digestible, interesting and actionable. Above all, authors have to always remember that the point of writing is to help empower others to act and to make decisions themselves.

Thank You for Supporting Online Education

We've gone far from the first chapter! From the foundations of computer vision, how it can be conceptualized, to building your first CNN classifier, validating and testing it, through several projects, discovering landscapes, building highly performant data and augmentation pipelines, intersecting NLP and CV, reading and implementing papers, applying best practices, etc.
Deep Learning is a large field, and you might want to dive deeper into it, or you might be looking to dive deeper into Machine Learning in general, or Data Science as the parent field to it. This course is intended for a varied audience, as mentioned in the beginning, and with that assumption - I've made choices on what to include and what not to include, to maximize the informational value for everyone involved.
One of the assumptions was that you have at least a basic understanding of ML and DL and that there's no need to cover material such as activation and loss functions, what a neural network is, etc. Condensing it all into a small section at the start of the course wouldn't do it justice, nor would it really help people get a grasp on it, so I've omitted it. This is the section in which I'd love to take a moment to point you to a few resources that could help you deepen your grasp on data science in general, as well as other resources on deep learning for computer vision.