Preface
Who This Course is For?
We are interfacing with deep learning and the applications of deep learning algorithms every day. With the technology shifting away from only the "cool kids" to practically everyone - there's a huge influx of researchers from various domains, as well as software engineers in the field.
This course is dedicated to everyone with a basic understanding of machine learning and deep learning, to orient themselves towards or initially step into computer vision - an exciting field in which deep learning has been making strides recently. The course will primarily be using Keras - the official high-level API for TensorFlow, with some PyTorch in later lessons.
While prerequisite knowledge of Keras isn't strictly required, it will undoubtedly help. I won't be explaining what an activation function is, how cross-entropy works or what weights and biases are. There are amazing resources covering these topics, ranging from free blogs to paid books - and covering these topics would inherently steer the focus of the course in a different direction than it's intended to be set in. The course is written to be beginner-friendly, with layers to be extracted through multiple reads. Not everything is likely to "stick" after the first read, so intermediate users will find useful information on critical thinking, advanced techniques, new tools (many are covered and used interchangeably instead of reusing the same pipeline), and optimization techniques. The last lesson covers how we can achieve 90% parameter reduction and 50% training time reduction while maintaining validation accuracy, for example.
I've decided to use Keras for the vast majority of the course because it's proven de-facto to be the best framework for relaying high-level concepts into actionable code that delivers results, without drowning you in the technical details. On the other hand, you can dig deeper and access lower levels of the API when required. While TensorFlow, the back-end typically used with Keras, isn't as highly regarded by the deep learning community, it's still the leading framework powering the majority of production applications, and provides a great ecosystem besides the framework itself. You don't need to interface with TensorFlow's lower-level API if you don't want to, if you use Keras. Additionally, TensorFlow gets a worse reputation than it should have, due to the tech debt present from earlier days and versions, which is rapidly being changed.
Lessons that utilize PyTorch are located later in the course, in a situation where deviating from the TensorFlow ecosystem benefits us more than it doesn't, such as for object detection. A good engineer can switch tools, and while it's tempting to stay in a single ecosystem, you shouldn't base your experience and knowledge on the quirks of a single framework - but rather, focus on building skills and understanding of concepts through different lenses.
While writing the course, I've focused on covering both the technical, intuitive and practical side of concepts, demystifying them and making them approachable.
The name of the course starts with "Practical...". Many think that this mainly means having lots and lots of code and not much theory. Skipping important theory of application leads to bad science and bad models for production. Throughout the course, I'll be detouring to cover various techniques, tools and concepts and then implement them practically. It's my humble opinion that practicality requires an explained basis, as to why you're practically doing something and how it can help you. I tried making the course practical with a focus on making you able to implement things practically. This includes a lot of breaks in which we ask ourselves "Ok, but why?" after code samples.
This also includes okay-but-not-ideal practices in the beginning, identifying their weaknesses, and correcting them later on. This builds foundations much stronger than just going with the better practice from the get-go, since many times, the differences aren't too aparent from a quick glance. These small differences are what makes ML research irreproducible and production systems fail silently. They're understandable to make, and easy to slip into your workflow. Let's work them out lesson by lesson.
Things don't always work - you should know why you're trying them out. I want this course to not only teach you the technical side of Computer Vision, but also how to be a Computer Vision engineer.
For the Researcher:
I hope that this effort pays off in the sense that professionals coming from different fields don't struggle with finding their way through the landscape of deep learning in the context of computer vision, and can apply the correct methodologies required for reproducible scientific rigor. In my time navigating research papers - there are clear issues with applying deep learning to solve current problems. From leaking testing data into training data, to incorrectly applying transfer learning, to misusing the backbone architectures and preventing them to work well, to utilizing old and deprecated technologies that are objectively out of date - modern research could significantly be improved by providing a guiding hand that helps researchers navigate the landscape and avoid common pitfalls. These mistakes are understandable. Re-orienting from a lifetime of life sciences to applying computer vision to a problem you're passionate about is difficult. Addressing this requires building resources that equip researchers with the required know-how to shine in computer vision as much as they shine in their respective fields. This course tries to do exactly that.
For the Software Engineer:
I used to be a software engineer before diving into machine and deep learning. It's a vastly different experience. Many call deep learning "Software 2.0" - a term coined by Andrej Karpathy, one of the major names in deep learning and computer vision. While some raise disputes about the naming convention - the fact of the matter is that it's fundamentally different than what a classical software engineer is used to. Software is about precisely writing down a sequence of steps for a machine to take to achieve a goal. This is both the beauty and bane of software - if it works, it works exactly and only because you wrote it to work. If it doesn't work, it doesn't work exactly and only because you wrote it to not work (usually accidentally). With Software 2.0, instead of explicitly writing instructions, we write the container for those instructions, and let it figure out a way to reach some desired behavior.
At many junctions and problems I tried to solve using software, it was extremely difficult to come up with instructions, and for some problems, it was downright impossible. Imbuing software with machine and deep learning models allows our solutions to problems to also include something extra - something that's beyond our own expertise. When I wanted to help solve an unrealistic bubble in the real estate market by providing accurate appraisals free of charge for all users of the website - I knew that I would never be able to code the rules of what makes the price of some real estate. It was both beyond my expertise, and beyond my physical capabilities. In the end, I built a machine learning system that outperformed local agencies in appraisals and imbued my software with this ability. As a software engineer - you can empower your code with machine and deep learning.
For the Student:
Every fresh graduate that lands an internship gets to realize the gap between traditional academic knowledge and production code. It's usually a process in which you get hit with a hard case of an impostor syndrome, fear and self-doubt. While these feelings are unnecessary, they're understandable, as you're suddenly surrounded by a wall of proprietary solutions, frameworks and tools nobody told you about before and nuanced uses of paradigms you might be familiar with. Thankfully, this state is easily dispelled through practice, mentorship and simply getting familiar with the tools, in most cases. I hope that this course helps you get ahold of the reins in the deep learning ecosystem for computer vision, covering various tools, utilities, repositories and ideas that you can keep in the back of your head. Keep at it, slow and steady. Incremental improvement is an amazing thing!
For the Data Enthusiast:
You don't have to be a professional, or even a professional in training, to appreciate data and hierarchical abstraction. Python is a high-level programming language, and easy to get a hold of even if you haven't worked with it before. Without any experience in computer science, software engineering, mathematics or data science, the road will definitely be more difficult, though. Many issues you might run into won't necessarily be tied to the language or ecosystem itself - setting up a development environment, handling versions of dependencies, finding fixes for issues, etc. are more likely to be a show stopper for you than learning the syntax of a for
loop. For example, debugging is natural for software engineers, but is commonly being put off by practitioners who step into ML/DL without an SE background.
Even so, delegating your environment to free online machines (such as Google Colab or Kaggle) removes a lot of the issues associated with your local environment! They're useful for novices as much as for advanced practicioners. They offer free and paid versions, and really helped make both research and sharing results much easier, especially for those without an SE background.
You might also be a philosopher or ethicist looking to break into data or AI ethics. This is an important and growing field. Computer vision systems (as have other machine learning systems) have faced their criticisms in the past in regards to ethically questionable biases. Only when we realize that we have problems can we start fixing them - and we need more people assessing the work of data scientists and helping to root out bias. Having a surface-level understanding of these systems might be sufficient for some analysis - but having a more in-depth understanding (even if you don't intend on building some yourself) can help you assess systems and aid in improving them.
Do I Need Expensive Equipment?
No. It's great if you have it, but it's not necessary. Having a tower build with 4 graphics cards won't make you a good deep learning engineer nor researcher - it'll just make the algorithms run faster.
Some datasets, to be fair, are possible but simply impractical to run on slower systems, and computer vision is generally best done with a GPU. If you don't have access to one at all - you can always use cloud-based providers. They're free. Platforms like Kaggle and Google Colab, at the time of writing, provide you with a weekly quota (in hours) of free GPUs you can use. You just connect to their cloud-based service, and run your notebooks. Even if you have a GPU, chances are that theirs are going to be better than yours. The selection of GPUs and access changes through time, so to stay up to date with their offerings, it's best if you visit the websites yourself.
Other providers do exist as well - and they typically offer a subscription that nets you access to better resources and/or have a payment model where you pay for each minute/hour you use their resources for. I purposefully won't mention or explicitly endorse any paid product for obvious reasons in the course, though, a quick Google search can find the competitive services.
Without a doubt - services like these substantially help democratize knowledge and access to resources, making research from any part of the world, from the comfort of your home very possible and plausible. You can get cutting-edge performance models within a reasonable timeframe, on a lot of the tasks you decide to dedicate your time to, with these services.
How the Course is Structured
The course is structured through Guides and Guided Projects.
Guides serve as an introduction to a topic, such as the following introduction and guide to Convolutional Neural Networks, and assume no prior knowledge in the narrow field but can assume prior knowledge of prerequisites (such as at least basic understanding of loss functions and activation functions for example).
Guided Projects are self-contained and serve to bridge the gap between the cleanly formatted theory and practice and put you knee-deep into the burning problems and questions in the field. With Guided Projects, we presume only the knowledge of the narrower field that you could gain from following the lessons in the course. You can also enroll into Guided Projects as individual mini-courses, though, you gain access to all relevant Guided Projects by enrolling into this course.
Once we've finished reviewing how they're built, we'll assess why we'd want to build them. Theory is theory and practice is practice. Any theory will necessarily be a bit behind the curve - it takes time to produce resources like books and courses, and it's not easy to "just update them".
Guided Projects are our attempt at making our courses stay relevant through the years of research and advancement. Theory doesn't change as fast. The application of that theory does.
In the following lesson, we'll jump into Convolutional Neural Networks - how they work, what they're made of and how to build them, followed by an overview of some of the modern architectures. This is quickly followed by a real project with imperfect data, a lesson on critical thinking, important techniques and further projects.
Source Code and Notebooks
The source code of this course is made public and freely available on GitHub, throughout various Jupyter Notebooks that encapsulate all of the Guided Projects in the course. This GitHub repository is meant to serve as a central place to hold all of the source code, track issues and changes and host a community of people looking to apply Computer Vision to their field.
As APIs change and new practices are put into place, I'll be updating the repository. I've executed and tested all code samples in the course, and they work at the time of publishing.
If you're having issues with some code samples, there's a chance that an API change occured, so check the repository.
Contact
If you want to contact the author (questions, issues, remarks, other feedback, sharing your own work), please don't hesitate to send an email to [email protected]!