Introduction
For many of us, we started out programming on desktops and servers, which seemed to have infinite memory and processing power (well, depending on when you started programming, I guess). There was little reason to optimize your code since you weren't likely to exceed the system's limits anyway. And then, when you got in to embedded systems, there was the rude awakening. Moving from such a powerful system to a much smaller, less capable one, like an Arduino, was a bit of a shock. All of a sudden you had to think about saving CPU cycles and memory, which doesn't always come easy to programmers just starting out.
As you'll see throughout this article, fast code is not only important for doing calculations, but even more so for IO operations. If you venture in to robotics (like drones or other control systems) this will become even more clear since much of the work done by the microcontroller results in IO. Usually a faster feedback loop meant better performance.
After a few weeks of wrangling with a microcontroller to squeeze out every ounce of processing power as possible for a drone flight controller, I thought I'd write up an article to help you find ways to improve the speed and efficiency of your own projects.
Throughout this article, I'll be focusing on the Arduino Uno since that seems to be the most common board out there, although much of this article should also apply to the other boards as well.
Why Arduinos are Slow
Clock Speed
First of all, you are only as fast as your clock (disregarding multi-core processors), which the Arduino Uno defaults to using a 16Mhz crystal. What that means is the ATmega microcontroller can execute up to 16 million instructions per second. Now, 16 million instructions per second may sound like a lot (and it is, sort of), but when you consider what all an Arduino needs to do to execute even simple operations, it really isn't that much. For many projects, the clock cycles are shared between things like calculations, I2C communication, reading and writing to pins and registers, and many more operations.
Even then, seemingly simple commands can take up quite a bit of clock cycles, such as setting a digital pin to high. This is one of the simplest IO operations you can perform on an Arduino, but it actually takes a very long time (over 50 clock cycles!) because of the amount of code used in the digitalWrite()
method, which I'll address in the next section. So a faster clock would allow you to execute the instructions at a faster pace.
Safety Checks and Validation
So, outside of the clock speed, why are Arduinos slow? Well, it mostly has to do with some of the standard method calls and objects we use throughout our code. Here are just a few of the main culprits:
digitalWrite()
digitalRead()
pinMode()
Many of these methods suffer from the same drawbacks, so let's take a look at the code for one of the most commonly used methods, digitalWrite()
:
void digitalWrite(uint8_t pin, uint8_t val)
{
uint8_t timer = digitalPinToTimer(pin);
uint8_t bit = digitalPinToBitMask(pin);
uint8_t port = digitalPinToPort(pin);
volatile uint8_t *out;
if (port == NOT_A_PIN) return;
// If the pin that support PWM output, we need to turn it off
// before doing a digital write.
if (timer != NOT_ON_TIMER) turnOffPWM(timer);
out = portOutputRegister(port);
uint8_t oldSREG = SREG;
cli();
if (val == LOW) {
*out &= ~bit;
} else {
*out |= bit;
}
SREG = oldSREG;
}
As you can see, there is quite a bit going on here. But shouldn't it be much simpler? All we really need to do is set the pin high or low in a register, right? Turns out that the Arduino creators decided it was more important to add safety checks and validation to the code than it was to make the code fast. After all, this is a platform targeted more towards beginners and education than it is to power users and CPU-intensive applications.
The first few lines use the pin
parameter to find the corresponding timer
, bit
, and port
for the given pin. The port
is actually just a memory-mapped register, which controls multiple pins. To only turn on or off the pin we want, we need to determine which bit of the register our pin corresponds to, which is what the digitalPinToBitMask()
function does.
Once we've found the timer
, bit
, and port
, we check to make sure it is in fact a valid pin. This line isn't required for digitalWrite()
to do its job, but it acts as a safety net for more inexperienced programmers (and even experienced ones). We'd hate to be writing to the wrong memory location and corrupt the program.
The if (timer != NOT_ON_TIMER) ...
line is there to make sure we end any previous PWM usage of the pin before we write a "constant" high or low. Many of the pins on Arduinos can also be used for PWM output, which requires a timer to operate by timing the duty cycles. If needed, this line will turn off the PWM. So again, to ensure we don't see any weird behavior, this is a safety check meant to help the user.
And finally, in the last few lines we're actually setting the given value to the given port.
The safety checks do slow down the execution quite a bit, but it also makes debugging much easier. This way when something goes wrong, you're less likely to get weird behavior that will leave you scratching your head. There is nothing more frustrating to have seemingly logical code and to not get the expected result. Programming microcontrollers is much different than programming desktop or phone apps (although they have their fair share of difficulties too). Since you're working directly with hardware and don't have an operating system to keep you safe, problems can be hard to find.
If speed isn't your goal, then I'd highly recommend you continue to use these method provided by Arduino. There is no point in exposing yourself to unnecessary risk if it doesn't help you reach your end goal.
You should also know that the method calls aren't always slow because of the amount of code it executes, but a contributing factor could be because of the physical limitations of the device. For example, analogRead()
takes about 100 microseconds per call due to the resolution it provides and clock it's supplied. A lower ADC resolutions would decrease the time each call takes. However, even then, while the hardware is ultimately the limiting factor here, the Arduino code does conservatively set the ADC max sample rate to only 9600Hz (while capable of around 77Khz). So, while Arduinos are much slower than they need to be, it isn't always because of design choices and trade-offs. There is a good discussion about this here, and documentation here.
How to Speed up Arduino
To be clear, we aren't actually making Arduino faster, rather, we're making the code more efficient. I point out this distinction because using these tricks won't give us a faster clock (although we can speed up the clock, which I'll touch on later), it will just execute less code. This is an important distinction because having a faster clock provides us other benefits, like having more precise timers, faster communication, etc.
Also, keep in mind that by using the code below you're making some trade-offs. The programmers who developed Arduino weren't just lousy coders who couldn't write fast code, they consciously made the decision to add validations and safety checks to methods like digitalWrite()
since it benefits their target customers. Just make sure you understand what can (and will) go wrong with these kinds of trade-offs.
Anyway, on to the code.
Digital Write
Now, I'm not going to show you how to speed up every method, but many of the same concepts from here can be applied to other methods like pinMode()
. The minimum amount of code you need to write to a pin is:
#define CLR(x,y) (x&=(~(1<<y)))
#define SET(x,y) (x|=(1<<y))
Yup, that's it.
As you can see, we get right to the point in these macros. To use them, you'll have to reference both the port and bit position directly instead of conveniently using the pin numbers. For example, we'd be using the macro like this:
SET(PORTB, 0);
This would end up writing a HIGH
value to pin 8 on your Arduino Uno. It's a little bit rough, but much faster. This also means we're more prone to doing something wrong, like reference a non-existent port, write over an active PWM, or a number of other things.
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
The macro gives us quite a boost, using up an estimated 2 cycles (8Mhz frequency), whereas digitalWrite()
uses up a whopping 56 cycles (285Khz frequency).
Serial
Unfortunately for some tasks, like if you need to use serial communication, there isn't a whole lot you can do to improve the speed, but there are some optimization you can watch out for.
Serial communication is commonly used for sending debugging or status information to the desktop IDE, which means you probably have Serial.println()
statements all throughout your code. It's easy to forget about these statements after development, so if you're looking for a speed boost and don't need to debug anymore, try removing all the println()
calls and removing Serial
from the code altogether. Just having it initialized (and not even using Serial.println()
) means you're wasting a lot of cycles in the TX and RX interrupts. In one case, it was measured that just having Serial
enabled slowed down the digitalWrite()
s by about 18%. That's a lot of wasted cycles due to dead code.
Clock Speed
Although I've mostly focused on the software improvements you can make, don't forget that there is always the "simple" improvement of speeding up the clock. I say it this way because it isn't really a simple plug-and-play improvement after all. In order to speed up the clock on an Arduino, you need to insert a new crystal in to the board, which may or may not be difficult depending on your soldering skills.
Once you've put in a new crystal oscillator, you still need to update the bootloader to reflect the change, otherwise it won't be able to receive code over the serial port. And lastly, you'll need to change the F_CPU
value to the proper clock speed. If you upped the clock to 20Mhz (the fastest clock the ATmega is rated for), for example, then you'll need to modify a few files in the Arduino IDE:
- In
preferences.txt
, change- from:
build.f_cpu=16000000L
- to:
build.f_cpu=20000000L
- from:
- In the
makefile
, change- from:
F_CPU = 16000000
- to:
F_CPU = 20000000
- from:
According to this post, the ATmega328 can be overclocked to 30Mhz, but I don't recommend it =)
Conclusion
Hopefully you found something in this post that you can easily apply to your projects, or at least I'm hoping it will encourage you to browse the Arduino source code to find optimizations of your own. The Arduino is a very capable microcontroller, but it can be capable of so much more.
Have any optimizations of your own you'd like to share? Let us know in the comments!
Don't forget to sign up for our mailing list to get our best articles right in your inbox!