Introduction
In this guide, we'll take a look at what GitHub actions are, how they work, and build a workflow using Python to showcase how you can use GitHub actions to automate tasks.
Since its inception in 2008, GitHub has grown to become the de facto leader in development project hosting. A community-oriented idea to allow all of our favorite open-source programs free hosting in one central place blew up. GitHub became so popular, that it became synonymous with git
; you'll find dozens of articles online explaining how git is not the same as GitHub, and vice-versa.
On it's 10 year anniversary, a big company acquired GitHub for 7.5 billion dollars. That company's name is Microsoft. GitHub acquisition aside, building WSL and having many open-source projects like VS Code, .NET and TypeScript, just to name a few, Microsoft changed the development game and the general public's opinion on the company's invasion of privacy that was Windows 10.
Community-oriented as it still may be, GitHub's next goal was to start making some revenue - by entering the enterprise scene. Cue - GitHub Actions.
Taking a Look at Existing Enterprise Solutions
At the time of Microsoft getting its hands on GitHub, the enterprise scene for software development was already established by a few big players:
- Atlassian's BitBucket allowed for seamless integration with Jira and Trello, the leaders in issue management and organization.
- Amazon's CodeCommit allowed organizations using AWS to never leave the comforts of one UI and one CLI tool.
- GitLab, with it's DevOps-oriented approach aimed to centralize the entire development process under one roof.
In the past few years GitHub has managed to add many of it's enterprise competition's features, including CI/CD
CI/CD and Automation
Modern software development relies heavily on automation, and for a simple reason - it speeds things up. New versions are automatically built, tested and deployed to the appropriate environments.
All it takes is a single effort to write up a couple of scripts and configure a few machines to execute them. GitHub's offering of such features comes in the form of GitHub Actions
An Overview of GitHub Actions
At the time of writing this guide, GitHub Actions are less than two years old. Despite its young age, the feature has matured pretty well due to it being a feature of GitHub.
The Community
Countless users jumped aboard and started getting to know the ins and outs of GitHub Actions and started writing up their own reusable modules (or actions) and shared them with the rest of the world. GitHub heavily relies on such contributions in its marketing model. Currently there are over 9,500 different actions which allow you to, in a few lines of code, set up your environments, run linters and testers, interact with numerous major platform APIs etc. All without ever installing any software besides git
and your favorite editor.
Worfklows
We define our automated process through workflows. They are YAML files which contain, among other things, the name of our workflow, trigger events, jobs and steps of our pipeline and runners to perform them.
YAML
YAML Ain't a Markup Language or YAML (a recursive acronym) is a language mostly used for writing configuration files. It is often preferred over JSON for easier writing and readability. Even though JSON is faster in terms of serialization, and much more strict, YAML is used in places where speed is not of great importance.
If you've never had experience with YAML, I highly encourage you to visit Learn X in Y minutes, where X=YAML.
If you're somewhat experienced, I recommend reading about some of YAML's idiosyncrasies and gotchas.
Trigger Events
The on
keyword specifies one or more GitHub (note: not just git) events that will trigger the workflow. The event can be very broad, e.g. on every push to the repository, or very specific, e.g. every time a pull request gets a new comment.
The events can also be scheduled in a cron-like fashion:
name: my workflow
on:
push:
branches: [main, test]
Here, we've got a trigger event set for every push to either main
or test
branch. Another way to register triggers is on a schedule, such as:
name: my nightly build workflow
on:
schedule:
cron: '0 22 * * *'
This is a nighly build scheduled for 10PM every day.
Jobs
So far, we've given our workflow a name and configured different events that trigger it. The jobs
keyword lists actions that will be executed. One workflow can hold multiple jobs with multiple steps
each:
jobs:
job1:
steps:
.
.
job2:
steps:
.
.
By default, all jobs run in parallel, but we can make one job wait for the execution of another using the needs
keyword:
jobs:
job1:
steps:
.
.
job2:
needs: job1
steps:
.
.
job3:
needs: [job1, job2]
steps:
.
.
Ensuring jobs execute successfully one by one.
We can also independently configure each job's environment, or run a job across multiple configurations using the matrix strategy
. The documentation notes:
A matrix allows you to create multiple jobs by performing variable substitution in a single job definition.
Here's an example of a matrix build configured to work on multiple platforms:
jobs:
ubuntu_job:
runs-on: ubuntu-latest
steps:
.
.
multi_os_job:
runs-on: {{matrix.os}}
strategy:
matrix:
os: [ubuntu-latest, windows-2016, macos-latest ]
steps:
.
.
Actions
Actions are reusable modules which can be placed in workflows as any other job or step. They can both take inputs and produce outputs. The community marketplace is rich with many bootstrap actions for preparing environments; we will be using a few today.
You can write your own actions as either docker containers or by using vanilla JavaScript and contribute to the marketplace, or keep them to yourself.
An action can easily be referenced in a workflow like any other step in the list:
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
jobs:
compile_code:
runs-on: ubuntu-latest
steps:
- name: check out repo
uses: actions/checkout@v2
- name: compile code
run: gcc main.c
.
.
Here, we can see an example of using actions like any other step. Note that steps are, unlike jobs, always executed consecutively.
Runners
Runners, otherwise known as agents or workers, are machines which are tasked with executing your workflows. Each runner can be set up differently. For example, GitHub offers runners in the three most popular OS flavors - Ubuntu, Windows and MacOS.
GitHub offers their own runners, but you can also opt to host your own runner with the GitHub Actions runner application configured.
Pricing
GitHub runners can execute workflows for free if the repository is public, and the monthly threshold doesn't exceed 2000 minutes.
Teams and Enterprises have their own pricing categories (typical) with different perks and prices, at $4/user per month and $21/user per month respectively, as of writing this guide.
For a complete overview of GitHub's plans, check out GitHub's updated pricing page.
Artifacts - Workflow Persistent Data
Since GitHub runners are temporarily available, so is the data they process and generate. Artifacts are data that can remain available on the repository page after the execution of runners and need to be uploaded with the special upload-artifact
action.
The default retention time period is 90 days, but that can be changed:
The overview screen greets us with a lot of data, including the number of the workflow run, a list of all jobs that are queued for execution or have already executed, the visual representation of different jobs and their connections, as well as any artifacts produced by the workflow.
GitHub Actions in Practice - A Python Benchmarker
Note: this example uses a repository created for this article, which can be found, unsurprisingly, on GitHub.
Let's combine what we've covered into a fully-fledged workflow. We will be creating a Python benchmarker workflow which we will place in .github/workflows/benchmark.yml
.
The workflow will be triggered on every push to the main branch.
name: python version benchmarker
on:
push:
branches: [main]
The workflow consists of three stages.
The Lint Stage
The first job is tasked with linting the contents of benchmarker.py
, making sure that it has a score of at least 8.0:
jobs:
pylint:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2 # checkout repo
- uses: actions/setup-python@v2 # set up environment for python
with:
python-version: 3.7
- uses: py-actions/py-dependency-install@v2 # install dependencies from requirements.txt
with:
path: requirements.txt
- name: run pylint, fail under 8.5
run: pip install pylint; pylint benchmarker.py --fail-under=8
Benchmark
We will be running the benchmark across 6 different versions and implementations of python, failing if the code isn't compatible with all of them (configured with fail-fast
parameter of the matrix strategy, which is true
by default):
benchmark:
runs-on: ubuntu-latest
needs: pylint
outputs:
pypy2: ${{ steps.result.outputs.pypy2 }}
pypy3: ${{ steps.result.outputs.pypy3 }}
py2-7: ${{ steps.result.outputs.py2-7 }}
py3-6: ${{ steps.result.outputs.py3-6 }}
py3-7: ${{ steps.result.outputs.py3-7 }}
py3-8: ${{ steps.result.outputs.py3-8 }}
strategy:
matrix:
include:
- python-version: pypy2
out: pypy2
- python-version: pypy3
out: pypy3
- python-version: 2.7
out: py2-7
- python-version: 3.6
out: py3-6
- python-version: 3.7
out: py3-7
- python-version: 3.8
out: py3-8
steps:
- uses: actions/checkout@v2
- name: setup py
uses: actions/setup-python@v2
with:
python-version: ${{matrix.python-version}}
- name: save benchmark stats
id: result
run: |
echo "::set-output name=${{matrix.out}}::$(python benchmarker.py)"
Let's take a more detailed look at this, to see some finer issues you can come across when using GitHub Actions. The outputs
keyword specifies key:value
pairs that a job can produce and allow other jobs to reference. The key
value is the name of the output and the value
is a reference to a particular output of a step with a given id
.
In our case the step with an id: result
will produce an output based on the matrix' value of the python-version
which had to be modified and provided with the out
parameter since GitHub's object access syntax doesn't allow dots in object names, as well as having numbers on the first position.
There was no inherent way of placing outputs in a single JSON and referencing steps.result.outputs
as a JSON object - which can be done for read-only purpose as we will see in the following stage. Each output must instead be defined explicitly.
Uploading to Pastebin and Creating a New Artifact
The third and final stage will read the previous stage's outputs and compile them into a single file. That file will be uploaded as an artifact as well as uploaded to Pastebin.
In order to make a post
request to Pastebin we will need to configure an account and then use its API key:
pastebin:
runs-on: ubuntu-latest
needs: benchmark
steps:
- uses: actions/checkout@v2
- uses: actions/setup-python@v2
with:
python-version: 3.9
- uses: py-actions/py-dependency-install@v2
with:
path: requirements.txt
- name: use benchmark data
run: echo '${{ toJSON(needs.benchmark.outputs) }}' > matrix-outputs.json
- name: pastebin API request
env:
PASTEBIN_API_KEY: ${{ secrets.PASTEBIN_API_KEY }}
run: python pastebin.py
- name: upload newly created artifact
uses: actions/upload-artifact@v2
with:
name: benchmark-stats
path: newpaste.txt
The secret is placed as a job's environment variable to be easily accessed with os.environ[PASTEBIN_API_KEY]
in Python.
Secrets management in GitHub
GitHub offers a safe place for secrets on a repository or project-wide level. To save a secret, navigate to the repository Settings and add a new value in the Secrets tab:
When Not to Choose GitHub Actions as a CI/CD Tool?
Even though we've seen the potential of this new feature of GitHub, there are some things to consider; things that may be deal breakers and make you search for an automation tool elsewhere:
- GitHub's offering of runners is pretty lacking. With 2 cores and 8GB of RAM, they are good for running linters and testing; but don't even think about some serious compilation.
- REWRITE Workflow debugging can be an unpleasant experience. There is no way of re-running a single job but re-running the entire workflow. If the final step is encountering issues, you'll either have to rewrite the workflow to make troubleshooting a bit more bearable or wait for the entire workflow to run before getting to your point of troubleshooting.
- No support for distributed builds.
Conclusion
GitHub Actions have matured a lot in the past few years, but not enough. Still, the potential is there. With the best API out of all git platforms, and with the innovative approach of writing actions in JavaScript, all backed up by the largest git community in the world - there is no doubt that GitHub Actions has the potential to take over the entire CI/CD game. But not yet.
For now, use this tool for simple compiling/packaging or to append tags to your commits while the enterprise still relies on the likes of Jenkins, Travis CI and GitLab CI.