Automating Version Control Commits

In the previous article I explained how to monitor your data and how to detect changes using tools like Integrit, which is a Host-based Intrusion Detection System (HIDS). Discovering the changes in files is already quite nice, but keeping track of the content and its changes over time is much better. In this article, we will have a look at different ways to use a revision control system by doing commits automatically.

The solutions we looked at up to now will help you to answer the question, "Did my data change?" - although you do not know what has changed exactly. The original data has already been overwritten, and in retrospect only you will notice the changes. There are two ways to identify the differences - comparing the current data with your external backup, or use a revision control system like Git or Apache Subversion (SVN). Both tools offer you a version history for the files and directories.

As an example, the command git status displays changes that have been made:

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   dataset1.txt
#
no changes added to commit (use "git add" and/or "git commit -a")

The disadvantage of these solutions is that they require an additional step - a commit as soon as an entry in the file system has changed. Both Git and Subversion do not do that automatically.

A very straight-forward solution is the following one:

$ git add --all
$ git commit -a -m "current changes"

In line one all the local changes are tracked, and in line two these changes are committed. This covers temporary files, too, which may not be your desire. To prevent this, a file named .gitignore helps. This file contains the file names that are excluded from version control, for example temporary files created by the Vim text editor:

*~
*.sw?

Using the inotify Interface

Combining Git/SVN commands with find, xargs, incron, fswatch, or inotify helps, and allows you to process such events without manual intervention.

Inotify works by watching for changes to the filesystem and notifying applications of those changes.

The following code shows a call based on the inotify kernel subsystem using inotifywait. A Git commit happens as soon as the changes in the file file.txt are written to disk.

$ inotifywait -q -m -e CLOSE_WRITE --format="git commit -m 'autocommit on change' %w" file.txt | sh
...

In order to track entire directories you should use the additional switch -r.

Combining find, xargs, and git

The following bash script uses tools that exist on every Linux system, and combines find, xargs, and git. Once per hour the script checks for files having changed within the last hour, and commits them in a local Git repository.

#!/bin/bash

while true
do
    # calculate time range
    currentSec=$(date + "%s")
    currentTime=$(date + "%Y%m%d %H:%M:%S")

    previousSec=$(echo "$currentSec-3600" | bc)
    previousTime=$(date + "%Y%m%d %H:%M:%S" --date="@$previousSec")

    echo "--- $previousTime - $currentTime ---"
    currentPath=$(pwd)
    workingPath="$1"

    cd "$workingPath"
    echo "now here: $workingPath"

    find . -type f -newermt "$previousTime" ! -newermt "$currentTime" -print | xargs git add -v

    git commit -v -m "changes from $previousTime - $currentTime"

    cd "$currentPath"
    echo "now here: $currentPath"

    # wait for 1 hour
    sleep 60m
done

Calling the script results in an output like the following:

$ ./find-changes.sh /home/frank/data & 
...
--- 20190115 14:06:29 - 20190115 15:06:29 ---
now here: .
add '.data.txt'
[master 65797bb] changes from 20190115 14:06:29 - 20190115 15:06:29
 1 file changed, 2 insertions(+), 2 deletions(-)
 create mode 100644 data.txt
now here: /home/frank/data
...

In order for one to see what has changed between the single commits, they may use gitk - a graphical repository browser - to display that.

gitwatch and Flashbake

In the same category of tools you'll find gitwatch and Flashbake. Basically, gitwatch is an extended version of the bash script from above, and also uses the inotify kernel subsystem. Nonetheless, gitwatch has not been packaged for Debian or Ubuntu yet.

Download the script from the project website on GitHub, first, and then call it from the command line with the name of the file or directory to be tracked. Here you see how it works while working on the article:

However, the second project - Flashbake - has been in existence for a much longer time, and connects Git and Cron. Written in Python, it is available from the project website as well as a regular Debian package.

In order to track file modifications, Flashbake requires a Git repository as well as a file named .flashbake in it. The latter one is meant to contain the list of files to be tracked.

Free eBook: Git Essentials

Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!

$ cat .flashbake 

# files to track

data.txt
gitk.png
gitwatch.png
iviewdb.png
find-changes.sh
ils.png

Next, an according cron entry is required to run Flashbake regularly. Use the command crontab -e to open the crontab file, and add the following line:

*/15 * * * * flashbake /home/frank/data 15 > /dev/null

This entry calls Flashbake every 15 minutes, and tracks the content of the directory /home/frank/data (although you'll want to change this path to suit your needs). The error messages sent to stdout are simply redirected to the digital trash called /dev/null.

In order to see the changes between the commits you may either use gitk (see above), or use the command git diff.

Conclusion

With the help of these tools you can figure out if, when, and especially what kind of data has changed on your system. By using Git you can resort to previous versions. It is recommended to keep space for additional data - the backup has to be stored somewhere, and on an external device, ideally.

In the context of security, in order to prevent unwanted changes it is required to work with further safety measures, for example on the level of the access rights, and the system services.

Acknowledgements

The author would like to thank Axel Beckert, Veit Schiele, and Zoleka Hofmann for their help and critical remarks while preparing this article.

Last Updated: August 3rd, 2023
Was this article helpful?

Improve your dev skills!

Get tutorials, guides, and dev jobs in your inbox.

No spam ever. Unsubscribe at any time. Read our Privacy Policy.

Frank HofmannAuthor

IT developer, trainer, and author. Coauthor of the Debian Package Management Book (http://www.dpmb.org/).

Free
Course

Git Essentials: Developer's Guide to Git

# git

Git Essentials: Developer's Guide to Git is a course for all developers, beginner to advanced, and written to get you up to speed with the...

David Landup
François Dupire
Jovana Ninkovic
Details

Make Clarity from Data - Quickly Learn Data Visualization with Python

Learn the landscape of Data Visualization tools in Python - work with Seaborn, Plotly, and Bokeh, and excel in Matplotlib!

From simple plot types to ridge plots, surface plots and spectrograms - understand your data and learn to draw conclusions from it.

© 2013-2024 Stack Abuse. All rights reserved.

AboutDisclosurePrivacyTerms