In the previous article I explained how to monitor your data and how to detect changes using tools like Integrit, which is a Host-based Intrusion Detection System (HIDS). Discovering the changes in files is already quite nice, but keeping track of the content and its changes over time is much better. In this article, we will have a look at different ways to use a revision control system by doing commits automatically.
The solutions we looked at up to now will help you to answer the question, "Did my data change?" - although you do not know what has changed exactly. The original data has already been overwritten, and in retrospect only you will notice the changes. There are two ways to identify the differences - comparing the current data with your external backup, or use a revision control system like Git or Apache Subversion (SVN). Both tools offer you a version history for the files and directories.
As an example, the command
git status displays changes that have been made:
$ git status # On branch master # Changes not staged for commit: # (use "git add <file>..." to update what will be committed) # (use "git checkout -- <file>..." to discard changes in working directory) # # modified: dataset1.txt # no changes added to commit (use "git add" and/or "git commit -a")
The disadvantage of these solutions is that they require an additional step - a commit as soon as an entry in the filesystem has changed. Both Git and Subversion do not do that automatically.
A very straight-forward solution is the following one:
$ git add --all $ git commit -a -m "current changes"
In line one all the local changes are tracked, and in line two these changes are committed. This covers temporary files, too, which may not be your desire. To prevent this, a file named
.gitignore helps. This file contains the file names that are excluded from version control, for example temporary files created by the Vim text editor:
Using the inotify Interface
Inotify works by watching for changes to the filesystem and notifying applications of those changes.
The following code shows a call based on the inotify kernel subsystem using inotifywait. A Git commit happens as soon as the changes in the file
file.txt are written to disk.
$ inotifywait -q -m -e CLOSE_WRITE --format="git commit -m 'autocommit on change' %w" file.txt | sh ...
In order to track entire directories you should use the additional switch
Combining find, xargs, and git
The following bash script uses tools that exist on every Linux system, and combines
git. Once per hour the script checks for files having changed within the last hour, and commits them in a local Git repository.
while true do # calculate time range currentSec=$(date + "%s") currentTime=$(date + "%Y%m%d %H:%M:%S") previousSec=$(echo "$currentSec-3600" | bc) previousTime=$(date + "%Y%m%d %H:%M:%S" --date="@$previousSec") echo "--- $previousTime - $currentTime ---" currentPath=$(pwd) workingPath="$1" cd "$workingPath" echo "now here: $workingPath" find . -type f -newermt "$previousTime" ! -newermt "$currentTime" -print | xargs git add -v git commit -v -m "changes from $previousTime - $currentTime" cd "$currentPath" echo "now here: $currentPath" # wait for 1 hour sleep 60m done
Calling the script results in an output like the following:
$ ./find-changes.sh /home/frank/data & ... --- 20190115 14:06:29 - 20190115 15:06:29 --- now here: . add '.data.txt' [master 65797bb] changes from 20190115 14:06:29 - 20190115 15:06:29 1 file changed, 2 insertions(+), 2 deletions(-) create mode 100644 data.txt now here: /home/frank/data ...
In order for one to see what has changed between the single commits, they may use gitk - a graphical repository browser - to display that.
gitwatch and Flashbake
In the same category of tools you'll find gitwatch and Flashbake. Basically, gitwatch is an extended version of the bash script from above, and also uses the inotify kernel subsystem. Nonetheless, gitwatch has not been packaged for Debian or Ubuntu yet.
Download the script from the project website on Github, first, and then call it from the command line with the name of the file or directory to be tracked. Here you see how it works while working on the article:
However, the second project - Flashbake - has been in existence for a much longer time, and connects Git and Cron. Written in Python, it is available from the project website as well as a regular Debian package.
In order to track file modifications, Flashbake requires a Git repository as well as a file named
.flashbake in it. The latter one is meant to contain the list of files to be tracked.
$ cat .flashbake # files to track data.txt gitk.png gitwatch.png iviewdb.png find-changes.sh ils.png
Next, an according cron entry is required to run Flashbake regularly. Use the command
crontab -e to open the crontab file, and add the following line:
Free eBook: Git Essentials
Check out our hands-on, practical guide to learning Git, with best-practices, industry-accepted standards, and included cheat sheet. Stop Googling Git commands and actually learn it!
*/15 * * * * flashbake /home/frank/data 15 > /dev/null
This entry calls Flashbake every 15 minutes, and tracks the content of the directory
/home/frank/data (although you'll want to change this path to suit your needs). The error messages sent to
stdout are simply redirected to the digital trash called
In order to see the changes between the commits you may either use gitk (see above), or use the command
With the help of these tools you can figure out if, when, and especially what kind of data has changed on your system. By using Git you can resort to previous versions. It is recommended to keep space for additional data - the backup has to be stored somewhere, and on an external device, ideally.
In the context of security, in order to prevent unwanted changes it is required to work with further safety measures, for example on the level of the access rights, and the system services.
The author would like to thank Axel Beckert, Veit Schiele, and Zoleka Hofmann for their help and critical remarks while preparing this article.