Monitoring Data Changes Using a HIDS

In this article I'll explain how to monitor your data and how to detect changes. This kind of monitoring is mostly done using a Host-based Intrusion Detection System (HIDS) like Integrit. In this article, we describe various methods for your different use-cases.

IDS (Overview)

In general, an Intrusion Detection System (IDS) is a device or a software application that both observes and analyzes the system activity for malevolent activities or policy violations. It inspects network traffic, system calls, and changes made on storage devices in order to discover suspicious action based on known system behavior ("does the system behave normal, or is the load much higher than usual?"). In an ideal case an IDS also automatically alerts you when someone or something is trying to compromise your system.

In general, there are two different kinds of IDS available -- NIDS, and HIDS.

Network Intrusion Detection System (NIDS)

An NIDS is strategically positioned at various points on the network to passively monitor traffic going to and from network devices in order to find both suspicious network packets and activity. The goal is to discover signatures by hackers, and to notify the system administrator before the offenders can do serious damage or simply occupy bandwidth and bind computing power (so-called Denial of Service Attacks).

Mostly, an NIDS is a combination of stand-alone hardware sensors, and software components that work together to accommodate a greater range.

Selection of tools: Snort, Suricata, Bro/Zeek.

Host-based Intrusion Detection System (HIDS)

A HIDS tracks down and audits local file changes and modifications. This includes monitoring of data files and directory structures, and discovering changes in the structure itself as well as content, access rights, inode, size, and access information.

Selection of tools: Tripwire, Integrit, AIDE, Samhain, Systraq, Open Source HIDS Security (OSSEC).

As pointed out above, NIDS are installed on a separate system in order to take care of an entire network. Hence the name, HIDS have to be installed on every single computer in a network, and takes care of only one specific machine.

Furthermore, there are also combined versions available -- so-called Hybrid IDS -- that offer both services plus a management level for the network and the host level. Among others this includes Sagan, Security Onion, Tiger, ACARM-ng, and Audit. A fourth but smaller category contains distribution-specific IDS, like Debsecan for Debian GNU/Linux.

Who needs an IDS?

Still, the question remains who is in need for an IDS, and especially why? An IDS is a core component of the security infrastructure for corporations, and more than useful to prevent business servers and data centers from data damage. IDS's target to identify any changes made that will be a compromise to the business assets, confidential information and private critical files.

How does an HIDS Work?

Basically, a HIDS keeps an initial data set per file or directory in a database. A single entry contains a list of attributes, for example the filename and the file attributes like the creation time, the access time, the modification time, the owner, the number of the inode, its size as well as information regarding the file content. It is common to save the latter as a hash value, for example using the SHA-1 or SHA-256 mechanism. MD5 is seen as broken, and therefore is no longer recommended to be used.

In order to check if the data is unchanged the HIDS recalculates the according dataset, and compares it with the original one. Finding one dataset being identical with the other one the HIDS returns a positive value as this result indicates that the according entry has not been changed. Otherwise, a possible incident has happened, and the HIDS will alert you.

Next, we will show you different ways to check the integrity of your data.

Using Built-in Shell Commands

This solution is based on the find command, and is rather simple. It allows you use built-in methods of UNIX/Linux to see if a file entry has been modified in a specified time range.

The first parameter for the find command is -type f, and limits the output to files, only. The second parameter -newermt sets another boundary - the modification time. In the example below the lower boundary is set to January 3, 2019, 12h00, and the upper boundary to now, implicitly. Then, find prints only these entries that have a newer modification time than specified on the command line. Consequently, the output only contains file2.tmp from the list.

$ ls -l
total 16
-rw-r--r--  1 frank frank        3 Jan  3 12:00 file1.tmp
-rw-r--r--  1 frank frank   11 Jan  4 16:53 file2.tmp
$ find . -type f -newermt "2019-01-03 12:00:00"
./file2.tmp

In order to set a timeframe with both a lower and an upper boundary find allows you to specify two values. The next example sets 12h00 of January 3, 2019 as a lower boundary, and 18h00 as an upper boundary using ! as a negation:

$ find . -type f -newermt "2019-01-03 12:00:00" ! -newermt "2019-01-03 18:00:00"
...

So far, we only know that the discovered file has a different modification time. In order to see if the content was modified, actually, a hash value for the content helps. You can calculate different hash values using the commands sha512sum, cksum, md5deep and hashdeep, hashalot and hashrat.

In the following example we use sha1sum to calculate the hash value for the content. sha1sum outputs two columns - the calculated value for the file content, and the filename.

$ sha1sum file*
56ac1c08fa5479fd57c4a5c65861c4ed3ed93ff8  file1.tmp
56ac1c08fa5479fd57c4a5c65861c4ed3ed93ff8  file2.tmp

In order to compare two different conditions of files we have to compare two sets - before, and after a modification. That's why we saved the output of cksum in a file named snapshot-20190103 as follows, firstly:

$ sha1sum f* > snapshot-$(date +”%Y%m%d”)

With the help of the sha1sum command we can see the differences between the last snapshot and today.

$ sha1sum -c snapshot-20190103
file1.tmp: OK
file2.tmp: FAILED
sha1sum: WARNING: calculated value for 1 does NOT match
$

The shown solution is quite simple, and can easily be integrated in a frequent check, as for example in crontab. As a matter of principle, we can figure out if the content of the directory is still the same, and if previously existing files have been deleted or modified, or new files have been added in the time between our snapshots. The more files are to be tracked the more complex it gets to figure out what was done.

Using a Revision Control System

The solution we have up to now allows us to keep track of changes in structures with a smaller number of entries. Unfortunately, changes that happened overwrite existing files and data, and cannot be instantly reverted or restored unless a proper, additional backup exists. A revision control system like Git or Apache Subversion (SVN) can help us. Both tools track content changes, and also offer us a version history for the files and directories.

As an example, the command git status displays changes that have been made.

$ git status
# On branch master
# Changes not staged for commit:
#   (use "git add <file>..." to update what will be committed)
#   (use "git checkout -- <file>..." to discard changes in working directory)
#
#   modified:   dataset1.dbs
#
no changes added to commit (use "git add" and/or "git commit -a")

The disadvantage of these solutions is that they require an additional step - a commit as soon as an entry in the filesystem has changed. Both Git and Subversion do not do that automatically. Tools like incron and fswatch can handle such events, and trigger further action like automated commits, notifications or alerts.

Setting up a HIDS

The third group of solutions is based on special tools classified as Host-based Intrusion Detection Systems (HIDS). They allow to track changes of content, user rights, and access times. To illustrate its usage we will monitor a directory using Integrit. The HIDS Integrit is available as a regular Debian package.

Firstly, install integrit from the package repositories, for example using apt-get as follows:

$ apt-get install integrit

Next, Integrit needs to be set up properly. Integrit refers to a configuration file in order to know which directories to keep an eye on, and where to store its data. In our example the only directory that we would like to monitor is /home/user/research-data, and that's why we set the entry root= to this path.

The two databases containing the previous state, and the present state are referred to using the two entries known= and current=. Keep this databases at safe places. For illustration purposes we keep them in the same directory. Also store this configuration as the file integrit.conf in a local directory.

In the following step we initiate the original database. Run Integrit with administrative privileges (in our case as user root), and invoke the following command:

$ integrit -v -C integrit.conf -u

Integrit initializes its database and stores it in the file current.cdb. Next, we do some changes on our data, rerun Integrit, and see what it says to our changes. A change can be a new file, or a new line that you simply add to an existing file.

Before rerunning Integrit backup the current database according to the settings in the Integrit configuration file as follows:

$ cp current.cdb to known.cdb

Now, we can rerun Integrit like that:

$ integrit -v -C integrit.conf -c

As you see in the following screenshot Integrit discovers the change, and complains. That's what we wanted :)

Conclusion

Discovering data changes in the file system is not too complicated. It needs a bit of preparation but helps not to be surprised in the future. The data is yours, and taking care works in your advantage.

The selection of tools we made is not complete as there exist other levels and mechanisms of integrity checks, too. For further reading have a look at filesystems like BTRFS and ZFS/OpenZFS as well as tools like rootkithunter.

In the next article I'll go over various ways to automatically track and commit file changes using popular version control tools like Git.

Acknowledgements

The author would like to thank Axel Beckert, Veit Schiele, and Zoleka Hofmann for their help and critical remarks while preparing this article.

Author image
Berlin -- Genève -- Cape Town Twitter Website
IT developer, trainer, and author. Coauthor of the Debian Package Management Book (http://www.dpmb.org/).