MattockFS; Computer-Forensics File-System : Part Four

This post is the 4th of an eight-part series regarding the MattockFS Computer-Forensics File-System. This series of post is based on the MattockFS workshop that I gave at the Digital Forensics Research Workshop in Überlingen Germany earlier this year.

If you didn't read the earlier installments of this series yet, you might want to visit these first:

In the previous three installments of this series, we looked at some essential theoretical background and historical context that we shall be building upon in this fourth post.

In this post, we finally get to the main subject of this series. The MattockFS user-space filesystem. Today we look at the core ideas implemented in MattockFS and how these fit into the core design of the file-system.

If we look at MattockFS from the historic perspective discussed in the previous posts, we see that MattockFS, MattockFS can be seen as a CarvFS compatible data-archive implementation with integrated (Anycast relay like) local message bus. Its implementation features a capability based API. The concept of data freezing and of a trusted provenance log further add to the forensic integrity features. Next to integrity, MattockFS has been designed in such a way that it attempts to minimize the amounth of spurious reads during data processing.

Basically MattockFS is a user-space file-system designed specifically for usage by lab-side computer forensic frameworks. As such, MattockFS implements a basic write-once data archive, anti-anti-forensics, and general computer forensics process integrity geared integrity measures and features aimed at mitigating and attenuating spurious reads that have been significant process bottlenecks in previous asynchronous processing based computer forensic data processing solutions such as in the now deprecated Open Computer Forensics Architecture.

Let's start by looking a bit at the MattockFS integrity measures. MattockFS implements three important integrity features. The first integrity feature that MattockFS provides is manifested in the write once property of the data archive functionality. The privilege separation provided by the file-system running as a different user basically provides what could be considered a priv-sep lab-side equivalent of a Sealed Digital Evidence Bag. The same priv-sep properties allow for the file-system to be the source of a trusted provenance log. While at a system level the archive and provenance log are mutable, as far as processes running under the user id's of regular forensic framework modules are concerned, they both can be considered protected against tampering by compromised or corruption by buggy forensic data processing processes.

The third and final integrity measure that MattockFS implements is that its functionality is accessible to the forensic-framework modules only through a capability based API. This API works with sparse capabilities in a way not dissimilar to how MinorFS used to use sparse capabilities, only more in an API oriented manner. As in MinorFS, sparse capabilities are not fully secure by themselves if we assume true malice. Linux will leak such sparse capabilities to other processes unless we use some mandatory access controls in conjunction. It is thus important to note that while MattockFS provides a good start at a capability secure API, without a good AppArmor or SELinux configuration the security properties of this API should be considered to be limited to non-maliciously intended module failure only. True anti-anti-forensics protection will require MattockFS and the mandatory access controll framework for Linux to work in conjunction.

Let us move on from integrity to performance. The Open Computer Forensics Architecture (OCFA) used to suffer significantly from spurious reads related performance degradation that stemmed directly from its asynchronous design. It is undeniable that the information technology world is moving to asynchronous processing more and more, and especially the concept of combining a tool-chain approach where huge chunks of data are processed by a dynamic chain of tools is hard to align efficiently with this asynchonous reality. To accomodate this allignment, MattockFS attempts to solve at least part of the issues while facilitating the remaining.

One important measure geared at reducing spurious reads is the usage of so-called opportunistic hashing. The hashing of data entities is a pervasive part of most computer forensic tool-chains. In OCFA, files used to be read from start to end for the purpose of hashing and content-addressed storage. It was the first thing that happened to the data and even files that would not get further processing because they were of usually irrelevant file-types would get hashed. With opportunistic hashing, we hash the files when there are low-level reads. The hashing is done within the user-space filesystem. A low level read leads to progression of any higher level entities currently active in the system. As now, there isusualy no need to read the entire file just to hash it, at the start of the tool-chain, this leads to a delay of the first read of data blocks and thus to a reduction of the time window where data blocks are to be considered hot, and paging out of the blocks is to be considered decremental to performance.

We define two types of opportunistic hashing. The primary type is used most pervasively. In primary opportunistic hashing, we only hash data that was read anyway. Only if at the point where the toolchain actually needs the hash the hashing is not completely done, then the trailing data can be read to finish up the hashing process for that data entity.

Secondary opportunistic hashing isn't currently implemented in MattockFS yet. It's on my todo list though. In secondary opportunistic hashing we not only hash data as it is being read, we actually do extra reads, but only on data that we expect to be currently paged in. Secondary opportunistic hashing is meant only for data derivation modules such as unzippers and carvers. It concerns hashing of data that is still in core because either because the data was just written by for example an unzip module, or because the the data was just read, for example by a zero-storage carving process.

A second feature aimed at reducing spurious reads revolves around combining CarvPath entities denoting specific tool-chain traversals, with the fadvise system call. The fadvise system call allows a process holding an open filehandle to communicate intended file-fragment usage to the kernel. Given that MattockFS works with a filehandle to a huge growing archive file, the use of fadvise can greatly help the kernel in finding a more appropriate strategy of paging out cache pages. MattockFS uses a special reference counting stack for keeping track of active tool-chain carvpath entities and invoking fadvise appropriately on the involved data archive fragments.

The above two slides show the algoritm used by the reference counting stack for carvpath entities.

A third important facility needed for spurious read reduction in an asynchronous framework is throttling. While MattockFS does not implement throttling itself, MattockFS does implement hooks for querying the reference counting stack fadvise implementation for information about the current fadvise status of the archive. Information that should be indicative of the current page-cache pressure and thus usefull as an indication of the need to throttle new-data input.

The last main feature of MattockFS is the unavoidable concequence of the mariage of the tool-chain approach to forensic data processing with the asynchronous message passing approach to concurrency within the MattockFS forensic file-system. MattockFS integrates a local message-bus that has been inspired on the OCFA Anycast relay. Tool-chain level meta-data can only reside in the message bus. For example, opportunistic hashing state has tool-chain bound lifetimes. The same is true about the reference counting stack for fadvise entities.

On the other side, the archive is the only place where fadvise system call invocations can take place and where low level reads can lead to opportunistic hashing progressing.

The combination of these two realities make that the dependencies for both the fadvise reference counting stack and the message bus implementation are simply too strong to not implement a local message bus as part of the file-system implementation.

Before we conclude this installment of the MattockFS series, there is still one important pont I need to get across. That point is that MattockFS in itself is not a computer forensic framework. MattockFS is meant to be an important building block that future asynchronous computer forensic frameworks should use in their implementation in order to assure the anti-anti-forensics, forensic process integrity, and spurious-read attenuation are properly addressed.

Such a future framework will need to aim for a symbiotic relationship between MattockFS and other essential elements of a potentially distributed and scalable computer forensic framework for lab-side processing within medium to large computer forensic investigations.

In the next installment of this series, we shall be discussing the file-system as API approach, and shall be discussing how a module framework would communicate with MattockFS through such a sparse-capability based file-system as API based system interface.