Tree-walking for beginners in Ultimate C++: How to write applications that scan your hard disk

in utopian-io •  7 years ago  (edited)

I am going to be looking at how to write applications that scan your hard disk (or even multiple hard disks) with the intent of performing some action on the files that are found using Ultimate C++.

Some people find Microsoft's search utility rather unreliable. Perhaps the aforementioned canine is spending too much time getting over-familiar with lamp-posts, because it can often doze off on the job – the progress bar continues to cycle but no new matching files are found. Admittedly, I've got over a 10 TeraByte of online storage on my main development machine, and perhaps this is too much for Microsoft's search program to cope with. Ultimately, you'll get more satisfaction from writing your own file-search tools.

Another ever-popular utility is the resource trawler – a program that searches for all the executable files on your hard disk. Having found an executable (be it a DLL, EXE file, or whatever) it examines the innards of the file looking for bitmaps, icons and other graphical goodies. Again, this is something we'll be building over the course of this series.

Yet a third example of this type of utility is a disk-usage analyser; by scanning through the contents of every folder on your hard disk, such a program can determine which subdirectories are eating up the most disk space.

Designing The User Interface in Ultimate C++

Okay, let's roll up our sleeves and get started. First we need to create a basic user interface to test our directory-scanning code. For now, it's fairly spartan, but we'll add more features as we go along. The initial user interface is shown in the accompanying screenshot. A TListView control is used to display all the files that match our file specification, while the 'Go!' button is used to start a file search. While a file search is in progress, the label on this button changes to 'Stop!', reverting back to 'Go!' at the end of the search. If you press it during a lengthy scan, you'll be asked if you want to stop.

image.png

Writing a user interface to handle lengthy operations is an interesting problem. One way of dealing with this sort of situation is to fire off a background thread. This takes care of the file-scanning procedure while the application's main (foreground) thread drives the user interface itself. Although there's nothing wrong with this type of approach, I wanted to show you a simpler (and arguably more responsive) technique, which achieves the same objective.

Listening Out For Stop Events

The basic structure of the Scan routine – which does most of the work – is shown in the section entitled 'Scanners don't live in vain' (see overleaf). For the sake of clarity, a lot of significant code (including the all-important recursion support) isn't shown there. Here's another important code snippet:

if (++idx % 16 == 0) // Give the user a chance to stop the scan Application->ProcessMessages();

The 'idx' variable is a local that's initialised to zero each time the Scan routine is called. Each time round the loop (remember, each time that we discover a new file or directory) the value of idx gets incremented by one. If, after incrementing, the value of this variable is exactly divisible by sixteen, then the modulus operator (the % sign in the above code) returns zero. This causes the ProcessMessages method of the Application variable to get called, which – in turn – allows the VCL library to process any outstanding events such as button clicks.

image.png

This is necessary so that the event handler associated with the 'Stop!' button gets a chance to execute during the file scan. If it's clicked, it sets a variable called Scanning to 'false', which causes the Scan routine to stop looping. But you're probably wondering what this modulus sixteen business is for? It's really a performance optimisation; if we called ProcessMessages every time round the loop, then this could slow things down; instead, by doing it only once every sixteen loops, it's much less of an overhead.

All Systems Go!

The Go/Stop button actually has two different event handlers associated with it; during a file scan, it acts like a Stop button and therefore it's associated with an event handler called 'StopButtonClick'. When the application is idle, it functions as a Go button, so it's linked to a handler called 'GoButtonClick'. This might sound like a weird way of doing things, but it does simplify the program logic.

void __fastcall TForm1::GoButtonClick(TObj ect *Sender) 
{
// Start scanning... 
GoButton->OnClick = StopButtonClick; 
GoButton->Caption = "
Sto&p!"; 
Scanning = true;
FoundFiles->Items->Clear(); 
Scan ("d:\\windows\\system32\\", true);
}

The code for the GoButtonClick routine is shown above; it begins by assigning the StopButtonClick handler to the button. From then on, any button clicks get sent to that routine. Next, it changes the button caption to reflect the fact that we're doing a scan. That boolean variable I mentioned earlier, 'Scanning', is set to true and any previous contents in the list-view control are cleared. Last but not least, it calls the Scan routine to do the actual business of directory scanning.

Let's Get Recursive

Of course, the most important aspect of the Scan routine is its inherently recursive nature; once a subdirectory has been found, it has to call itself again, passing the pathname of the newly found subdirectory. So, suppose we were to start scanning from the root directory of drive C: thus giving a pathname of C:. When we find the WINDOWS directory, Scan has to call itself again, this time passing C:\WINDOWS as the pathname. In this way, successively deeper levels of the directory hierarchy are explored.

// Is it a directory?
if (sr.Attr & faDirectory) { 
// Yes – time to get recursive! 
Scan (FilePath + sr.Name + "\\", false);
} 
else { // No – it's just a file 
TListItem * item = FoundFiles->Items->Add();
item->Caption = sr.Name;
item->SubItems->Add (FilePath);
}

The 'business end' of the Scan routine – at least, in terms of recursion – is illustrated by the code shown above. Once we've found something (FindFirst or FindNext returns zero) the first job is to figure out if it's a file or directory. This is done by examining the file attribute, Attr, which forms part of the search record. This is made up of a mask of bit flags to represent whether a file/directory is hidden, archived, and so on. By AND-ing the file attribute with the faDirectory flag, we can instantly tell if we're dealing with a directory.

If it is a directory, then we simply add the newly-encountered directory name to the existing pathname (as explained above) and call ourselves once again. If it's a file that we're dealing with, this is the point at which we can plug some useful information into the list-view control. In this case, I've just added the pathname and filename, but you could also display the file's timestamp and size (in bytes) simply by adding extra columns to the list-view.

One item that requires further explanation is the TopLevel flag, which I've added to the Scan method as a second parameter. In a nutshell, our recursive routine needs to know whether it's being called as the first invocation of itself, or whether it's a secondary invocation that corresponds to an encountered subdirectory. Thus, the GoButtonClick code calls Scan with the TopLevel argument set to True. (Note: this doesn't necessarily mean we're looking at a top-level directory; it just means that this is the first invocation of the routine). Within Scan itself, all subsequent invocations are called with this parameter set to False. The reason that Scan needs to know if it's the top-level call is simple; if it is, then when it exits, this represents the end of the file-scanning process. Accordingly, it sets the Scanning variable to False and invokes the StopButtonClick method.

image.png

As with most development issues, there's more than one way to skin a cat! An alternative, and arguably neater solution to this problem would be to increment a 'recursion-depth' counter at the start of the Scan routine and to decrement this variable at the end of the code. By initialising this variable to zero, we could test for an 'end of scan' condition by detecting a decrement back to zero. See if you can figure out how to make the necessary changes to this month's code.

Understanding Attributes

As mentioned elsewhere, the Attr field inside the TSearchRec data structure is actually a mask made up of a number of bit-flags. The Ultimate C++ identifiers with each of these bit-flags are:

  • faReadOnly : The file is marked as read-only.
  • faHidden : The file is marked as hidden.
  • faSysFile : The file is marked as being for system use only.
  • faVolumeID : This is a volume label, not a file.
  • faDirectory : This is a sub- directory entry, not a file.
  • faArchive : The file is marked for archiving.
  • faAnyFile : See explanation below.

Strictly speaking, faAnyFile isn't a bit-flag. Its value (0x3F) corresponds to all the other bit-flags being set. It's used to match files or directories when calling FindFirst, and that's how we've used it in this month's code. If either of the faHidden or faSysFile attributes are set, the file won't appear in ordinary directory listings. However, because we've specified faAnyFile in our call to FindFirst, we'll pick up these files too.

If you're using Windows, right-click a file and then select Properties. You'll normally only see the read-only, hidden and archive bits. (If you can't see the archive atribute, click on Advanced.) The faVolumeID attribute is – in my opinion – something of a hack! When Microsoft added volume labels to MSDOS, they needed somewhere to store the volume label information. Someone chose to simply store the volume label as an ordinary root directory entry, with an extra little flag, faVolumeID, to show that it's a volume ID and not a file! Naturally, this is also the reason why volume labels can't be longer than 11 characters (eight, plus three for the file extension).



Posted on Utopian.io - Rewarding Open Source Contributors

Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Your contribution cannot be approved because it is not as informative as other contributions. See the Utopian Rules. Contributions need to be informative and descriptive in order to help readers and developers understand them.

  • The tutorial is not narrative.

You can contact us on Discord.
[utopian-moderator]

I am the first time on steemit blog, please give freedom in my comment! Do not back off before the fight. I see the privilege of steemit @locer76 @utopian.io