Which Data Structure to Use?
- Sysadmin has asked for a report based on our FTP logs. I've parsed
the lines in the FTP log based on the action the user took (STOR,
DELE, RETR, CWD..) For each action, I now have an array that contains
just those lines of the FTP log.
There are several 'fields' in each line that I would like to report
on. Example: For the change directory (CWD) array, I'm interested in
the 3 pieces of info... userid, directory, and a running count for
I'm stumbling on how to reason (and code) the storage for this data.
The end report would look something like this..
Userid Directory Number of Hits
UID1 /usr/bin 25
UID1 /usr/opt 3
UID3 /user/bin/perl 6
So, as I said before, with a foreach loop and a regex, I've built a
change directory array (@CWD). Now with another foreach loop and a
split, I stored the $userid ($CWD) and directory($CWD) in
scalars and have a counter for the hits. Problem number one... How to
identify the combined userid/directory to determine which counter to
apply the hit to ..
Seems like the high level identifier will be userid, but as the same
user may CWD to multiple directories with an FTP session, perhaps the
userid & directory will make it unique. Concatenating the userid &
directory and populating a hash with this as the key, (the value
being the running count) seemed feasible. But, perhaps also somewhat
less than elegant.
A hash of array (userid => [directory, count]) doesn't get the job
done because the userid key is not unique. If I understand the
examples in Perl Cookbook, the hash of hashes doesn't work for the
If I wasn't sure I was confused and offtrack before I wrote this,
after proofreading it, I've convinced myself that I am. Any help will
be much appreciated.
With a little help understanding what might be a reasonable storage
structure/pseudo code approach, I'll take a stab at the code and
revisit y'all when I approach the <wall>.