Image by Larry He's So Fine via FlickrWe have been doing quite a few e-discovery collections over the past couple of years and there is a recurring theme to each of them; There is a definite communication barrier between attorneys and us computer forensic types.
Attorneys use words such as mens rea, voir dire, habius corpus and in camera. Our vocabulary includes words like bit stream copy, logical acquisition, active file collection and MD5 hash values.
Ever read the book, Men are from Mars, Women are from Venus? Having a conversation with my wife many times goes like this:
"I know that is what I said. But that is not what I meant!"
The problem is that she says one thing and I hear something entirely different. That is a lot like talking to attorneys. Attorneys are from Mars and computer geeks are from Pluto. Well, we would be if it was still a planet.
It seems that the hardest thing to do is gather information before the collection that is accurate and means the same thing to both parties.
When an attorney says to me, "I want a copy of the hard drive." I hear, okay, you want a bit stream forensic image of the entire physical hard drive. Not a problem. At least, not a problem until we find out they meant that they wanted a copy of the all of the logical files on the hard drive. Wait a minute? What the heck is a logical file? Can a file be illogical? My wife certainly can, and that is one of her many endearing qualities.
When we computer forensic types think about hard drive and partitions and files, we tend to think in two realms: physical and logical.
So what's the difference anyway?
When the operating system on a computer shows you, the user, partitions, directories and files located on a physical hard drive, it shows you a logical representation of the physical data on that hard drive. Each operating system has its own little quirks in how it likes to store, arrange and show the files it manages.
Even what most people consider to be their hard drives when they see in their file browser items like C or D or some other drive letter is a logical representation. That C you see is not a physical hard drive. It is a partition on a physical hard drive. Now, of course if your hard drive only has one partition, you could say that Drive C is the physical hard drive, but you would still only be referring to the logical representation of the hard drive. The nickname, so to speak, that the operating system gives the partition on the physical hard drive. Otherwise you would see something like hda0 or sda1. That is what the drive would look like at a lower and not so friendly level.
The operating system shows nicknames you so you can have an idea of where your files are, using friendly names.
An easy way to think about how drive letters work is to think real names and nicknames. My real name is Lawrence. Let's call that my physical name. My nickname is Larry. Let's call that my logical name. I can answer to either one equally well, but since, me as an operating system, represents my physical self as my logical name Larry, you don't need to know my real name to yell at me or ask me a question.
When you are browsing around in your computer, you will not see files that are deleted. These are still there on the physical hard drive, but are not included in the logical representation that the operating system shows you. That is because, being ever so helpful, the operating system assumes that since you deleted the files, you don't want to see them anymore. That is how many people get surprised when a forensic examination exposes all those nasty little porn files you thought were gone when you deleted them and then emptied your recycle bin.
Back to physical and logical. The other helpful thing your operating system does is show you how much space is left on the hard drive. (I am using the logical representation here, since most of us normally think of a drive letter as a hard drive, even if it is incorrect. It is more convenient.)
When you examine your hard drive in Windows for example, it might show you that you have 500 gigabytes in total space, followed by 75 gigabytes used and 425 gigabytes free.
Now if you asked for a copy of the whole hard drive, you were probably thinking you want a copy of that logical 75 gigabytes, not the whole physical 500 gigabytes. It is rare in an e-discovery collection to want the whole 500 gigabytes of the physical drive. Why?
1. Most discovery requests don't include deleted files.
2. E-discovery processing is danged expensive and is charged by the gigabyte in most cases. Why pay any more than you have to for processing?
3. Getting to what you REALLY mean; you want all the user files from that physical 500 gigabyte drive and that logical 75 gigabytes.
So, if the collection order specifies all the user files from the entire hard drive, I got that. No problem, you will end up with an actual collection far smaller that even the logical 75 gigabytes.
If the order specifies the entire hard drive, I am going to think; Okay, forensic bit stream copy time. You will get the entire 500 gigabytes.
I am going to write more on this blog about the technical stuff in plain English and try to bridge the communication gap between us and the non-technical people we serve.