How Data Defines Us
We all have data. Even if we would like to think otherwise, data defines and drives our world today. You cannot go online, on your cell phone, or even snap a picture without accumulating data. Data is an assortment of facts. Data can be visible, known to the user, or it can be hidden in the background. Background data, or hidden data, is generally defined under metadata. Metadata is data about the data.
Data is a collection of values, measurements, and facts. There are many varieties of forms of data, such as numeric, words, measures, locations, descriptions, and many more. Data, for all intents and purposes, is an expressed description. There are four different types of data.
- Qualitative Data
This type of data contains descriptions of things or objects.
- Quantitative Data
This type of data denotes a quantity and is generally expressed numerically.
- Discrete Data
This is a type of data field set up so that it can only accept a specific kind of data input.
- Continuous Data
This data type has a range of acceptable information that can be entered. Such as only numeric values between 1-50, or alphabetical entries beginning with A to C. There is a set range for the data to remain in.
Metadata includes the above forms of data but is generally used in the context of describing other data. It is descriptions that describe the source and additional information about a piece of data. For example, metadata on an image may contain information regarding who took the photo, the date and time, and the GPS location of where it was taken. So much descriptive information can be hidden in metadata that it makes an excellent forensic resource for solving crimes.
History of Metadata
Data about data, sounds fun, huh. We have had this concept for ages. Think of the example of library card catalogs before computers—cards in files, data, specific formats, ranges, and even locations. Think of metadata as descriptive information about the piece of data collected. It can be a management or resource tool. It can also help determine where the data initiated. There can be a varied amount of data collected behind the scenes.
Before libraries, metadata, or its concept, it can be traced back to 280 BC when the scrolls available in the Great Library of Alexandria attached small tags to the end. These tags, similar to the more modern library catalog cards, gave information on the title, subject matter, and author. Users could then decide which scroll to choose based on the general description without unrolling the many scrolls in the library.
The term metadata was coined by two MIT professors, Stuart McIntosh and David Griffel. In 1967 with a progress report describing the relation to data stored on computer systems, they expressed a need for “meta language.” They discussed the process of storing data records and the given purpose that information is collected. “If many different types of records are-generated, it becomes necessary to keep a record (metadata – bibliographic data) of the data records.”
To be sure, different disciplines use different metadata standards. Depending on the source or type of file, the metadata is generated to that specific source. An example would be an image file with metadata containing the date it was taken, where it was made, and the time of day. Another would be a website that could hold metadata that provides information on the programming language used, the originator, the date it was created, and any associated images or files. Metadata can be designed to a specific standard that encompasses all the required descriptions for any particular type of data, data file, book, and as we have learned ancient scrolls.
Of course, sooner or later, someone would insist that rules are set. In 1979 metadata standards were defined by the International Press Telecommunications Council or IPTC. The IPTC council created standards for descriptive data that could be inserted into image files. In the late 80s, the board then developed standards and definitions for data that could be attached to a variety of files, images, text, or media files. This standard became known as the Information Interchange Model or IMM.
Advances in metadata continued over the years. Adobe created its standards. Adobe used the standard definitions of data that were suggested by the IPTC but created their header system. TIFF or JPEG images continue to use IPTC standards, descriptions, and headers.
From 2000 onward, the advances in metadata and the development of rules and standards continued. The National Information Standards Organization (NISO) created metadata systems in 2001 for the information industry. This would include libraries, publishers, and software companies. In their guide, Metadata Made Simpler: A Guide for Libraries, they used the following definition to help explain metadata and systems of descriptions for files. “There are several different types of metadata, including descriptive, administrative, and structural. Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords. Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. Rights management metadata is a form of administrative metadata dealing with intellectual property rights. Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.”
How Metadata Applies to Forensics
In digital forensics, a certified computer examiner or forensics expert requires specific protocols or standards during an investigation. The primary purpose of digital forensics is to investigate by going over all the available media or computer data that is uncovered. A professional digital forensics examiner’s primary goal is to locate evidence utilizing seizure, search, or retrieval. They are trained to do this while maintaining the “data integrity” of suspected files.
The first step indicated in “good practices” by digital forensics experts is to perform a hash of the suspect files or media. The experts first get a clean ‘sanitized’ copy of the original data files. This new copy is called evidence media. Once this copy is made, a ‘hash’ or comparison of the original data is performed to determine if the sanitized copy is an exact match to the originating data source. “Hashing is the process of getting a validated exclusive fixed string of data that defines a digital property’s originality.” A ‘hash’ is complete when a data collection is run through a hash command or function. The resulting data is an exclusive definition of the data and is equivalent to a ‘fingerprint.’ The fingerprint is what the scientists want to see as this can contain the following valuable information regarding the media or file being looked at.
- Dates: When the data was created.
- When: What time was this data last accessed.
- Change: It can show a history of all modifications to the data.
- Erased: This shows when the data was removed or attempted to transfer the data.
- Who: This is important as it shows who created the data file, and all persons who have changed or impacted it.
- Time: This tells scientists the time that each access of the information occurred and who occurred.
- GPS: Can tell local proximities, or where the data was created, where the individual was located when altering data.
There are many forms that data can take. Data can be available in databases, word files, images, entire websites, email, or chat. The list is possibly endless, but this is what gives us the need for metadata. When forensic scientists study metadata, they also have a variety of software to help them accomplish their tasks. Metadata software packages for Windows include FTK, Paraben, or Metadata Assistant. Those who prefer Macs often use MacQuisition to perform searches and other functions on metadata.
These software applications give reliable results on evidence data. A forensic scientist can view, document, and create reports on the data set being investigated using the forms. These applications can hash through the evidence and establish the fingerprint necessary for comparison. When results from the hash show that a particular file or media does not match the fingerprint file – this data can be used to determine which files need to be looked at carefully for modification or further analysis.
Forensic metadata is used to help prove cases, solve crimes, and assist in other investigations. Understanding the metadata concept is just a beginning to knowing how that information can be used, managed, or altered to solve a problem. With this necessary information on what metadata is and is used for, other articles will further explain some of the software and additional uses for legal representation of metadata in the court.