Metadata and Signals intelligence (SIGINT)
The basic definition that is given for metadata is that it is “data about data.” In other words, for every data file created, such as an image, in the background is information or ‘metadata’ about the picture. The user generally sees the image, and the metadata is the hidden data. In the case of a snapshot, it can contain details such as the GPS location of where the picture was taken, the date, time, user information, and even camera settings. It is, as described, additional data about the image.
There are five basic types of metadata:
- Descriptive Metadata – This holds descriptive information about a resource. Easily searchable, descriptive metadata is used for discovery and identification. It can include specific details such as title, abstract, author, and keywords.
- Structural Metadata – This type of metadata holds containers of data and specifies how compound objects are put together. An example would be how pages are ordered to form specific chapters. Structural metadata describes the types, versions, relationships, and other characteristics of digital materials.
- Administrative Metadata – Administrative metadata works much like the administration in an office setting. It includes the details that help manage a resource, like a resource type, applications used, permissions, and when and how it was created.
- Reference Metadata – This is similar to the description of the contents of a book. It includes information about the contents, type of information, and quality of statistical data.
- Statistical Metadata – This is also called process data. Process data is information stored in a business process that can accumulate over time (think excel spreadsheets or databases.) It may also describe methods that collect, process, or produce statistical data.
In other words, every type of file, data packet, or image created, even though the user views one version, there is data stored in the background of the file that is accessible and shares a great deal of personal information.
What is SIGINT?
SIGINT stands for signals intelligence. Signals intelligence is a form of intelligence gathering. Intel is gathered by intercepting electronic signals. It can be further divided into categories, including communications between people (COMINT) or electronic signals (ELINT) that are not used directly for communication. SIGINT is generally under the operation of intelligence collection management.
As a rule, most sensitive information is encrypted. SIGINT uses cryptoanalysis to decipher and recover details or messages. Traffic analysis is also helpful as it involves studying who is signaling whom, tracks the quantity of data, and the details are often used to determine which signals should be focused on for further cryptoanalysis.
Historically speaking, the concept of intercepting signal information has been with us since its first use in 1900 during the Boer War (1899 – 1902). During this war, the British had invested in wireless receivers or sets to have onboard their vessels. The British Navy and Army used limited electronic signaling. Since the British were the only ones who had this capability, no unique interpretations or codes were necessary. However, during the course of the war, the Boers captured some of these wireless sets. They used them to intercept signals from the British and send out messages to trap the British ships.
“The United States Department of Defense has defined the term “signals intelligence” as:
- A category of intelligence comprising either individually or in combination all communications intelligence (COMINT), electronic intelligence (ELINT), and foreign instrumentation signals intelligence (FISINT), however, transmitted.
- Intelligence is derived from communications, electronic, and foreign instrumentation signals.
- Being a broad field, SIGINT has many sub-disciplines. The two main ones are communications intelligence (COMINT) and electronic intelligence (ELINT).”
Any type of data collection system must have at the other end a ‘target.’ How else would it be able to determine which signal to process? According to DoD, targeting is the “output of the process of developing collection requirements:
- An intelligence need is considered in the allocation of intelligence resources. Within the Department of Defense, these collection requirements fulfill the essential elements of information and other intelligence needs of a commander or an agency.
- An established intelligence need, validated against the appropriate allocation of intelligence resources (as a requirement) to fulfill the essential elements of information and other intelligence needs of an intelligence consumer.”
Trading Details, ICREACH
With the broad mass collection of data, including its attached metadata, the ability to define the search is a significant part of getting correct and complete details. In 2007, the DoD Target Analysis Center released some information on the future of improving operations and the capacity to share data, including metadata. The announcement was made to step beyond the previous boundaries and obstacles to data collection and targeting. The idea is that through sharing data among agencies, more data will also be accumulated and entered into the database. To accomplish this goal, the agency announced the initiation of a project called ICREACH.
ICREACH is a surveillance-related search engine used by agencies like the NSA, FBI, DEA, and other law enforcement agencies. Though its public discovery was controversial, ICREACH collects deep metadata on both foreigners and United States residents. It was developed after the 9/11 terrorist attacks as a means to prevent terrorism. However, as the constitution protects Americans from mass surveillance, it was a shock when the information regarding the search engine was leaked to the public.
The search engine has access to data and metadata from its overwhelmingly large database of records, hundreds of millions of files that can include emails, phone calls, instant messages, and geo-locations. It contains data on most of the residents, both foreign and natural-born citizens, in the United States. Since the program was approved under Executive Order 12333, ICREACH gathers data stored in several different databases created by the Reagan-era program. How or what the parameters are for organizations to meet to access ICREACH has not been fully disclosed.
The NSA describes the search engine as a ‘one-stop shopping’ tool. The details may not include specific conversations of an individual but detailed metadata connected to them. A search may be based on a piece of data, such as a phone number or email address. What is returned are the details and data points associated with or paired with the number, including all of its attached metadata. From this, analysts can know who a person talked to, for how long, when, and where they were located when they made the call. A pattern can be developed with enough details, showing an individual’s daily habits as they go throughout their day. It can showtimes and locations, giving the analysts the ability to determine the approximate time a person wakes, which coffee shop they are likely to visit, where they buy groceries, and what type of hobbies or additional habits the person may have.
Metadata and SIGINT
There are other applications used to collect and disseminate metadata through SIGINT. For example, the PROTON (previously called CRISSCROSS) application is a CIA managed program that provides data extracts or reports from various intelligence agencies (NSA, CIA, DIA, FBI, and DEA). The information is created from databases of phone call records and other detailed SIGINT data collected. Sources can also be HUMINT, Open, or law enforcement agencies.
For more than 15 years, PROTON has provided intel-derived data about US residents, along with second-and third-party data to other agencies. The NSA application has been providing SIGINT-derived signaling data to both the CIA and DEA. This was done to support a multi-agency counter-narcotics analysis and investigation. These transactions and operations were approved by overseeing departments. For security purposes, it has been noted that data from SIGINT and HUMINT are displayed in such a way as to be unable to discern the difference in the method of data collection. This is to reduce the risks involved for federal sources that may be targeted should the metadata lead back to their identity.
The massive-scale of this data collection and processing worldwide has put a tremendous amount of pressure on the NSA IT infrastructure. There was concern that this amount of data processing would require large amounts of resources. However, both electronic and human intelligence and other communication resources transfer moderate amounts of data to the CIA PROTON Office. To do this, the careful flow of data in portions or easily handled volumes has not increased personnel. Under the terms of the agreement, the CIA provides a list of US overseas commercial phone numbers to be minimized when processing the signal data. The phone numbers associated with the Department of State or Department of Defense are also underestimated. This demonstrates that there are control measures involved in the collection, accumulation, and dissemination of data.
Handing Out the Data
Data on this scale requires careful planning. Working with large data sets, computing storage costs, and protecting the raw data can be challenging enough, but as the NSA runs ICREACH, it is subject to FOIA requests by the public and news media. The law enforcement agencies that use the data, such as the DEA, FBI, and CIA, are also subject to FOIA requests. The Freedom of Information Act is a federal law regulating data release to ensure freedom of information. The law requires full or partial disclosure of previously unreleased data under government control. The act was created to provide more transparency in government. It defines which agency records are subject to disclosure and outlines mandatory disclosure procedures. The legislation also defines nine exemptions to the statute. The United States Post Office, although a government agency, is exempt from FOIA requests.
Top government departments turn to intelligent algorithms, smart features, machine learning, and artificial intelligence to help automate their search and redactions for release. Agencies that handle intelligence, secure information, along with local and national security, turn to CaseGuard. When law enforcement or a federal agency meets the FOIA requests, they must be able to release information while redacting any details that contain personally identifiable information or classified information. The amount of data to redact is significant for these organizations. When accuracy and speed matter, trust CaseGuard.