SecureState Blog

Read SecureState's award winning blog.

Identify and Protect Through Discovery

Data Discovery consists of inspecting and auditing systems, devices, and applications for indications of storage, transmission, or access to sensitive and regulatory data.  When we discuss Data Discovery, we generally are concerned with the location, types and controls surrounding important data types such as proprietary information, personally identifiable information, health and patient information, privacy information, intellectual property, or financial and credit card data.  This data can exist in file systems, shared drives, databases, removable media, container files such as ZIP/RAR files, emails, encrypted volumes, or in other similar data storage types and locations.   The goal of Data Discovery is to provide a readiness plan that helps the organization rapidly know if sensitive data sets have surfaced, and ensure that proper, or compliant, data controls and segmentation are implemented. 

Identifying, auditing, and protecting specific types of confidential information are key components of any readiness or preparation assessment, and are also keydatadiscoveryrequirements and best practices within industry related standards and regulations, such as PCI and HIPAA.  Most companies understand how and where sensitive data should reside within the environment, but due to inter-connectivity between supporting groups and the requirement of interrelationships between business processes, data tends to reside in systems, applications, databases, and file shares that do not provide necessary protection, authentication, and confidentiality to that information.



The organization needs to ensure the data discovery methodologies provide a full audit perspective of all sensitive data and will baseline and document the data types, locations, and devices storing or transmitting sensitive data.  The methodologies should aim to limit the damage of a potential data incident, reduce recovery time, costs, and liability, and help to ensure best practices and compliance through regular identification and audits.  SecureState uses the following three primary methodologies when performing Data Discovery: Identification Phase, Implementation Phase, and Integration Phase.


Identification Phase

Any type of readiness or preparation assessment should begin with an identification phase to determine the threat, affected resources, and the impact of potential data leakage, interception, or theft.  A proper readiness assessment is dependent on the ability to scope the engagement by identifying data and process classifications, escalation procedures, resources, risk, impact, and prioritization.  The Data Discovery team should help augment, or facilitate the management of this entire process, ultimately helping to identify, create, and mature a business’s planning, preparation, and resiliency.  Through proper management, a Data Discovery assessment will improve the vulnerability management service to help prepare an organization to identify, correlate, and respond to any incident that may affect its resources and data.  Below are the specific objectives that should be included within this phase:

  • Identify the types of data,  systems, access, storage, location and impact
  • Identify required controls and standards that are applicable
  • Identify and interview data-owners and IT administration for data controls, types and access
  • Improve the ability of network staff to prevent, detect, and respond to future threats
  • Procedures for how to comprehensively identify sensitive and regulatory data on file systems, commercial and open source databases, documents, shares and container files such as email storage and compressed archives
  • Help define the scope and control points needed for future assessments or certifications
  • Help the company control the storage, access, and transmission of sensitive data
  • Help support the company’s ability to fine tune its data retention, data access, and disposal policies
  • Develop a Project and Implementation Plan custom to the environment and data classification

The last bullet objective describes key deliverables that are created at the end of this phase:  project plans and implementation plans.


Project Plan

Data Discovery is a long process and will require the assistance and cooperation between many separate and distinct areas, groups, technologies, support teams, and resources.  It is essential to have a detailed interview schedule to discuss data-owner and custodian impacts and requirements, and to discuss the methods of data discovery for each group and determine challenges and access to the data types.


Implementation Plan

The organization should recognize that there are multiple regulations, policies, owners, and procedures that dictate how systems can be investigated.  Different architectures, file systems, data owner obligations, credentials, impacts and policies or regulations may require separate and unique methodologies and technology in order to successfully identify data-types.  It is therefore essential to have a detailed implementation plan for how to perform discovery on specific data sets.  The implementation plan should include the scope for what data sets are being discovered (types, amount, general locations, encoding, structure, etc.), specific system and application types, contractual or other regulation controls, NDAs, and connection and access requirements.


Implementation Phase

Generally, four levels of Data Discovery should be used during the assessment: authenticated audits, host-based agents, centralized network identification and application-based searches.  These methodologies, when used in conjunction, will allow the organization to better understand the amount and type of data residing on systems within the environment.  It will become clear after the description of each level how it became paramount to discuss and identify the policies, procedures, owners and technologies during discovery assessments.


Authenticated Audits

The organization should implement an authenticated approach to the Data Discovery assessment.  Authenticated approaches will require credentials or trusts to access different data types across the environment.  This methodology will provide a full audit perspective of all sensitive data and will baseline and document the data types, locations, and devices storing or transmitting sensitive data.  Therefore, the discovery team will need credentials and trusts to access data locations and data contained within controlled applications, container files, databases, and file systems within the domain.


Host-Based Agents

The discovery team will distribute search processing agents and centralized management across the network to applicable systems within the environment.  Distributed searching will exponentially reduce search times and also vastly reduce network traffic.  Distributed search processing can be handled by the client computers (without any prior installation required), thereby distributing the CPU work load across the network.  Host-based agents also reduce the network activity for distributed search clients by sending only the compressed, and encrypted, search results and not the complete searchable data, thereby saving time and reducing network traffic.


Centralized Network Identification

The discovery team may also implement a centralized network identification methodology which will allow systems to be scanned across the network.  This methodology is used to search across multiple file systems, including Macs and Unix-based operating systems.  From one centralized management system, the team will be able to collect and correlate sensitive data types and locations.  By using a centralized network scanner, the team will be able to locate sensitive data residing on systems that cannot be scanned with host-based agents.  While not as quick or as efficient as a host-based solution, this method allows the team to provide the company with an increased view of what sensitive information resides and is accessible within the environment.



The final methodology that will be used is application-based scanning.  Application-based scanning uses a client side interface which can properly parse through data residing within a database or specific application, for example.  There are many types of databases and applications that typical data discovery scanners cannot connect to or properly interpret data structures.  The discovery team should develop and test an application-based scanning method to ensure that the company receives proper data coverage across multiple databases and applications.


Integration Phase

Once data is identified and verified, it is then plotted, analyzed and correlated for the effectiveness of controls that protect and ensure the confidentiality, integrity, and availability of that data.  The discovery assessment should help identify potential security flaws within an organization, its architecture or general design as it relates to sensitive data in transit, storage or processing.  As applicable, the discovery team should be able to perform a high level review to identify security gaps or misconfigurations, ingress and egress points, device configurations, and proper separate and distinct segments that protect the organization’s network, sensitive systems, and data.  Additionally, this phase should lead to suggestions on monitoring critical files and directories, user and system events and accountability, account management, and policy changes to help develop methods to prevent and alert on modifications and access to sensitive data.  The result of the Integration Phase will provide the organization confidence and recommendations for the following:

  • Data locations and types
  • Data segmentation requirements
  • Data storage, transmission and process hardening
  • Access and accountability controls based on roles and need-to-know
  • Data impact and classification integration within IR and BC plans


Did You Know?

While building your Data Discovery program and methodologies, you also are focusing on compliance: ensure all systems that should be within scope are identified; narrow footprint of risk through segmentation and data controls.

Many regulations recommend Data Discovery within the environment to ensure the organization has identified all systems and data-sets within scope, and to help reduce financial, operational, reputational and legal impacts through proactive identification and data controls:

  • PCI DSS v2
  • Sarbanes-Oxley
  • GLBA
  • EU Safe Harbor


End Result

Data Discovery assessments provide an excellent way to identify data subject to compliance frameworks and standards, and ensure the security controls and need-to-know access surrounds the critical business operations and processes.  Data Discovery should lead to and build a solid foundation to properly develop Data Classification, Data Security Controls, Storage and Destruction Controls, and Incident Impact Plans.  The next blog in this series will discuss precisely how Data Discovery is the foundation for larger data security controls and standards.

Continue Reading

Data Discovery – Part 2

Data Discovery – Part 3