Chapter 9

Security Operations

IN THIS CHAPTER

Bullet Understanding and complying with investigations

Bullet Conducting logging and monitoring activities

Bullet Performing configuration management and applying foundational security operations concepts

Bullet Applying resource protection and conducting incident management

Bullet Operating and maintaining detective and preventive measures

Bullet Implementing and supporting patch and vulnerability management

Bullet Understanding and participating in change management processes

Bullet Implementing and testing recovery strategies and disaster recovery processes

Bullet Participating in business continuity planning and exercises

Bullet Implementing and managing physical security

Bullet Addressing personnel safety and security concerns

The Security Operations domain covers lots of essential security concepts and builds on many of the other security domains, including Security and Risk Management (Chapter 3), Asset Security (Chapter 4), Security Architecture and Engineering (Chapter 5), and Communication and Network Security (Chapter 6). Security operations represents routine operations that occur across many of the CISSP domains. This domain represents 13 percent of the CISSP certification exam.

Understand and Comply with Investigations

Conducting investigations for various purposes is an important function for security professionals. You must understand evidence collection and handling procedures, reporting and documentation requirements, various investigative processes, and digital forensics tools and techniques. Successful conclusions in investigations depend heavily on proficiency in these skills.

Crossreference This section covers Objective 7.1 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Evidence collection and handling

Evidence is information presented in a court of law to confirm or dispel a fact that’s under contention, such as the commission of a crime, the violation of policy, or an ethics matter. A case can’t be brought to trial or other legal proceeding without sufficient evidence to support the case. Thus, gathering and protecting evidence properly is one of the most important and most difficult tasks that an investigator must master.

Important evidence collection and handling topics covered on the CISSP exam include the types of evidence, rules of evidence, admissibility of evidence, chain of custody, and the evidence life cycle.

Types of evidence

Sources of legal evidence that you can present in a court of law generally fall into one of four major categories:

· Direct evidence: Oral testimony or a written statement based on information gathered through a witness’s five senses (in other words, an eyewitness account) that proves or disproves a specific fact or issue.

· Real (or physical) evidence: Tangible objects from the actual crime, such as the tools or weapons used and any stolen or damaged property; may also include visual or audio surveillance tapes generated during or after the event. Physical evidence from a computer crime is not always available.

· Documentary evidence: Includes originals and copies of business records, computer-generated and computer-stored records, manuals, policies, standards, procedures, and log files. Most evidence presented in a computer crime case is documentary evidence. The hearsay rule (which we discuss in “Hearsay rule” later in this chapter) is an extremely important test of documentary evidence that must be understood and applied to this type of evidence.

· Demonstrative evidence: Used to aid the court’s understanding of a case. Opinions are considered to be demonstrative evidence and may be expert (based on personal expertise and facts) or nonexpert (based on facts only). Other examples of demonstrative evidence include models, simulations, charts, and illustrations.

Other types of evidence that may fall into one or more of the major categories include

· Best evidence: Original, unaltered evidence, which courts prefer over secondary evidence. Read more about this evidence in “Best evidence rule” later in this chapter.

· Secondary evidence: A duplicate or copy of evidence, such as a tape backup, screen capture, or photograph.

· Corroborative evidence: Evidence that supports or substantiates other evidence presented in a case.

· Conclusive evidence: Incontrovertible and irrefutable evidence — you know, the smoking gun.

· Circumstantial evidence: Relevant facts that you can’t directly or conclusively connect to other events, but about which a reasonable person can make a reasonable inference.

Rules of evidence

Important rules of evidence for computer crime cases include the best evidence rule and the hearsay evidence rule. The CISSP candidate must understand both of these rules and their applicability to evidence in computer crime cases.

BEST EVIDENCE RULE

The best evidence rule, defined in the U.S. Federal Rules of Evidence, states that “to prove the content of a writing, recording, or photograph, the original writing, recording, or photograph is [ordinarily] required.”

The Federal Rules of Evidence, however, define an exception to this rule as “[i]f data are stored in a computer or similar device, any printout or other output readable by sight, shown to reflect the data accurately, is an ‘original.’”

Thus, data extracted from a computer — if that data is a fair and accurate representation of the original data — satisfies the best evidence rule and may be introduced into court proceedings as such.

HEARSAY RULE

Hearsay evidence is evidence that’s not based on personal, firsthand knowledge of a witness but comes from other sources. Under the Federal Rules of Evidence, hearsay evidence normally is not admissible in court. This rule exists to prevent unreliable testimony from improperly influencing the outcome of a trial.

Business records, including computer records, have traditionally, and perhaps mistakenly, been considered hearsay evidence by most courts because these records cannot be proved to be accurate and reliable. One of the most significant obstacles for a prosecutor to overcome in a computer crime case is getting computer records admitted as evidence.

Tip A prosecutor may be able to introduce computer records as best evidence, rather than hearsay evidence.

Several courts have acknowledged that the hearsay rules are applicable to computer-stored records containing human statements but are not applicable to computer-generated records untouched by human hands.

Perhaps the most successful and commonly applied test of admissibility for computer records, in general, has been the business records exception, established in the Federal Rules of Evidence for records of regularly conducted activity that meet the following criteria:

· Made at (contemporaneously) or near the time when the act occurred

· Made by a person who has knowledge of the business process or from information transmitted by a person who has knowledge of the business process

· Made and relied on during the regular conduct of business or in the furtherance of the business, as verified by the custodian or other witness who is familiar with the records’ use

· Kept for motives that tend to ensure their accuracy

· In the custody of the witness on a regular basis (as required by the chain of evidence)

Tip The chain of evidence establishes accountability for the handling of evidence throughout the evidence life cycle. See “Chain of custody and the evidence life cycle” later in this chapter.

Admissibility of evidence

Because computer-generated evidence can be easily manipulated, altered, or tampered with, and because it’s not easily and commonly understood, this type of evidence is usually considered to be suspect in a court of law. To be admissible, evidence must be

· Relevant: It must tend to prove or disprove facts that are relevant and material to the case.

· Reliable: It must be reasonably proved that what is presented as evidence is what was originally collected and that the evidence itself is reliable. This proof is established in part through proper evidence handling and the chain of custody. (We discuss this topic in the upcoming section “Chain of custody and the evidence life cycle.”)

· Legally permissible: It must be obtained through legal means. Evidence that’s not legally permissible may include evidence obtained through the following means:

· Illegal search and seizure: Law enforcement personnel must obtain a court order. But non–law enforcement personnel, such as a supervisor or system administrator, may be able to conduct an authorized search under some circumstances.

· Illegal wiretaps or phone taps: Anyone conducting wiretaps or phone taps must obtain a court order.

· Entrapment or enticement: Entrapment encourages a person to commit a crime that they may have had no intention of committing. Conversely, enticement lures a person toward certain evidence (a honeypot, if you will) after they have already committed a crime. Enticement isn’t necessarily illegal, but it does raise certain ethical arguments and may not be admissible in court.

· Coercion: Coerced testimony or confessions are not legally permissible. Coercion involves compelling a person to provide evidence involuntarily through the use of threats, violence (torture), bribery, trickery, or intimidation.

· Unauthorized or improper monitoring: Active monitoring must be properly authorized and conducted in a standard manner; users must be notified that they may be subject to monitoring.

Chain of custody and the evidence life cycle

The chain of custody (or chain of evidence) provides accountability and protection for evidence throughout its entire life cycle and includes the following information, which is normally kept in an evidence log:

· People involved (who): Identify any and all people who discovered, collected, seized, analyzed, stored, preserved, transported, or otherwise controlled the evidence; also identify any witnesses or other people who were present during any of these activities

· Description of evidence (what): Ensure that all evidence is described completely and uniquely

· Location of evidence (where): Provide specific information about the evidence’s location when it is discovered, analyzed, stored, or transported

· Date/time (when): Record the date and time when evidence is discovered, collected, seized, analyzed, stored, or transported; also record date and time information for any evidence log entries associated with the evidence

· Methods used (how): Provide specific information about how evidence was discovered, collected, stored, preserved, or transported

Any time evidence changes possession or is transferred to a different media type, it must be recorded properly in the evidence log to maintain the chain of custody.

Law enforcement officials must strictly adhere to chain-of-custody requirements, and this adherence is highly recommended for anyone else who is involved in collecting or seizing evidence. Security professionals and incident response teams must fully understand and follow chain-of-custody principles and procedures, no matter how minor or insignificant a security incident may initially appear to be. In both cases, chain of custody serves to prove that digital evidence has not been modified at any point in the forensic examination and analysis.

Even properly trained law enforcement officials sometimes make crucial mistakes in evidence handling and safekeeping. Most attorneys won’t understand the technical aspects of the evidence that you may present in a case, but they will definitely know evidence-handling rules and will most certainly scrutinize your actions in this area. Improperly handled evidence, no matter how conclusive or damaging, will likely be inadmissible in a court of law.

The evidence life cycle describes the various phases of evidence, from its initial discovery to its final disposition. The evidence life cycle has the following five stages:

· Collection and identification

· Analysis

· Storage, preservation, and transportation

· Presentation in court

· Final disposition, such as return to owner or destroy (for copies)

The following sections explain more about these stages.

COLLECTION AND IDENTIFICATION

Collecting evidence involves taking that evidence into custody. Unfortunately, evidence can’t always be collected and must instead be seized. Many legal issues are involved in seizing computers and other electronic evidence. The publication Searching and Seizing Computers and Obtaining Evidence in Criminal Investigations, 3rd Edition (2009), published by the U.S. Department of Justice Computer Crime and Intellectual Property Section, provides comprehensive guidance on this subject. This publication is available for download at https://www.justice.gov/sites/default/files/criminal-ccips/legacy/2015/01/14/ssmanual2009.pdf.

In general, law enforcement officials can search and/or seize computers and other electronic evidence under any of four circumstances:

· Voluntary or consensual: The owner of the computer or electronic evidence can freely surrender the evidence.

· Subpoena: A court issues a subpoena to a person, ordering that person to deliver the evidence to the court.

· Search warrant or Anton Piller order: A search warrant is issued to a law enforcement official by the court, allowing that official to search and seize specific evidence. An Anton Piller order allows the premises to be searched and evidence seized without warning, usually to prevent possible destruction of evidence.

· Exigent circumstances: If probable cause exists and the destruction of evidence is imminent, that evidence may be searched or seized without a warrant.

When evidence is collected, it must be marked and identified properly to ensure that it can be presented in court properly as actual evidence gathered from the scene or incident. The collected evidence must be recorded in an evidence log with the following information:

· A description of the piece of evidence, including specific information such as make, model, serial number, physical appearance, material condition, and preexisting damage

· The name(s) of the person or people who discovered and collected the evidence

· The exact date and time, specific location, and circumstances of the discovery/collection

Additionally, the evidence must be marked according to the following guidelines:

· Mark the evidence. If possible, without damaging the evidence, mark the piece of evidence with the collecting person’s initials, the date, and the case number (if known). Seal the evidence in an appropriate container, and again mark the container with the same information.

· Use an evidence tag. If the actual evidence cannot be marked, attach an evidence tag with the information in the preceding item, seal the evidence and tag it in an appropriate container, and mark the container with the same information.

· Seal the evidence. Seal the container with evidence tape, and mark the tape in a manner that will clearly indicate any tampering or altering of the evidence.

· Protect the evidence. Use extreme caution when collecting and marking evidence to ensure that it’s not damaged. If you’re using plastic bags for evidence containers, make sure that they’re static-free to protect magnetic media.

Always collect and mark evidence in a consistent manner so that you can easily identify evidence and describe your collection and identification techniques to an opposing attorney in court, if necessary.

ANALYSIS

Analysis involves examining the evidence for information pertinent to the case. Analysis should be conducted with extreme caution — and only by experienced, properly trained personnel — to ensure the evidence is not altered, damaged, or destroyed.

STORAGE, PRESERVATION, AND TRANSPORTATION

All evidence must be stored properly in a secure facility and preserved to prevent damage or contamination from various hazards, including intense heat or cold, extreme humidity, water, magnetic fields, and vibration. Evidence that’s not properly protected may be inadmissible in court, and the party responsible for collection and storage may be liable. Care must also be exercised during transportation to ensure that evidence is not lost, temporarily misplaced, damaged, or destroyed.

PRESENTATION IN COURT

Evidence to be presented in court must continue to follow the chain of custody and be handled with the same care as at all other times in the evidence life cycle. This process continues throughout the trial until all testimony related to the evidence is completed and the trial has concluded, or the case is settled or dismissed.

FINAL DISPOSITION

After the conclusion of the trial or other disposition, evidence is normally returned to its proper owner. Under some circumstances, however, certain evidence may be ordered destroyed, such as contraband, drugs, or drug paraphernalia. Any evidence obtained through a search warrant is legally under the control of the court, possibly requiring the original owner to petition the court for its return.

Reporting and documentation

As described in the preceding section, complete and accurate recordkeeping is critical to each investigation. An investigation’s report is intended to be a complete record of an investigation and usually includes the following:

· Incident investigators, including their qualifications and contact information

· Names of the parties interviewed, including their roles, involvement, and contact information

· List of all evidence collected, including chain(s) of custody

· Tools used to examine or process evidence, including versions

· Samples and sampling methodologies used, if applicable

· Computers used to examine, process, or store evidence, including a description of configuration

· Root-cause analysis of incident, if applicable

· Conclusions and opinions of investigators

· Hearings or proceedings

· Parties to whom the report is delivered

Investigative techniques

An investigation should begin immediately upon report of an alleged computer crime, policy violation, or incident. Any incident should be handled, at least initially, as a computer crime investigation or policy violation until a preliminary investigation determines otherwise. Different investigative techniques may be required, depending on the goal of the investigation or applicable laws and regulations. Incident handling, for example, requires expediency to contain any potential damage as quickly as possible. A root-cause analysis requires in-depth examination to determine what happened, how it happened, and how to prevent the same thing from happening again. In all cases, proper evidence collection and handling are essential. Even if a preliminary investigation determines that a security incident was not the result of criminal activity, you should always handle any potential evidence properly, in case further legal proceedings are anticipated or a crime is uncovered during the course of a full investigation. The CISSP candidate should be familiar with the general steps of the investigative process:

1. Detect and contain an incident.

Early detection is critical to a successful investigation. Unfortunately, computer-related incidents usually involve passive or reactive detection techniques (such as the review of audit trails and accidental discovery), which often leave a cold evidence trail. Containment minimizes further loss or damage. The computer incident response team (CIRT), which we discuss later in this chapter, normally is responsible for conducting an investigation. The team should be notified or activated as quickly as possible after a computer crime is detected or suspected.

2. Notify management.

Management must be notified of any investigations as soon as possible. Knowledge of the investigations should be limited to as few people as possible and on a need-to-know basis. Out-of-band communication methods (reporting in person) should be used to ensure that an intruder does not intercept sensitive communications about the investigation.

3. Conduct a preliminary investigation.

This preliminary investigation determines whether an incident or crime actually occurred. Most incidents turn out to be honest mistakes rather than malicious conduct. This step includes reviewing the complaint or report, inspecting damage, interviewing witnesses, examining logs, and identifying further investigation requirements.

4. Determine whether the organization should disclose that the crime occurred.

First, and most important, determine whether laws or regulations require the organization to disclose a crime or incident. Next, by coordinating with a public relations or public affairs official of the organization, determine whether the organization wants to disclose this information.

5. Conduct the investigation.

Conducting the investigation involves three activities:

1. Identify potential suspects.

Potential suspects include organization insiders and outsiders. One standard discriminator that helps identify and eliminate potential suspects is the MOM test: Did the suspect have the motive, opportunity, and means? The motive might relate to financial gain, revenge, or notoriety. A suspect had opportunity if they had access, whether as an authorized user for an unauthorized purpose or as an unauthorized user (due to a security weakness or vulnerability) for an unauthorized purpose. Means relates to whether the suspect had the necessary tools and skills to commit the crime.

2. Identify potential witnesses.

Determine whom you want to interview and who should conduct the interviews. Be careful not to alert any potential suspects to the investigation; focus on obtaining facts, not opinions, in witness statements.

3. Prepare for search and seizure.

Identify the types of systems and evidence that you plan to search or seize, designate and train the search and seizure team members (normally, members of the CIRT), obtain and serve proper search warrants (if required), and determine the potential risk to the system during a search-and-seizure effort.

6. Report your findings.

The results of the investigation, including evidence, should be reported to management and turned over to proper law enforcement officials or prosecutors as appropriate.

Remember MOM stands for motive, opportunity, and means. Motive refers to the reason or incentive for a suspect to commit a crime, such as financial gain or revenge. Opportunity refers to the suspect having a chance or opening to commit the crime — for example, if they had access (either authorized or unauthorized) to a system that was breached. Means refers to a suspect having the ability to commit a crime — for example, they had the skills and tools to commit the crime.

Digital forensics tools, tactics, and procedures

Digital forensics is the science of conducting a computer incident investigation to determine what has happened and who is responsible, and to collect legally admissible evidence for use in subsequent legal proceedings, such as a criminal investigation, internal investigation, or lawsuit.

Proper forensic analysis and investigation requires in-depth knowledge of hardware (such as endpoint devices and networking equipment), operating systems (including desktop, server, mobile device, and other device operating systems, such as routers, switches, and load balancers), applications, databases, and software programming languages, as well as knowledge and experience using sophisticated forensics tools and tool kits.

The types of forensic data-gathering techniques include

· Hard drive forensics: Specialized tools are used to create one or more forensically identical copies of a computer’s hard drive. A device called a write blocker is typically used to prevent any possible alterations to the original drive. Cryptographic checksums can be used to verify that a forensic copy is an exact duplicate of the original. Then tools are used to examine the contents of the hard drive to determine the following:

· Last known state of the computer

· History of files accessed

· History of files created

· History of files deleted

· History of programs executed

· History of websites visited by a browser

· History of attempts by the user to remove evidence

· Live forensics: Specialized tools are used to examine a running system, including

· Running processes

· Currently open files

· Contents of main storage (RAM)

· Keystrokes

· Communications traffic in and out of the computer

Live forensics are difficult to perform because the tools used to collect information can affect the system being examined.

Artifacts

Key artifacts that may be collected during an investigation may include computers, mobile devices, servers (physical or virtual), network equipment (such as routers and switches), and security equipment (such as a firewall). These artifacts may contain indicators of compromise (IoC) that can be preserved as evidence to support an investigation.

Conduct Logging and Monitoring Activities

Event logging is an essential part of an organization’s IT operations. Increasingly, organizations are implementing centralized log collection systems that often serve as security information and event management (SIEM) platforms.

Crossreference This section covers Objective 7.2 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Intrusion detection and prevention

Intrusion detection is a passive technique used to detect unauthorized activity on a network. An intrusion detection system is frequently called an IDS. Three types of IDSes used today are

· Network-based: Consists of a separate device attached to a network that listens to all network traffic by using various methods (which we describe later in this section) to detect anomalous activity

· Host-based: A subset of network-based IDS in which only the network traffic destined for a particular host is monitored

· Wireless: Another type of network intrusion detection that focuses on wireless intrusion by scanning for rogue access points

Both network- and host-based IDSes use a couple of methods:

· Signature-based: A signature-based IDS compares network traffic that is observed with a list of patterns in a signature file. A signature-based IDS detects any of a known set of attacks, but if an intruder is able to change the patterns that they use in the attack, the attack may be able to slip by the IDS without being detected. The other downside of signature-based IDS is that the signature file must be updated frequently.

· Reputation-based: Closely akin to signature-based alerting, reputation-based alerting detects when communications and other activities involve known-malicious domains and IP networks. Some IDSes update themselves several times daily, including adding to a list of known-malicious domains and IP addresses. Then, when any activities are associated with a known-malicious domain or IP address, the IDS can create an alert that lets personnel know about the activity.

· Anomaly-based: An anomaly-based IDS monitors all the traffic over the network and builds traffic profiles. Over time, the IDS will report deviations from the profiles that it has built. The upside of anomaly-based IDSes is that there are no signature files to update periodically. The downside is that you may have a high volume of false positives. Behavior-based and heuristics-based IDSes are similar to anomaly-based IDSes and have many of the same advantages. Rather than detect anomalies in normal traffic patterns, behavior-based and heuristics-based systems attempt to recognize and remember potential attack patterns.

Intrusion detection doesn’t stop intruders, but intrusion prevention does … or at least, it slows them down. Intrusion prevention systems (IPSes) are newer and more common than IDSes, and IPSes are designed to detect and block intrusions. An IPS is simply an IDS that can take action, such as dropping a connection or blocking a port, when an intrusion is detected.

Remember Intrusion detection looks for known attacks and/or anomalous behavior on a network or host.

See Chapter 6 for more on IDSes and IPSes.

Security information and event management

Security information and event management (SIEM) solutions provide real-time collection, analysis, correlation, and presentation of security logs and alerts generated by various network sources (such as firewalls, IDSes/IPSes, routers, switches, servers, and workstations).

A SIEM solution can be software- or appliance-based, and may be hosted and managed internally or by a managed security service provider.

A SIEM requires a lot of up-front configuration and tuning, so only the most important, actionable events are brought to the attention of staff members in the organization. The effort is worthwhile, however: A SIEM combs through millions or billions of events daily and presents only the most important few actionable events so that security teams can take appropriate action.

Many SIEM platforms also have the ability to accept threat intelligence feeds from various vendors, including the SIEM manufacturers. This approach permits the SIEM to adjust its detection and blocking capabilities automatically for the most up-to-date threats.

Security orchestration, automation, and response

A security orchestration, automation, and response (SOAR) solution takes a SIEM one step further through the automation of repeatable tasks as a result of an event that has been detected. A SIEM might produce an alert regarding suspicious activity originating from the Internet, and an analyst might investigate the alert by performing various tasks to obtain additional information about the originating system or the contents of the alert. Such investigating may take several minutes, during which time an intruder may be performing reconnaissance or stealing information. With SOAR, these steps may take only seconds, providing the analyst key actionable information. A SOAR platform can also direct response steps, such as black-holing a domain or IP address, locking a user or administrator account, or enacting a firewall rule to contain an intrusion. In other cases, a SOAR can prepare for action and perform that action when an analyst approves it.

Continuous monitoring

Continuous monitoring technology collects and reports security data in near real time. Continuous monitoring components may include

· Discovery: Ongoing inventory of network and information assets, including hardware, software, and sensitive data

· Assessment: Automatic scanning and baselining of information assets to identify and prioritize vulnerabilities

· Threat intelligence: Feeds from one or more outside organizations that produce high-quality, actionable data

· Audit: Nearly real-time evaluation of device configurations and compliance with established policies and regulatory requirements

· Scanning: Automatic scanning of systems and networks to discover new vulnerabilities

· Patching: Automatic security patch installation and software updating

· Reporting: Aggregating, analyzing, and correlating log information and alerts

Egress monitoring

Egress monitoring (or extrusion detection) is the process of monitoring outbound traffic to discover potential data leakage (or loss). Modern cyberattacks employ various stealth techniques to avoid detection as long as possible for the purpose of data theft. These techniques may include the use of encryption, such as secure sockets layer/transport layer security (SSL/TLS) and steganography (discussed in Chapter 4).

Data loss prevention (DLP) systems are often used to detect the exfiltration of sensitive data, such as personally identifiable information (PII) or protected health information (PHI)in email messages, data uploads, PNG or JPEG images, and other forms of communication. These technologies often perform deep packet inspection to decrypt and inspect outbound traffic that is TLS encrypted.

DLP systems can also be used to disable the use of removable media drive interfaces on servers and workstations, as well as to encrypt data written to removable media so that only systems with the same organization’s DLP agent can read the contents of the removable media drive.

Static DLP tools are used to discover sensitive and proprietary data in databases, file servers, and other data storage systems.

Log management

Log data is (or should be) collected from practically every application, server, network, and security device, and user device in an organization’s environment. Without complete log data, it can be nearly impossible to determine exactly what happened in the event of an attack.

To the greatest extent possible, log information should be synchronized to a network time server to ensure that log data from disparate sources can be correlated accurately. Logs should be stored centrally and securely to ensure that the data collected is immutable and can be readily ingested into various security analytics platforms, SIEM solutions, and other security tools for log aggregation, analysis, and correlation. Appropriate retention periods for log information should also be defined and implemented based on legal or regulatory compliance requirements.

Threat intelligence

Threat intelligence involves collecting and analyzing data about attempted or successful intrusions. Global threat intelligence is usually provided automatically to subscriber organizations via a threat intelligence feed. You might subscribe to numerous threat feeds from your firewall, endpoint protection (antimalware), and other security vendors that collect and analyze threat information from numerous sources, typically including their globally deployed customer bases. Security teams use the information from these threat intelligence feeds to proactively hunt and identify active threats and bad actors within their organization’s environment.

A key element in threat intel feeds is the indicator of compromise (IoC), which is digital information that an organization can use to determine whether specific types of intrusion or malware are present in an organization’s environment. An IoC typically takes the form of a virus signature, IP address, MD5 hash of a malware file, URL, or domain name. Security analysts may use tools to proactively search for IoCs in an activity known as threat hunting.

Machine-readable threat intel feeds use any of several formats, including CSV (comma-separated values), STIX (Structured Threat Information Expression), XML (Extensible Markup Language), JSON (JavaScript Object Notation), OpenIOC (Open Indicators of Compromise), and TAXII (Trusted Automated Exchange of Indicator Information).

Tip Threat intel tools enable an organization to detect the tactics, techniques, and procedures that threat actors use to attack networks and systems.

User and entity behavior analysis

User and entity behavior analysis (UEBA) is used in various security tools to create a baseline of “normal” user, device, or other activity on the network. Anomalous behavior that deviates from this baseline, beyond a defined threshold, may be an indicator of threat activity and can be alerted on or trigger an automated response. A user who normally logs in between 9 a.m. and 5 p.m. on the East Coast of the United States, for example, may be required to use an authorized (managed or domain-joined) device and multifactor authentication (MFA) if they attempt to log into the network after 10 p.m. from Europe. The user may, in fact, be on a business trip and therefore legitimately needs to log in outside their “normal” hours, in which case they can still log in with some additional precautions to verify their identity. Or they may be sound asleep at home in New York, and the login attempt is coming from a threat actor somewhere in Europe, in which case the additional precautions will help prevent a successful intrusion using the unsuspecting user’s credentials.

Perform Configuration Management

An organization’s information architecture changes all the time. As a result, its security posture changes all the time. Provisioning and decommissioning various information resources can have significant effects, both direct and indirect, on the organization’s security posture. An application, for example, might directly introduce new vulnerabilities into an environment or integrate with a database in a way that compromises the integrity of the database. Or a system administrator might create a new virtual machine (VM) in a public cloud environment and forget a step in the VM’s security settings.

Crossreference This section covers Objective 7.3 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Security planning and analysis must be an integral part of every organization’s resource provisioning processes, as well as throughout the life cycle of all resources. Important security considerations include

· Provisioning: Security should be consulted any time the organization is considering introducing new equipment, such as a Wi-Fi access point or network router from a manufacturer whose products have not previously been deployed in the environment. This approach ensures that security can assess any known risks associated with the new equipment and its impact on the organization’s overall security posture.

· Asset management (or inventory): Maintaining a complete, accurate inventory is critical to ensure that all potential vulnerabilities and risks in an environment can be identified, assessed, and addressed. Indeed, so many other critical security processes depend on sound asset inventory that asset inventory is one of the most important (if most mundane) activities in IT organizations.

Tip Asset management is covered in the first two controls of the Center for Internet Security (CIS) 20 Controls and the first category of the first (Identify) core function of the National Institute of Standards and Technology’s (NIST) Cybersecurity Framework (CSF). Asset management appears first in these and other)important frameworks because it is fundamental to most other security controls.

· Baselining: Establishing a baseline helps security teams tune security events and alerts that are received and can also be used to feed user and entity behavior capabilities (discussed earlier in this chapter) in security tools deployed throughout the environment.

· Change management: Change management processes are used to strictly control changes to systems in production environments so that only duly requested and approved changes are made.

· Configuration management: Configuration management processes need to be implemented and strictly enforced to ensure that information resources are operated in a safe and secure manner. Organizations typically implement an automated configuration management database (CMDB) that is part of a system configuration management system used to manage asset inventory data. Often, this database is also used to manage the configuration history of systems.

· Physical assets: Physical assets must be protected against loss, damage, or theft. Valuable or sensitive data stored on a physical asset may far exceed the value of the asset itself.

· Virtual assets: VM sprawl has increasingly become an issue for organizations with the popularity of virtualization technology and software defined networks (SDN). VMs can be (and often are) provisioned in a matter of minutes but aren’t always properly decommissioned when they are no longer needed. Dormant VMs aren’t always backed up and can go unpatched for many months, exposing the organization to increased risk from unpatched security vulnerabilities.

Of particular concern to security professionals is the implementation of VMs without proper review and approvals. This problem didn’t exist before virtualization, as organizations had other checks and balances in place to prevent the implementation of unauthorized systems (namely, the purchasing process). But VMs can be implemented unilaterally, often without the knowledge or involvement of other personnel within the organization.

· Cloud assets: As more organizations adopt cloud strategies, including Software as a Service (SaaS), Platform as a Service (PaaS), and Infrastructure as a Service (IaaS) solutions, it’s important to keep track of these assets. Ultimately, an organization is responsible for the security and privacy of its applications and data — not the cloud service provider. Issues of data residency and transborder data flow need to be considered.

A new class of security tools known as cloud access security brokers (CASB) can detect access to and use of cloud-based services. These tools give the organization more visibility into its sanctioned and unsanctioned use of cloud services. Many of these systems, in cooperation with cloud services, can be used to control the use of cloud services.

· Applications: This category includes commercial and custom applications, private clouds, web services, SaaS products, and the interfaces and integrations among application components. Securing the provisioning of these assets requires strict access controls; only designated administrators should be able to deploy and configure them.

· Automation: IT organizations, including DevOps teams, increasingly use automation and orchestration to deploy information resources at scale. This massive level of activity requires coordination with security teams to ensure that information resources can be accounted for properly and security risks identified. It is increasingly common, for example, for an application deployed in a microservices architecture to provision and deprovision thousands of containerized ephemeral microservices, deployed across a multi-cloud environment, with a life span of a few minutes or even seconds.

Apply Foundational Security Operations Concepts

Fundamental security operations concepts that CISSP candidates need to understand and manage well include the principles of need to know and least privilege, separation of duties and responsibilities, monitoring of special privileges, job rotation, information life cycle management, and service-level agreements.

Crossreference This section covers Objective 7.4 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Need-to-know and least privilege

The need-to-know concept states that only people with a valid business justification should have access to specific information or functions. In addition to having a need to know, a person must have an appropriate security clearance level to be granted access. Conversely, a person who has the appropriate security clearance level but no need to know should not be granted access.

One of the most difficult challenges in managing need to know is the use of controls that enforce it. Also, information owners need to be able to distinguish I need to know from I want to know, I want to feel important, and I’m just curious.

Need-to-know is closely related to the concept of least privilege and can help organizations implement least privilege in a practical manner.

The principle of least privilege states that people should have the capability to perform only the tasks (or have access to only the data) required for their primary jobs — no more.

Giving a person more privileges and access than required invites trouble. Offering the capability to perform more than the job requires may become a temptation that results, sooner or later, in an abuse of privilege.

Giving a user full permissions on a network share, for example, rather than just read and modify rights to a specific directory, opens the door not only to abuse of those privileges (such as reading or copying other sensitive information on the network share), but also to costly mistakes (such as accidentally deleting a file or the entire directory). As a starting point, organizations should approach permissions with a “deny all” mentality and add needed permissions as required.

Tip Least privilege is closely related to separation of duties and responsibilities, described in the following section. Distributing the duties and responsibilities for a given job function among several people means that those people require fewer privileges on a system or resource.

Remember The principle of least privilege states that people should have the fewest privileges necessary to allow them to perform their tasks.

Several important concepts associated with need to know and least privilege include

· Entitlement: When a new user account is provisioned in an organization, the permissions granted to that account must be appropriate for the level of access required by the user. In too many organizations, human resources simply instructs the IT department to give a new user “whatever so-and-so (another user in the same department) has access to.” Instead, entitlement needs to be based on the principle of least privilege.

· Aggregation: When people transfer between jobs and/or departments within an organization (see “Job rotation” later in this chapter), they often need different access and privileges to do their new jobs. Far too often, organizational security processes do not adequately ensure that access rights that are no longer required are actually revoked. Instead, people accumulate privileges, and over a period of many years, an employee can have far more access and privileges than they actually need. This situation is known as aggregation, and it’s the antithesis of least privilege. Privilege creep and accumulation of privileges are common terms.

· Transitive trust: Trust relationships (in the context of security domains) are often established within and between organizations to facilitate ease of access and collaboration. A trust relationship enables subjects (such as users or processes) in one security domain to access objects (such as servers or applications) in another security domain. (See chapters 5 and 7 for more about objects and subjects.) A transitive trust extends access privileges to the subdomains of a security domain, analogous to inheriting permissions to subdirectories within a parent directory structure. Instead, a nontransitive trust should be implemented by requiring access to each subdomain to be granted explicitly based on the principle of least privilege rather than inherited.

Separation of duties and responsibilities

The concept of separation (or segregation) of duties (SoD) and responsibilities ensures that no single person has complete authority and control of a critical system or process. This practice promotes security in the following ways:

· Reduces opportunities for fraud or abuse: For fraud or abuse to occur, two or more people must collude or be complicit in the performance of their duties.

· Reduces high-impact mistakes: Because two or more people perform the process, mistakes are less likely to occur, or are more quickly detected and corrected.

· Reduces dependence on a single person: Critical processes are accomplished by groups of people. Multiple people should be trained on different parts of the process (such as through job rotation, discussed in the following section) to help ensure that the absence of one person doesn’t unnecessarily delay or impede the successful completion of a step in the process.

Here are some common examples of separation of duties and responsibilities within organizations:

· A bank assigns the first three numbers of a six-number safe combination to one employee and the second three numbers to another employee. A single employee isn’t permitted to have all six numbers, so a lone employee is unable to gain access to the safe and its contents.

· An accounting department might separate record entry and internal auditing functions or accounts payable and check disbursing functions.

· A system administrator is responsible for setting up new accounts and assigning permissions, which a security administrator then verifies.

· A programmer develops software code, but a separate person is responsible for testing and validation, and yet another person is responsible for loading the code on production systems.

· Destruction of classified materials may require two people to complete or witness the destruction.

· Disposal of assets may require an approval signature by the office manager and verification by building security.

In smaller organizations, separation of duties and responsibilities can be difficult to implement because of limited personnel and resources.

Privileged account management

Privileged entity controls are the mechanisms, generally built into computer operating systems and network devices, that provide and monitor privileged access to hardware, software, and data. In Unix and Windows, the controls that permit privileged functions reside in the operating system. Operating systems for servers, desktop computers, and many other devices use the concept of modes of execution to define privilege levels for various user accounts, applications, and processes that run on a system. For instance, the Unix root account and Windows Server Enterprise, Domain, and Local Administrator account roles have elevated rights that allow those accounts to install software, view the entire file system and in some cases access the OS kernel and memory directly.

Specialized tools are used to monitor and record activities performed by privileged and administrative users. This approach helps ensure accountability on the part of each administrator and aids in troubleshooting through the ability to view actions performed by administrators.

System or network administrators typically use privileged accounts to perform operating system and utility management functions. Supervisor or Administrator mode should be used only for system administration purposes. Unfortunately, many organizations allow system and network administrators to use these privileged accounts or roles as their normal user accounts even when they aren't doing work that requires this level of access. Yet another horrible security practice allows administrators to share a single administrator or root account.

Privileged access management (PAM) solutions help security teams organize administrative accounts and service accounts in an environment. Some PAM solutions can permit an administrator to check out temporary privileged access credentials to access a network device or operating system and can even record the administrative session. Although administrators may feel threatened by this level of monitoring, they should remember that a PAM solution that records their session can help exonerate them by implicitly proving that they did not perform some malicious (or accidental) task.

Warning System or network administrators occasionally grant root or administrator privileges to normal applications as a matter of convenience, rather than spend the time to figure out exactly what privileges the application requires and then create an account role for the application with only those privileges. Allowing a normal application these privileges is a serious mistake, because applications that run in privileged mode bypass some or all security controls, which could lead to unexpected application behavior. Any user of a payroll application, for example, could view or change anyone's data if the application running in privileged mode was never told no by the operating system. Further, if an application running in privileged mode is compromised by an attacker, the attacker may inherit privileged access to the entire system.

Tip Hackers specifically target Supervisor and other privileged modes because those modes have a great deal of power over systems. The use of Supervisor mode should be limited wherever possible, especially on user workstations.

MONITORING (EVERYBODY'S SPECIAL!)

Monitoring the activities of an organization’s users, particularly those who have special (such as administrator) privileges, is an important security operations practice.

User monitoring can include casual or direct observation, analysis of security logs, inspection of workstation hard drives, random drug testing (in certain job functions and in accordance with applicable laws), audits of attendance and building access records, review of call logs and transcripts, and other activities.

User monitoring and its purposes should be fully addressed in an organization’s written policy manuals. Information systems should display a login warning that clearly informs the user that their activities may be monitored and for what purposes. The login warning should also clearly indicate who owns the information and information assets processed on the system or network and that the user has no expectation of privacy with regard to information stored or processed on the system. The login process should require users to affirmatively acknowledge the login warning by clicking OK or I Agree to gain access to the system.

An organization should conduct user monitoring in accordance with its written policies and applicable laws. Also, only personnel who are authorized to do so (such as security, legal, or human resources) should perform this monitoring, and only for authorized purposes. User and entity behavior analytics is a process that is helpful for detecting potential breaches, intrusions, or other malicious activity by using monitoring data to establish baselines of normal behavior or activity and analyzing anomalies.

Job rotation

Job rotation (or rotation of duties) is another effective security control that offers many benefits to an organization. Similar to the concept of separation of duties and responsibilities, job rotation involves regularly or randomly transferring key personnel to different positions or departments within an organization, with or without notice. Job rotations accomplish several important organizational objectives:

· Reduce opportunities for fraud or abuse: Regular job rotations can accomplish this objective in the following two ways:

· People hesitate to set up the means for periodically or routinely stealing corporate information because they know that they could be moved to another shift or task at almost any time.

· People don’t work with one another long enough to form collusive relationships that could damage the company.

· Eliminate single points of failure: By ensuring that numerous people within an organization or department know how to perform several job functions, an organization can reduce dependence on certain people and thereby eliminate single points of failure when a person is absent, incapacitated, no longer employed with the organization, or otherwise unavailable to perform a critical job function.

· Promote professional growth: Through cross-training opportunities, job rotations can help an employee’s professional growth and career development, and reduce monotony and/or fatigue.

Job rotations can also include changing workers’ workstations and work locations, which can also keep would-be saboteurs off balance and less likely to commit wrongful acts.

MANDATORY AND PERMANENT VACATIONS: JOB ROTATIONS OF A DIFFERENT SORT

Mandatory vacations and termination of employment are two important security operations topics that warrant a few paragraphs. You might think of a mandatory vacation as being a short (one- or two-week) job rotation and a termination as being a permanent vacation!

Requiring employees to take one or more weeks of their vacation in a single block of time gives an organization an opportunity to uncover potential fraud or abuse. Employees who engage in illegal or prohibited activities are sometimes reluctant to be away from the office, concerned that these activities will be discovered in their absence as a result of an actual audit or investigation or when someone else who performs that person’s normal day-to-day functions in their absence uncovers an irregularity. Less ominously, mandatory vacations may help in other ways:

· Reducing stress and therefore reducing opportunities for mistakes or coercion by others

· Discovering inefficient processes when a substitute performs a job function faster or discovers a better way to get something done

· Revealing single points of failure, shadow processes, and opportunities for job rotation (and separation of duties and responsibilities) when a process or job function idles because the only person who knows how to perform that function is lying on a beach somewhere

As with the practice of separation of duties, small organizations can have difficulty implementing job rotations.

Finally, it is vital to lock down or revoke local and remote access for a terminated employee as soon as possible, especially when the employee is being fired or laid off. The potential consequences associated with continued access by an angry employee are serious enough to warrant emergency procedures for immediate termination of access.

Service-level agreements

Users of business- or mission-critical information systems need to know whether their systems or services will function when they need them, and users need to know more than “Is it up?” or “Is it down again?” Their customers, and others, hold users accountable for getting their work done in a timely and accurate manner, so consequently, those users need to know whether they can depend on their systems and services to help them deliver as promised.

The service-level agreement (SLA) is a quasilegal document (a real legal document when it is included in or referenced by a contract) pledging that the system or service performs to a set of minimum standards, such as

· Hours of availability: The wall-clock hours that the system or service will be available for users, which could be 24 x 7 (24 hours per day, 7 days per week) or something more limited, such as daily from 4 a.m. to 12 p.m. Availability specifications may also cite maintenance windows (such as Sundays from 2–4 a.m.) when users can expect the system or service to be down for testing, upgrades, and maintenance.

· Average and peak number of concurrent users: The maximum number of users who can use the system or service at the same time.

· Transaction throughput: The number of transactions that the system or service can perform or support in a given period. Usually, throughput is expressed as transactions per second, per minute, or per hour.

· Transaction accuracy: The accuracy of transactions that the system or service performs. Generally, this figure is related to complex calculations (such as sales tax) and accuracy of location data.

· Data storage capacity: The amount of data that users can store in the system or service (such as cloud storage). Capacity may be expressed in raw terms (megabytes or gigabytes) or in numbers of transactions.

· Response times: The maximum periods of time (in seconds) that key transactions take. Response times for long processes such as nightly runs and batch jobs also should be covered in the SLA.

· Service desk response and resolution times: The amount of time (usually in hours) that a service or help desk will take to respond to requests for support and resolve any issues.

· Mean time between failures: The amount of time, typically measured in (thousands of) hours, that a component (such as a server hard drive) or system is expected to operate continuously before experiencing a failure.

· Mean time to restore service: The amount of time, typically measured in minutes or hours, that it is expected to take to restore a system or service to normal operation after a failure has occurred.

· Security incident response times: The amount of time (usually in hours or days) between the realization of a security incident and any required notifications to data owners and other affected parties, commonly known as dwell time.

· Escalation process during times of failure: When things go wrong, how quickly the service provider will contact the customer, as well as what steps the provider will take to restore service.

Remember Availability is one of the three tenets of information security (confidentiality, integrity, and availability, discussed in Chapter 3). Therefore, SLAs are important security documents.

Because an SLA is a quantified statement, the service provider and the user alike can take measurements to see how well the service provider is meeting the SLA’s standards. This measurement, which is sometimes accompanied by analysis, is frequently called a scorecard.

Tip Operational-level agreements (OLAs) and underpinning contracts (UCs) are important SLA supporting documents. An OLA is essentially an SLA between the interdependent groups that are responsible for the terms of the SLA, such as a service desk and the desktop support team. UCs are used to manage third-party relationships with entities that help support the SLA, such as an external service provider or vendor.

Finally, for an SLA to be meaningful, it needs to have teeth! How will the SLA be enforced, and what will happen when violations occur? What are the escalation procedures? Will any penalties or service credits be paid in the event of a violation? If so, how will penalties or credits be calculated?

Tip Internal SLAs and OLAs, such as those between an IT department and its users, typically don’t provide penalties or service credits for service violations. Internal SLAs are structured more as a commitment between IT and the user community and are useful for managing service expectations. Clearly defined escalation procedures (who gets notified of a problem; when, how, and when it goes up the chain of command) are critical in an internal SLA.

Tip SLAs rarely, if ever, provide meaningful financial penalties for service violations. An hour of Internet downtime might legitimately cost an e-commerce company $10,000 of business, for example, but most service providers typically only credit an equivalent to the amount paid for the lost hour of Internet service (a few hundred dollars). This amount may seem to be incredibly disproportionate, but consider things from the service provider’s perspective. That same credit has to be given to all customers that experienced the outage. Thus, an outage could cost the service provider hundreds of thousands of dollars. If service providers were legally obligated to reimburse every customer for their actual losses, it’s fair to guess that no one would be in the business of providing Internet service (or that an MPLS circuit would cost a few thousand dollars a month). Instead, look for such penalties as an early-termination clause that lets you get out of a long-term contract if your service provider repeatedly fails to meet its service-level obligations.

HOW MANY NINES?

Availability is often expressed in a percentage of uptime, usually in terms of “How many nines?” In other words, an application, server, or site may be available 99 percent of the time, 99.9 percent of the time, or as much as 99.999 percent of the time. Approximate amounts of downtime per year are shown in the following table:

Percentage

Number of Nines

Downtime Per Year (24/7/365)

99

Two

88 hours

99.9

Three

9 hours

99.99

Four

53 minutes

99.999

Five

5 minutes

Apply Resource Protection

Resource protection is the broad category of controls that protect information assets and information infrastructure.

Crossreference This section covers Objective 7.5 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Resources that require protection include

· Communications hardware and software: Routers, switches, firewalls, load balancers, IPSes, fax machines, virtual private network (VPN) servers, and so on, as well as the software that these devices use

· Computers and their storage systems: All corporate servers and client workstations, storage area networks, network-attached storage, direct-attached storage, near-line, and offline storage systems, cloud-based storage, and backup devices

· Business data: All stored information, such as financial data, sales and marketing information, personnel and payroll data, customer and supplier data, proprietary product or process data, and intellectual property

· System data: Operating systems, utilities, user IDs and password files, audit trails, and configuration files

· Backup media: Tapes, tape cartridges, removable disks, and offsite replicated disk systems

· Software: Application source code, programs, tools, libraries, vendor software, and other proprietary software

Media management

Media management refers to a broad category of controls that are used to manage information classification and physical media. Data classification refers to the tasks of marking information according to its sensitivity, as well as the subsequent handling, storage, transmission, and disposal procedures that accompany each classification level. Physical media is similarly marked; likewise, controls specify handling, storage, and disposal procedures. See Chapter 4 for more about data classification.

Sensitive information such as financial records, employee data, and information about customers must be clearly marked, properly handled and stored, and appropriately destroyed in accordance with established organizational policies, standards, and procedures:

· Marking: Marking is the process of affixing human-readable classification labels on documents and data files, whether those files are electronic or hard copy. A marking might read PRIVILEGED AND CONFIDENTIAL. See Chapter 4 for a detailed discussion of data classification.

· Tagging: Tagging is the process of affixing machine-readable classification labels on documents and data files.

· Handling: The organization should have established procedures for handling sensitive information. These procedures detail how employees can transport, transmit, and use such information, as well as any applicable restrictions.

· Protection: Protection involves two components:

· Physical protection of the actual media, such as locked cabinets and secured vehicles

· Logical protection of information on media, such as encryption

· Storage and backup: Similar to handling, the organization must have procedures and requirements specifying how sensitive information must be stored and backed up.

· Retention: Most organizations are bound by various laws and regulations to collect and store certain information, as well as keep it for minimum and/or maximum specified periods. An organization must be aware of legal requirements and ensure that it’s in compliance with all applicable regulations. Records retention policies should cover any electronic records that may be located on file servers, document management systems, databases, email systems, archives, and records management systems, as well as paper copies and backup media stored at offsite facilities. Organizations that want to retain information longer than required by law should firmly establish why such information should be kept longer. Nowadays, just having information can be a liability, so keeping information longer should be the exception rather than the norm.

· Destruction: Sooner or later, an organization must destroy sensitive information. The organization must have procedures detailing how to destroy sensitive information that was previously retained, whether the data is in hard-copy form or an electronic file.

Warning At the opposite end of the records retention spectrum, many organizations destroy records (including backup media) as soon as legally permissible to limit the scope and cost of any future discovery requests or litigation. Before implementing draconian retention policies that severely restrict your organization’s retention periods, you should fully understand the negative implications of a policy for your disaster recovery capabilities. Also, consult your organization’s legal counsel to ensure that you’re in full compliance with all applicable laws and regulations. Although extremely short retention policies and practices may be prudent for limiting future discovery requests or litigation, they’re illegal for limiting pending discovery requests or litigation (or even records that you have a reasonable expectation may become the subject of future litigation). In such cases, don’t destroy pertinent records; if you do, you go to jail. You go directly to jail! You don’t pass Go, you don’t collect $200, and (oh, yeah) you don’t pass the CISSP exam, either — or even remain eligible for CISSP certification.

Media protection techniques

Media protection techniques span a broad array of technologies and approaches, depending on the media. A mobile device may encrypt data and automatically back up to the cloud. An organization might also use mobile device management (MDM) software to enforce additional protections and controls on the device. Similarly, user endpoints such as desktop and laptop PCs may have a trusted platform module (TPM) chip installed to provide hardware-based encryption on the device (discussed in Chapter 5). The operating system may provide additional protections such as disk- or file-level encryption and permissions control. Data loss prevention (DLP) tools may be used to disable an endpoint device’s USB port to prevent data from being copied to a removable USB storage device. Servers and storage arrays employ media protection techniques that may include redundant array of independent disks (RAID) protection, snapshots, and replication. Removable media containing sensitive or valuable information may be placed in locking cabinets or stored at a secure off-site storage facility.

Conduct Incident Management

The formal process of detecting, responding to, and fixing a security problem is known as incident management (but more properly referred to as security incident management).

Crossreference This section covers Objective 7.6 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Warning Do not confuse the concept of incident management, described herein, with the more general concept of incident management as defined by the Information Technology Infrastructure Library’s (ITIL) Service Management best practices.

Several incident response frameworks include minor variations in the following phases. NIST Special Publication (SP) 800-61 Revision 2, Computer Security Incident Handling Guide, lists the following phases of incident handling: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Incident Activity.

Incident management includes the following steps:

· Preparation: Incident management begins before an incident occurs. Preparation is the key to quick and successful incident management. A well-documented and regularly practiced incident management plan ensures effective preparation. The plan should include

· Response procedures: Include detailed procedures that address different contingencies and situations. Some organizations structure detailed procedures in a set of playbooks.

· Response authority: Clearly define roles, responsibilities, and levels of authority for all members of the CIRT.

· Available resources: Identify people, tools, and external resources (consultants and law enforcement agents) that are available to the CIRT. Training should include the use of these resources when possible.

· Legal review: The incident response plan should be evaluated by appropriate legal counsel to determine compliance with applicable laws and to determine whether they’re enforceable and defensible.

· Detection: Detecting a security incident or event is the first and, often, most difficult step in incident management. Detection may occur through automated monitoring and alerting systems or as the result of a reported security incident (such as a lost or stolen mobile device). Under the best of circumstances, detection may occur in real time as soon as a security incident occurs, such as malware that is discovered by antimalware software on a computer. More often, a security incident may not be detected for quite some time (months or years), as in the case of a sophisticated “low and slow” cyberattack. Determining whether a security incident has occurred is similar to the detection and containment step in the investigative process (discussed earlier in this chapter) and includes defining what constitutes a security incident for your organization.

· Response: Upon determination that an incident has occurred, it’s important to begin immediate, detailed documentation of every action taken throughout the incident management process. You should also identify the appropriate alert level. Ask questions such as “Is this an isolated incident or a systemwide event?”, “Has personal or sensitive data been compromised?”, and “What laws may have been violated?” The answers will help you determine who to notify and whether to activate the entire incident response team or only certain members. Next, notify the appropriate people about the incident — both incident response team members and management. All contact information should be documented before an incident, and all notifications and contacts during an incident should be documented in the incident log.

· Mitigation: The purpose of this step is to contain the incident and minimize further loss or damage. You may need to eradicate a virus, deny access, or disable services to halt the incident in progress.

· Reporting: This step requires assessing the incident and reporting the results to appropriate management personnel and authorities (if applicable). The assessment includes determining the scope and cause of damage, as well as the responsible (or liable) party.

· Recovery: Recovering normal operations involves eradicating any components of the incident (such as removing malware from a system or disabling email service on a stolen mobile device). Think of recovery as returning a system to its pre-incident state, with any changes required to prevent incident recurrence.

· Remediation: Remediation may include rebuilding systems, repairing vulnerabilities, improving safeguards, and restoring data and services. Do this step in accordance with a business continuity plan that properly identifies recovery priorities.

· Lessons learned: The final phase of incident management requires evaluating the effectiveness of your incident management plan and identifying any lessons learned, which should include not only what went wrong, but also what went right. Organizations often perform an after-action review (AAR) to discuss and understand the steps taken during a recent incident to identify potential improvements in detection or response.

Remember Investigations and incident management follow similar steps but have different purposes. The distinguishing characteristic of an investigation is the gathering of evidence for possible prosecution, whereas incident management focuses on containing the damage and returning to normal operations.

Operate and Maintain Detective and Preventative Measures

Detective and preventative security measures include various security technologies and techniques.

Crossreference This section covers Objective 7.7 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Important examples of detective and preventative measures include

· Firewalls: Firewalls are typically deployed at the network or data center perimeter and at other network boundaries, such as between zones of trust. Increasingly, host-based firewalls are being deployed to protect endpoints and virtual servers throughout the data center. Firewalls are discussed in more detail in Chapter 6.

· IDSes/IPSes: IDSes passively monitor traffic in a network segment or to and from a host and provide alerts of suspicious activity. An IPS can detect and either block an attack or drop the network packets from the attack source. IDSes and IPSes are discussed earlier in this chapter and in Chapter 6.

· Whitelisting and blacklisting: Whitelisting involves explicitly allowing some action, such as email delivery from a known sender, traffic from a specific IP address range, or execution of a trusted application. Blacklisting explicitly blocks specific actions.

· Third-party security services: Third-party security services cover a wide spectrum of possible security services, such as

· Managed security services (MSS), which typically involves a service provider that monitors an organization’s IT environment for malfunctions and incidents. Service providers can also perform management of infrastructure devices, such as network devices and servers.

· Vulnerability management services, where a service provider periodically scans internal and external networks and then reports vulnerabilities to the customer organization for remediation.

· SIEM, discussed earlier in this chapter.

· SOAR, discussed earlier in this chapter.

· IP reputation services, usually in the form of a threat intelligence feed to an organization’s IDSes, IPSes, and firewalls.

· Web content filtering, in which an on-premises appliance or a cloud-based service limits or blocks user access to banned categories of websites (think gambling or pornography), as well as websites known to contain malicious software.

· Data loss prevention (DLP), capabilities where the storage, movement, and use of sensitive information is monitored and (sometimes) blocked, based on organization policy. DLP is described earlier in this chapter.

· Cloud-based malware detection, offered as a service that provides real-time scanning of files in the cloud and uses the speed and scale of the cloud to detect and prevent zero-day threats faster than traditional on-premises antimalware solutions.

· Cloud-based spam filtering, offered as a service that blocks or quarantines spam and phishing emails before they reach the corporate network, thereby significantly reducing the volume of email traffic and performance overhead associated with transmitting and processing unwanted and potentially malicious email.

· Distributed denial of service (DDoS) mitigation, typically deployed in an upstream network to drop or reroute DDoS traffic before it affects the customer’s network, systems, and applications.

· Sandboxing: A sandbox enables untrusted or unknown programs to be executed in a separate, isolated operating environment, so any security threats or vulnerabilities can be safely analyzed. Sandboxing is used in many types of systems today, including antimalware, web filtering, and IPSes.

· Honeypots and honeynets: A honeypot is a decoy system that is used to attract attackers so that their methods and techniques can be observed (somewhat like a Trojan horse for the good guys!). A honeynet is a network of honeypots.

· Antimalware: Antimalware (also known as antivirus) software intercepts operating system routines that store and open files. The antimalware software compares the contents of the file being opened or stored against a database of malware signatures. If a malware signature is matched, the antimalware software prevents the file from being opened or saved and (usually) alerts the user. Enterprise antimalware software typically sends an alert to a central management console so that the organization’s security team is alerted and can take the appropriate action. Advanced antimalware tools use various advanced techniques such as machine learning to detect and block malware from executing on a system.

· Machine learning and artificial intelligence (AI) tools: Machine learning and AI technologies are increasingly used to perform analytics, identify anomalous and potentially malicious behavior, and automate and orchestrate security responses, such as disconnecting a compromised host from the network or terminating a session from a known-malicious IP source address or domain.

Implement and Support Patch and Vulnerability Management

Software bugs and flaws inevitably exist in operating systems, database management systems, tools, and applications, and are continually discovered by researchers. Many of these bugs and flaws are security vulnerabilities that could permit an attacker to control a target system and subsequently access sensitive data or critical functions. Patch and vulnerability management is the process of regularly assessing, testing, installing and verifying fixes and patches for software bugs and flaws as they are discovered.

Crossreference This section covers Objective 7.8 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

To perform patch and vulnerability management, follow these basic steps:

1. Subscribe to security advisories from vendors and third-party organizations.

2. Perform risk analysis on each advisory and patch to determine its applicability and risk to your organization.

3. Develop a plan to install the security patch or to perform another workaround, if any is available.

You should base your decision on which solution best eliminates the vulnerability or reduces risk to an acceptable level.

4. Proactively apply security patches to systems, devices, and applications based on risk and after appropriate testing.

Testing ensures that stated functions still work properly and that no unexpected side effects arise as a result of installing the patch or workaround.

5. Verify that the patch is properly installed and that systems still perform properly.

6. Update all relevant documentation to include any changes made or patches installed.

7. Perform periodic security scans of internal and external infrastructure to identify systems and applications with unsecure configuration and missing patches.

Security scans serve as quality assurance to make sure that proactive patching and configuration are effective.

Understand and Participate in Change Management Processes

Change management is the business process used to control architectural and configuration changes in a production environment. Instead of just making changes in systems and the way that they relate to one another, change management is a formal process of request, design, review, approval, implementation, and recordkeeping.

Crossreference This section covers Objective 7.9 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Configuration management is the closely related process of actively managing the configuration of every system, device, and application and then thoroughly documenting those configurations.

· Remember Change management is the approval-based process that ensures that only approved changes are implemented.

· Configuration management is the control that records all the soft configuration (settings and parameters in the operating system, database, and application) and software changes that are performed with approval from the change management process.

Implement Recovery Strategies

Developing and implementing effective backup and recovery strategies are critical for ensuring the availability of systems and data. Other techniques and strategies are commonly implemented to ensure the availability of critical systems, even in the event of an outage or disaster.

Crossreference This section covers Objective 7.10 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Backup storage strategies

Backups are performed for a variety of reasons that center on a basic principle: Sometimes, things go wrong, and we need to get our data back. To cover all reasonable scenarios, backup storage strategies often involve the following:

· Secure offsite storage: Store backup media at a remote location, far enough away so that the remote location is not directly affected by the same events (weather, natural disasters, or human-made disasters), but close enough that backup media can be retrieved in a reasonable period. This approach is also known as e-vaulting or remote backup.

· Transport via secure courier: This approach can discourage or prevent theft of backup media while it is in transit to a remote location.

· Backup media encryption: This approach helps prevent any unauthorized third party from recovering data from backup media.

· Data replication: Send data to an offsite or remote data center, or cloud-based storage provider, in near real time.

Recovery site strategies

These strategies include hot sites (fully functional data centers or other facilities that are always up and ready with near-real-time replication of production systems and data), cold sites (data centers or facilities that have some recovery equipment available but not configured and no backup data onsite), and warm sites (data centers or facilities that have some hardware and connectivity prepositioned and configured, as well as an offsite copy of backup data).

Selecting a recovery site strategy has everything to do with cost and service level. The faster you want to recover data processing operations in a remote location, the more you will have to spend to build a site that is ready to go at the speed you require. In a nutshell, speed costs.

Multiple processing sites

Many large organizations operate multiple data centers for critical systems with real-time replication and load balancing between the various sites. This approach is the ultimate solution for large commercial sites that have little or no tolerance for downtime. Indeed, a well-engineered multisite application can suffer even significant whole-data-center outages without customers even knowing that anything is wrong.

System resilience, high availability, quality of service, and fault tolerance

The resilience of a system is a measure of its ability to keep running, even under less-than-ideal conditions. Resilience is important at all levels, including network, operating system, subsystem (such as database management system or web server), and application.

Resilience can mean a lot of different things. Here are some examples:

· Filtering malicious input: The system can recognize and reject input that may be an attack. Examples of suspicious input include what you get typically in an injection attack, buffer-overflow attack, or DoS attack.

· Data replication: The system copies critical data to a separate storage system in the event of component failure.

· Redundant components: The system contains redundant components that permit it to continue running even when hardware failures or malfunctions occur. Examples of redundant components include multiple power supplies, multiple network interfaces, redundant storage techniques (such as RAID), and redundant server architecture techniques (such as clustering).

· Maintenance hooks: Hidden, undocumented features in software programs that are intended to inappropriately expose data or functions for illicit use.

· Security countermeasures: Knowing that systems are subject to frequent or constant attack, systems architects need to include several security countermeasures to minimize system vulnerability. Such countermeasures include the following:

· Revealing as little information about the system as possible. For example, don’t permit the system to display the version of operating system, database, or application software that’s running.

· Limiting access to those people who must use the system to perform needed organizational functions.

· Disabling unnecessary services to reduce the number of attack targets.

· Using strong authentication to make it as difficult as possible for outsiders to access the system.

System resilience, high availability, quality of service (QoS), and fault tolerance are similar characteristics that are engineered into a system to make it as reliable as possible:

· System resilience: Includes eliminating single points of failure in system designs and building fail-safes into critical systems.

· High availability (HA): Typically consists of clustered systems and databases configured in an active–active (both systems are running and immediately available) or active–passive (one system is active, while the other is in standby but can become active, usually within a matter of seconds). Clusters in active–passive mode have the failover mechanism used to automatically switch the active role from one server in the cluster to another.

· Quality of service: Refers to a mechanism in which systems that provide various services prioritize certain services to ensure that they’re always available or perform at a certain level. Voice over Internet Protocol (VoIP) systems, for example, typically are prioritized to ensure that sufficient network bandwidth is always available to prevent any traffic delay or degradation of voice quality. Other services that are not as sensitive to delays (such as web browsing or file downloads) will be prioritized at a lower level in such cases.

· Fault tolerance: Includes engineered redundancies in critical components, such as multiple power supplies, multiple network interfaces, and RAID-configured storage systems.

HOW VIRTUALIZATION MAKES HIGH AVAILABILITY A REALITY

Server virtualization is a rapidly growing and popular trend that has come of age in recent years. Virtualization allows organizations to build more resilient, highly efficient, cost-effective technology infrastructures to support their business-critical systems and applications. Popular virtualization solutions include VMware vSphere, VirtualBox, and Microsoft Hyper-V. Although virtualization has many benefits, here’s a quick look at the high-availability benefit.

Virtual systems can be replicated or moved between separate physical systems, often without interrupting server operations or network connectivity. This task can be accomplished over a local area network when two physical servers (hosting multiple virtual servers) share common storage (a storage-area network). If Physical Server #1 fails, all the virtual servers on that physical server can quickly be moved to Physical Server #2. In an alternative scenario, if a virtual server on Physical Server #1 reaches a predefined performance threshold (such as processor, memory, or bandwidth use), the virtual server can be moved” — automatically and seamlessly — to Physical Server #2.

For business continuity or disaster recovery purposes (discussed in the next section and in Chapter 3), virtual servers can also be pre-staged in separate geographic locations, ready to be activated or booted up when needed. Using a third-party application, critical applications and data can be continuously replicated to a disaster recovery site or secondary data center in near real time so that normal business operations can be restored as quickly as possible.

Implement Disaster Recovery Processes

A variety of disasters can beset an organization’s business operations. They fall into two main categories: natural and human-made.

Crossreference This section covers Objective 7.11 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

In many cases, formal methodologies are used to predict the likelihood of a particular disaster. The term 50-year flood plain, for example, is one that you’ve probably heard to describe the maximum physical limits of a river flood that’s likely to occur once in a 50-year period. The likelihood of each of the following disasters depends greatly on local and regional geography:

· Fires and explosions

· Earthquakes

· Storms (snow, ice, hail, prolonged rain, wind, dust, solar)

· Floods

· Hurricanes, typhoons, and cyclones

· Volcanic eruptions and lava flows

· Tornadoes

· Landslides

· Avalanches

· Tsunamis

· Pandemics

Many of these occurrences may have secondary effects; often, these secondary effects have a bigger impact on business operations, sometimes in a wider area than the initial disaster. A landslide in a rural area, for example, can topple power transmission lines, resulting in a citywide blackout. Some of these effects are

· Utility outages: Electric power, natural gas, water, and so on

· Communications outages: Telephone, cable, wireless, TV, and radio

· Transportation outages: Road, airport, train, and port closures

· Evacuations/unavailability of personnel: From both home and work locations

As if natural disasters weren’t enough, human-made disasters can also disrupt business operations as a result of deliberate and accidental acts:

· Accidents: Hazardous-materials spills, power outages, communications failures, and floods due to water-supply accidents

· Crime and mischief: Arson, vandalism, and burglary

· War and terrorism: Bombings, sabotage, and other destructive acts

· Cyberattacks/cyberwarfare: DoS attacks, malware, data destruction, and similar acts

· Civil disturbances: Riots, demonstrations, strikes, sickouts, and other such events

DISASTER RECOVERY PLANNING AND TERRORIST ATTACKS

The 2001 terrorist attacks in New York, Washington, D.C., and Pennsylvania — and the subsequent collapse of the World Trade Center buildings — had disaster recovery planning and business continuity planning officials all over the world scrambling to update their plans.

This kind of planning is still a highly relevant topic more than 20 years later. The attacks redefined the limits of extreme, deliberate acts of destruction. Previously, the most heinous attacks imaginable were large-scale bombings, such as the 1993 attack on the World Trade Center and the 1995 bombing of the Alfred P. Murrah Federal Building in Oklahoma City.

The collapse of the World Trade Center towers resulted in the loss of life of 40 percent of the employees of the Sandler O’Neill & Partners investment bank. Bond broker Cantor Fitzgerald lost 658 employees in the attack — nearly its entire workforce. The sudden loss of a large number of employees had rarely been figured into business continuity and disaster recovery plans before. Businesses suddenly had to figure into contingency and recovery plans the previously unheard-of scenario, “What do we do if significant numbers of employees are suddenly lost?”

Traditional plans nearly always assumed that a business still had plenty of workers around to keep the business rolling; those insiders might be delayed by weather or other events, but eventually, they’d be back to continue running the business. The attacks on September 11, 2001, changed all that forever. Organizations need to include the possibility of the loss of a significant portion of their workforces into their business continuity plans. They owe this inclusion to their constituents and to their investors.

Tip For a complete reference on disaster recovery planning, we recommend IT Disaster Recovery Planning For Dummies, by Peter H. Gregory (John Wiley & Sons, Inc.).

Disasters can affect businesses in many ways, some obvious and others not so obvious:

· Damage to business buildings: Disasters can damage or destroy a building or make it uninhabitable.

· Damage to business records: Along with damaging a building, a disaster may damage a building’s contents, including business records, whether they are in the form of paper, microfilm, or electronic files.

· Damage to business equipment: A disaster may be capable of damaging business equipment, including computers, copiers, and all sorts of other machinery. Anything electrical or mechanical, from calculators to nuclear reactors, can be damaged in a disaster.

· Damage to communications: Disasters can damage common-carrier facilities, including telephone networks (both landline and cellular), data networks, and even wireless and satellite-based systems. Even if a business’s buildings and equipment are untouched by a disaster, communications outages can be crippling. Further, damaged communications infrastructure in other cities can be capable of knocking out many businesses’ voice and data networks. The September 11, 2001, attacks had an immediate impact on communications over a wide area of the northeastern United States, where several telecommunications providers had strategic regional facilities.

· Damage to public utilities: Power, water, natural gas, and steam services can be damaged by a disaster. Even if a business’s premises are undamaged, a utility outage can cause significant business disruption.

· Damage to transportation systems: Freeways, roads, bridges, tunnels, railroads, and airports can all be damaged in a disaster. Damaged transportation infrastructure in other regions (where customers, partners, and suppliers are located, for example) can cripple organizations that depend on the movement of materials, goods, or customers.

· Injuries and loss of life: Violent disasters in populated areas often cause casualties. When employees, contractors, or customers are killed or injured, businesses are affected in negative ways. There may be fewer customers or fewer available employees to deliver goods and services, for example. Losses don’t need to be the employees or customers themselves; when family members are injured or in danger, employees will usually stay home to care for them and return to work only when those situations have stabilized.

· Indirect damage: suppliers and customers: If a disaster strikes a region where key suppliers or customers are located, the effect on businesses can be almost as serious as though the business itself suffered damage.

PLANNING FOR PANDEMICS

(The third edition of CISSP For Dummies, published in 2010, first included this sidebar. Little did we know that we would witness a global pandemic in our lifetime.)

In the past hundred years (and indeed, in all of recorded history before the 21st century), several pandemics have swept through the world. As a result of the COVID-19 pandemic, now we all know too well that a pandemic is a rapid spread of a new disease to which few people have natural immunity. Large numbers of people may fall ill, resulting in high rates of absenteeism; supplier slowdowns; and shortages in materials, goods, and services. Some pandemics have a high mortality rate; many people die.

Contingency planning for a pandemic requires a different approach from that for other types of disasters. When a disaster such as an earthquake, hurricane, or volcano occurs, help in many forms soon comes pouring into the region to help repair transportation, communications, and other vital services. Organizations can rely on outsourced help or operations in other regions to keep critical operations running. But in a pandemic, no outside help may be available, and much larger regions may be affected. In general, a pandemic can induce a global slowdown in manpower, supplies, and services, as well as depressed demand for most goods and services. Whole national economies can grind to a near halt.

Businesses affected by a pandemic should expect high rates of absenteeism for extended periods. Local or regional municipalities may impose quarantines and travel restrictions, which slow the movement of customers and supplies. Schools may be closed for extended periods, which could require working parents to stay at home. Businesses should plan on operating only the most critical business processes, and they may have to rely on cross-trained staff because some of the usual staff members may be ill, or they may be unwilling or unable to travel to work.

This list isn’t complete, but it should help you think about all the ways that a disaster can affect your organization.

Response

Emergency response teams must be prepared for every reasonably possible scenario. Members of these teams need a variety of specialized training to deal with such things as water and smoke damage, structural damage, flooding, and hazardous materials.

Organizations must document all the types of responses so that the response teams know what to do. The emergency response documentation consists of two major parts: response to each type of incident, and the most up-to-date facts about the facilities and equipment that the organization uses.

In other words, you want your teams to know how to deal with water damage, smoke damage, structural damage, hazardous materials, and many other things. Your teams also need to know everything about every company facility — where to find utility entrances, electrical equipment, heating/ventilation/air conditioning (HVAC) equipment, fire control, elevators, communications, data closets, and so on; which vendors maintain and service them; and so on. And you need experts who know about the materials and construction of the buildings themselves. Those experts might be your own employees, outside consultants, or a little of both.

Remember It is the disaster response planning team’s responsibility to identify the experts needed for all phases of emergency response.

Responding to an emergency branches into two activities: salvage and recovery. A tangential activity is preparing financially for the costs associated with salvage and recovery.

Salvage

The salvage team is concerned with restoring full functionality to the damaged facility. This restoration includes several activities:

· Damage assessment: Arrange a thorough examination of the facility to identify the full extent and nature of the damage. Frequently, outside experts, such as structural engineers, perform this inspection.

· Salvage assets: Remove assets, such as computer equipment, records, furniture, inventory, and so on, from the facility.

· Cleaning: Thoroughly clean the facility to eliminate smoke damage, water damage, debris, and more. Outside companies that specialize in these services frequently perform this job.

· Restoring the facility to operational readiness: Complete repairs, and restock and reequip the facility to return it to pre-disaster readiness. At this point, the facility is ready for business functions to resume.

Remember The salvage team is primarily concerned with the restoration of a facility and its return to operational readiness.

Recovery

Recovery comprises equipping the business continuity team with any logistics, supplies, or coordination needed to get alternative functional sites up and running. This activity should be heavily scripted, with lots of procedures and checklists to ensure that every detail is handled.

Financial readiness

The salvage and recovery operations can cost a lot of money. The organization must prepare for potentially large expenses (at least several times the normal monthly operating cost) to restore operations to the original facility.

Financial readiness can take several forms, including

· Insurance: An organization may purchase an insurance policy that pays for the replacement of damaged assets and perhaps even some of the other costs associated with conducting emergency operations.

· Cash reserves: An organization may set aside cash to purchase assets for emergency use, as well as to use for emergency operations costs.

· Line of credit: An organization may establish a line of credit before a disaster to be used to purchase assets or pay for emergency operations should a disaster occur.

· Pre-purchased assets: An organization may choose to purchase assets to be used for disaster recovery purposes in advance and store those assets at or near a location where they will be used in the event of a disaster.

· Letters of agreement: An organization may want to establish legal agreements that would be enacted in a disaster. These agreements may range from the use of emergency work locations (such as nearby hotels), use of fleet vehicles, and appropriation of computers used by lower-priority systems.

· Standby assets: An organization can use existing assets as items to be repurposed in the event of a disaster. A computer system that is used for software testing could be quickly reused for production operations if a disaster strikes, for example.

Personnel

People are the most important resource in any organization. As such, disaster response must place human life above all other considerations when developing disaster response plans and when emergency responders are taking action after a disaster strikes. In terms of life safety, organizations can ensure the safety of personnel in several ways:

· Evacuation plans: Personnel need to know how to safely evacuate a building or work center. Signs should be clearly posted and drills held routinely so that personnel can practice exiting the building or work center calmly and safely. For organizations with large numbers of customers or visitors, additional measures need to be taken so that people who are unfamiliar with evacuation routes and procedures can exit the facilities safely.

· First aid: Organizations need to have plenty of first-aid supplies on hand, including longer-term supplies in the event that a natural disaster prevents paramedics from being able to respond. Personnel need to be trained in first aid and cardiopulmonary resuscitation (CPR) in the event of a disaster, especially when communications and/or transportation facilities are cut.

· Emergency supplies: For disasters that require personnel to shelter in place, organizations need to stock emergency water, food, blankets, and other necessities in the event that employees are stranded at work locations for more than a few hours.

Remember Personnel are the most important resource in any organization.

Communications

A critical component of disaster recovery planning is the communications plan. Employees need to be notified about closed facilities and any special work instructions, such as an alternative location for work. The planning team needs to realize that one or more of the usual means of communication may be adversely affected by the same event that damaged business facilities. If a building has been damaged, for example, the voicemail system that people would try to call to check messages and get workplace status might not be working.

Organizations need to anticipate the effects of an event when considering emergency communications. You need to establish two or more ways to locate each important staff member. These ways may include landlines, cellphones, spouses’ cellphones, and alternative contact numbers (such as neighbors and relatives).

Tip Text messaging is often an effective means of communication, even when mobile communications systems are congested. Text messages require fewer resources than live calls.

Many organizations’ emergency operations plans include the use of audio conference bridges so that personnel can discuss operational issues hour by hour throughout the event. Instead of relying on a single provider (which you might not be able to reach because of communications problems or because it’s affected by the same event), organizations should have a second and maybe even a third audioconference provider established. Emergency communications documentation needs to include dial-in information for both (or all three) conference systems.

In addition to internal communications, the disaster recovery plan must address external communications to ensure that customers, investors, government, and media are provided accurate and timely information.

Assessment

When a disaster strikes, an organization’s disaster recovery plan needs to include procedures to assess damage to buildings and equipment.

First, the response team needs to examine buildings and equipment to determine which assets are a total loss, which are repairable, and which are still usable (although not necessarily in their current locations).

For such events as floods, fires, and earthquakes, a professional building inspector usually needs to examine a building to see whether it is fit for occupation. If not, the next step is determining whether a limited number of personnel will be permitted to enter the building to retrieve needed assets.

When the assessment has been completed, assets can be divided into three categories:

· Salvage: Assets that are a total loss and cannot be repaired. In some cases, components can be removed to repair other assets.

· Repair: Assets that can be repaired and returned to service.

· Reuse: Undamaged assets that can be placed back in service, although they may need to be moved to an alternative work location if the building can’t be occupied.

Restoration

The ultimate objective of the disaster recovery team is the restoration of work facilities with their required assets so that business can return to normal. Depending on the nature of the event, restoration may take the form of building repair, building replacement, or permanent relocation to a different building.

Similarly, assets used in each building may need to undergo their own restoration, whether that restoration takes the form of replacement, repair, or return to service in the chosen location.

Before full restoration, business operations may be conducted in temporary facilities, possibly by alternative personnel, who may be other employees or contractors hired to and help out. These temporary facilities may be located near the original facilities or some distance away. The circumstances of the event will dictate some of these matters, as well as the organization’s plans for temporary business operations.

Training and awareness

An organization’s ability to respond effectively to a disaster is highly dependent on its preparations. In addition to the development of high-quality, workable disaster recovery and business continuity plans that are kept up to date, the most important part is making sure that employees and other personnel are trained periodically in response and continuity procedures. Training and practice help reinforce understanding of proper response procedures, giving the organization its best chance at surviving a disaster.

An important part of training is participation in various types of testing, as discussed in the upcoming section, “Test Disaster Recovery Plans.”

Lessons learned

Every disaster is different and requires the best-laid plans to be adapted to address unique circumstances as they arise. It is important to document any changes and adaptations that need to be made to the plan, including what worked well and what didn’t work, as well as other key information collected during an actual disaster. This information will help drive a program of continuous improvement to ensure that the organization is better prepared for future events.

Test Disaster Recovery Plans

By the time an organization has created a disaster recovery plan, it’s probably spent hundreds of hours and possibly tens or hundreds of thousands of dollars on consulting fees. You’d think that after making such a big investment, they’d test the plan to make sure that it really works when an actual disaster strikes!

Crossreference This section covers Objective 7.12 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

The following sections outline testing methods.

Read-through or tabletop

A read-through (or tabletop) test is a detailed review of plan documents, performed by employees on their own. The purpose of a read-through test is to identify inaccuracies, errors, and omissions in documentation.

It’s easy to coordinate this type of test, because each person who performs the test does it when their schedule permits (provided that they complete it before any deadlines).

By itself, a document review is an insufficient way to test a disaster recovery plan, but it’s a logical starting place. You should perform one or more of the other tests described in the following sections shortly after you do a read-through test.

Walkthrough

A walkthrough (or structured walkthrough) test is a team approach to the read-through test in which several business and technology experts in the organization gather to walk through the plan. A moderator or facilitator leads participants to discuss each step in the plan so that they can identify issues and opportunities for making the plan more accurate and complete. Group discussions usually help identify issues that people won’t find when working on their own. Often, the participants want to perform the review at a fancy mountain or oceanside retreat, where they can think much more clearly! (Yeah, right.)

During a walkthrough test, the facilitator writes down parking-lot issues (items to be considered later but written down now so they won’t be forgotten) on a whiteboard or flip chart while the group identifies those issues. These action items serve to make improvements to the plan. Each action item needs to have an accountable person assigned, as well as a completion date so that the action items will be completed in a reasonable time. Depending on the extent of the changes, a follow-up walkthrough may need to be conducted.

Tip A walkthrough test usually requires two or more hours to complete.

Simulation

In a simulation test, all the designated disaster recovery personnel practice going through the motions associated with a real recovery. In a simulation, the team doesn’t \perform any recovery or alternative processing.

An organization that plans to perform a simulation test appoints a facilitator who develops a disaster scenario, using a type of disaster that’s likely to occur in the region. An organization in San Francisco might choose an earthquake scenario, for example, and an organization in Miami could choose a hurricane.

In a simple simulation, the facilitator reads announcements as though they’re news briefs. Such announcements describe an unfolding scenario and can include information about the organization’s status at the time. An example announcement might read like this:

It is 8:15 a.m. local time, and a magnitude 7.1 earthquake has just occurred 15 miles from company headquarters. Building 1 is heavily damaged, and some people are seriously injured. Building 2 (the one containing the organization’s computer systems) is damaged, and personnel are unable to enter the building. Electric power is out, and the generator has not started because of an unknown problem that may be earthquake-related. Executives Jeff Johnson and Sarah Smith (CIO and CFO) are backpacking on the Appalachian Trail and cannot be reached.

The disaster-simulation team, meeting in a conference room, discusses emergency response procedures and how the response might unfold. They consider the conditions described to them and identify any issues that could affect an actual disaster response.

The simulation facilitator makes additional announcements throughout the simulation. Just as in a real disaster, the team doesn’t know everything right away; instead, news trickles in. In the simulation, the facilitator reads scripted statements that … um, simulate the way that information flows in a real disaster.

A more realistic simulation can be held at the organization’s emergency response center, where some resources that support emergency response may be available. Another idea is to hold the simulation on a day that is not announced ahead of time so that responders will be genuinely surprised and possibly be less prepared to respond.

Tip Remember to test your backup media to make sure that you can actually restore data from backups!

Parallel

A parallel test involves performing all the steps of a real recovery except that you keep the real live production systems running. The actual production systems run in parallel with the disaster recovery systems. The parallel test is very time-consuming, but it does test the accuracy of the applications because analysts compare data on the test recovery systems with production data.

The technical architecture of the target application determines how a parallel test needs to be conducted. The general principle of a parallel test is that the disaster recovery system (meaning the system that remains on standby until a real disaster occurs, at which time the organization presses it into production service) runs process work at the same time that the primary system continues its normal work. Precisely how this is accomplished depends on technical details. For a system that operates on batches of data, those batches can be copied to the disaster recovery system for processing there, and results can be compared for accuracy and timeliness.

Highly interactive applications are more difficult to test in a strictly parallel test. Instead, it might be necessary to record user interactions on the live system and then play back those interactions using an application testing tool. Then responses, accuracy, and timing can be verified after the test to verify whether the disaster recovery system worked properly.

Although a parallel test may be difficult to set up, its results can provide a good indication of whether disaster recovery systems will perform during a disaster. Also, the risks associated with a parallel test are low, since a failure of the disaster recovery system will not affect real business transactions.

Remember The parallel test includes loading data onto recovery systems without taking production systems down.

Full interruption (or cutover)

A full interruption (or cutover) test is similar to a parallel test except that in a full interruption test, a function’s primary systems are shut off or disconnected. A full interruption test is the ultimate test of a disaster recovery plan because one or more of the business’s critical functions depends on the availability, integrity, and accuracy of the recovery systems.

A full interruption test should be performed only after successful walk-throughs and at least one parallel test. In a full interruption test, backup systems are processing the full production workload and all primary and ancillary functions, including user access, administrative access, integrations with other applications, support, reporting, and whatever else the main production environment needs to support.

Remember A full interruption test is the ultimate test of the ability for a disaster recovery system to perform properly in a real disaster, but it’s also the test with the highest risk and cost.

Participate in Business Continuity Planning and Exercises

Business continuity and disaster recovery planning are closely related but distinctly different activities. As described in Chapter 3, business continuity focuses on keeping a business running after a disaster or other event has occurred; disaster recovery deals with restoring the organization and its affected processes and capabilities back to normal operations.

Crossreference This section covers Objective 7.13 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Tip If you don’t recall the similarities and differences between business continuity and disaster recovery planning, we strongly recommend that you refer to Chapter 3!

Security professionals need to take an active role in their organization’s business continuity planning activities and related exercises. As a CISSP, you’ll be a recognized expert in the area of business continuity and disaster recovery, and you’ll need to contribute your specialized knowledge and experience to help your organization develop and implement effective, comprehensive business continuity and disaster recovery plans.

Implement and Manage Physical Security

Physical security is yet another important aspect of the security professional’s responsibilities. Important physical security concepts and technologies are covered extensively in chapters 5 and 7.

Crossreference This section covers Objective 7.14 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

As with other information security concepts, ensuring physical security requires appropriate controls at the physical perimeter (including the building exterior, parking areas, and common grounds) and internal security controls to (most important) protect personnel, as well as to protect other physical and information assets from various threats, such as fire, flooding, severe weather, civil disturbances, terrorism, criminal activity, and workplace violence. Physical security is discussed further in Chapter 5.

Address Personnel Safety and Security Concerns

Security professionals contribute to the safety and security of personnel by helping their organizations develop and implement effective personnel security policies (discussed in Chapter 3) and through physical security measures (discussed in the preceding section, as well as chapters 5 and 7).

Crossreference This section covers Objective 7.15 of the Security Operations domain in the CISSP Exam Outline (May 1, 2021).

Several important aspects of personnel safety and security that need to be understood and addressed include

· Travel: Personnel may be at greater risk when traveling. They may unwittingly travel into or get lost in a high-crime area of a city, for example. Even at the airport, criminals are looking for travelers, whether for business or leisure, who are easy marks. When employees are traveling, the organization should ensure that its personnel are appropriately briefed on important safety precautions, including local laws and customs (particularly for international travel), travel advisories, and high-crime areas to avoid (if possible). Organizations should develop a means for being able to contact traveling employees in the event of an emergency. Traveling in groups, checking in frequently with other members of the travel group, and using laptop security cables are a few examples of security precautions to consider.

· Security training and awareness: Security training and awareness should include not only information security, but also personal safety and security. Providing your employees' safety and security information that they can use to protect themselves and their families helps promote a positive culture that extends to every important aspect of our lives. Teaching your employees not to post personal data, travel plans, and other sensitive information on social media, for example, can potentially help them avoid identity theft, fraud, burglary, blackmail, extortion, and other crimes that may be perpetrated by violent criminals (not just faraway cybercriminals) in their homes and communities. Security training and awareness programs are discussed further in Chapter 3.

· Emergency management: Saving human lives is the first priority in any emergency situation. Personnel should understand the basics such as calling 911 (or another local emergency number or system), first-aid care, CPR, use of an automated emergency defibrillator, fire evacuation routes, and other topics.

· Duress: Personnel may be subjected to or threatened by coercion or violence for various purposes. Your employees should know what resources are available to them from your organization, such as legal or financial assistance, employee assistance programs, grief counseling, and suicide prevention. In public areas, organizations should implement duress alarms so that personnel can summon assistance. Personnel security and safety awareness training should cover not only what resources are available and how to use them, but also the tactics, scams, and schemes that threat actors may use to achieve their deviant purposes.

Remember Saving human lives is the first priority in any life-threatening situation.

If you find an error or have any questions, please email us at admin@erenow.org. Thank you!