Aquaboutic | Focus Security Research | Vulnerability Exploit | POC


enterprise security operation architecture based on general technology

Posted by loope at 2020-02-26

1、 Development bottleneck of enterprise network security construction

Sans network security activity scale model divides the investment and work direction of network security construction into five stages: architecture, passivity, initiative, intelligence and deterrence.

For most traditional enterprises (different from security enterprises and Internet enterprises), after a period of efforts, they can complete the process from scratch (Architecture), from "fire fighting" (passive) to positive construction (active). Enterprises at this stage generally have the following capabilities:

Basic attack and defense ability: understand common network attack and defense technology, and be able to carry out penetration test.

Threat protection capability: for common network attack threats, multiple means can be used for protection and monitoring to form a deep structure. Collect and use threat information for traceability analysis and collapse detection.

Safe operation capability: complete closed-loop disposal of safety events, and form an effective risk control mechanism.

After solving the basic survival problems of enterprise security, the main contradiction has changed from the lack of tools to how to develop efficiently, evenly and sustainably. At this stage, we believe that enterprise security has relatively mature single point attack and defense capabilities, but the overall security capabilities are lacking. Intuitively speaking, the security team can solve most of the security problems, but the investment in the tracking and research of new security threats and new security technologies is insufficient; when the business scale of the data center is expanded within a limited range, the security team can barely maintain the strength of security protection by adding more staff, but this simple and crude way can never keep up with the actual development of information work Degree. The main bottlenecks are:

(1) Insufficient security resource scheduling capability

1. Security resource deployment

With the rapid development of virtualization and cloud computing technology, the computing resources of data center have good flexibility and scalability. For security, the endpoints that need to be protected and managed are changing dynamically every day or even every moment. Agent-based endpoint security protection means need to catch up with the speed of business changes, rapid and effective deployment, maintain effective management of endpoint, and timely follow up the online, offline, policy and log status of agent.

In addition, with the continuous development of business, the network structure is becoming more and more complex. On the one hand, it has interaction requirements with multi-party businesses, and there are multiple boundaries; on the other hand, it is necessary to maintain the consistency of monitoring and protection efforts among various internal regions. Link based traffic protection needs to provide protection capability for multi service flow and data flow, with large traffic and more complex access requirements.

2. Security policy management

In the process of security operation, based on Threat Intelligence and collapse detection, the security strategy needs to be adjusted constantly. When the scale of security resources is large, it is necessary to ensure the accuracy and timeliness of policy distribution and manage the policy in a unified way.

(2) Insufficient safety analysis capability

In the scenario of single point attack and defense, security analysis mainly depends on the management platform of single security device itself. In order to realize the analysis of the overall security situation, the enterprise constructs a situation awareness platform to collect, analyze and display the security log and threat information in a unified way. When all the logs are collected, it challenges the data processing ability of the platform and requires the ability of big data processing and storage.

(3) Safety positioning and value issues

This is a trite topic. The safety construction of enterprises has developed to the current stage. Although it has relatively mature methodology and protection system, it is still in the state of "self-improvement" from the outside. Other units and departments can understand and accept the risks reported by the security department and are willing to cooperate with the rectification, but it is not easy to understand the systematic security work such as defense in depth, security operation, and even vulnerability mining. After all, safety is a relatively cold technical direction. To promote security work, we should not only emphasize the importance through the top-down enterprise strategy and organizational structure, but also need the bottom-up security business and security capability output to enhance the sense of identity, especially the technology output.

Among the above three problems, the root cause of insufficient resource deployment capability lies in the failure to pool the security capability at the bottom; the root cause of insufficient security analysis capability lies in the failure to process all security data in a unified way; the security orientation is needless to say, the job is not good, and there is no value.

2、 Transformation from engineering equipment to practical ability of safe operation

According to the above gap analysis, we can preliminarily sort out the requirements of security technology architecture:

(1) Overall demand

1. Adopt general technical solutions.

2. To meet the needs of big data processing, high concurrency and high availability.

3. Meet distributed deployment requirements.

(2) Agent based endpoint management

1. More than 100000 agent management capabilities.

2. In the agent concurrent activity scenario, it has the full life cycle management capabilities of online, offline, policy distribution and log collection for each agent.

(3) Link based traffic protection

1. It has the processing capacity of single link 10 Gigabit network.

2. It has the processing ability in the scenario of high concurrent business access.

3. Have multiple link access processing capabilities.

(4) Security resource management scheduling

1. Management coverage of all security resources in the data center.

2. Have the ability to issue and execute minute level policies.

(5) Security log analysis

1. It has the real-time processing capacity of more than 100000 logs per second.

2. Have the ability of massive log retrieval and storage.

3. Ability to show analysis results.

4. Have the ability of linkage with security resource management and scheduling.

As you can see, security demands processing power as much as business. This is also in line with the original intention of security work: since security needs to guarantee business, it cannot become a performance bottleneck, at least it must have the same processing capacity. From the actual situation, due to the security to ensure the entire data center business, its performance is often higher than a single business system. Therefore, security actually has the basis of technology output to the business.

3、 Expand enterprise network security technology architecture

Based on the above demonstration, we expand the enterprise network security technology architecture, shift the focus of security construction from tool set accumulation to operation platform and bottom structure construction, and realize the standardization and pooling of security resources.

(1) Content

This layer is actually the combing and integration of existing security means. All kinds of security systems, whether they are "boxes" of commercial procurement or software based on open source or self-developed, have clear tool attributes: first, they are deployed on the "front line" to play the most direct security protection effect; second, they are independent with clear functional characteristics; third, they are replaceable, and similar products can be "plug and play" in principle; fourth, they should be under unified management Li.

These security measures are integrated to form a basic defense in depth system. We describe it as "family barrel set meal" or "big dish plan". Safety measures are embedded in all stages of information construction to form standardized protection measures. In other words, as long as a new endpoint is online, a full set of agent-based endpoint protection means will be installed by default; as long as there are links such as boundary, business flow or data flow, they will be pulled to the flow cleaning resource pool for unified protection and monitoring. In addition, for online services, services, systems and equipment, access to various gateways to achieve unified access control; through active detection to achieve vulnerability detection and asset management, the introduction of threat intelligence.

Based on this, the standardized business of safety has been formed, which can effectively avoid the uneven protection, the difficulty of updating safety products, and is conducive to the horizontal expansion and large-scale export of safety capabilities.

(2) Operation platform

The construction of operation platform mainly realizes the management and scheduling of security resources, as well as the analysis and presentation to connect the operation work. Through data analysis technology, we can achieve the ability of security monitoring, response and early warning, resist external threats, and ensure the safe and stable operation of business.

1. Interface drive

It provides a unified interface for external security resources, mainly including the call and release of resources such as endpoint agent distribution, traffic protection and monitoring access, gateway access, active detection and scanning, so that the business can automatically use pooled security resources.

2. Cluster management

Manage all security resources in a unified way, distribute and update policies, collect logs and conduct real-time processing.

3. Service Bus

For the collected massive logs, message queue and cache are established, and real-time analysis, processing, storage and retrieval are carried out based on big data technology.

4. Unified presentation

Accumulate safety analysis cases, form safety scenarios and analysis rules, and display data processing results in a unified way. According to the security operation structure, establish response and disposal process to control security risks.

(3) Underlying architecture

1. Safety technology

Although a large-scale safety technology framework has been built to improve the safety protection ability, attack and defense technology is still an important basis and starting point for analysis and protection, and risk control is the main line and goal throughout the whole work.

2. High availability

The security technology architecture should meet the high availability requirements. On the one hand, the performance requirements determine that the management platform needs to be deployed in clusters, and a stable and reliable computing infrastructure is required. On the other hand, the goal of security services determines that security resources need to be physically close to the deployment of business systems. In addition to the multi center deployment adopted by the data center itself for high availability, even within the same data center, security resources need to be deployed in multiple physical locations. Therefore, the security technology architecture needs to consider the distributed deployment technology.

3. High concurrency

For massive traffic and logs, a single security tool cannot meet the performance requirements, so it needs to be deployed in a cluster. Through the technology of load balancing, the business pressure scheduling of security resources is realized.

4. Big data

For massive logs, it has the ability of real-time processing, fast retrieval and storage.

(4) Deployment example

1、 LVS/Nginx is used as cluster deployment tool for load balancing.

2. Kubernetes is used to provide the basic computing environment for the security platform.

3. Zookeeper is used for unified management of security tool policies.

4. Filebeat is used to collect and transmit logs.

5. Kafka is used as the message queue receiving log.

6. Flink is used to process the logs in real time.

7. Hive is used as big data storage.

8. Logstash is used to receive the real-time processing results of logs.

9. Elastic search is used to store real-time processing results and provide full-text retrieval.

10. Kibana is used to show the data in elastic search.

11. Redis is used as the cache database and MySQL as the main storage.

12. JIRA is used for work order management, follow-up processing of log analysis results, docking with zookeeper, strategy adjustment and distribution.

13. This paper mainly discusses the transformation of enterprise security technology architecture, so the specific selection of tool level is not carried out.

4、 Docking with safety operation

(1) Data processing flow

The security event processing shall form a closed loop according to the process of early warning, protection, detection, response, recovery, counterattack (WPDRRC). Through data processing process, landing safety event life cycle.

1. Basic data classification

The data collected at the tool level can be classified into the following categories:

Intelligence: mainly external threat intelligence collected through various channels.

Traceability and Forensics: a large number of logs for traceability and forensics generated by link traffic DPI (deep packet analysis) tool and EDR (endpoint detection response) tool, as well as infrastructure operation logs.

Warning category: threat warning detected by various security tools.

Monitoring: monitoring and scanning of various security tool business systems.

2. Data processing module

The processing level mainly includes the following modules:

Threat Intelligence base: select intelligence data with high reliability and applicability to establish Threat Intelligence base and form early warning information for comparative analysis.

Association analysis engine: Based on alarm data and monitoring data, judge the accuracy, severity and urgency of threat alarm for response. Among them, threat warning with high accuracy is used as internal intelligence data and input into Threat Intelligence base.

Traceability and forensics module: compare the information of threat intelligence base with traceability and forensics data, judge the situation of loss, determine the scope and severity of impact for response.

Response disposal platform: for the situation that needs to be disposed, work orders are formed according to the priority to adjust the tool level strategy.

3. Data flow process

1) Intelligence data is input into Threat Intelligence base.

2) Traceability forensics data input log receiving module, further input into the full-text search engine.

3) The new intelligence data is compared with the data of traceability and forensics, and the detection results are input into the correlation analysis engine.

4) The new traceability forensics data is compared with the stock threat intelligence database, and the detection results are input into the correlation analysis engine.

5) Alarm type and monitoring type data input correlation analysis engine to analyze substantive high-risk events.

6) The correlation analysis engine comprehensively evaluates the accuracy, severity and urgency of the event, forms the priority, inputs the response and disposal platform, executes the plan to prevent the situation from deteriorating, and inputs the traceability and forensics module.

7) Through the full-text search engine, the traceability forensics module analyzes the event in detail, judges the scope of influence and the development of the situation; develops and improves the solution and inputs the response and disposal platform to eliminate the risk; forms the internal threat information input threat information platform.

8) The response and disposal platform forms alarm log and security tool strategy, which are distributed and executed through work order system.

(2) Sorting out safety scenes

Based on the experience of security operation, the types of threats are classified, security scenarios are formed, and analysis methods are summarized.

1. Scene classification and grading

Attack chain model divides network attack into seven stages. At the same time, because the attack actions in these stages need to be based on the "results" of the previous attacks, and they are basically in the data center, they are more likely to be exposed. On the contrary, in the early stage of stampede and delivery, most of them are launched from the outside, with a large amount of attacks. The results of attacks are not easy to be confirmed directly, and are prone to false or missed reports.

Therefore, to classify the scene, the first thing is to sort out the lost scene. Based on the core objectives of information security protection: confidentiality, integrity and availability, we can design the main categories of lost scenarios: denial of service, system control and data disclosure. Then, according to the different causes of these results, the detailed items are divided, and the targeted analysis method is designed.

2. Solution segment

As mentioned in the "data flow process" above, the same event must be issued and executed at least twice in the process of disposal. On the one hand, different strategies have different difficulties and time-consuming in the implementation level, such as blocking addresses and ports can be completed quickly, but upgrading patches and anti-virus strategies will take time, especially in the case of large data center. On the other hand, it also takes time to trace the source of evidence, and eradication programs cannot be formed in the first time. In the early stage of emergency response such as "fire fighting", the primary goal is to control the situation as soon as possible and prevent deterioration. All kinds of disposal measures need to form echelons according to the time, and at least form a "two-stage" solution. Among them, temporary solutions need to be able to be implemented immediately and quickly. In the process of continuous optimization of follow-up solutions, temporary solutions can be further divided into minute level, hour level, natural day level and other multi-stage solutions.

5、 Main benefits

(1) The evolution of "human flesh" security operation to automation and intelligence

With the rapid development of information construction and severe network security situation, the volume and difficulty of security operation work continue to grow. It is an inevitable trend to improve the efficiency of human work by means of automation and intelligent technology. On the one hand, it improves the business carrying capacity of the security team. In the case of limited external network security professional talent reserve and internal security team human budget, the security team can still provide effective network security guarantee for the data center with growing business scale based on the specialized security operation technology architecture. On the other hand, it is conducive to improving the professionalism of safety work. The core competitiveness of safety work lies in the accumulation of knowledge of safety scene and experience of event handling. Through the specialized security operation technology framework, the experienced security personnel can be liberated from the complicated information collection and processing work, and put into the work of knowledge base precipitation, new technology tracking and security governance framework research, which can effectively improve the professional ability of the security team.

(2) Enhance the value contribution of network security to enterprise information work

As a traditional "cost center", network security can't make profits directly. To reflect its value, we need a method to measure the benefits. In the process of construction and operation of enterprise security operation technology architecture, advanced technologies such as big data and cloud computing are applied to achieve the technical goals of high concurrency, high availability and distribution, and have practical application scenarios. The internal horizontal output of these technologies is a new growth point of network security benefits.

In the process of construction of enterprise security operation technology framework, the security team has realized the transformation of work focus from professional security technology to general information technology, greatly expanded the security technology work boundary, and provided a broad space for professional technology accumulation and professional talent training. This process is conducive to promoting the communication and common development of safety technology and external, and to improving the acceptance and recognition of safety work.

(3) Promote the transformation from passive security to active security

Under the background of the increasingly severe network security situation, the threat is escalating. The traditional security methods that rely too much on protection tools are not enough to deal with complex, hidden and high-risk attack means. The bottom line of security work is to avoid core business collapse or even the overall collapse of data center, which is facing increasing pressure. Through the implementation of the enterprise's security operation framework, the transformation from passive security to active security can improve the overall security protection ability, especially the high-level threat response ability and large-scale attack protection ability, so as to further reflect the core values, capabilities and benefits of security work.

(4) Promote the integration and support of security and business

First, the security means are embedded in each stage of the information construction life cycle to realize the security bearing business. The second is to improve the ability to detect and control security risks and further ensure the safe and stable operation of business. Third, to establish a strong security analysis capability and give full play to the value of security data is conducive to the formulation of scientific and reasonable risk control measures, strengthening the cooperation between security and business, and continuously and efficiently carrying out security operation.


Thank my leaders and colleagues Lu Yi and ouyangxin for their help and support!

Thank you Mr. Zhang Song of Huatai Securities for your guidance and valuable suggestions!

About the author: Dong Yicheng, senior network security engineer, works in the information security department of the financial information center of the people's Bank of China, CISP, winner of the bank science and Technology Development Award. Responsible for the construction of Internet security protection system and security operation, focusing on penetration testing, web security, PKI / Ca fields.

More good articles from cactus intelligence station:

Osquery learning notes: a theoretical study

Ouyangxin: practice and thinking of terminal safety operation

Lu Yi: building flexible information security defense system from the perspective of attack

On the tactical confrontation in network attack and defense

Long press identification QR code to get more original articles

Welcome to reprint and contact cactus intelligence station

Welcome to praise. A kind of