security – AIDA

Protecting the Security and Privacy of AIDA and its Data

Paula Cristina Rodrigues — Mon, 07 Nov 2022 15:24:06 +0000

Protecting the Security and Privacy of AIDA and its Data

Adapting the RAID platform comes with many security and privacy issues that need to be addressed. These issues appear mainly due to the transition to an edge architecture, which imposes the use of computation resources at the edges of the network, but also due to the need of supporting multiple tenants and network slicing with 5G technology. As mobile phones connect and disconnect, the network is always changing, requiring adaptation and monitorization tools to be used to maximize resources. And since the complexity of the system increases, attackers have more opportunities to exploit it. With these issues in mind, various mechanisms and protocols were put in place, assuring the overall security of the platform.

Secure Communications and Malware Identification

The transition to edge computing architecture introduces new vulnerabilities. Attacks like Man-in-the-Middle or Eavesdropping can occur with higher probabilities, thus it is of utmost importance to ensure secure communication between the different devices and components. The edge architectures can leverage different solutions, seeking or not compliance with standard industries for mobile applications like Mobile Edge Computing (MEC). Such standards facilitate the data processing at the edge, but also open new threats due to exposure of new APIs, which may require data flows between edge nodes and cloud components.

Secure communications for applications leveraging on APIs can rely on the advances of the HTTP protocol – HTTP/3 which introduces functionalities for increased security (i.e., reduced round trip time for initial session handshake) and support for unreliable links, for instance with high packet loss. The AIDA platform leverages the advances of HTTP/3 that relies on the QUIC protocol. In addition, KubeEdge orchestrates the diverse microservices of the AIDA platform at edge nodes and also supports multiple solutions to secure the communications at the control plane. Besides TLS connections, there is also support for the HTTP/3 protocol.

The AIDA platform is also able to allow ISPs to detect malware on the fly. First, the AIDA platform captures the network traffic (mainly DNS queries) and presents it to a botnet analyzing service. The service then produces a floating point evaluation regarding the probability of infection of a specific device, which can then be used to trigger a response based on the presented value.

The steps present in the detection are: (1) blacklist/whitelist analysis, (2) query rate analysis, (3) domain analysis (whether it is or not DGA-generated) and finally a (4) machine learning step. This pipeline-oriented scheme aims to achieve high speed and scalability, therefore, the packets leave the pipeline as early as a consistent evaluation is formed. In all steps a packet can be marked as infected or not(except (2) which can only deem infected) and leave the pipeline. Only if a packet does not meet the upper or lower bound criteria for infection, it traverses the pipeline into the next evaluation step.

Secure within Software Components

The AIDA platform is based on the use of microservices, which require special attention to ensure their security. The platform also is expected that large amounts of data will be generated, therefore, intrusion detection mechanisms must be lightweight while also being able to cope with a dynamic number of replicas and a large number of services.

When looking at the system as a whole, we were able to increase detection rates up to 60%, while only around 25% of the alarms were false positives. Without the techniques being employed the results would be overwhelmed by a really large amount of false positives. Also, considering adaptation environments, results improved by up to 80%, improving the overall security of the system.

Fig.1 – Results of the intrusion detection methods employed

Hoping to improve the intrusion detection rate and minimize false alarms, we also decided to explore the use of machine-learning techniques for intrusion detection. With this intent, we collected system calls from the microservice systems as our data and used classification techniques to detect intrusions. The results demonstrate a high detection rate for two of the five attacks from the tested vulnerabilities. Although only some attacks have been detected, the false alarm rate had excellent results, staying below 1% for all attacks. We also improved the results from machine learning by using a sliding window as a post-processing technique.

Based on the machine learning results, we understood the necessity of a better system call representation for intrusion detection techniques. For this, we decided to work on a representation that could convey more information about the connection between system calls. First, we devised a classification where system calls were divided into classes and subclasses. Later, we established relationships of different costs between these system calls to create a system call graph. Some parts of the system call graph can be seen in Fig.3. Since our representation was susceptible to our subjectivity, we designed a validation process that gathered information from other researchers in the research area. In the classification, we adjusted 17,28% of system calls based on the validation from other researchers. The graph validation has started, and it is in progress.

Fig.2 – Small representation of the system calls as a graph

Since the AIDA platform will receive dynamic loads, adaptations are possible throughout the execution, maximizing the resources being allocated. Mechanisms of self-adaptation were therefore put in place, monitoring the system and executing pre-defined actions that let the system react to the changes, allowing for high performance and availability. To do that, we will use the Trustworthiness Monitoring and Assessment (TMA) Framework. TMA allows self-adaptation mechanisms in cloud and edge applications. This is done through their REST interfaces, which interact with the probes and actuators of the managed element (i.e., the AIDA platform). As TMA can be easily tailored to any aspect to be monitored (e.g., performance, availability, security), it was chosen to be applied to the AIDA platform. In addition, we have recently made available a new dashboard that allows managing and visualizing TMA configurations.

Data Privacy

The assurance of data privacy in the AIDA platform demands suitable anonymization approaches to store and process large amounts of data. The development of a privacy framework came to overcome the difficulty in selecting and configuring the appropriate mechanism that fulfills the project requirements. This privacy framework allows to implement, apply, and assess Privacy-Preserving Mechanisms (PPMs) according to the pipeline below.

Fig. 3 – Privacy Framework architecture

The architecture of the privacy framework consists of a main Python Package with a set of subpackages, where each subpackage contains the corresponding adapter, that is, an abstract class that can be extended by implementing the abstract methods (i.e. relevant methods for the component). These adapters make the framework easily extendable, by allowing the implementation of new features (e.g. new PPMs or metrics).

These strategies are complemented by SOTERIA, which uses machine learning techniques to create a distributed privacy-preserving system. It was built taking into consideration both scalability and fault tolerance, allowing the processing of large datasets. See our December post for details.

By University of Coimbra

How can we protect the security and the privacy of the AIDA platform?

Paula Cristina Rodrigues — Mon, 13 Dec 2021 10:39:46 +0000

How can we protect the security and the privacy of the AIDA platform?

The evolution of the RAID platform during the AIDA project brings many benefits, but also many security and privacy concerns to be considered. Thus, discovering and implementing measures that address these new risks, while not degrading performance, is of utmost importance. The main challenges are related to the transition to edge, pushing the computational power to the edges of the network, to the integration of 5G supporting multiple tenants and network slicing, and finally to the privacy of the data gathered and analyzed.

Given these changes to the network, communications need to be verified in order to assure that they are secure and that performance isn’t being affected. The network is constantly changing as many devices connect and disconnect from it and have higher and lower traffic, creating the necessity of monitoring the platform to allow a fast response to any change in it. This change in the network creates new potential entry points for attackers to take advantage of or makes it harder to defend attacks that were already possible.

Figure 1: Overview of the changes to the Architecture

Providing Secure Communication among the Components

The AIDA platform with components running at the edge of the network and at the core, requires secure communication channels to assure that the exchanged information is protected against several threats, such as eavesdropping, man-in-the-middle attacks. A critical aspect regarding security is the support for authentication, authorization and accounting (AAA), by all services/functions of the AIDA platform, no matter the place where they run.

Robust and secure communication approaches exist, such as the Transport Layer Security (TLS) protocol which is widely used nowadays to assure authentication, confidentiality and integrity of the exchanged data. In this perspective, components should support TLS v1.2 and beyond, preferably v1.3 given the higher protection levels and the reduced times to perform the handshake processes.

Nonetheless, a plug-n-play solution is not simple! The existence of several microservices, which can run at the edge or at the core of the network, lead to issues with keys, certificates management, that are required by TLS connections. A seamless integration with federated identity management approaches like OpenID Connect can lead to scalability issues, if not managed properly.

Assuring a Secure Operation of the Software Components

AIDAMicroservices need to be monitored using lightweight, fast, and efficient approaches while maintaining a high effectiveness level. The constant modification of the deployment scenarios, with auto-scaling adaptation, forces the behavior profiles used to identify deviations to become generalizable so that security level is not compromised in these dynamic environments. There is still potential for some intrusions to go undetected, the reason why the incorporating intrusion tolerance provides a way to increasing security levels and assure the system provides the intended service level despite intrusions successfully evading the detection mechanisms.

Many security strategies are being evaluated and improved, such as the use of machine learning techniques and classifiers to detect intrusions. The goal is to construct benign behavior profiles that detect deviations from the “normal behavior” used to train the algorithms. After a configurable number of deviations, alarms are raised and suspicious activity is reported. Intrusion tolerance will be most effective when applied to the key services of the architecture. Solutions that are commonly used are under study to identify possible applications in the AIDA scenario. The approaches that provide tolerance to the application can range from diversity of services, requiring different versions of technologies or techniques used to develop them, to the application of architectural patterns that can be static or dynamically applied according to information collected from the system while in operation.

Figure 2: High level perspective of the secure operation of the main software components

Also, to keep up with high availability, a self-adaptation mechanism can be used to monitor and adapt the various components inside the architecture, applying known actions to different components as an answer to the changes in the environment. These actions have the purpose of mitigating the problems and improving the performance of the platform, managing the resources to where they are needed, achieving high performance and availability. It also takes care of fixing identified security and privacy problems in the platform.

Maintaining the Privacy of the Data used

Regulations such as GDPR and HIPAA, together with the need to outsource data and computation to third-party infrastructures, make it critical to have privacy-preserving solutions that can be deployed at potentially untrusted environments. For instance, in Machine learning as it deals with the analysis of sensitive data, many times unprotected, which may leak sensitive information to adversaries at the untrusted premises. Even if this information is encrypted, there are other types of attacks that may compromise confidentiality as depicted in Figure 3.

Figure 3: Examples of attacks that can affect ML: Adversarial Samples, Model Extraction, Model Inversion, Reconstruction Attacks, and Membership Inference

Although the use of software-based cryptographic schemes is far from coming to a halt, Trusted Execution Environments (TEEs) are increasingly sought as an alternative solution that can reduce the performance overhead associated with traditional privacy-preserving secure schemes. In AIDA we are exploring this technology to provide a privacy-preserving machine learning solution that can be used in practice, while scaling out for large datasets. SOTERIA is a system for distributed privacy-preserving machine learning, which leverages Apache Spark’s design and its MLlib APIs. Our solution was designed to avoid changing the architecture and processing flow of Apache Spark, keeping its scalability and fault tolerance properties.

Apart from cryptographic mechanisms, privacy guarantees can be provided by applying adequate anonymization mechanisms. However, selecting a privacy-preserving mechanism is quite challenging, not only by the lack of a standardized and universal privacy definition, but also by the need of properly selecting and configuring mechanisms according to the data types and privacy requirements. Moreover, the type of anonymization approaches employed may affect the performance of the machine learning mechanisms considered in the project. Focusing on the data types relevant for the AIDA project, we are developing a privacy framework that allows us to test configurations, apply and assess privacy-preserving mechanisms according to the achieved privacy and utility level of data.

By University of Coimbra and INESC TEC