Protecting the Security and Privacy of AIDA and its Data

Adapting the RAID platform comes with many security and privacy issues that need to be addressed. These issues appear mainly due to the transition to an edge architecture, which imposes the use of computation resources at the edges of the network, but also due to the need of supporting multiple tenants and network slicing with 5G technology. As mobile phones connect and disconnect, the network is always changing, requiring adaptation and monitorization tools to be used to maximize resources. And since the complexity of the system increases, attackers have more opportunities to exploit it. With these issues in mind, various mechanisms and protocols were put in place, assuring the overall security of the platform.

Secure Communications and Malware Identification

The transition to edge computing architecture introduces new vulnerabilities. Attacks like Man-in-the-Middle or Eavesdropping can occur with higher probabilities, thus it is of utmost importance to ensure secure communication between the different devices and components. The edge architectures can leverage different solutions, seeking or not compliance with standard industries for mobile applications like Mobile Edge Computing (MEC). Such standards facilitate the data processing at the edge, but also open new threats due to exposure of new APIs, which may require data flows between edge nodes and cloud components.

Secure communications for applications leveraging on APIs can rely on the advances of the HTTP protocol – HTTP/3 which introduces functionalities for increased security (i.e., reduced round trip time for initial session handshake) and support for unreliable links, for instance with high packet loss. The AIDA platform leverages the advances of HTTP/3 that relies on the QUIC protocol. In addition, KubeEdge orchestrates the diverse microservices of the AIDA platform at edge nodes and also supports multiple solutions to secure the communications at the control plane. Besides TLS connections, there is also support for the HTTP/3 protocol.

The AIDA platform is also able to allow ISPs to detect malware on the fly. First, the AIDA platform captures the network traffic (mainly DNS queries) and presents it to a botnet analyzing service. The service then produces a floating point evaluation regarding the probability of infection of a specific device, which can then be used to trigger a response based on the presented value.

The steps present in the detection are: (1) blacklist/whitelist analysis, (2) query rate analysis, (3) domain analysis (whether it is or not DGA-generated) and finally a (4) machine learning step. This pipeline-oriented scheme aims to achieve high speed and scalability, therefore, the packets leave the pipeline as early as a consistent evaluation is formed. In all steps a packet can be marked as infected or not(except (2) which can only deem infected) and leave the pipeline. Only if a packet does not meet the upper or lower bound criteria for infection, it traverses the pipeline into the next evaluation step.

Secure within Software Components

The AIDA platform is based on the use of microservices, which require special attention to ensure their security. The platform also is expected that large amounts of data will be generated, therefore, intrusion detection mechanisms must be lightweight while also being able to cope with a dynamic number of replicas and a large number of services.

When looking at the system as a whole, we were able to increase detection rates up to 60%, while only around 25% of the alarms were false positives. Without the techniques being employed the results would be overwhelmed by a really large amount of false positives. Also, considering adaptation environments, results improved by up to 80%, improving the overall security of the system.

Fig.1 – Results of the intrusion detection methods employed

Hoping to improve the intrusion detection rate and minimize false alarms, we also decided to explore the use of machine-learning techniques for intrusion detection. With this intent, we collected system calls from the microservice systems as our data and used classification techniques to detect intrusions. The results demonstrate a high detection rate for two of the five attacks from the tested vulnerabilities. Although only some attacks have been detected, the false alarm rate had excellent results, staying below 1% for all attacks. We also improved the results from machine learning by using a sliding window as a post-processing technique.

Based on the machine learning results, we understood the necessity of a better system call representation for intrusion detection techniques. For this, we decided to work on a representation that could convey more information about the connection between system calls. First, we devised a classification where system calls were divided into classes and subclasses. Later, we established relationships of different costs between these system calls to create a system call graph. Some parts of the system call graph can be seen in Fig.3. Since our representation was susceptible to our subjectivity, we designed a validation process that gathered information from other researchers in the research area. In the classification, we adjusted 17,28% of system calls based on the validation from other researchers. The graph validation has started, and it is in progress.

Fig.2 – Small representation of the system calls as a graph

Since the AIDA platform will receive dynamic loads, adaptations are possible throughout the execution, maximizing the resources being allocated. Mechanisms of self-adaptation were therefore put in place, monitoring the system and executing pre-defined actions that let the system react to the changes, allowing for high performance and availability. To do that, we will use the Trustworthiness Monitoring and Assessment (TMA) Framework. TMA allows self-adaptation mechanisms in cloud and edge applications. This is done through their REST interfaces, which interact with the probes and actuators of the managed element (i.e., the AIDA platform). As TMA can be easily tailored to any aspect to be monitored (e.g., performance, availability, security), it was chosen to be applied to the AIDA platform. In addition, we have recently made available a new dashboard that allows managing and visualizing TMA configurations.

Data Privacy

The assurance of data privacy in the AIDA platform demands suitable anonymization approaches to store and process large amounts of data. The development of a privacy framework came to overcome the difficulty in selecting and configuring the appropriate mechanism that fulfills the project requirements. This privacy framework allows to implement, apply, and assess Privacy-Preserving Mechanisms (PPMs) according to the pipeline below.

Fig. 3 – Privacy Framework architecture

The architecture of the privacy framework consists of a main Python Package with a set of subpackages, where each subpackage contains the corresponding adapter, that is, an abstract class that can be extended by implementing the abstract methods (i.e. relevant methods for the component). These adapters make the framework easily extendable, by allowing the implementation of new features (e.g. new PPMs or metrics).

These strategies are complemented by SOTERIA, which uses machine learning techniques to create a distributed privacy-preserving system. It was built taking into consideration both scalability and fault tolerance, allowing the processing of large datasets. See our December post for details.

By University of Coimbra