Handling Update Hotspots in Distributed Database Systems

Handling Update Hotspots in Distributed Database Systems Motivation Database systems must deal with the fact that real workloads often exhibit hotspots: Some items at certain times are accessed by concurrent transactions with high probability. This arises in telecoms, sensing, stock trading, shopping, banking, and numerous applications. Some are as simple as counting events, such as user votes or advertisement impressions in Web sites. Some of these applications, such as prepaid telco plans, selling event tickets, or keeping track of remaining inventory, in addition to counting, also need to enforce a bound invariant, that ensures that the quantity being tracked does not […]

There is always one more data bug

There is always one more data bug If we, as data scientists, receive a dataset from a reliable source, we should go ahead with the analysis (classification, clustering, deep learning, etc), right? Well, yes, that’s what most of us (myself included) often do, especially if there is a tight deadline. However, this could be dangerous. Let me describe some rude awakenings I suffered over the past decades, as well as remind you some fast and easy preventive measures. Examples (a.k.a. horror stories) E1 Geographical data Two decades ago, we got access to a public dataset of cross-roads in California, […]

AIDA Research on Suspicious Behavior and Anomaly Detection

AIDA Research on Suspicious Behavior and Anomaly Detection One of the goals of the AIDA Project is to investigate and identify new ways to help Analysts find Anomalous Behavior (of any kind) in large and complex pools of data. We hope this can ultimately lead to significant improvements in the detection of fraudulent activity occurring on Telecom Networks, especially in the light of new technologies like 5G already expanding on the market. Moreover, we now see over more sophisticated, complex and robust methods being used for committing fraud on Telecomm Networks but also increasingly expanding to the world of Communication […]

Protecting the Security and Privacy of AIDA and its Data

Protecting the Security and Privacy of AIDA and its Data Adapting the RAID platform comes with many security and privacy issues that need to be addressed. These issues appear mainly due to the transition to an edge architecture, which imposes the use of computation resources at the edges of the network, but also due to the need of supporting multiple tenants and network slicing with 5G technology. As mobile phones connect and disconnect, the network is always changing, requiring adaptation and monitorization tools to be used to maximize resources. And since the complexity of the system increases, attackers have more […]

Fraud detection, micro-clusters and scatterplots

Fraud detection, micro-clusters and scatterplots Acknowledgements The results and analysis presented here were done with contributions from Mirela Cazzolato (USP, and CMU), Saranya Vijayakumar (CMU), Xinyi (Carol) Zheng (CMU), Meng-Chieh (Jeremy) Lee (CMU), Namyong Park (CMU), Pedro Fidalgo (Mobileum), Bruno Lages (Mobileum), and Agma Traina (USP). Reminders – Problem definition and past insights As we mentioned in the February 2022 blog post, the problem we are focusing on is to spot fraudulent behavior in a who-calls-whom-and-when graph. We distinguished between the supervised case (where we are given a list of fraudulent subscribers (labeled data)), and the un-supervised one, where […]

Federated Machine Learning

Federated Machine Learning Federated Learning (FL) is a collaboratively decentralized privacy-preserving technology to overcome the challenges of data storage and data sensibility [1]. The last few years have been strongly marked by artificial intelligence, machine learning, smart devices, and deep learning. As a result, two challenges arose in data science, impacting how data can be accessed and used. First, with the creation of the General Data Protection Regulation (GDPR) [2], the data became protected by the regulation. Institutions cannot store or share data without users’ authorization. Another challenge is that in the era of big data, a large volume of […]

Finding Anomalies in Large Scale Graphs

Finding Anomalies in Large Scale Graphs Problem definition Given a large, who-calls-whom graph, how can we nd anomalies and fraud? How can we explain the results of our algorithms? This is exactly the focus of this project. We distinguish two settings: static graphs (no timestamps), and time-evolving graphs (with timestamps for each phone). We further subdivide into two sub-cases each: supervised, and unsupervised. In the supervised case, we have the labels for some of the nodes (‘fraud’/’honest’), while in the unsupervised one, we have no labels at all. Major lessons For the supervised case, the natural assumption is that […]

A Data Management Architecture for AIDA

A Data Management Architecture for AIDA One of the major challenges in the evolution of the RAID platform during the AIDA project is the need to further distribute the platform components to achieve greater levels of scalability, by leveraging the increasing edge computing capacity made available by the IoT and the imminent large-scale deployment of 5G cellular technology. The advent of 5G networks and growing adoption of Internet of Things (IoT) devices lead to more opportunities for data collection and processing with hybrid edge-cloud systems. In this architecture, edge devices — placed near where the data is being […]

How can we protect the security and the privacy of the AIDA platform?

How can we protect the security and the privacy of the AIDA platform? The evolution of the RAID platform during the AIDA project brings many benefits, but also many security and privacy concerns to be considered. Thus, discovering and implementing measures that address these new risks, while not degrading performance, is of utmost importance. The main challenges are related to the transition to edge, pushing the computational power to the edges of the network, to the integration of 5G supporting multiple tenants and network slicing, and finally to the privacy of the data gathered and analyzed. Given these changes […]

Fraud Risk Management

Fraud Risk Management 5G presents an opportunity for telecom operators to capture new revenue streams from industrial digitization. In cases such as network-as-a-service (NaaS), network exposure is becoming a reality through the transformation of core telecom network assets into digital assets. With 5G, the dynamic provisioning and scaling of network capacity and resources are available for the first time. The vision of managing the network-as-a-service in the same way as a developer might manage cloud resources on Azure, AWS, or Google Cloud is happening through a combination of scalable infrastructure and the next generation of digital business support systems (BSS). […]

Blog