Privacy Artificial Intelligence

View on GitHub

Attack Simulation

In this notebook, we provide a simulation of a simple federated data poisoning attack. First, we will use a simple approach that consists of shuffling the training labels of some clients, which will become adversarial.

The aim of this notebook is to present the class FederatedDataAttack implemented in, whose goal is to implement any attack on the federated data. For more information about basic federated learning concepts, refer to the A Simple Experiment notebook.

For this simulation, we choose to use the Emnist Digits dataset.

from shfl.data_base import Emnist
from shfl.data_distribution import NonIidDataDistribution
import numpy as np
import random

database = Emnist()
train_data, train_labels, test_data, test_labels = database.load_data()

Now, we distribute the data among the client nodes using a non-IID distribution over 10% of the data.

noniid_distribution = NonIidDataDistribution(database)
federated_data, test_data, test_labels = noniid_distribution.get_federated_data(num_nodes=20, percent=10)

At this point, we are ready to apply a data attack to some nodes. For this simulation, we choose to apply data poisoning to the 20% of the nodes. To do so, we implement the interface FederatedTransformation with a shuffling of the training labels of federated_data and create FederatedPoisoningDataAttack, which implements FederatedDataAttack with data poisoning in a certain percentage of the nodes.

from shfl.private.federated_operation import FederatedTransformation
from shfl.private.federated_attack import FederatedDataAttack


class ShuffleNode(FederatedTransformation):
    def apply(self, labeled_data):

class FederatedPoisoningDataAttack(FederatedDataAttack):
    def __init__(self, percentage):
        self._percentage = percentage
        self._adversaries = []

    def adversaries(self):
        return self._adversaries

    def apply_attack(self, federated_data):
        num_nodes = federated_data.num_nodes()
        list_nodes = np.arange(num_nodes)
        self._adversaries = random.sample(list(list_nodes), k=int(self._percentage / 100 * num_nodes))
        boolean_adversaries = [1 if x in self._adversaries else 0 for x in list_nodes]

        for node, boolean in zip(federated_data, boolean_adversaries):
            if boolean:

We create a FederatedPoisoningDataAttack object with the percentage set to 20% and apply the attack over federated_data.

simple_attack = FederatedPoisoningDataAttack(percentage=20)
simple_attack.apply_attack(federated_data = federated_data)

We can get the adversarial nodes in order to show the applied attack.

adversarial_nodes = simple_attack.adversaries
[1, 8, 2, 13]

In order to show the effect of the attack, we select one adversarial client and an index position and show the data and the label associated with this image. We change data access protection (see FederatedData), in order to access the data. Due to the nature of the data poisoning (random shuffle), it is possible that for some specific data, the label will match, but in most cases it will not.

import matplotlib.pyplot as plt
from import UnprotectedAccess

adversarial_index = 0
data_index = 10


[0. 0. 0. 0. 0. 0. 1. 0. 0. 0.]

At this point, we can train a FL model among these clients (adversarial and regular) using a specific aggregation operator. For more information, please see the A Simple Experiment notebook.