Inteligencia Artificial de Privacidad

Ver en GitHub

TensorFlow Model

In this notebook, we provide a simple example of how to perform an experiment in a federated environment, with the help of this framework. We are going to use a popular dataset to start the experimentation in a federated environment. The framework provides some functions for loading the Emnist Digits dataset.

import shfl

database = shfl.data_base.Emnist()
train_data, train_labels, test_data, test_labels = database.load_data()

Let's inspect some properties of the loaded data.

print(len(train_data))
print(len(test_data))
print(type(train_data[0]))
train_data[0].shape
240000
40000
<class 'numpy.ndarray'>

(28, 28)

So, as we have seen, our dataset is composed of a set of matrices that are 28 by 28. Before starting with the federated scenario, we can take a look at a sample in the training data.

import matplotlib.pyplot as plt

plt.imshow(train_data[0])

png

We are going to simulate a federated learning scenario with a set of client nodes containing private data, and a central server that will be responsible for coordinating the different clients. But, first of all, we have to simulate the data contained in every client. In order to do that, we are going to use the previously loaded dataset. The assumption in this example is that the data is distributed as a set of independent and identically distributed random variables, with every node having approximately the same amount of data. There are a set of different possibilities for distributing the data. The distribution of the data is one of the factors that can have the most impact on a federated algorithm. Therefore, the framework has some of the most common distributions implemented, which allows you to easily experiment with different situations. In Sampling Methods, you can dig into the options that the framework provides, at the moment.

iid_distribution = shfl.data_distribution.IidDataDistribution(database)
federated_data, test_data, test_label = iid_distribution.get_federated_data(num_nodes=20, percent=10)

That's it! We have created federated data from the Emnist dataset using 20 nodes and 10 percent of the available data. This data is distributed to a set of data nodes in the form of private data. Let's learn a little more about the federated data.

print(type(federated_data))
print(federated_data.num_nodes())
federated_data[0].private_data
<class 'shfl.private.federated_operation.FederatedData'>
20
Node private data, you can see the data for debug purposes but the data remains in the node
<class 'dict'>
{'5461062416': <shfl.private.data.LabeledData object at 0x149cd6090>}

As we can see, private data in a node is not directly accessible, but the framework provides mechanisms to use this data in a machine learning model. A federated learning algorithm is defined by a machine learning model, locally deployed in each node, that learns from the respective node’s private data and an aggregating mechanism to aggregate the different model parameters uploaded by the client nodes to a central node. In this example, we will build a TensorFlow learning model. The framework provides classes on using TensorFlow and Keras (see A Simple Experiment) models in a federated learning scenario, your only job is to create a function acting as model builder. Moreover, the framework provides classes on using pretrained TensorFlow and Keras models (see Pretrained Model).

import tensorflow as tf
#If you want execute in GPU, you must uncomment this two lines.
# physical_devices = tf.config.experimental.list_physical_devices('GPU')
# tf.config.experimental.set_memory_growth(physical_devices[0], True)

class CustomDense(tf.keras.layers.Layer):
    """
    Implementation of Linear layer

    Attributes
    ----------
    units : int
        number of units for the output
    w : matrix
        Weights from the layer
    b : array
        Bias from the layer
    """

    def __init__(self, units=32, **kwargs):
        super(CustomDense, self).__init__(**kwargs)
        self._units = units

    def get_config(self):
        config = {'units': self._units}
        base_config = super(CustomDense, self).get_config()

        return dict(list(base_config.items()) + list(config.items()))

    def build(self, input_shape):
        """
        Method for build the params

        Parameters
        ----------
        input_shape: list
            size of inputs
        """
        self._w = self.add_weight(shape=(input_shape[-1], self._units),
                                  initializer='random_normal',
                                  trainable=True)

        self._b = self.add_weight(shape=(self._units,),
                                  initializer='random_normal',
                                  trainable=True)

    def call(self, inputs):
        """
        Apply linear layer

        Parameters
        ----------
        inputs: matrix
            Input data

        Return
        ------
        result : matrix
            the result of linear transformation of the data
        """
        return tf.nn.bias_add(tf.matmul(inputs, self._w), self._b)


def model_builder():
    inputs = tf.keras.Input(shape=(28, 28, 1))
    x = tf.keras.layers.Conv2D(32, kernel_size=(3, 3), padding='same', activation='relu', strides=1)(inputs)
    x = tf.keras.layers.MaxPooling2D(pool_size=2, strides=2, padding='valid')(x)
    x = tf.keras.layers.Dropout(0.4)(x)
    x = tf.keras.layers.Conv2D(32, kernel_size=(3, 3), padding='same', activation='relu', strides=1)(x)
    x = tf.keras.layers.MaxPooling2D(pool_size=2, strides=2, padding='valid')(x)
    x = tf.keras.layers.Flatten()(x)
    x = CustomDense(128)(x)
    x = tf.nn.relu(x)
    x = tf.keras.layers.Dropout(0.1)(x)
    x = CustomDense(64)(x)
    x = tf.nn.relu(x)
    x = CustomDense(10)(x)
    outputs = tf.nn.softmax(x)

    model = tf.keras.Model(inputs=inputs, outputs=outputs)

    model.compile(optimizer="rmsprop", loss="categorical_crossentropy", metrics=["accuracy"])

    return shfl.model.DeepLearningModel(model)

Now, the only piece missing is the aggregation operator. Nevertheless, the framework provides some aggregation operators that we can use. In the following piece of code, we define the federated aggregation mechanism. Moreover, we define the federated government based on the TensorFlow learning model, the federated data, and the aggregation mechanism.

aggregator = shfl.federated_aggregator.FedAvgAggregator()
federated_government = shfl.federated_government.FederatedGovernment(model_builder, federated_data, aggregator)

If you want to see all the aggregation operators, you can check out the Aggregation Operators notebook. Before running the algorithm, we want to apply a transformation to the data. A good practice is to define a federated operation that will ensure that the transformation is applied to the federated data in all the client nodes. We want to reshape the data, so we define the following FederatedTransformation.

import numpy as np

class Reshape(shfl.private.FederatedTransformation):

    def apply(self, labeled_data):
        labeled_data.data = np.reshape(labeled_data.data, (labeled_data.data.shape[0], labeled_data.data.shape[1], labeled_data.data.shape[2],1))

class CastFloat(shfl.private.FederatedTransformation):

    def apply(self, labeled_data):
        labeled_data.data = labeled_data.data.astype(np.float32)

shfl.private.federated_operation.apply_federated_transformation(federated_data, Reshape())
shfl.private.federated_operation.apply_federated_transformation(federated_data, CastFloat())

We are now ready to execute our federated learning algorithm.

test_data = np.reshape(test_data, (test_data.shape[0], test_data.shape[1], test_data.shape[2],1))
test_data = test_data.astype(np.float32)
federated_government.run_rounds(3, test_data, test_label)
Accuracy round 0
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cb6710>: [1.2928415536880493, 0.5999249815940857]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ccd8d0>: [1.2544679641723633, 0.6145250201225281]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6d10>: [1.4649477005004883, 0.4932500123977661]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6f90>: [1.1362718343734741, 0.6897000074386597]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6b50>: [1.3245103359222412, 0.5472750067710876]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0390>: [1.3977092504501343, 0.5182499885559082]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0210>: [1.253063678741455, 0.6249250173568726]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce09d0>: [1.261741042137146, 0.6463249921798706]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0890>: [1.0686293840408325, 0.7071750164031982]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0710>: [1.1946674585342407, 0.6825500130653381]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0250>: [1.4399402141571045, 0.5239999890327454]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0590>: [1.1110390424728394, 0.737725019454956]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0a90>: [1.2109249830245972, 0.6366999745368958]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0b90>: [1.378128170967102, 0.569599986076355]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0cd0>: [1.1583523750305176, 0.6611499786376953]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0e10>: [1.2141239643096924, 0.6123250126838684]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0f50>: [1.3370256423950195, 0.5838249921798706]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab0d0>: [1.1722753047943115, 0.6233000159263611]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab210>: [1.2288721799850464, 0.6362000107765198]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab350>: [1.04910147190094, 0.7120000123977661]
Global model test performance : [1.5315402746200562, 0.6554750204086304]



Accuracy round 1
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cb6710>: [0.7625688314437866, 0.7877249717712402]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ccd8d0>: [0.8031100630760193, 0.7783750295639038]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6d10>: [0.6858523488044739, 0.8170999884605408]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6f90>: [0.722141683101654, 0.7936000227928162]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6b50>: [0.8395906090736389, 0.7243499755859375]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0390>: [1.095941185951233, 0.649150013923645]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0210>: [0.7347734570503235, 0.7852500081062317]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce09d0>: [0.7487986087799072, 0.8043249845504761]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0890>: [0.7880127429962158, 0.7285000085830688]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0710>: [0.6359884738922119, 0.8355249762535095]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0250>: [0.792952299118042, 0.7698500156402588]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0590>: [0.6581923961639404, 0.8294249773025513]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0a90>: [0.5841638445854187, 0.8313500285148621]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0b90>: [0.652135968208313, 0.8245000243186951]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0cd0>: [0.8505703210830688, 0.7363499999046326]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0e10>: [0.651348352432251, 0.8321250081062317]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0f50>: [0.8804095983505249, 0.7199000120162964]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab0d0>: [0.8018162846565247, 0.7846249938011169]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab210>: [0.7168283462524414, 0.7993249893188477]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab350>: [0.9283837676048279, 0.6868249773979187]
Global model test performance : [0.6832903027534485, 0.8578000068664551]



Accuracy round 2
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cb6710>: [0.4508397579193115, 0.872825026512146]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ccd8d0>: [0.6095467805862427, 0.8156999945640564]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6d10>: [0.42919716238975525, 0.8871750235557556]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6f90>: [0.5784087777137756, 0.8202499747276306]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149cd6b50>: [0.4464050829410553, 0.8699749708175659]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0390>: [0.5509712100028992, 0.8209499716758728]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0210>: [0.5002411603927612, 0.8534500002861023]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce09d0>: [0.47208932042121887, 0.8781999945640564]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0890>: [0.46912622451782227, 0.8433250188827515]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0710>: [0.4439672827720642, 0.8736749887466431]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0250>: [0.6297665238380432, 0.8054500222206116]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0590>: [0.4329649806022644, 0.8920750021934509]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0a90>: [0.42659908533096313, 0.8774250149726868]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0b90>: [0.4823418855667114, 0.8494499921798706]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0cd0>: [0.4769411087036133, 0.8621249794960022]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0e10>: [0.4972580671310425, 0.8498749732971191]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149ce0f50>: [0.5624709129333496, 0.8492249846458435]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab0d0>: [0.4566011130809784, 0.8802250027656555]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab210>: [0.46325093507766724, 0.8721749782562256]
Test performance client <shfl.private.federated_operation.FederatedDataNode object at 0x149fab350>: [0.552659809589386, 0.8217499852180481]
Global model test performance : [0.4077396094799042, 0.9021250009536743]