Example of Vertical Federated Learning with Differential Privacy (assuming PSI*)

In this notebook we provide a example of how Differential Privacy is added in the Sherpa's platform and the effects it has on the whole process. To set up the federated learning experiment we will replicate the same process of vertical Federated Learning in order to train a deep learning model.

This first part is a replica of the vertical Federated Learning case using a dataset belonging to an insurance and a bank, following the same steps. The descriptions of each of the parts are brief, as it main focus is on how is Differential Privacy applied in this paradigm.

Index

0) Libraries and data

First of all, we will load the libraries that we are going to use:

import warnings

import numpy as np
import pandas as pd
from shfl.auxiliar_functions_for_notebooks.business_impact import *
from shfl.auxiliar_functions_for_notebooks.functionsFL import *
from shfl.auxiliar_functions_for_notebooks.preprocessing import *
from shfl.data_base.data_base import split_train_test
from shfl.federated_aggregator import FedSumAggregator
from shfl.federated_government.vertical_federated_government import VerticalFederatedGovernment
from shfl.model.vertical_deep_learning_model_pt import VerticalNeuralNetClientModelPyTorch
from shfl.model.vertical_deep_learning_model_pt import VerticalNeuralNetServerModelPyTorch
from shfl.private.data import LabeledData
from shfl.private.federated_operation import VerticalServerDataNode
from shfl.private.federated_operation import federate_list
from shfl.private.reproducibility import Reproducibility

plt.style.use('seaborn')
pd.set_option("display.max_rows", 30, "display.max_columns", None)
warnings.filterwarnings('ignore')
Reproducibility(456)
2022-04-25 16:33:55.130340: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-25 16:33:55.130356: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.





<shfl.private.reproducibility.Reproducibility at 0x7f34700c2b50>

We load the data of the bank and the insurance:

data_bank = pd.read_csv("./data_bank.csv", sep=",")
data_insurance = pd.read_csv("./data_insurance.csv", sep=",")

We now convert the target ("yes" or "no") of the Bank into numeric to be able to make predictions:

labels_total_numeric = label_encoder(data_bank.label)

And we split it from the dataset:

data_bank=data_bank.drop("label", axis=1)

1) Prepare the data and the models for the vertical federated learning scenario preserving the privacy

1.1) Bank Data Preprocessing

We have to One Hot Encode the categorical variables and normalize the numeric variables:

data_bank_array = preprocessing_data(data_bank)

We have already supposed that the labels are assigned to the bank. We split the data sets into training data and testing data.

train_data_bank, train_labels, test_data_bank, test_labels = split_train_test(data_bank_array,
                                                                              labels_total_numeric)

1.2) Insurance Data Preprocessing

We do the same preprocessing and splitting for the insurance:

data_insurance_array = preprocessing_data(data_insurance)
train_data_ins, test_data_ins = split_train_test(data_insurance_array)

1.3) Deploying the full neural network model

To obtain the scheme of the neural network, we are going to build the whole structure.

1.3.1) Create the federated nodes

We generate the federated nodes with the next function:

nodes_federation = federate_list([train_data_bank, train_data_ins],fed_data_node_type="HeterogeneousDataNode")
And we will create the **test data**:data**:
test_data = [test_data_bank, test_data_ins]
We need to specify which **model to use for each client node**, they will run a **neural network model**, but they will have different input sizes since they possess different number of features.features.
client_out_dim = 2

def model_nodes_list():
    model0 = nn.Sequential(
    nn.Linear(train_data_bank.shape[1], client_out_dim, bias=True),
    )

    model1 = nn.Sequential(
        nn.Linear(train_data_ins.shape[1], client_out_dim, bias=True),
    )

    optimizer0 = torch.optim.SGD(params=model0.parameters(), lr=0.001)
    optimizer1 = torch.optim.SGD(params=model1.parameters(), lr=0.001)

    batch_size = 32
    model_nodes = [VerticalNeuralNetClientModelPyTorch(model=model0, loss=None, optimizer=optimizer0, batch_size=batch_size),
                   VerticalNeuralNetClientModelPyTorch(model=model1, loss=None, optimizer=optimizer1, batch_size=batch_size)]
    return model_nodes

model_nodes = model_nodes_list()
#### 1.3.2) Create the server model

The server is assigned a linear model, along with the data to train on.

def server_node():
    model_server = torch.nn.Sequential(torch.nn.Linear(client_out_dim, 1, bias=True),
                                       torch.nn.Sigmoid())
    loss_server = torch.nn.BCELoss(reduction="mean")
    optimizer_server = torch.optim.SGD(params=model_server.parameters(),lr=0.001)
    model = VerticalNeuralNetServerModelPyTorch(model_server, loss_server, optimizer_server,
                                      metrics={"roc_auc_shfl":roc_auc_shfl})
    return model

def vertical_server(nodes_federation,train_labels):
    server = VerticalServerDataNode(nodes_federation=nodes_federation,
                                    model=server_node(),
                                    aggregator=FedSumAggregator(),
                                    data=LabeledData(data=None,
                                                     label=train_labels.reshape(-1,1).astype(np.float32)))
    return server

server = vertical_server(nodes_federation,train_labels)

# Configure data access to nodes and server
nodes_federation.configure_model_access(meta_params_query)
server.configure_model_access(meta_params_query)
server.configure_data_access(train_set_evaluation)

Pytorch models expect by default input data to be float:

# Convert to float
nodes_federation.apply_data_transformation(cast_to_float);

2) Run the experiment

We are ready to launch our federated experiment with 10 epochs and without Differential Privacy:

# Create federated government
federated_government = VerticalFederatedGovernment(model_nodes,
                                                   nodes_federation,
                                                   server_node=server)

federated_government.run_epochs(epochs=20,
                                batch_size=32,
                                length_dataset=len(train_data_bank),
                                test_data=test_data,
                                test_label=test_labels.reshape(-1,1),
                                eval_freq=0)
Evaluation in  epoch  1 :
Loss: 0.391084223985672   Accuracy: 0.7140154630071299


Evaluation in  epoch  2 :
Loss: 0.32680657505989075   Accuracy: 0.7363164870049965


Evaluation in  epoch  3 :
Loss: 0.30523717403411865   Accuracy: 0.7436662066732604


Evaluation in  epoch  4 :
Loss: 0.29654932022094727   Accuracy: 0.74783887383166


Evaluation in  epoch  5 :
Loss: 0.2923213243484497   Accuracy: 0.7511411374882173


Evaluation in  epoch  6 :
Loss: 0.2898702025413513   Accuracy: 0.7540682992256347


Evaluation in  epoch  7 :
Loss: 0.28819844126701355   Accuracy: 0.7566138289295304


Evaluation in  epoch  8 :
Loss: 0.28690990805625916   Accuracy: 0.7587437092598693


Evaluation in  epoch  9 :
Loss: 0.28584083914756775   Accuracy: 0.7606230913652601


Evaluation in  epoch  10 :
Loss: 0.28491896390914917   Accuracy: 0.7621514418568487


Evaluation in  epoch  11 :
Loss: 0.2841084897518158   Accuracy: 0.7635078073591959


Evaluation in  epoch  12 :
Loss: 0.28338906168937683   Accuracy: 0.7647054758957532


Evaluation in  epoch  13 :
Loss: 0.2827470004558563   Accuracy: 0.7657924362140704


Evaluation in  epoch  14 :
Loss: 0.28217220306396484   Accuracy: 0.7667910881251084


Evaluation in  epoch  15 :
Loss: 0.2816565930843353   Accuracy: 0.7676595174063237


Evaluation in  epoch  16 :
Loss: 0.2811933755874634   Accuracy: 0.7684885382065603


Evaluation in  epoch  17 :
Loss: 0.2807767391204834   Accuracy: 0.7692193035648198


Evaluation in  epoch  18 :
Loss: 0.28040173649787903   Accuracy: 0.7699452852346368


Evaluation in  epoch  19 :
Loss: 0.28006383776664734   Accuracy: 0.7705858438965527


Evaluation in  epoch  20 :
Loss: 0.27975916862487793   Accuracy: 0.7712013451428174

Now that both the normal experiment and the one with differential privacy are calculated, the difference needs to be plotted. For this purpose, we are using the ROC AUC metric. The ROC Curve is a useful diagnostic tool for understanding the trade-off for different thresholds and the ROC AUC provides a useful number for comparing models based on their general capabilities.

First we check the ROC AUC scores, and then we will plot them in a single chart to compare.

y_prediction_fed = federated_government._server.predict_collaborative_model(test_data)
plot_roc(y_prediction_fed, test_labels)

png

3) Adding differential privacy

Differential Privacy is a statistical technique to provide data aggregations, while avoiding the leakage of individual data records. This technique ensures that malicious agents intervening in the communication of local parameters cannot trace this information back to the data sources, adding an additional layer of data privacy.

There are few steps towards adding this privacy layer, as it needs to be properly adapted to the data. First the sensitivity of the data, then the noise mechanism and finally the parameters of the mechanism, also called the privacy parameter or privacy budget.

3.1) Calculating sensitivity

Following the same methodology as in the horizontal Federated Learning with Differential Privacy case, we will calculate the sensitivity, now for tabular data.

model_test_bank = nn.Sequential(
    nn.Linear(train_data_bank.shape[1], 256, bias=True),
    nn.Linear(256, 128, bias=True),
    nn.Linear(128, client_out_dim, bias=True),
)

model_test_ins = nn.Sequential(
    nn.Linear(train_data_ins.shape[1], 256, bias=True),
    nn.Linear(256, 128, bias=True),
    nn.Linear(128, client_out_dim, bias=True),
)

max_sensitivity_bank=sensitivity(400,train_data_bank,model_test_bank)
max_sensitivity_ins=sensitivity(400,train_data_ins,model_test_ins)

For these experiments, we replicate the same models and structure:

nodes_federation = federate_list([train_data_bank, train_data_ins],fed_data_node_type="HeterogeneousDataNode")

model_nodes = model_nodes_list()

server = vertical_server(nodes_federation,train_labels)

nodes_federation.configure_model_access(meta_params_query)
server.configure_model_access(meta_params_query)
server.configure_data_access(train_set_evaluation)

3.2) Adding the noise mechanism: trade-off between accuracy and privacy

3.2.1) Local Differential Privacy

Adding Differential Privacy comes at a small cost. Regulating the ε value enables us to select the exact balance, as we want to mantain the effectivity of the model so as to be usable in real world applications. For this case, we are using local Differential Privacy to mask the values sent to the server in the local parties. A value of ε = 0.5 will give us a high amount of privacy, so let's see how it behaves.

from shfl.differential_privacy import LaplaceMechanism

params_access_definition_bank = LaplaceMechanism(sensitivity=max_sensitivity_bank, epsilon=0.5)
params_access_definition_ins = LaplaceMechanism(sensitivity=max_sensitivity_ins, epsilon=0.5)

nodes_federation[0]._model_params_access_policy=params_access_definition_bank
nodes_federation[1]._model_params_access_policy=params_access_definition_ins

Then, we run the vertical training:

# Create federated government
federated_government_dp = VerticalFederatedGovernment(model_nodes,
                                                      nodes_federation,
                                                      server_node=server)

federated_government_dp.run_epochs(epochs=20,
                                batch_size=32,
                                length_dataset=len(train_data_bank),
                                test_data=test_data,
                                test_label=test_labels.reshape(-1,1),
                                eval_freq=0)
Evaluation in  epoch  1 :
Loss: 0.4675355851650238   Accuracy: 0.6640544690508267


Evaluation in  epoch  2 :
Loss: 0.3920636475086212   Accuracy: 0.7047485473153106


Evaluation in  epoch  3 :
Loss: 0.3435995578765869   Accuracy: 0.72908241110654


Evaluation in  epoch  4 :
Loss: 0.3171211779117584   Accuracy: 0.7387661910769175


Evaluation in  epoch  5 :
Loss: 0.30362868309020996   Accuracy: 0.7428616358361743


Evaluation in  epoch  6 :
Loss: 0.2964669466018677   Accuracy: 0.7459033023699607


Evaluation in  epoch  7 :
Loss: 0.2925344705581665   Accuracy: 0.7483224819537252


Evaluation in  epoch  8 :
Loss: 0.29021021723747253   Accuracy: 0.7503316310995634


Evaluation in  epoch  9 :
Loss: 0.2887301743030548   Accuracy: 0.7519811683650279


Evaluation in  epoch  10 :
Loss: 0.2876769006252289   Accuracy: 0.753289621051388


Evaluation in  epoch  11 :
Loss: 0.28688913583755493   Accuracy: 0.7544001960856672


Evaluation in  epoch  12 :
Loss: 0.2862595319747925   Accuracy: 0.7553387101991427


Evaluation in  epoch  13 :
Loss: 0.28572171926498413   Accuracy: 0.7561912697837788


Evaluation in  epoch  14 :
Loss: 0.2852461040019989   Accuracy: 0.7569897660958587


Evaluation in  epoch  15 :
Loss: 0.2848053276538849   Accuracy: 0.7577042820997263


Evaluation in  epoch  16 :
Loss: 0.2844155728816986   Accuracy: 0.7583033821284741


Evaluation in  epoch  17 :
Loss: 0.28407275676727295   Accuracy: 0.7588756783156313


Evaluation in  epoch  18 :
Loss: 0.2837557792663574   Accuracy: 0.759440912867469


Evaluation in  epoch  19 :
Loss: 0.2834589183330536   Accuracy: 0.7599086512929552


Evaluation in  epoch  20 :
Loss: 0.283174604177475   Accuracy: 0.7603993969819026
The ROC plot for Differential Privacy applied in the local nodes is:is:
y_prediction_dp = federated_government_dp._server.predict_collaborative_model(test_data)
plot_roc(y_prediction_dp, test_labels)

png

3.2.2) Differential Privacy on top of everything

The labels and the model of the server can also be critical, so in order to avoid possible leaks of information, Differential Privacy can be used to protect the server when returning the gradients for the learning process. A value of ε = 0.5 will give us a high amount of privacy, and it will let us compare the performance between the local and on top of everything Differential Privacy. Again, we replicate the same models and structure for the experiment.

nodes_federation = federate_list([train_data_bank, train_data_ins], fed_data_node_type="HeterogeneousDataNode")
model_nodes = model_nodes_list()

server = vertical_server(nodes_federation,train_labels)

nodes_federation.configure_model_access(meta_params_query)
server.configure_model_access(meta_params_query)
server.configure_data_access(train_set_evaluation)

In this case, we a re applying Differential Privacy on the server part, apart from locally. This mechanism is applied when returning the learnt gradients, so that it does not reveal directly the real information returned by the server model. The mechanism sensitivity is 0, just because it is calculated on the server part that needs the embeddings to work.

params_access_definition_bank = LaplaceMechanism(sensitivity=max_sensitivity_bank, epsilon=0.5)
params_access_definition_ins = LaplaceMechanism(sensitivity=max_sensitivity_ins, epsilon=0.5)
params_access_definition_server = LaplaceMechanism(sensitivity=0, epsilon=0.1)

nodes_federation[0]._model_params_access_policy=params_access_definition_bank
nodes_federation[1]._model_params_access_policy=params_access_definition_ins

server._model_params_access_policy=params_access_definition_server

Finally, we run the last training:

# Create federated government
federated_government_dp_2 = VerticalFederatedGovernment(model_nodes,
                                                      nodes_federation,
                                                      server_node=server)

federated_government_dp_2.run_epochs(epochs=20,
                                batch_size=32,
                                length_dataset=len(train_data_bank),
                                test_data=test_data,
                                test_label=test_labels.reshape(-1,1),
                                eval_freq=0)
Evaluation in  epoch  1 :
Loss: 0.4450891315937042   Accuracy: 0.4634378895763732


Evaluation in  epoch  2 :
Loss: 0.40160995721817017   Accuracy: 0.555622906851563


Evaluation in  epoch  3 :
Loss: 0.3697430491447449   Accuracy: 0.6586422950648889


Evaluation in  epoch  4 :
Loss: 0.3464324474334717   Accuracy: 0.7115684924714615


Evaluation in  epoch  5 :
Loss: 0.32728689908981323   Accuracy: 0.7324506592909784


Evaluation in  epoch  6 :
Loss: 0.31455641984939575   Accuracy: 0.7394715193617132


Evaluation in  epoch  7 :
Loss: 0.30522421002388   Accuracy: 0.7432747035366036


Evaluation in  epoch  8 :
Loss: 0.2994040846824646   Accuracy: 0.7459770319172248


Evaluation in  epoch  9 :
Loss: 0.2952128052711487   Accuracy: 0.7482064585261046


Evaluation in  epoch  10 :
Loss: 0.2922833263874054   Accuracy: 0.7500587330636548


Evaluation in  epoch  11 :
Loss: 0.2903350293636322   Accuracy: 0.751723836299448


Evaluation in  epoch  12 :
Loss: 0.28914111852645874   Accuracy: 0.7532261422650719


Evaluation in  epoch  13 :
Loss: 0.28820687532424927   Accuracy: 0.7546866099397149


Evaluation in  epoch  14 :
Loss: 0.2874971032142639   Accuracy: 0.7555486609696734


Evaluation in  epoch  15 :
Loss: 0.2868814766407013   Accuracy: 0.7561214127462063


Evaluation in  epoch  16 :
Loss: 0.2864290475845337   Accuracy: 0.7570966777359699


Evaluation in  epoch  17 :
Loss: 0.2859129011631012   Accuracy: 0.7575949406496126


Evaluation in  epoch  18 :
Loss: 0.2854032516479492   Accuracy: 0.7582002670664919


Evaluation in  epoch  19 :
Loss: 0.28495877981185913   Accuracy: 0.7586544137422764


Evaluation in  epoch  20 :
Loss: 0.2845660150051117   Accuracy: 0.7591349086702758

The ROC plot for Differential Privacy also in the server part is:

y_prediction_dp_2 = federated_government_dp_2._server.predict_collaborative_model(test_data)
plot_roc(y_prediction_dp_2, test_labels)

png

4) Comparison

We can see the impact of using different approaches for the vertical Federated Learning when it comes to Differential Privacy. The decision should take into consideration how much privacy we need, where is interesting to use it and how much noise can the model tolerate until the performance starts declining.

4.1) ROC curve

values=[y_prediction_fed, y_prediction_dp, y_prediction_dp_2]
titles=['Federated', 'Federated with local DP', 'Federated with DP everywhere']
colors=['blue', 'lightgreen', 'green']
linestyle=[':','-.','-']

plot_all_roc_curves(test_labels, values, titles, colors, linestyle)

png

One of the effects observed in some cases is that a model can get stuck in a local minima, but when some noise is added in each iteration there is a possibility to escape from there and get to a closer value comparing to the global minima. Changing this starting point adding a lot of differential privacy can lead to this, but it also can harm your model, so selecting the best balance between model performance and privacy is always advised.

local-global

With the ROC AUC curve we can not extract a conclusion about the behaviour of the model when the differential privacy mechanism is applied, since they all have a very similar graph. For that reason, we are going to calculate the F1-Score for the different models.

4.2) F1-Score

n_classes=2

values_f1_fed = (y_prediction_fed > 0.5).astype(int)

values_f1_dp = (y_prediction_dp > 0.5).astype(int)

values_f1_dp2 = (y_prediction_dp_2 > 0.5).astype(int)


score_fed_f1 = f1_score(test_labels, values_f1_fed, average='macro')

score_dp_f1 = f1_score(test_labels, values_f1_dp, average='macro')

score_dp2_f1 = f1_score(test_labels, values_f1_dp2, average='macro')

values=[round(score_fed_f1, 4), round(score_dp_f1, 4), round(score_dp2_f1, 4)]
titles=['Federated', 'Federated with local DP', 'Federated with DP everywhere']
colors=['blue', 'lightgreen', 'green']
plot_all_metric(values, "F1-Score", titles, colors)

png

The most private model has less F1-Score than the others due to the low values of ε (which means high privacy). However, the performance loss is not really significant and the utility is kept. The scores of the cases with Differential Privacy can be mitigated with a longer training, but with limits, since the noise makes it difficult for the problem to converge. This means, with more rounds of training, we can improve these results, but the computational cost increases and at some point it is not worthy to keep training because a minimum is reached.

;