Example of Vertical Federated Learning with Differential Privacy (assuming PSI*)
In this notebook we provide a example of how Differential Privacy is added in the Sherpa's platform and the effects it has on the whole process. To set up the federated learning experiment we will replicate the same process of vertical Federated Learning in order to train a deep learning model.
This first part is a replica of the vertical Federated Learning case using a dataset belonging to an insurance and a bank, following the same steps. The descriptions of each of the parts are brief, as it main focus is on how is Differential Privacy applied in this paradigm.
Index
0) Libraries and data
First of all, we will load the libraries that we are going to use:
import warnings
import numpy as np
import pandas as pd
from shfl.auxiliar_functions_for_notebooks.business_impact import *
from shfl.auxiliar_functions_for_notebooks.functionsFL import *
from shfl.auxiliar_functions_for_notebooks.preprocessing import *
from shfl.data_base.data_base import split_train_test
from shfl.federated_aggregator import FedSumAggregator
from shfl.federated_government.vertical_federated_government import VerticalFederatedGovernment
from shfl.model.vertical_deep_learning_model_pt import VerticalNeuralNetClientModelPyTorch
from shfl.model.vertical_deep_learning_model_pt import VerticalNeuralNetServerModelPyTorch
from shfl.private.data import LabeledData
from shfl.private.federated_operation import VerticalServerDataNode
from shfl.private.federated_operation import federate_list
from shfl.private.reproducibility import Reproducibility
plt.style.use('seaborn')
pd.set_option("display.max_rows", 30, "display.max_columns", None)
warnings.filterwarnings('ignore')
Reproducibility(456)
2022-04-25 16:33:55.130340: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-04-25 16:33:55.130356: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
<shfl.private.reproducibility.Reproducibility at 0x7f34700c2b50>
We load the data of the bank and the insurance:
data_bank = pd.read_csv("./data_bank.csv", sep=",")
data_insurance = pd.read_csv("./data_insurance.csv", sep=",")
We now convert the target ("yes" or "no") of the Bank into numeric to be able to make predictions:
labels_total_numeric = label_encoder(data_bank.label)
And we split it from the dataset:
data_bank=data_bank.drop("label", axis=1)
1) Prepare the data and the models for the vertical federated learning scenario preserving the privacy
1.1) Bank Data Preprocessing
We have to One Hot Encode the categorical variables and normalize the numeric variables:
data_bank_array = preprocessing_data(data_bank)
We have already supposed that the labels are assigned to the bank. We split the data sets into training data and testing data.
train_data_bank, train_labels, test_data_bank, test_labels = split_train_test(data_bank_array,
labels_total_numeric)
1.2) Insurance Data Preprocessing
We do the same preprocessing and splitting for the insurance:
data_insurance_array = preprocessing_data(data_insurance)
train_data_ins, test_data_ins = split_train_test(data_insurance_array)
1.3) Deploying the full neural network model
To obtain the scheme of the neural network, we are going to build the whole structure.
1.3.1) Create the federated nodes
We generate the federated nodes with the next function:
nodes_federation = federate_list([train_data_bank, train_data_ins],fed_data_node_type="HeterogeneousDataNode")
test_data = [test_data_bank, test_data_ins]
client_out_dim = 2
def model_nodes_list():
model0 = nn.Sequential(
nn.Linear(train_data_bank.shape[1], client_out_dim, bias=True),
)
model1 = nn.Sequential(
nn.Linear(train_data_ins.shape[1], client_out_dim, bias=True),
)
optimizer0 = torch.optim.SGD(params=model0.parameters(), lr=0.001)
optimizer1 = torch.optim.SGD(params=model1.parameters(), lr=0.001)
batch_size = 32
model_nodes = [VerticalNeuralNetClientModelPyTorch(model=model0, loss=None, optimizer=optimizer0, batch_size=batch_size),
VerticalNeuralNetClientModelPyTorch(model=model1, loss=None, optimizer=optimizer1, batch_size=batch_size)]
return model_nodes
model_nodes = model_nodes_list()
The server is assigned a linear model, along with the data to train on.
def server_node():
model_server = torch.nn.Sequential(torch.nn.Linear(client_out_dim, 1, bias=True),
torch.nn.Sigmoid())
loss_server = torch.nn.BCELoss(reduction="mean")
optimizer_server = torch.optim.SGD(params=model_server.parameters(),lr=0.001)
model = VerticalNeuralNetServerModelPyTorch(model_server, loss_server, optimizer_server,
metrics={"roc_auc_shfl":roc_auc_shfl})
return model
def vertical_server(nodes_federation,train_labels):
server = VerticalServerDataNode(nodes_federation=nodes_federation,
model=server_node(),
aggregator=FedSumAggregator(),
data=LabeledData(data=None,
label=train_labels.reshape(-1,1).astype(np.float32)))
return server
server = vertical_server(nodes_federation,train_labels)
# Configure data access to nodes and server
nodes_federation.configure_model_access(meta_params_query)
server.configure_model_access(meta_params_query)
server.configure_data_access(train_set_evaluation)
Pytorch models expect by default input data to be float
:
# Convert to float
nodes_federation.apply_data_transformation(cast_to_float);
2) Run the experiment
We are ready to launch our federated experiment with 10 epochs and without Differential Privacy:
# Create federated government
federated_government = VerticalFederatedGovernment(model_nodes,
nodes_federation,
server_node=server)
federated_government.run_epochs(epochs=20,
batch_size=32,
length_dataset=len(train_data_bank),
test_data=test_data,
test_label=test_labels.reshape(-1,1),
eval_freq=0)
Evaluation in epoch 1 :
Loss: 0.391084223985672 Accuracy: 0.7140154630071299
Evaluation in epoch 2 :
Loss: 0.32680657505989075 Accuracy: 0.7363164870049965
Evaluation in epoch 3 :
Loss: 0.30523717403411865 Accuracy: 0.7436662066732604
Evaluation in epoch 4 :
Loss: 0.29654932022094727 Accuracy: 0.74783887383166
Evaluation in epoch 5 :
Loss: 0.2923213243484497 Accuracy: 0.7511411374882173
Evaluation in epoch 6 :
Loss: 0.2898702025413513 Accuracy: 0.7540682992256347
Evaluation in epoch 7 :
Loss: 0.28819844126701355 Accuracy: 0.7566138289295304
Evaluation in epoch 8 :
Loss: 0.28690990805625916 Accuracy: 0.7587437092598693
Evaluation in epoch 9 :
Loss: 0.28584083914756775 Accuracy: 0.7606230913652601
Evaluation in epoch 10 :
Loss: 0.28491896390914917 Accuracy: 0.7621514418568487
Evaluation in epoch 11 :
Loss: 0.2841084897518158 Accuracy: 0.7635078073591959
Evaluation in epoch 12 :
Loss: 0.28338906168937683 Accuracy: 0.7647054758957532
Evaluation in epoch 13 :
Loss: 0.2827470004558563 Accuracy: 0.7657924362140704
Evaluation in epoch 14 :
Loss: 0.28217220306396484 Accuracy: 0.7667910881251084
Evaluation in epoch 15 :
Loss: 0.2816565930843353 Accuracy: 0.7676595174063237
Evaluation in epoch 16 :
Loss: 0.2811933755874634 Accuracy: 0.7684885382065603
Evaluation in epoch 17 :
Loss: 0.2807767391204834 Accuracy: 0.7692193035648198
Evaluation in epoch 18 :
Loss: 0.28040173649787903 Accuracy: 0.7699452852346368
Evaluation in epoch 19 :
Loss: 0.28006383776664734 Accuracy: 0.7705858438965527
Evaluation in epoch 20 :
Loss: 0.27975916862487793 Accuracy: 0.7712013451428174
Now that both the normal experiment and the one with differential privacy are calculated, the difference needs to be plotted. For this purpose, we are using the ROC AUC metric. The ROC Curve is a useful diagnostic tool for understanding the trade-off for different thresholds and the ROC AUC provides a useful number for comparing models based on their general capabilities.
First we check the ROC AUC scores, and then we will plot them in a single chart to compare.
y_prediction_fed = federated_government._server.predict_collaborative_model(test_data)
plot_roc(y_prediction_fed, test_labels)
3) Adding differential privacy
Differential Privacy is a statistical technique to provide data aggregations, while avoiding the leakage of individual data records. This technique ensures that malicious agents intervening in the communication of local parameters cannot trace this information back to the data sources, adding an additional layer of data privacy.
There are few steps towards adding this privacy layer, as it needs to be properly adapted to the data. First the sensitivity of the data, then the noise mechanism and finally the parameters of the mechanism, also called the privacy parameter or privacy budget.
3.1) Calculating sensitivity
Following the same methodology as in the horizontal Federated Learning with Differential Privacy case, we will calculate the sensitivity, now for tabular data.
model_test_bank = nn.Sequential(
nn.Linear(train_data_bank.shape[1], 256, bias=True),
nn.Linear(256, 128, bias=True),
nn.Linear(128, client_out_dim, bias=True),
)
model_test_ins = nn.Sequential(
nn.Linear(train_data_ins.shape[1], 256, bias=True),
nn.Linear(256, 128, bias=True),
nn.Linear(128, client_out_dim, bias=True),
)
max_sensitivity_bank=sensitivity(400,train_data_bank,model_test_bank)
max_sensitivity_ins=sensitivity(400,train_data_ins,model_test_ins)
For these experiments, we replicate the same models and structure:
nodes_federation = federate_list([train_data_bank, train_data_ins],fed_data_node_type="HeterogeneousDataNode")
model_nodes = model_nodes_list()
server = vertical_server(nodes_federation,train_labels)
nodes_federation.configure_model_access(meta_params_query)
server.configure_model_access(meta_params_query)
server.configure_data_access(train_set_evaluation)
3.2) Adding the noise mechanism: trade-off between accuracy and privacy
3.2.1) Local Differential Privacy
Adding Differential Privacy comes at a small cost. Regulating the ε value enables us to select the exact balance, as we want to mantain the effectivity of the model so as to be usable in real world applications. For this case, we are using local Differential Privacy to mask the values sent to the server in the local parties. A value of ε = 0.5 will give us a high amount of privacy, so let's see how it behaves.
from shfl.differential_privacy import LaplaceMechanism
params_access_definition_bank = LaplaceMechanism(sensitivity=max_sensitivity_bank, epsilon=0.5)
params_access_definition_ins = LaplaceMechanism(sensitivity=max_sensitivity_ins, epsilon=0.5)
nodes_federation[0]._model_params_access_policy=params_access_definition_bank
nodes_federation[1]._model_params_access_policy=params_access_definition_ins
Then, we run the vertical training:
# Create federated government
federated_government_dp = VerticalFederatedGovernment(model_nodes,
nodes_federation,
server_node=server)
federated_government_dp.run_epochs(epochs=20,
batch_size=32,
length_dataset=len(train_data_bank),
test_data=test_data,
test_label=test_labels.reshape(-1,1),
eval_freq=0)
Evaluation in epoch 1 :
Loss: 0.4675355851650238 Accuracy: 0.6640544690508267
Evaluation in epoch 2 :
Loss: 0.3920636475086212 Accuracy: 0.7047485473153106
Evaluation in epoch 3 :
Loss: 0.3435995578765869 Accuracy: 0.72908241110654
Evaluation in epoch 4 :
Loss: 0.3171211779117584 Accuracy: 0.7387661910769175
Evaluation in epoch 5 :
Loss: 0.30362868309020996 Accuracy: 0.7428616358361743
Evaluation in epoch 6 :
Loss: 0.2964669466018677 Accuracy: 0.7459033023699607
Evaluation in epoch 7 :
Loss: 0.2925344705581665 Accuracy: 0.7483224819537252
Evaluation in epoch 8 :
Loss: 0.29021021723747253 Accuracy: 0.7503316310995634
Evaluation in epoch 9 :
Loss: 0.2887301743030548 Accuracy: 0.7519811683650279
Evaluation in epoch 10 :
Loss: 0.2876769006252289 Accuracy: 0.753289621051388
Evaluation in epoch 11 :
Loss: 0.28688913583755493 Accuracy: 0.7544001960856672
Evaluation in epoch 12 :
Loss: 0.2862595319747925 Accuracy: 0.7553387101991427
Evaluation in epoch 13 :
Loss: 0.28572171926498413 Accuracy: 0.7561912697837788
Evaluation in epoch 14 :
Loss: 0.2852461040019989 Accuracy: 0.7569897660958587
Evaluation in epoch 15 :
Loss: 0.2848053276538849 Accuracy: 0.7577042820997263
Evaluation in epoch 16 :
Loss: 0.2844155728816986 Accuracy: 0.7583033821284741
Evaluation in epoch 17 :
Loss: 0.28407275676727295 Accuracy: 0.7588756783156313
Evaluation in epoch 18 :
Loss: 0.2837557792663574 Accuracy: 0.759440912867469
Evaluation in epoch 19 :
Loss: 0.2834589183330536 Accuracy: 0.7599086512929552
Evaluation in epoch 20 :
Loss: 0.283174604177475 Accuracy: 0.7603993969819026
y_prediction_dp = federated_government_dp._server.predict_collaborative_model(test_data)
plot_roc(y_prediction_dp, test_labels)
3.2.2) Differential Privacy on top of everything
The labels and the model of the server can also be critical, so in order to avoid possible leaks of information, Differential Privacy can be used to protect the server when returning the gradients for the learning process. A value of ε = 0.5 will give us a high amount of privacy, and it will let us compare the performance between the local and on top of everything Differential Privacy. Again, we replicate the same models and structure for the experiment.
nodes_federation = federate_list([train_data_bank, train_data_ins], fed_data_node_type="HeterogeneousDataNode")
model_nodes = model_nodes_list()
server = vertical_server(nodes_federation,train_labels)
nodes_federation.configure_model_access(meta_params_query)
server.configure_model_access(meta_params_query)
server.configure_data_access(train_set_evaluation)
In this case, we a re applying Differential Privacy on the server part, apart from locally. This mechanism is applied when returning the learnt gradients, so that it does not reveal directly the real information returned by the server model. The mechanism sensitivity is 0, just because it is calculated on the server part that needs the embeddings to work.
params_access_definition_bank = LaplaceMechanism(sensitivity=max_sensitivity_bank, epsilon=0.5)
params_access_definition_ins = LaplaceMechanism(sensitivity=max_sensitivity_ins, epsilon=0.5)
params_access_definition_server = LaplaceMechanism(sensitivity=0, epsilon=0.1)
nodes_federation[0]._model_params_access_policy=params_access_definition_bank
nodes_federation[1]._model_params_access_policy=params_access_definition_ins
server._model_params_access_policy=params_access_definition_server
Finally, we run the last training:
# Create federated government
federated_government_dp_2 = VerticalFederatedGovernment(model_nodes,
nodes_federation,
server_node=server)
federated_government_dp_2.run_epochs(epochs=20,
batch_size=32,
length_dataset=len(train_data_bank),
test_data=test_data,
test_label=test_labels.reshape(-1,1),
eval_freq=0)
Evaluation in epoch 1 :
Loss: 0.4450891315937042 Accuracy: 0.4634378895763732
Evaluation in epoch 2 :
Loss: 0.40160995721817017 Accuracy: 0.555622906851563
Evaluation in epoch 3 :
Loss: 0.3697430491447449 Accuracy: 0.6586422950648889
Evaluation in epoch 4 :
Loss: 0.3464324474334717 Accuracy: 0.7115684924714615
Evaluation in epoch 5 :
Loss: 0.32728689908981323 Accuracy: 0.7324506592909784
Evaluation in epoch 6 :
Loss: 0.31455641984939575 Accuracy: 0.7394715193617132
Evaluation in epoch 7 :
Loss: 0.30522421002388 Accuracy: 0.7432747035366036
Evaluation in epoch 8 :
Loss: 0.2994040846824646 Accuracy: 0.7459770319172248
Evaluation in epoch 9 :
Loss: 0.2952128052711487 Accuracy: 0.7482064585261046
Evaluation in epoch 10 :
Loss: 0.2922833263874054 Accuracy: 0.7500587330636548
Evaluation in epoch 11 :
Loss: 0.2903350293636322 Accuracy: 0.751723836299448
Evaluation in epoch 12 :
Loss: 0.28914111852645874 Accuracy: 0.7532261422650719
Evaluation in epoch 13 :
Loss: 0.28820687532424927 Accuracy: 0.7546866099397149
Evaluation in epoch 14 :
Loss: 0.2874971032142639 Accuracy: 0.7555486609696734
Evaluation in epoch 15 :
Loss: 0.2868814766407013 Accuracy: 0.7561214127462063
Evaluation in epoch 16 :
Loss: 0.2864290475845337 Accuracy: 0.7570966777359699
Evaluation in epoch 17 :
Loss: 0.2859129011631012 Accuracy: 0.7575949406496126
Evaluation in epoch 18 :
Loss: 0.2854032516479492 Accuracy: 0.7582002670664919
Evaluation in epoch 19 :
Loss: 0.28495877981185913 Accuracy: 0.7586544137422764
Evaluation in epoch 20 :
Loss: 0.2845660150051117 Accuracy: 0.7591349086702758
The ROC plot for Differential Privacy also in the server part is:
y_prediction_dp_2 = federated_government_dp_2._server.predict_collaborative_model(test_data)
plot_roc(y_prediction_dp_2, test_labels)
4) Comparison
We can see the impact of using different approaches for the vertical Federated Learning when it comes to Differential Privacy. The decision should take into consideration how much privacy we need, where is interesting to use it and how much noise can the model tolerate until the performance starts declining.
4.1) ROC curve
values=[y_prediction_fed, y_prediction_dp, y_prediction_dp_2]
titles=['Federated', 'Federated with local DP', 'Federated with DP everywhere']
colors=['blue', 'lightgreen', 'green']
linestyle=[':','-.','-']
plot_all_roc_curves(test_labels, values, titles, colors, linestyle)
One of the effects observed in some cases is that a model can get stuck in a local minima, but when some noise is added in each iteration there is a possibility to escape from there and get to a closer value comparing to the global minima. Changing this starting point adding a lot of differential privacy can lead to this, but it also can harm your model, so selecting the best balance between model performance and privacy is always advised.
With the ROC AUC curve we can not extract a conclusion about the behaviour of the model when the differential privacy mechanism is applied, since they all have a very similar graph. For that reason, we are going to calculate the F1-Score for the different models.
4.2) F1-Score
n_classes=2
values_f1_fed = (y_prediction_fed > 0.5).astype(int)
values_f1_dp = (y_prediction_dp > 0.5).astype(int)
values_f1_dp2 = (y_prediction_dp_2 > 0.5).astype(int)
score_fed_f1 = f1_score(test_labels, values_f1_fed, average='macro')
score_dp_f1 = f1_score(test_labels, values_f1_dp, average='macro')
score_dp2_f1 = f1_score(test_labels, values_f1_dp2, average='macro')
values=[round(score_fed_f1, 4), round(score_dp_f1, 4), round(score_dp2_f1, 4)]
titles=['Federated', 'Federated with local DP', 'Federated with DP everywhere']
colors=['blue', 'lightgreen', 'green']
plot_all_metric(values, "F1-Score", titles, colors)
The most private model has less F1-Score than the others due to the low values of ε (which means high privacy). However, the performance loss is not really significant and the utility is kept. The scores of the cases with Differential Privacy can be mitigated with a longer training, but with limits, since the noise makes it difficult for the problem to converge. This means, with more rounds of training, we can improve these results, but the computational cost increases and at some point it is not worthy to keep training because a minimum is reached.