Federated models: non-linear kernel SVM classification

In the present notebook, the idea for a Federated non-linear support vector machine (SVM) classification is presented. The model is encapsulated in the Sherpa.ai Federated Learning Framework on for a synthetic database. Moreover, differential privacy is applied and its impact on the global model is assessed.

Index

1) The data

We start by creating a synthetic database:

import shfl
from shfl.data_base.data_base import WrapLabeledDatabase
from sklearn.datasets import make_classification
import numpy as np
from shfl.model.linear_classifier_model import LinearClassifierModel
from shfl.model.svm_classifier_model import SVMClassifierModel
from shfl.auxiliar_functions_for_notebooks.functionsFL import *
import numpy as np
from sklearn.svm import NuSVC
from sklearn.svm import SVC
from sklearn import metrics

# Create database:
n_features = 2
n_classes = 3
data, labels = make_classification(
    n_samples=10000, n_features=n_features, n_informative=2, 
    n_redundant=0, n_repeated=0, n_classes=n_classes, 
    n_clusters_per_class=1, weights=None, flip_y=0.1, class_sep=0.5)
database = WrapLabeledDatabase(data, labels)

train_data, train_labels, test_data, test_labels = database.load_data()

C = 1
kwargs = {'C':C, 'kernel':"linear"}
model_use = SVC(**kwargs)
2022-03-24 17:58:17.400626: W tensorflow/stream_executor/platform/default/dso_loader.cc:64] Could not load dynamic library 'libcudart.so.11.0'; dlerror: libcudart.so.11.0: cannot open shared object file: No such file or directory
2022-03-24 17:58:17.400664: I tensorflow/stream_executor/cuda/cudart_stub.cc:29] Ignore above cudart dlerror if you do not have a GPU set up on your machine.
print("Shape of train and test data: " + str(train_data.shape) + str(test_data.shape))
print("Shape of train and test labels: " + str(train_labels.shape) + str(test_labels.shape))
Shape of train and test data: (8000, 2)(2000, 2)
Shape of train and test labels: (8000,)(2000,)

2) The model

Next, we define the new class for SVM using sklearn's Support Vector Machine classifiers. By the implementation below, you can use either SVC or NuSVC:

# from shfl.model.utils import check_initialization_classification
# from shfl.model.utils import check_data_features

As reference, we train a centralized model (i.e. non-federated). And in the case of two features, we can visualize the solution on a plane:

# Train global model using framework's class: 
classes = [i for i in range(0,n_classes)] 
model_centralized = SVMClassifierModel(n_features=n_features, classes=classes, model=model_use)
model_centralized.train(data=train_data, labels=train_labels)
if n_features == 2:
    plot_2D_decision_boundary(model_centralized, test_data, labels=test_labels, title = "Benchmark: Classification using Centralized data")
print("Test performance using centralized data: " + str(model_centralized.evaluate(test_data, test_labels)))


predictions_cent = model_centralized.predict(test_data)

png

Test performance using centralized data: [('balanced_accuracy', 0.6690983355230552), ('cohen_kappa', 0.5051932152815586)]

2.1) How to _aggregate a model's parameters from each federated node

sklearn provides three options for classification: LinearSVC, SVC and NuSVC. The linear version LinearSVC is easily incorporated in the platform, since the aggregation of the model is straightforward (see notebook for Federated Logistic Regression, in which LinearSVC can be used instead of the logistic regression). On the other hand, for SVC and NuSVC, the output model's parameters are more complex, since they depend on the number of support vectors for each class. Thus, in principle, each client would deliver parameters with different dimensions, which are not straightforward to _aggregate.

Here, we use the clients' support vectors to train a global model directly on the server, obtaining the aggregated model:

import numpy as np
import inspect


def svm_aggregator(clients_params): 
    """Uses the server's (caller object) model to _aggregate the client's parameters."""
    caller_object = inspect.currentframe().f_back.f_locals['self']
    clients_params_array = np.vstack(clients_params)
    caller_object._model.train(clients_params_array[:, 0:-1], clients_params_array[:, -1].astype(int))

    return caller_object._model.get_model_params()

3) Run the federated learning experiment

Once defined the aggregator, we can run the federated model. Note that the decision boundary can vary even by running the training on the same data (this is due to the internal shuffle of the data of the SVM solver). Thus, in order to compare the centralized and the federated models, it is more relevant to compare the scores on the test data:

iid_distribution = shfl.data_distribution.IidDataDistribution(database)
nodes_federation, test_data, test_labels = iid_distribution.get_nodes_federation(num_nodes=20, percent=100)

classes = [i for i in range(0,n_classes)] 
def model_builder():
    model = SVMClassifierModel(n_features=n_features, classes=classes, model=model_use)
    return model

# aggregator = GlobalModelAggregator(SVMClassifierModel(n_features=n_features, classes=classes))
aggregator = svm_aggregator

federated_government = shfl.federated_government.FederatedGovernment(model_builder(), nodes_federation, aggregator)
federated_government.run_rounds(n_rounds=3, test_data=test_data, test_label=test_labels)

if n_features == 2:
    plot_2D_decision_boundary(federated_government._server._model, 
                              test_data, test_labels, 
                              title = "Global model: Classification using Federated data")
Evaluation in round 0:

Collaborative model test -> balanced_accuracy: 0.6711961943239064  cohen_kappa: 0.5082788104256921  

Evaluation in round 1:

Collaborative model test -> balanced_accuracy: 0.6701208304105807  cohen_kappa: 0.5067053481863151  

Evaluation in round 2:

Collaborative model test -> balanced_accuracy: 0.6696095829668179  cohen_kappa: 0.5059459443200327  

png

4) Add differential privacy

In instance-based machine learning methods such as SVM or KNN, part of the data (or the entire data, in the worst case) constitute the resulting model. These methods are thus particularly exposed to reconstruction attacks (e.g. see Yang et al. 2019). In order to protect private information, we can apply Differential Privacy on the resulting model output from the clients and observe its influence on the federated global model.

4.1) Sensitivity by sampling

We first estimate model's sensitivity by sampling. Recall that the matrices of support vectors are the actual models' parameters, and that they can have differing number of rows. We then need to define a distance between such matrices: we can choose the max of the Euclidean distance of their rows (see matrix distance).

Note that the sk-learn SVM solver is non-deterministic. In fact, due to the internal data shuffle, the SVM solver may deliver slightly different support vectors when training on the same set. Moreover, even when setting the random seed (see random_state input parameter), and simply switching one row in the training dataset, may result in slightly different output.

from shfl.differential_privacy import SensitivitySampler
from shfl.differential_privacy import L1SensitivityNorm
from shfl.differential_privacy import SensitivityNorm
from scipy.spatial import distance_matrix
 
class UniformDistribution(shfl.differential_privacy.ProbabilityDistribution):
    """
    Implement Uniform sampling over the data
    """
    def __init__(self, sample_data):
        self._sample_data = sample_data

    def sample(self, sample_size):
        row_indices = np.random.choice(a=self._sample_data.shape[0], size=sample_size, replace=False)
        
        return self._sample_data[row_indices, :]
    
    
class SVMClassifierSample(SVMClassifierModel):
    
    def __call__(self, data_array):
        data = data_array[:, 0:-1]
        labels = data_array[:, -1].astype(int)
        params = np.array([], dtype=np.int64).reshape(0, (self._n_features + 1))
        self.set_model_params(params)
        train_model = self.train(data, labels)
        model_params = self.get_model_params()
        model_params = model_params[:,0:-1] # Exclude the classes indices
        
        return model_params.copy()


class MatrixSetXoRNorm(SensitivityNorm):
    """
    Distance matrix using only rows not in common.
    """
    def compute(self, x_1, x_2):
        nrows, ncols = x_1.shape
        dtype = {'names':['f{}'.format(i) for i in range(ncols)],
                   'formats':ncols * [x_1.dtype]}
        x = np.setxor1d(x_1.view(dtype), x_2.view(dtype))
        x = x.view(x_1.dtype).reshape(-1, ncols)
        if x.shape[0] is not 0:
            x = distance_matrix(x,x)
            x = x.max()
        else:
            x = 0

        return x
<>:43: SyntaxWarning: "is not" with a literal. Did you mean "!="?
<>:43: SyntaxWarning: "is not" with a literal. Did you mean "!="?
/tmp/ipykernel_935822/2118452112.py:43: SyntaxWarning: "is not" with a literal. Did you mean "!="?
    if x.shape[0] is not 0:

As a matter of fact, in the sensitivity sampling, we consider databases that differ at most in one entry, or contain exactly the same data, yet some of the support vectors turn out to be different. This said, the sensitivity sampling procedure for this case is expected to deliver results with high variance.

The resulting sensitivity is particularly high, and the application of DP dramatically deteriorates the performance of the global model:

# Create sampling database:
n_instances = 400
sampling_data, sampling_labels = make_classification(
    n_samples=n_instances, n_features=n_features, n_informative=2, 
    n_redundant=0, n_repeated=0, n_classes=n_classes, 
    n_clusters_per_class=1, weights=None, flip_y=0.1, class_sep=0.1)  
sample_data = np.hstack((sampling_data, sampling_labels.reshape(-1,1)))

# Sampling sensitivity:
distribution = UniformDistribution(sample_data)
sampler = SensitivitySampler()

n_data_size = 200 # must be <= n_instances
kwargs['random_state'] = 123
max_sensitivity, mean_sensitivity = sampler.sample_sensitivity(
    SVMClassifierSample(n_features=n_features, classes=classes, model=model_use), 
    MatrixSetXoRNorm(), distribution, n_data_size=n_data_size, m_sample_size=100)

print("Max sensitivity from sampling: " + str(max_sensitivity))
print("Mean sensitivity from sampling: " + str(mean_sensitivity))
Max sensitivity from sampling: 4.884894520827751
Mean sensitivity from sampling: 1.3691717981016511

4.2) Run the federated learning experiment with differential privacy

At this stage we are ready to add a layer of DP to our federated learning model. The Gaussian mechanism is employed.

from shfl.differential_privacy import GaussianMechanism

sensitivity_array = np.full((n_features+1,), max_sensitivity)
sensitivity_array[-1] = 0  # We don't apply noise on the classes
params_access_definition = GaussianMechanism(sensitivity=sensitivity_array, epsilon_delta=(0.9, 0.9))
nodes_federation.configure_model_params_access(params_access_definition)

federated_governmentDP = shfl.federated_government.FederatedGovernment(
    model_builder(), nodes_federation, aggregator)

federated_governmentDP.run_rounds(n_rounds=1, test_data=test_data, test_label=test_labels)
if n_features == 2:
    plot_2D_decision_boundary(federated_governmentDP._server._model, 
                              test_data, test_labels, 
                              title = "Global model: Classification using Federated data")
Evaluation in round 0:

Collaborative model test -> balanced_accuracy: 0.3333333333333333  cohen_kappa: 0.0  

png

4.3) Sensitivity associated to the data

Since the SVM's parameters are constituted by the data itself, we might assume that the model's sensitivity is actually the sensitivity to apply on the data itself if one would try to access it. We then take the component-wise variance of the data as the sensitivity. The resulting ϵ\epsilon-private global model's performance is then comparable to the non-private version:

from shfl.differential_privacy import GaussianMechanism

sensitivity_array = np.var(sample_data, axis=0)
sensitivity_array[-1] = 0  # We don't apply noise on the classes
print("Component-wise sensitivity: " + str(sensitivity_array))

params_access_definition = GaussianMechanism(sensitivity=sensitivity_array, epsilon_delta=(0.9, 0.9))
nodes_federation.configure_model_params_access(params_access_definition)

federated_governmentDP = shfl.federated_government.FederatedGovernment(
    model_builder(), nodes_federation, aggregator)

federated_governmentDP.run_rounds(n_rounds=1, test_data=test_data, test_label=test_labels)
if n_features == 2:
    plot_2D_decision_boundary(federated_governmentDP._server._model, test_data, test_labels, title = "Global model: Classification using Federated data")
Component-wise sensitivity: [0.61813351 1.09970295 0.        ]
Evaluation in round 0:

Collaborative model test -> balanced_accuracy: 0.6315070166242993  cohen_kappa: 0.4480216645864218  

png

4.4) F1-score comparison

For a clear evaluation using the F1-score, we make predictions and we plot the results:

federated_model = federated_government._server._model
predictions_fed = federated_model.predict(test_data)

dp_model = federated_governmentDP._server._model
predictions_dp = dp_model.predict(test_data)

score_fed_f1 = f1_score(test_labels, predictions_fed, average='macro')
score_cent_f1 = f1_score(test_labels, predictions_cent, average='macro')
score_dp_f1 = f1_score(test_labels, predictions_dp, average='macro')


values=[round(score_cent_f1, 3), round(score_fed_f1, 3), round(score_dp_f1, 3)]
titles=['Centralized', 'Federated', 'Federated with DP']
colors=['red', 'blue', 'green']
plot_all_f1(values, titles, colors)

png

5) Remarks

Remark 1: Federated learning round. In this approach, the model's parameters are the actual support vectors. Thus, at each learning round, the support vectors are sent by the clients to the central server, where an additional SVM is run to _aggregate the global model. At that stage, the (global) support vectors are sent back to the clients and are used together with clients' data to train the local model. However, the global support vectors are not considered as local data, and thus are not stored as client's data on the node.

Remark 2: Application of DP. The model's sensitivity is highly responsive on the training data, and the resulting model's performance can be easily degenerated by application of DP. Sensitivity is computed both by sampling and data variance, and the former yields lower sensitivity. Nevertheless, neither of the two approaches fit in the definition of sensitivity based on either L1 and L2 norms, and a more general notion of distance should be introduced for a formal guarantee of DP (see 3.3 in Dwork et al. 2016).

Remark 3: Reduction of training data. Since the SVM is particularly sensitive to duplicates in the training data, these are removed when fitting the model. However, when applying DP, there aren't essentially any identical instances and more sophisticated reduction techniques for training data should be used (e.g. a clustering technique as in Yu et al. 2003) since otherwise the set of training vectors would keep growing at each federated round and introducing excessive noise in the model. When DP is applied as in this approach, it is thus advisable to run only a few federated rounds.

Remark 4: Tuning for soft margin and kernel parameters. In the presented case default values are used, however a tuning is in general needed for SVM models.

;