# Matlab Wrappers for AdversariaLib (with Examples)¶

Contents:

Authors: Igino Corona, Battista Biggio, Davide Maiorca, Pattern Recognition and Applications Lab, Dept. of Electrical and Electronic Engineering, University of Cagliari, Italy.

## Outline¶

In security-sensitive applications, the success of machine learning depends on a thorough vetting of their resistance to adversarial data. In one pertinent, well-motivated attack scenario, an adversary may attempt to evade a deployed system at test time by carefully manipulating attack samples. In this work, we present a simple but effective gradient-based approach that can be exploited to systematically assess the security of several, widely-used classification algorithms against evasion attacks.

Following a recently proposed framework for security evaluation, we simulate attack scenarios that exhibit different risk levels for the classifier by increasing the attacker’s knowledge of the system and her ability to manipulate attack samples. This gives the classifier designer a better picture of the classifier performance under evasion attacks, and allows him to perform a more informed model selection (or parameter setting). We evaluate our approach on the relevant security task of malware detection in PDF files, and show that such systems can be easily evaded. We also sketch some countermeasures suggested by our analysis.

Full paper: Biggio B., Corona I., Maiorca D., Nelson B., Srndic N., Laskov P., Giacinto G., Roli F., Evasion attacks against machine learning at test time, in European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases* (ECML-PKDD 2013), Prague, Czech Republic, 2013.

In the following we describe how to replicate the experiments described in the above research paper. We also describe how to customize or create new experiments for other “adversarial” evaluations, through a rich set of examples (to this end, please visit also the adversariaLib page). This document is released together with our open-source library adversariaLib: a general-purpose library for the security evaluation of machine learning algorithms, which has been used for the experimental evaluation of the attacks described in the paper. Thus, our aim is not only allowing other researchers to replicate the experiments in our paper but also:

• allowing other researchers to evaluate the proposed gradient-descent attack with other datasets, experimental settings, and classifiers;
• allowing other researchers to build on (and contribute to) our library for the evaluation of machine learning algorithms against adversarial attacks.

## Requirements¶

In order to replicate the experiments described in the paper, it is necessary to:

• install Matlab: we experimented with the version R2012 and higher.

• We found a problem when using Matlab on Ubuntu 13.04 for 64bit architectures (that may also occur for 32bit arch.). In particular, the library libgfortran.so.3 is not properly loaded from Matlab. Here’s the solution.

• Locate the Matlab’s dynamic library libgfortran.so.3:
$locate libgfortran /home/MATLAB/R2013a/sys/os/glnxa64/libgfortran.so.3 /home/MATLAB/R2013a/sys/os/glnxa64/libgfortran.so.3.0.0 • Look at the files related to the libgfortran package: $ dpkg-query -L libgfortran3
/.
/usr
/usr/share
/usr/share/doc
/usr/lib
/usr/lib/x86_64-linux-gnu
/usr/lib/x86_64-linux-gnu/libgfortran.so.3.0.0
/usr/share/doc/libgfortran3
/usr/lib/x86_64-linux-gnu/libgfortran.so.3
• Override the Matlab’s dynamic library:

#### Our results¶

We also provide our computed results as a compressed archive exp_paper.tar.gz. You may just rename or remove the current exp_paper_ecml folder, and uncompress the archive. It will create again the folder ‘exp_paper_ecml’, including the results of our experiments. Therefore, you may just set make_exp = 0 and leave make_plots = 1 to generate the corresponding plots.

We tested our Matlab/adversariaLib scripts on Apple Mac OS X 10.8.4 and Ubuntu 13.04 (64 bits).

## Defining new Experiments¶

### General Instructions¶

1. Inside the matlab folder, create a folder dedicated to the new experiment (e.g., expfolder).
2. Inside the setup folder, for each figure that you want to print, you have to create a folder whose name starts with “exp”.
3. You have to put one or more configuration files inside each “exp” folder, depending on how many curves you want to trace on the figure. You can use set_params_example.m as a template for each new experimental setting. NOTE: Each configuration file must start with “set” (e.g., set_params_svm_lin_lambda_0.m).
4. Open main.m inside the matlab folder. Set setup_folder = "expfolder".
5. Set the flag make_exp = 1 (if you want to run the experiment using python) and make_plots = 1 (if you want Matlab to show plots).
6. Run main.m!
7. For each “exp” folder, you will find the plots inside the fig folder, in PDF/EPS format: plot.<pdf|eps>.

### Understanding experimental setups for new experiments¶

Each experiment is defined through a file whose name has the following format: set_params_xxx.m. Each file is used to generate a setup.py file according to the adversariaLib syntax. In the following we report the contents of set_params_example.m, that can be considered as a matlab template for the definition of new experiments:

exp_params.dataset_name = 'norm_pdf_med';
exp_params.test_fraction = 0.1; %fraction of randomly chosen test data (the rest will be used for training)
exp_params.nfolds = 3; %number of folds for cross validation
exp_params.nsplits = 1; %number of splits
exp_params.threads = 1; %number of threads (note: if you set -1 it will use all the processors)
exp_params.norm_weights = 'norm_weights.txt'; %normalization parameters for the PDF data.
exp_params.fig_folder = 'fig'; %in this folder (inside each experiment folder) the plot will be saved
exp_params.code_folder = '../adversariaLib'; % This is the relative path to the adversarialib code folder
exp_params.dataset_folder = '../../../../../dataset'; %This is the folder in which the dataset is stored

classifier_params.classifier = 'SVM_lin'; % 'SVM_rbf' or 'MLP'
classifier_params.xval=0;
classifier_params.mlp_steepness = 0.0005;
classifier_params.neurons = 5;
classifier_params.svm_C = 1;
classifier_params.svm_gamma = 1;

evil_classifier_params.training_size = 100;
evil_classifier_params.num_rep = 1; %Number of classifier's copies to learn by randomly sampling a training set
evil_classifier_params.classifier = classifier_params.classifier;
evil_classifier_params.xval= classifier_params.xval;
evil_classifier_params.mlp_steepness = classifier_params.mlp_steepness;
evil_classifier_params.neurons = classifier_params.neurons;
evil_classifier_params.svm_C = classifier_params.svm_C;
evil_classifier_params.svm_gamma = classifier_params.svm_gamma;

mimicry_params.mimicry_distance = 'kde_hamming';
mimicry_params.lambda = 0;
mimicry_params.kde_gamma = 0.1;
mimicry_params.max_leg_patterns = 100;

constraint_params.constraint = 'only_increment'; %Choose between: box, hamming, only_increment
constraint_params.constraint_step = 5;
constraint_params.max_boundaries = 10;

plot_params.title=['SVM (Linear)' ...
', \lambda=' num2str(mimicry_params.lambda)]; %Set up plot title
plot_params.legend_lk = ['SVM LIN' '- LK' ' (C=' num2str(classifier_params.neurons) ')']; %Set up legend for lk
plot_params.legend_pk = ['SVM LIN' '- PK' ' (C=' num2str(classifier_params.neurons) ')']; %Set up legend for pk
plot_params.color = 'r';

gradient_params.grad_step = 0.01;

Please note that almost all parameters are associated to parameters of a adversariaLib setup.py file (the parameters of surrogate classifiers are defined through the evil_classifier_params structure). The unique matlab-dependent parameters are as follows:

exp_params.fig_folder = 'fig';
plot_params.title=['SVM (Linear)' ...
', \lambda=' num2str(mimicry_params.lambda)]; %Set up plot title
plot_params.legend_lk = ['SVM LIN' '- LK' ' (C=' num2str(classifier_params.neurons) ')']; %Set up legend for lk
plot_params.legend_pk = ['SVM LIN' '- PK' ' (C=' num2str(classifier_params.neurons) ')']; %Set up legend for pk
plot_params.color = 'r';

Here follows a description of each one of the above settings:

• exp_params.fig_folder sets the default folder for the figures of an experiment;

• exp_params.code_folder relative path to the adversariaLib code;

• classifier_params.xval tells the matlab script for the generation of adversariaLib setups to use (if 1) or not (if 0) cross-validation for finding the best classifier’s parameters. In particular, if classifier_params.xval = 1, here are the default adversariaLib settings:

• for linear SVMs: 'grid_search': {'param_grid': dict(C=np.logspace(-3, 2, 6))}
• for RBF kernel SVMs: 'grid_search': {'param_grid': dict(C=np.logspace(-3, 2, 6), gamma=np.logspace(-3, 3, 7))}

such settings can be changed in the Matlab script utils/setup_write_classifier_params.m.

• plot_params.title sets the title of the plot; in this case, classifier type and lambda parameter are displayed;

• plot_params.legend_lk sets the legend of the (solid) curve related to Limited Knowledge (LK);

• plot_params.legend_pk sets the legend of the (dashed) curve related to Perfect Knowledge (PK);

• plot_params.color sets the color of the curve.