Software

​The software below is for scientific purpose ONLY.  Comments are very welcome!
​​​​​​​​​
Matlab programs for generating a user specified K clusters with message passing (Affinity Propagation)​.  The algorithm was published in the paper:
Xiangliang Zhang, Wei Wang, Kjetil Nørvåg, Michèle Sebag, "K-AP: Generating Specified K Clusters by Efficient Affinity Propagation", ICDM 2010, Sydney, Australia, December 14-17, 2010​. File Size 30.4KB. 

The programs are distributed under GNU Lesser General Public License(LGPL).​
​​​​​​​
Matlab code for StrAP: stream clustering with AP (Affinity Propagation), adding an online mechanism of adaption (1412KB). The algorithm has been published in ECML-PKDD 2008 and SIGKDD 2009. 

Xiangliang Zhang, Cyril Furtlehner, Michèle Sebag, "Data streaming with A​ffinity propagation". Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2008), Antwerp, Belgium, pp. 628-643, Lecture Notes in Computer Science 5212, Springer 2008, September 15-19, 2008

Xiangliang Zhang, Cyril Furtlehner, Julien Perez, Cécile Germain,  Michèle Sebag, " Toward Autonomic Grids: Analyzing the Job Flow with Affinity Streaming".  Proceedings of 15th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (SIGKDD 2009), pp. 987-996, Paris, France, June 28 –July 1, 2009​

The programs are made in INRIA and thus are the property of INRIA. The programs are distributed under GNU Lesser General Public License(LGPL).​
Detecting changes in multidimensional data streams is an important and challenging task. In unsupervised change detection, changes are usually detected by comparing the distribution in a current (test) window with a reference window. It is thus essential to design divergence metrics and density estimators for comparing the data distributions, which are mostly done for univariate data. Detecting changes in multidimensional data streams brings difficulties to the density estimation and comparisons. In this paper, we propose a framework for detecting changes in multidimensional data streams based on Principal Component Analysis (PCA), which is used for projecting data into a lower dimensional space, thus facilitating density estimation and change-score calculations. The proposed framework also has advantages over existing approaches by reducing computational costs with an efficient density estimator, promoting the change-score calculation by introducing effective divergence metrics, and by minimizing the efforts required from users on the threshold parameter setting by using the Page-Hinkley test.

More details can be found in the paper:
Abdulhakim A Qahtan, Basma Harbi, Suojin Wang, Xiangliang Zhang, "A PCA-Based Change Detection Framework for Multidimensional Data Streams". In the proceedings of the 21st ACM SIGKDD Conference on Knowledge Discovery and Data Mining - KDD 2015.


Download the code​

Attack a Classifier:

In security-sensitive applications, e.g., spam filters and intrusion detection systems, the deployed classification algorithms can be attacked by adversaries through generating exploratory attacks such as evasion and reverse engineering. For example, an attacker can probe the classifier with queries in order to reveal some confidential information about the training dataset that was used by the system or model the classifier's decision boundary.  How to construct artificial queries from scratch? Query synthesis is a branch of active learning for generating queries in order to reveal sensitive information about the true decision boundary. 

The objective of this study is to learn a deterministic noise-free halfspace quite efficiently via query synthesis.

The algorithm was published in the paper:

Ibrahim M Alabdulmohsin, Xin Gao, Xiangliang Zhang, "Efficient Active Learning of Halfspaces via Query Synthesis".  In the proceedings of Twenty-Ninth AAAI Conference on Artificial Intelligence - AAAI 2015​.

Download the Matlab code of the algorithm​

 

Protect a Classifier:

Under such adversarial environments, adversaries can generate exploratory attacks against the defender such as evasion and reverse engineering. We investigate the use of randomization as a suitable strategy for mitigating their risk. In particular, we derive a semidefinite programming (SDP) formulation for learning a distribution of classifiers subject to the constraint that any single classifier picked at random from such distribution provides reliable predictions with a high probability. We analyze the tradeoff between variance of the distribution and its predictive accuracy, and establish that one can almost always incorporate randomization with large variance without incurring a loss in accuracy. ​

More details can be found in the paper:

Ibrahim M Alabdulmohsin, Xin Gao, Xiangliang Zhang, "Adding Robustness to Support Vector Machines Against Adversarial Reverse Engineering". Proceedings of the 23rd ACM International Conference on Information and Knowledge Management- CIKM 2014.

Download the Matlab code of the algorithm​​

One transfer learning approach that has gained a wide popularity lately is attribute-based zero-shot learning. Its goal is to learn novel classes that were never seen during the training stage. The classical route towards realizing this goal is to incorporate a prior knowledge, in the form of a semantic embedding of classes, and to learn to predict classes indirectly via their semantic attributes. Despite the amount of research devoted to this subject lately, no known algorithm has yet reported a predictive accuracy that could exceed the accuracy of supervised learning with very few training examples. For instance, the direct attribute prediction (DAP) algorithm, which forms a standard baseline for the task, is known to be as accurate as supervised learning when as few as two examples from each hidden class are used for training on some popular benchmark datasets! In this paper, we argue that this lack of significant results in the literature is not a coincidence; attribute-based zero-shot learning is fundamentally an ill-posed strategy. The key insight is the observation that the mechanical task of predicting an attribute is, in fact, quite different from the epistemological task of learning the “correct meaning” of the attribute itself. This renders attribute-based zero-shot learning fundamentally ill-posed. In more precise mathematical terms, attribute-based zero-shot learning is equivalent to the mirage goal of learning with respect to one distribution of instances, with the hope of being able to predict with respect to any arbitrary distribution.

The provided code displays the decision rule when applying binary relevance with linear SVM to the seven segment display in the zero-shot setting. Please enjoy the code. Note that the code uses the LIBLINEAR package at: https://www.csie.ntu.edu.tw/~cjlin/liblinear/. Thus, the LIBLINEAR package should be installed first.