For a full list of publications, visit my Google Scholar page.
Zichong Wang, Charles Wallace, Albert Bifet, Xin Yao and Wenbin Zhang
Proceedings of the 34th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD), Turin, Italy, 2023
Graph generation models have gained increasing popularity and success across various domains. However, most research in this area has concentrated on enhancing performance, with the issue of fairness remaining largely unexplored. Existing graph generation models prioritize minimizing graph reconstruction's expected loss, which can result in representational disparities in the generated graphs that unfairly impact marginalized groups. This paper addresses this socially sensitive issue by conducting the first comprehensive investigation of fair graph generation models by identifying the root causes of representational disparities, and proposing a novel framework that ensures consistent and equitable representation across all groups. Additionally, a suite of fairness metrics has been developed to evaluate bias in graph generation models, standardizing fair graph generation research. Through extensive experiments on five real-world datasets, the proposed framework is demonstrated to outperform existing benchmarks in terms of graph fairness while maintaining competitive prediction performance.
Zichong Wang, Nripsuta Saxena, Tongjia Yu, Sneha Karki, Tyler Zetty, Israat Haque, Shan Zhou,
Dukka Kc, Ian Stockwell, Xuyu Wang, Albert Bifet and Wenbin Zhang
Proceedings of the ACM Conference on Fairness, Accountability, and Transparency (FAccT), Chicago, USA, 2023
Best Paper Award
Bias in machine learning has rightly received significant attention over the last decade. However, most fair machine learning (fair-ML) work to address bias in decision-making systems has focused solely on the offline setting. Despite the wide prevalence of online systems in the real world, work on identifying and correcting bias in the online setting is severely lacking. The unique challenges of the online environment make addressing bias more difficult than in the offline setting. First, Streaming Machine Learning (SML) algorithms must deal with the constantly evolving real-time data stream. Second, they need to adapt to changing data distributions (concept drift) to make accurate predictions on new incoming data. Adding fairness constraints to this already complicated task is not straightforward. In this work, we focus on the challenges of achieving fairness in biased data streams while accounting for the presence of concept drift, accessing one sample at a time. We present Fair Sampling over Stream, a novel fair rebalancing approach capable of being integrated with SML classification algorithms. Furthermore, we devise the first unified performance-fairness metric, Fairness Bonded Utility (FBU), to evaluate and compare the trade-off between performance and fairness of different bias mitigation methods efficiently. FBU simplifies the comparison of fairness-performance trade-offs of multiple techniques through one unified and intuitive evaluation, allowing model designers to easily choose a technique. Overall, extensive evaluations show our measures surpass those of other fair online techniques previously reported in the literature.
Nripsuta Saxena, Wenbin Zhang and Cyrus Shahabi
Proceedings of the SIAM International Conference on Data Mining (SDM), Blue Sky Track, Minneapolis, USA, 2023
In the last decade or so, the area of fairness in AI has received widespread attention, both within the scientific community as well as the general media. Researchers have made significant progress towards fairer AI, with work exploring everything from statistical definitions of fairness for individual and group fairness to fairness constraints and algorithms for debiasing models and datasets. Given the nascent nature of the field, however, progress in the space has been somewhat haphazard. For work in fair-AI to have as much real-world impact as possible, we need to take a step back and gauge what the gaps are, and which research questions need urgent attention. This work analyzes where the field is currently, and proposes more focused questions and new areas of research within fair AI.
Wenbin Zhang, Tina Hernandez-Boussard and Jeremy Weiss
Proceedings of the 37th AAAI Conference on Artificial Intelligence (AAAI), Washington, D.C., USA, 2023
Recent works in artificial intelligence fairness attempt to mitigate discrimination by proposing constrained optimization programs that achieve parity for some fairness statistic. Most assume availability of the class label, which is impractical in many real-world applications such as precision medicine, actuarial analysis and recidivism prediction. Here we consider fairness in longitudinal right-censored environments, where the time to event might be unknown, resulting in censorship of the class label and inapplicability of existing fairness studies. We devise applicable fairness measures, propose a debiasing algorithm, and provide necessary theoretical constructs to bridge fairness with and without censorship for these important and socially-sensitive tasks. Our experiments on four censored datasets confirm the utility of our approach.
Wenbin Zhang and Jeremy Weiss
Knowledge and Information Systems (KAIS)
Bests of ICDM
Fairness in machine learning (ML) has gained attention within the ML community and the broader society beyond with many fairness definitions and algorithms being proposed. Surprisingly, there is little work quantifying and guar- anteeing fairness in the presence of uncertainty which is prevalent in many socially sensitive applications, ranging from marketing analytics to actuarial analysis and recidivism prediction instruments. To this end, we revisit fairness and reveal id- iosyncrasies of existing fairness literature assuming certainty on the class label that limits their real-world utility. Our primary contributions are formulating fairness under uncertainty and group constraints along with a suite of corresponding new fairness definitions and algorithm. We argue that this formulation has a broader applicability to practical scenarios concerning fairness. We also show how the newly devised fairness notions involving censored information and the general framework for fair predictions in the presence of censorship allow us to measure and mitigate discrimination under uncertainty that bridges the gap with real-world applica- tions. Empirical evaluations on real-world datasets with censorship and sensitive attributes demonstrate the practicality of our approach.
Mohammad Ariful Islam, Hisham Siddique, Wenbin Zhang and Israat Haque
IEEE Transactions on Network and Service Management (TNSM)
5G networks enable emerging latency and bandwidth critical applications like industrial IoT, AR/VR, or autonomous vehicles, in addition to supporting traditional voice and data communications. In 5G infrastructure, Radio Access Networks (RANs) consist of radio base stations that communicate over wireless radio links. The communication, however, is prone to environmental changes like the weather and can suffer from radio link failure and interrupt ongoing services. The impact is severe in the above-mentioned applications. One way to mitigate such service interruption is to proactively predict failures and reconfigure the resource allocation accordingly. Existing works like the supervised ensemble learning-based model do not consider the spatial-temporal correlation between radio communication and weather changes. This paper proposes a communication link failure prediction scheme based on the LSTM-autoencoder that considers the spatial-temporal correlation between radio communication and weather forecast. We implement and evaluate the proposed scheme over a huge volume of real radio and weather data. The results confirm that the proposed scheme significantly outperforms the existing solutions.
Wenbin Zhang and Jeremy Weiss
Proceedings of the 36th AAAI Conference on Artificial Intelligence (AAAI), online, 2022
Also at Research2Clinics Workshop at NeurIPS, online, 2021
Recent works in artificial intelligence fairness attempt to mitigate discrimination by proposing constrained optimization programs that achieve parity for some fairness statistic. Most assume availability of the class label, which is impractical in many real-world applications such as precision medicine, actuarial analysis and recidivism prediction. Here we consider fairness in longitudinal right-censored environments, where the time to event might be unknown, resulting in censorship of the class label and inapplicability of existing fairness studies. We devise applicable fairness measures, propose a debiasing algorithm, and provide necessary theoretical constructs to bridge fairness with and without censorship for these important and socially-sensitive tasks. Our experiments on four censored datasets confirm the utility of our approach.
Tai Le Quy, Arjun Roy, Vasileios Iosifidis, Wenbin Zhang and Eirini Ntoutsi
Data Mining and Knowledge Discovery (DAMI)
Top 10 articles in 2022
As decision-making increasingly relies on machine learning (ML) and (big) data, the issue of fairness in data-driven artificial intelligence systems is receiving increasing attention from both research and industry. A large variety of fairness-aware ML solutions have been proposed which involve fairness-related interventions in the data, learning algorithms, and/or model outputs. However, a vital part of proposing new approaches is evaluating them empirically on benchmark datasets that represent realistic and diverse settings. Therefore, in this paper, we overview real-world datasets used for fairness-aware ML. We focus on tabular data as the most common data representation for fairness-aware ML. We start our analysis by identifying relationships between the different attributes, particularly with respect to protected attributes and class attribute, using a Bayesian network. For a deeper understanding of bias in the datasets, we investigate interesting relationships using exploratory analysis.
Zhen Liu, Ruoyu Wang and Wenbin Zhang
Medical & Biological Engineering & Computing (MBEC)
Machine learning techniques have been utilized on gene expression profiling for cancer diagnosis. However, the gene expression data suffer from the curse of high dimensionality. Different kinds of feature reduction methods have been proposed to decrease the features for specific cancer diagnosis. However, with the difficulty of obtaining the samples of a particular tumor, the lack of training samples may lead to the overfitting problem. In addition, the feature reduction model on a specific tumor may lead to the problem that the model is not scalable and cannot be generalized to new cancer types. To handle these problems, this paper proposes an unsupervised feature learning method to reduce the data dimensionality of gene expression data. This method amplifies the training samples of feature learning by utilizing the unlabeled samples from different sources. Two heuristic rules are devised to check if the unlabeled samples could be used for amplifying the training set. The amplified training set is used to train the feature learning model based on sparse autoencoder. Since the method leverages the knowledge among the expression data from different sources, it improves the generalization of unsupervised feature learning and further boosts the cancer diagnosis performance. A series of experiments are carried out on the gene expression datasets from TCGA and other sources. Experimental results prove that our method improves the generalization of cancer diagnosis when unlabeled data are used for latent feature learning. The flowchart of our proposed feature learning method.
Thomas Guyet, Wenbin Zhang and Albert Bifet
Proceedings of the 22nd International Conference on Computational Science (ICCS), online, 2022
The need to analyze information from streams arises in a variety of applications. One of the fundamental research directions is to mine sequential patterns over data streams. Current studies mine series of items based on the existence of the pattern in transactions but pay no attention to the series of itemsets and their multiple occurrences. The pattern over a window of itemsets stream and their multiple occurrences, however, provides additional capability to recognize the essential characteristics of the patterns and the inter-relationships among them that are unidentifiable by the existing items and existence based studies. In this paper, we study such a new sequential pattern mining problem and propose a corresponding efficient sequential miner with novel strategies to prune search space efficiently. Experiments on both real and synthetic data show the utility of our approach.
Kea Turner, Naomi C Brownstein, Zachary Thompson, Issam El Naqa, Yi Luo, Heather SL Jim, Dana E Rollison, Rachel Howard, Desmond Zeng, Stephen A Rosenberg, Bradford Perez, Andreas Saltos, Laura B Oswald, Brian D Gonzalez, Jessica Y Islam, Amir Alishahi Tabriz, Wenbin Zhang and Thomas J Dilling
Radiotherapy and Oncology
Background and purpose: The study objective was to determine whether longitudinal changes in patient-reported outcomes (PROs) were associated with survival among early-stage, non-small cell lung cancer (NSCLC) patients undergoing stereotactic body radiation therapy (SBRT).
Materials and methods: Data were obtained from January 2015 through March 2020. We ran a joint probability model to assess the relationship between time-to-death, and longitudinal PRO measurements. PROs were measured through the Edmonton Symptom Assessment Scale (ESAS). We controlled for other covariates likely to affect symptom burden and survival including stage, tumor diameter, comorbidities, gender, race/ethnicity, relationship status, age, and smoking status.
Results: The sample included 510 early-stage NSCLC patients undergoing SBRT. The median age was 73.8 (range: 46.3-94.6). The survival component of the joint model demonstrates that longitudinal changes in ESAS scores are significantly associated with worse survival (HR: 1.04; 95% CI: 1.02-1.05). This finding suggests a one-unit increase in ESAS score increased probability of death by 4%. Other factors significantly associated with worse survival included older age (HR: 1.04; 95% CI: 1.03-1.05), larger tumor diameter (HR: 1.21; 95% CI: 1.01-1.46), male gender (HR: 1.87; 95% CI: 1.36-2.57), and current smoking status (HR: 2.39; 95% CI: 1.25-4.56).
Conclusion: PROs are increasingly being collected as a part of routine care delivery to improve symptom management. Healthcare systems can integrate these data with other real-world data to predict patient outcomes, such as survival. Capturing longitudinal PROs-in addition to PROs at diagnosis-may add prognostic value for estimating survival among early-stage NSCLC patients undergoing SBRT.
Wenbin Zhang and Jeremy Weiss
Proceedings of the 21st IEEE International Conference on Data Mining (ICDM), online, 2021
Best Paper Award Candidate
There has been concern within the artificial intelligence (AI) community and the broader society regarding the potential lack of fairness of AI-based decision-making systems. Surprisingly, there is little work quantifying and guaranteeing fairness in the presence of uncertainty which is prevalent in many socially sensitive applications, ranging from marketing analytics to actuarial analysis and recidivism prediction instruments. To this end, we study a longitudinal censored learning problem subject to fairness constraints, where we require that algorithmic decisions made do not affect certain individuals or social groups negatively in the presence of uncertainty on class label due to censorship. We argue that this formulation has a broader applicability to practical scenarios concerning fairness. We show how the newly devised fairness notions involving censored information and the general framework for fair predictions in the presence of censorship allow us to measure and mitigate discrimination under uncertainty that bridges the gap with real-world applications. Empirical evaluations on real-world discriminated datasets with censorship demonstrate the practicality of our approach.
Wenbin Zhang, Liming Zhang, Dieter Pfoser and Liang Zhao
Proceedings of the SIAM International Conference on Data Mining (SDM), online, 2021
Deep generative models for graphs have exhibited promising performance in ever-increasing domains such as design of molecules (i.e, graph of atoms) and structure prediction of proteins (i.e., graph of amino acids). Existing work typically focuses on static rather than dynamic graphs, which are actually very important in the applications such as protein folding, molecule reactions, and human mobility. Extending existing deep generative models from static to dynamic graphs is a challenging task, which requires to handle the factorization of static and dynamic characteristics as well as mutual interactions among node and edge patterns. Here, this paper proposes a novel framework of factorized deep generative models to achieve interpretable dynamic graph generation. Various generative models are proposed to characterize conditional independence among node, edge, static, and dynamic factors. Then, variational optimization strategies as well as dynamic graph decoders are proposed based on newly designed factorized variational autoencoders and recurrent graph deconvolutions. Extensive experiments on multiple datasets demonstrate the effectiveness of the proposed models.
Zhen Liu, Nathalie Japkowicz, Deyu Tang, Wenbin Zhang and Jie Zhao
Future Generation Computer Systems (FGCS)
Android malware detection has attracted much attention in recent years. Existing methods mainly research on extracting static or dynamic features from mobile apps and build mobile malware detection model by machine learning algorithms. The number of extracted static or dynamic features maybe much high. As a result, the data suffers from high dimensionality. In addition, to avoid being detected, malware data is varied and hard to obtain in the first place. To detect zeroday malware, unsupervised malware detection methods were applied. In such case, unsupervised feature reduction method is an available choice to reduce the data dimensionality. In this paper, we propose an unsupervised feature learning algorithm called Subspace based Restricted Boltzmann Machines (SRBM) for reducing data dimensionality in malware detection. Multiple subspaces in the original data are firstly searched. And then, an RBM is built on each subspace. All outputs of the hidden layers of the trained RBMs are combined to represent the data in lower dimension. The experimental results on OmniDroid, CIC2019 and CIC2020 datasets show that the features learned by SRBM perform better than the ones learned by other feature reduction methods when the performance is evaluated by clustering evaluation metrics, i.e., NMI, ACC and Fscore.
Wenbin Zhang, Albert Bifet, Xiangliang Zhang, Jeremy Weiss and Wolfgang Nejdl
Proceedings of the 25th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD), online, 2021
As Artificial Intelligence (AI) is used in more applications, the need to consider and mitigate biases from the learned models has followed. Most works in developing fair learning algorithms focus on the offline setting. However, in many real-world applications data comes in an online fashion and needs to be processed on the fly. Moreover, in practical application, there is a trade-off between accuracy and fairness that needs to be accounted for, but current methods often have multiple hyper-parameters with non-trivial interaction to achieve fairness. In this paper, we propose a flexible ensemble algorithm for fair decision-making in the more challenging context of evolving online settings. This algorithm, called FARF (Fair and Adaptive Random Forests), is based on using online component classifiers and updating them according to the current distribution, that also accounts for fairness and a single hyper-parameters that alters fairness-accuracy balance. Experiments on real-world discriminated data streams demonstrate the utility of FARF.
Xuejian Wang, Wenbin Zhang, Aishwarya Jadhav and Jeremy Weiss
AAAI Spring Symposium Series (AAAI SSS), online, 2021
Survival analysis models are necessary for clinical forecasting with data censorship. Implicitly, existing works focus on the individuals with higher risks while lower risk individuals are poorly characterized. Developing survival models to represent different risk individuals equally is a challenging task but of great importance for providing accurate risk assessments across levels of risk. Here, we characterize this problem and propose an adjusted log-likelihood formulation as the new objective for survival prognostication. Several models are then proposed based on the newly designed optimization objective function which produce risks that count individuals “equally” on risk ratios thus providing representative attention to individuals of varying risk. Extensive experiments on multiple real-world datasets demonstrate the benefits of the proposed approach.
Xuejiao Tang, Wenbin Zhang, Yi Yu, Kea Turner, Tyler Derr, Mengyu Wang and Eirini Ntoutsi
Proceedings of the 30th International Conference on Artificial Neural Networks (ICANN), online, 2021
While image understanding on recognition-level has achieved remarkable advancements, reliable visual scene understanding requires comprehensive image understanding on recognition-level but also cognition-level, which calls for exploiting the multi-source information as well as learning different levels of understanding and extensive commonsense knowledge. In this paper, we propose a novel Cognitive Attention Network (CAN) for visual commonsense reasoning to achieve interpretable visual understanding. Specifically, we first introduce an image-text fusion module to fuse information from images and text collectively. Second, a novel inference module is designed to encode commonsense among image, query and response. Extensive experiments on large-scale Visual Commonsense Reasoning (VCR) benchmark dataset demonstrate the effectiveness of our approach.
Mingli Zhang, Ahmad Chaddad, Fenghua Guo, Wenbin Zhang, Ji Zhang and Alan Evans
Proceedings of the 32nd International Conference on Databases and Expert Systems Applications (DEXA), online, 2021
Variational AutoEncoder (VAE) as a class of neural networks performing nonlinear dimensionality reduction has become an effective tool in neuroimaging analysis. Currently, most studies on VAE consider unsupervised learning to capture the latent representations and to some extent, this strategy may be under-explored in the case of heavy noise and imbalanced neural image dataset. In the reinforcement learning point of view, it is necessary to consider the class-wise capability of decoder. The latent space for autoencoders depends on the distribution of the raw data, the architecture of the model and the dimension of the latent space, combining a supervised linear autoencoder model with variational autoencoder (VAE) may improve the performance of classification. In this paper, we proposed a supervised linear and nonlinear cascade dual autoencoder approach, which increases the latent space discriminative capability by feeding the latent low dimensional space from semi-supervised VAE into a further step of the linear encoder-decoder model. The effectiveness of the proposed approach is demonstrated on brain development. The proposed method also is evaluated on imbalanced neural spiking classification.
Qiqiang Xu, Ji Zhang, Ting Yu, Wenbin Zhang, Mingli Zhang, Yonglong Luo and Fulong Chen
Proceedings of the 32nd International Conference on Databases and Expert Systems Applications (DEXA), online, 2021
Text classification is a fundamental task that is widely used in various sub-domains of natural language processing, such as information extraction, semantic understanding, etc. For the general text classification problems, various deep learning models, such as Bi-LSTM, Transformer, BERT, etc. have been used which achieved good performance. In this paper, however, we consider a new problem on how to deal with a special scenario in text classification which has a weak sequential relationship among different classification entities. A typical example is in the block classification of resumes where there are sequential relationships existing amongst different blocks. By fully utilizing this useful sequential feature, we in this paper propose an effective hybrid model which combines a fully connected neural network model and a block-level recurrent neural network model with feature fusion that makes full use of such a sequential feature. The experimental results show that the average F1-score value of our model on three 1,400 real resume datasets is 5.5–11% higher than the existing mainstream algorithms.
Xuejiao Tang, Xin Huang, Wenbin Zhang, Travers B. Child, Qiong Hu, Zhen Liu and Ji Zhang
Proceedings of the 23rd International Conference on Big Data Analytics and Knowledge Discovery (DaWaK), online, 2021
Visual Commonsense Reasoning (VCR) predicts an answer with corresponding rationale, given a question-image input. VCR is a recently introduced visual scene understanding task with a wide range of applications, including visual question answering, automated vehicle systems, and clinical decision support. Previous approaches to solving the VCR task generally rely on pre-training or exploiting memory with long dependency relationship encoded models. However, these approaches suffer from a lack of generalizability and prior knowledge. In this paper we propose a dynamic working memory based cognitive VCR network, which stores accumulated commonsense between sentences to provide prior knowledge for inference. Extensive experiments show that the proposed model yields significant improvements over existing methods on the benchmark VCR dataset. Moreover, the proposed model provides intuitive interpretation into visual commonsense reasoning. A Python implementation of our mechanism is publicly available at https://github.com/tanjatang/DMVCR
Wenbin Zhang, Mingli Zhang, Ji Zhang, Zhen Liu, Zhiyuan Chen, Jianwu Wang, Edward Raff and Enza Messina
Proceedings of the 32nd International Conference on Tools with Artificial Intelligence (ICTAI), online, 2020
Artificial intelligence (AI)-based decision-making systems are employed nowadays in an ever growing number of online as well as offline services-some of great importance. Depending on sophisticated learning algorithms and available data, these systems are increasingly becoming automated and data-driven. However, these systems can impact individuals and communities with ethical or legal consequences. Numerous approaches have therefore been proposed to develop decision-making systems that are discrimination-conscious by-design. However, these methods assume the underlying data distribution is stationary without drift, which is counterfactual in many realworld applications. In addition, their focus has been largely on minimizing discrimination while maximizing prediction performance without necessary flexibility in customizing the tradeoff according to different applications. To this end, we propose a learning algorithm for fair classification that also adapts to evolving data streams and further allows for a flexible control on the degree of accuracy and fairness. The positive results on a set of discriminated and non-stationary data streams demonstrate the effectiveness and flexibility of this approach.
Liming Zhang, Wenbin Zhang and Nathalie Japkowicz
Proceedings of the 25th International Conference on Pattern Recognition (ICPR), online, 2020
Recognizing human activities from multi-channel time series data collected from wearable sensors is ever more practical. However, in real-world conditions, coherent activities and body movements could happen at the same time, like moving head during walking or sitting. A new problem, so-called "Coherent Human Activity Recognition (Co-HAR)", is more complicated than normal multi-class classification tasks since signals of different movements are mixed and interfered with each other. On the other side, we consider such Co-HAR as a dense labelling problem that classify each sample on a time step with a label to provide high-fidelity and duration-varied support to applications. In this paper, a novel condition-aware deep architecture "Conditional-UNet" is developed to allow dense labeling for Co-HAR problem. We also contribute a first-of-its-kind Co-HAR dataset for head movement recognition under walk or sit condition for future research. Experiments on head gesture recognition show that our model achieve overall 2%-3% performance gain of F1 score over existing state-of-the-art deep methods, and more importantly, systematic and comprehensive improvements on real head gesture classes.
Wenbin Zhang and Albert Bifet
Proceedings of the 23rd International Conference on Discovery Science (DS), online, 2020
Fairness-aware learning is increasingly important in socially-sensitive applications for the sake of achieving optimal and non-discriminative decision-making. Most of the proposed fairness-aware learning algorithms process the data in offline settings and assume that the data is generated by a single concept without drift. Unfortunately, in many real-world applications, data is generated in a streaming fashion and can only be scanned once. In addition, the underlying generation process might also change over time. In this paper, we propose and illustrate an efficient algorithm for mining fair decision trees from discriminatory and continuously evolving data streams. This algorithm, called FEAT (Fairness-Enhancing and concept-Adapting Tree), is based on using the change detector to learn adaptively from non-stationary data streams, that also accounts for fairness. We study FEAT’s properties and demonstrate its utility through experiments on a set of discriminated and time-changing data streams.
Mingli Zhang, Xin Zhao, Wenbin Zhang, Ahmad Chaddad, Jean-Baptiste Poline and Alan Evans
Proceedings of the 31st International Conference on Databases and Expert Systems Applications (DEXA), online, 2020
Autism spectrum disorder (ASD) is a complex neurodevelopmental disorder characterized by deficiencies in social, communication and repetitive behaviors. We propose imaging-based ASD biomarkers to find the neural patterns related ASD as the primary goal of identifying ASD. The secondary goal is to investigate the impact of imaging-patterns for ASD. In this paper, we model and explore the identification of ASD by learning a representation of the T1 MRI and fMRI by fusioning a discriminative learning (DL) approach and deep convolutional neural network. Specifically, a class-wise analysis dictionary to generate non-negative low-rank encoding coefficients with the multi-model data, and an orthogonal synthesis dictionary to reconstruct the data. Then, we map the reconstructed data with the original multi-modal data as input of the deep learning model. Finally, the learned priors from both model are returned to the fusion framework to perform classification. The effectiveness of the proposed approach was tested on a world-wide cross-site (34) database of 1127 subjects, experiments show competitive results of the proposed approach. Furthermore, we were able to capture the status of brain neural patterns with the known input of the same modality.
Wenbin Zhang and Eirini Ntoutsi
Proceedings of the 28th International Joint Conference on Artificial Intelligence (IJCAI), Macau, China, 2019
Automated data-driven decision-making systems are ubiquitous across a wide spread of online as well as offline services. These systems, depend on sophisticated learning algorithms and available data, to optimize the service function for decision support assistance. However, there is a growing concern about the accountability and fairness of the employed models by the fact that often the available historic data is intrinsically discriminatory, i.e., the proportion of members sharing one or more sensitive attributes is higher than the proportion in the population as a whole when receiving positive classification, which leads to a lack of fairness in decision support system. A number of fairness-aware learning methods have been proposed to handle this concern. However, these methods tackle fairness as a static problem and do not take the evolution of the underlying stream population into consideration. In this paper, we introduce a learning mechanism to design a fair classifier for online stream based decision-making. Our learning model, FAHT (Fairness-Aware Hoeffding Tree), is an extension of the well-known Hoeffding Tree algorithm for decision tree induction over streams, that also accounts for fairness. Our experiments show that our algorithm is able to deal with discrimination in streaming environments, while maintaining a moderate predictive performance over the stream.
Wenbin Zhang, Xuejiao Tang and Jianwu Wang
IEEE International Conference on Data Mining (ICDM), PhD Forum Track, Beijing, China, 2019
Algorithmic data-driven decision-making systems are becoming increasingly automated and have enjoyed tremendous success in a variety of application domains. More recently, these systems are increasingly being used to render all sort of socially-sensitive decisions. Yet, these automated decisions can lead, even in the absence of an intention, to a lack of fairness in the sense that members sharing one or more sensitive attributes are being treated unequally. In this paper, we handle unfairness in both online and offline settings. We introduce an algorithm-agnostic learning mechanism for optimal and non-discriminative decision-making as appropriate. This translates to a fairness-aware learning schema which can be immediately applied to most existing algorithms and to general decision-making tasks in dynamic settings with join data distribution changes over time.
Wenbin Zhang, Jianwu Wang, Daeho Jin, Lazaros Oreopoulos and Zhibo Zhang
IEEE International Conference on Big Data (BigData), Seattle, USA, 2018
A self-organizing map (SOM) is a type of competitive artificial neural network, which projects the high-dimensional input space of the training samples into a low-dimensional space with the topology relations preserved. This makes SOMs supportive of organizing and visualizing complex data sets and have been pervasively used among numerous disciplines with different applications. Notwithstanding its wide applications, the self-organizing map is perplexed by its inherent randomness, which produces dissimilar SOM patterns even when being trained on identical training samples with the same parameters every time, and thus causes usability concerns for other domain practitioners and precludes more potential users from exploring SOM based applications in a broader spectrum. Motivated by this practical concern, we propose a deterministic approach as a supplement to the standard self-organizing map. In accordance with the theoretical design, the experimental results with satellite cloud data demonstrate the effective and efficient organization as well as simplification capabilities of the proposed approach.
Antonio Candelieri, Wenbin Zhang, Enza Messina and Francesco Archetti
IEEE International Conference on Big Data (BigData), Poster Track, Seattle, USA, 2018
This work stems from the Italian project H-CIM (Health-Care Intelligent Monitoring), aimed at developing a wearable sensor data streams based home-monitoring system to support self-rehabilitation of elderly outpatients. Different from the pervasive data stream applications, which are always accompanied by the evolution of unstable class concepts, this project requires stable standard and personalized rehabilitation exercises patterns be provided to assess outpatient's self-therapy progress at home. In this designed pipeline, the representation sequences of the personal standard rehabilitation exercises in wearable sensor streams is therefore first benchmarked, then an assessment system which integrates multistage data processing and analyzing is proposed to enable elders to manage their own rehabilitation progress properly. The system proved to be an effective tool for supporting compliance monitoring and personalized self-rehabilitation; it is currently under further development within the Italian project Home-IoT, with the aim to become a more general data stream analytics service, not only devoted to rehabilitation exercises assessment.
Wenbin Zhang and Jianwu Wang
EEE International Conference on Bioinformatics and Biomedicine (BIBM), Madrid, Spain, 2018
Recommender system seeks to assist and augment the natural social process of making choices without sufficient personal experience of the alternatives. They have become fundamental applications in electronic commerce and information access, assisting users to effectively pinpoint information that of their interests from large catalog spaces. Contrary to the pervasive utilization of recommender systems in domains such as electronic commerce, the application of recommendation system in medical domain is limited and further effort is needed. In addition, while a variety of approaches have been proposed for performing recommendation, including collaborative filtering, demographic recommender and other techniques, each individual method has its own drawbacks. This paper proposes a medical oriented recommendation system in which patient's background data is used to bootstrap the collaborative filtering engine and personalized suggestions are provided therein. We present empirical experiment results that show how the content-bootstrapped part of the system enhances the effectiveness of medical article recommendation of the collaborative filtering.
Wenbin Zhang and Jianwu Wang
IEEE International Congress on Big Data (BigData Congress), Honolulu, USA, 2017
The pervasive imbalanced class distribution occurring in real-world stream applications, such as surveillance, security and finance, in which data arrive continuously has sparked extensive interest in the study of imbalanced stream classification. In such applications, the evolution of unstable class concepts is always accompanied and complicated by the skewed class distribution. However, most of the existing methods focus on either class imbalance problem or non-stationary learning problem, the combined approach of addressing both issues has enjoyed relatively little research. In this paper, we propose a hybrid framework for imbalanced stream learning that consists of three components: classifier updating, resampling and cost sensitive classifier. Based on the framework, we propose a hybrid learning algorithm to combine data-level and algorithm-level methods as well as classifier retraining mechanics to tackle class imbalance in data streams. Our experiments using real-world datasets and synthetic datasets show that our proposed hybrid learning algorithm can have better effectiveness and efficiency.
2021
ICDM Best Paper Award Candidate, SIAM Early Career, NeurIPS, AISTATS and IEEE BigData Travel Awards
2020
AAAI Travel Award
2019
ACM SIGAI, C-Fair Youth Forum and ICDM Travel Awards
Dec 2022
Aug 2022
May 2022
Feb 2022
Aug 2021
Apr 2021
Apr 2021
Oct 2020
Oct 2020
Oct 2020
Oct 2019
Nov 2019
Jun 2018