UT Dallas Data Science Faculty Present Their Breakthrough Research At ACM KDD, The Premier Data Science Conference

Computer Science professors at UT Dallas continue to have a strong presence at top-tier data science and machine learning conferences. They recently published multiple papers at the ACM KDD (Knowledge Discovery and Data Mining) conference, the premier data science meeting held virtually from August 14-18, 2021. The papers focus on various topics, including novel Text Classification Methods, Fairness in Artificial Intelligence, and Core Decomposition in Networks such as social networks and transportation networks.

“Data Science is one of the top research and graduate education areas in our Computer Science Department, with a growing undergraduate education component. We have exceptional faculty that publish regularly in top-tier data science-related venues and collaborate across multiple schools at UT Dallas, using data science techniques for applications in social sciences, natural sciences, brain and behavioral sciences, management sciences, and engineering. We are building on this success and give our students an outstanding learning experience with respect to research and education in data science.”

Dr. Ovidiu Daescu, Interim CS Department Head and CS Professor.

Dr. Latifur Khan published a paper with his Ph.D. student Yibo Hu titled: Uncertainty-Aware Reliable Text Classification. Deep neural networks have significantly contributed to the success in predictive accuracy for classification tasks. However, they tend to make overconfident predictions in real-world settings, where domain shifting and out-of-distribution (OOD) examples exist. Most research on uncertainty estimation focuses on computer vision because it provides visual validation on uncertainty quality. However, little research has been presented in the natural language process domain. Unlike Bayesian methods that indirectly infer uncertainty through weight uncertainties, current evidential uncertainty-based methods explicitly model the uncertainty of class probabilities through subjective opinions. They further consider inherent uncertainty in data with different root causes, vacuity (i.e., uncertainty due to a lack of evidence), and dissonance (i.e., uncertainty due to conflicting evidence). In their paper, the authors firstly apply evidential uncertainty in OOD detection for text classification tasks. They subsequently propose an inexpensive framework that adopts both auxiliary outliers and pseudo-off-manifold samples to train the model with prior knowledge of a certain class that has high vacuity for OOD samples. Extensive empirical experiments demonstrate that their model based on evidential uncertainty outperforms other counterparts for detecting OOD examples. Their approach can be easily deployed to traditional recurrent neural networks and fine-tuned pre-trained transformers.

Dr. Murat Kantarcioglu and his colleagues Dr. Yulia Gel from the Department of Mathematical Sciences at UT Dallas, as well as Friedhelm Victor from the Technische Universität Berlin, Berlin, Germany and Dr. Cuneyt Gurcan Akcora from the University of Manitoba Canada, published the paper titled: Alphacore: Data Depth based Core Decomposition. Core decomposition in networks has proven useful for evaluating the importance of nodes and communities in a variety of application domains, ranging from biology to social networks and finance. However, existing core decomposition algorithms have limitations in simultaneously handling multiple nodes and edge attributes. The authors propose a novel unsupervised core decomposition method that can be easily applied to directed and weighted networks. Their algorithm, AlphaCore, combines multiple node properties in a systematic and mathematically rigorous way by using the notion of data depth. In addition, it can be used as a mixture of centrality measure and core decomposition. Compared to existing approaches, AlphaCore avoids the need to specify numerous thresholds or coefficients and yields meaningful quantitative and qualitative insights into network structural organization. The authors evaluate AlphaCore’s performance with a focus on financial, blockchain-based token networks, the social network Reddit and a transportation network of international flight routes. They compare their results with existing core decomposition and centrality algorithms. Using ground truth about node importance, they show that AlphaCore yields the best precision and recall results among core decomposition methods using the same input features.

Finally, Dr. Feng Chen published a paper with Dr. Bhavani Thuraisingham as well as his Ph.D. student Chen Zhao titled: Fairness-Aware Online Meta-learning. Fairness in AI and Machine Learning is emerging to be a crucial research area to ensure social good. In contrast with offline working approaches, two research paradigms are devised for online learning: (1) Online Meta-Learning (OML) learns good priors over model parameters (or learning to learn) in a sequential setting where tasks are revealed one after another. Although it provides a sub-linear regret bound, such techniques completely ignore the importance of learning with fairness which is a significant hallmark of human intelligence; (2) Online Fairness-Aware Learning that captures many classification problems for which fairness is a concern. But it aims to attain zero-shot generalization without any task-specific adaptation. This, therefore, limits the capability of a model to adapt to newly-arrived data. To overcome such issues and bridge the gap, this paper is the first to propose a novel online meta-learning algorithm, namely FFML, which is under the setting of unfairness prevention. The key part of FFML is to learn good priors of an online fair classification model’s primal and dual parameters that are associated with the model’s accuracy and fairness, respectively. The problem is formulated in the form of a bi-level convex-concave optimization. The theoretical analysis provides sub-linear upper bounds for loss regret and violation of cumulative fairness constraints. The experiments demonstrate the versatility of FFML by applying it to classification on three real-world datasets and show substantial improvements over the best prior work on the tradeoff between fairness and classification accuracy.

“The UT Dallas Data Science research community has come a long way over the years,” noted Dr. Bhavani Thuraisingham. “Our faculty have published papers in every top tier data science and AI venue, including ACM KDD, IEEE ICDM, SDM, AAAI, IJCAI, ACM SIGMOD, PVLDB, ACM WWW, IEEE DSAA, and IEEE ICDE. In addition, they have also published papers at the intersection of data science and cybersecurity at various top tier cybersecurity venues, including ACM CCS, IEEE S&P, NDSS, and Usenix Security,” she added. She also stated the following when reminiscing about the past, “I remember attending the first KDD conference in Montreal in 1995, which had around 200 participants. While the papers presented included some major breakthroughs for that time, one of the debates at that conference was who owns data mining? Is it the data management, statistics, or the machine learning community? Now, 26 years later, with so much work to do, the multiple communities have all joined forces and are working together and solving some challenging problems from fairness to text classification to decomposition in networks”. Thuraisingham added, “While we focus on both the foundations and a variety of applications of data science, AI and Data Science for Social Good is one of our major focus areas.”

ABOUT THE UT DALLAS COMPUTER SCIENCE DEPARTMENT

The UT Dallas Computer Science program is one of the largest Computer Science departments in the United States with over 3,315 bachelors-degree students, more than 1,110 master’s students, 165 Ph.D. students, 52 tenure-track faculty members, and 44 full-time senior lecturers, as of Fall 2019. With the University of Texas at Dallas’ unique history of starting as a graduate institution first, the CS Department is built on a legacy of valuing innovative research and providing advanced training for software engineers and computer scientists.