DeepNLP KDD2021 Accepted Paper List AI Robotic and STEM Top Conference & Journal Papers

Automated Mechanism Design for Strategic Classification: Abstract for KDD21 Keynote Talk

Vincent Conitzer

AI is increasingly making decisions, not only for us, but also about us -- from whether we are invited for an interview, to whether we are proposed as a match for someone looking for a date, to whether we are released on bail. Often, we have some control over the information that is available to the algorithm; we can self-report some information, and other information we can choose to withhold. This creates a potential circularity: the classifier used, mapping submitted information to outcomes, depends on the (training) data that people provide, but the (test) data depend on the classifier, because people will reveal their information strategically to obtain a more favorable outcome. This setting is not adversarial, but it is also not fully cooperative. Mechanism design provides a framework for making good decisions based on strategically reported information, and it is commonly applied to the design of auctions and matching mechanisms. However, the setting above is unlike these common applications, because in it, preferences tend to be similar across agents, but agents are restricted in what they can report. This creates both new challenges and new opportunities, as we demonstrate in our theoretical work and our initial experiments. This is joint work with Hanrui Zhang, Andrew Kephart, Yu Cheng, Anilesh Krishnaswamy, Haoming Li, and David Rein.

Data Science for Assembly Engineering

Sharon C. Glotzer

Discovery and design of new materials able to self assemble from nanoscale building blocks are becoming increasingly enabled by large-scale molecular simulation. Aided by fast simulation codes leveraging powerful computer architectures, an unprecedented amount of data can be generated in the blink of an eye, shifting the effort and focus of the computational scientist from the simulation to the data. How do we manage so much data, and what do we do with it when we have it? In this talk, we discuss the applications of data science and data-driven thinking to molecular and materials simulation. Although we do so in the context of assembly engineering of soft matter, the tools and techniques discussed are general and applicable to a wide range of problems. We present applications of machine learning to automated, structure identification of complex colloidal crystals, high-throughput mapping of phase diagrams, the study of kinetic pathways between fluid and solid phases, and the discovery of previously elusive design rules and structure-property relationships.Biography: Sharon C. Glotzer is the John W. Cahn Distinguished University Professor at the University of Michigan, Ann Arbor, the Stuart W. Churchill Collegiate Professor of Chemical Engineering, and the Anthony C. Lembke Department Chair of Chemical Engineering. She is also Professor of Materials Science and Engineering, Physics, Applied Physics, and Macromolecular Science and Engineering. Her research on computational assembly science and engineering aims toward predictive materials design of colloidal and soft matter: using computation, geometrical concepts, and statistical mechanics, her research group seeks to understand complex behavior emerging from simple rules and forces, and use that knowledge to design new classes of materials. Glotzers group also develops and disseminates powerful open-source software including the particle simulation toolkit, HOOMD-blue, which allows for fast molecular simulation of materials on graphics processors, the signac framework for data and workflow management, and several analysis and visualization tools. Glotzer received her B.S. in Physics from UCLA and her PhD in Physics from Boston University. She is a member of the National Academy of Sciences, the National Academy of Engineering and the American Academy of Arts and Sciences.

LawyerPAN: A Proficiency Assessment Network for Trial Lawyers

Yanqing An,Qi Liu,Han Wu,Kai Zhang,Linan Yue,Mingyue Cheng,Hongke Zhao,Enhong Chen

Assessing the proficiency of trial lawyers in different legal fields is of significant importance since a qualified lawyer or lawyer team can strive for his clients best rights while ensuring the fairness of litigations. However, proficiency assessment for lawyers is very challenging due to many technical and domain challenges, such as the lack of unified evaluation standards, and the complex interactions between lawyers and cases in real legal systems. To this end, we propose a novel proficiency assessment network for trial lawyers (LawyerPAN) to quantify lawyer proficiency through online litigation records. Specifically, we first leverage the theories in psychological measurement for mapping the proficiency of lawyers in each field into a unified real number space. Meanwhile, the characteristics of cases (i.e., case difficulty and discrimination) are well modeled to ensure fairness when assessing lawyers in different cases and fields. Then, we model the interactions between lawyers and cases from two perspectives: the anticipatory perspective aims to measure the personal proficiency of anticipated strategy, and the adversarial perspective seeks to depict the gap of lawyers proficiency between both sides (i.e., plaintiffs and defendants). Finally, we conduct extensive experiments on real-world data, and the results show the effectiveness and interpretability of our approaches on assessing the proficiency of trial lawyers.

Fine-Grained System Identification of Nonlinear Neural Circuits

Dawna Bagherian,James Gornet,Jeremy Bernstein,Yu-Li Ni,Yisong Yue,Markus Meister

We study the problem of sparse nonlinear model recovery of high dimensional compositional functions. Our study is motivated by emerging opportunities in neuroscience to recover fine-grained models of biological neural circuits using collected measurement data. Guided by available domain knowledge in neuroscience, we explore conditions under which one can recover the underlying biological circuit that generated the training data. Our results suggest insights of both theoretical and practical interests. Most notably, we find that a sign constraint on the weights is a necessary condition for system recovery, which we establish both theoretically with an identifiability guarantee and empirically on simulated biological circuits. We conclude with a case study on retinal ganglion cell circuits using data collected from mouse retina, showcasing the practical potential of this approach.

Multi-facet Contextual Bandits: A Neural Network Perspective

Yikun Ban,Jingrui He,Curtiss B. Cook

Contextual multi-armed bandit has shown to be an effective tool in recommender systems. In this paper, we study a novel problem of multi-facet bandits involving a group of bandits, each characterizing the users needs from one unique aspect. In each round, for the given user, we need to select one arm from each bandit, such that the combination of all arms maximizes the final reward. This problem can find immediate applications in E-commerce, healthcare, etc. To address this problem, we propose a novel algorithm, named MuFasa, which utilizes an assembled neural network to jointly learn the underlying reward functions of multiple bandits. It estimates an Upper Confidence Bound (UCB) linked with the expected reward to balance between exploitation and exploration. Under mild assumptions, we provide the regret analysis of MuFasa. It can achieve the near-optimal u00d5((K + 1) u221aT) regret bound where K is the number of bandits and T is the number of played rounds. Furthermore, we conduct extensive experiments to show that MuFasa outperforms strong baselines on real-world data sets.

Partial Label Dimensionality Reduction via Confidence-Based Dependence Maximization

Wei-Xuan Bao,Jun-Yi Hang,Min-Ling Zhang

Partial label learning deals with training examples each associated with a set of candidate labels, among which only one is valid. Most existing works focus on manipulating the label space by estimating the labeling confidences of candidate labels, while the task of manipulating the feature space by dimensionality reduction has been rarely investigated. In this paper, a novel partial label dimensionality reduction approach named CENDA is proposed via confidence-based dependence maximization. Specifically, CENDA adapts the Hilbert-Schmidt Independence Criterion (HSIC) to help identify the projection matrix, where the dependence between projected feature information and confidence-based labeling information is maximized iteratively. In each iteration, the projection matrix admits closed-form solution by solving a tailored generalized eigenvalue problem, while the labeling confidences of candidate labels are updated by conducting kNN aggregation in the projected feature space. Extensive experiments over a broad range of benchmark data sets show that the predictive performance of well-established partial label learning algorithms can be significantly improved by coupling with the proposed dimensionality reduction approach.

Fast One-class Classification using Class Boundary-preserving Random Projections

Arindam Bhattacharya,Sumanth Varambally,Amitabha Bagchi,Srikanta Bedathur

Several applications, like malicious URL detection and web spam detection, require classification on very high-dimensional data. In such cases anomalous data is hard to find but normal data is easily available. As such it is increasingly common to use a one-class classifier (OCC). Unfortunately, most OCC algorithms cannot scale to datasets with extremely high dimensions. In this paper, we present Fast Random projection-based One-Class Classification (FROCC), an extremely efficient, scalable and easily parallelizable method for one-class classification with provable theoretical guarantees. Our method is based on the simple idea of transforming the training data by projecting it onto a set of random unit vectors that are chosen uniformly and independently from the unit sphere, and bounding the regions based on separation of the data. FROCC can be naturally extended with kernels. We provide a new theoretical framework to prove that that FROCC generalizes well in the sense that it is stable and has low bias for some parameter settings. We then develop a fast scalable approximation of FROCC using vectorization, exploiting data sparsity and parallelism to develop a new implementation called ParDFROCC. ParDFROCC achieves up to 2 percent points better ROC than the next best baseline, with up to 12u00d7 speedup in training and test times over a range of state-of-the-art benchmarks for the OCC task.

Causal Models for Real Time Bidding with Repeated User Interactions

Martin Bompaire,Alexandre Gilotte,Benjamin Heymann

A large portion of online advertising displays are sold through an auction mechanism called Real Time Bidding (RTB). Each auction corresponds to a display opportunity, for which the competing advertisers need to precisely estimate the economical value in order to bid accordingly. This estimate is typically taken as the advertisers payoff for the target event -- such as a purchase on the merchant website attributed to this display -- times this event estimated probability. However, this greedy approach is too naive when several displays are shown to the same user. The purpose of the present paper is to discuss how such an estimation should be made when a user has already been shown one or more displays. Intuitively, while a user is more likely to make a purchase if the number of displays increases, the marginal effect of each display is expected to be decreasing. In this work, we first frame this bidding problem with repeated user interactions by using causal models to value each display individually. Then, based on this approach, we introduce a simple rule to improve the value estimate. This change shows both interesting qualitative properties that follow our previous intuition as well as quantitative improvements on a public data set and online in a production environment.

Aggregating Complex Annotations via Merging and Matching

Alexander Braylan,Matthew Lease

Human annotations are critical for training and evaluating supervised learning models, yet annotators often disagree with one another, especially as annotation tasks increase in complexity. A common strategy to improve label quality is to ask multiple annotators to label the same item and then aggregate their labels. While many aggregation models have been proposed for simple annotation tasks, how can we reason about and resolve annotator disagreement for more complex annotation tasks (e.g., continuous, structured, or high-dimensional), without needing to devise a new aggregation model for every different complex annotation task? We address two distinct challenges in this work. Firstly, how can a general aggregation model support merging of complex labels across diverse annotation tasks? Secondly, for multi-object annotation tasks that require annotators to provide multiple labels for each item being annotated (e.g., labeling named-entities in a text or visual entities in an image), how do we match which annotator label refers to which entity, such that only matching labels are aggregated across annotators? Using general constructs for merging and matching, our model not only supports diverse tasks, but delivers equal or better results than prior aggregation models: general and task-specific.

How Interpretable and Trustworthy are GAMs?

Chun-Hao Chang,Sarah Tan,Ben Lengerich,Anna Goldenberg,Rich Caruana

Generalized additive models (GAMs) have become a leading model class for interpretable machine learning. However, there are many algorithms for training GAMs, and these can learn different or even contradictory models, while being equally accurate. Which GAM should we trust? In this paper, we quantitatively and qualitatively investigate a variety of GAM algorithms on real and simulated datasets. We find that GAMs with high feature sparsity (only using a few variables to make predictions) can miss patterns in the data and be unfair to rare subpopulations. Our results suggest that inductive bias plays a crucial role in what interpretable models learn and that tree-based GAMs represent the best balance of sparsity, fidelity and accuracy and thus appear to be the most trustworthy GAM models.

Graph Deep Factors for Forecasting with Applications to Cloud Resource Allocation

Hongjie Chen,Ryan A. Rossi,Kanak Mahadik,Sungchul Kim,Hoda Eldardiry

Deep probabilistic forecasting techniques have recently been proposed for modeling large collections of time-series. However, these techniques explicitly assume either complete independence (local model) or complete dependence (global model) between time-series in the collection. This corresponds to the two extreme cases where every time-series is disconnected from every other time-series in the collection or likewise, that every time-series is related to every other time-series resulting in a completely connected graph. In this work, we propose a deep hybrid probabilistic graph-based forecasting framework called Graph Deep Factors (GraphDF) that goes beyond these two extremes by allowing nodes and their time-series to be connected to others in an arbitrary fashion. GraphDF is a hybrid forecasting framework that consists of a relational global and relational local model. In particular, we propose a relational global model that learns complex non-linear time-series patterns globally using the structure of the graph to improve both forecasting accuracy and computational efficiency. Similarly, instead of modeling every time-series independently, we learn a relational local model that not only considers its individual time-series but also the time-series of nodes that are connected in the graph. The experiments demonstrate the effectiveness of the proposed deep hybrid graph-based forecasting model compared to the state-of-the-art methods in terms of its forecasting accuracy, runtime, and scalability. Our case study reveals that GraphDF can successfully generate cloud usage forecasts and opportunistically schedule workloads to increase cloud cluster utilization by 47.5% on average.

On Breaking Truss-Based Communities

Huiping Chen,Alessio Conte,Roberto Grossi,Grigorios Loukides,Solon P. Pissis,Michelle Sweering

A k-truss is a graph such that each edge is contained in at least k-2 triangles. This notion has attracted much attention, because it models meaningful cohesive subgraphs of a graph. We introduce the problem of identifying a smallest edge subset of a given graph whose removal makes the graph k-truss-free. We also introduce a problem variant where the identified subset contains only edges incident to a given set of nodes and ensures that these nodes are not contained in any k-truss. These problems are directly applicable in communication networks: the identified edges correspond to vital network connections; or in social networks: the identified edges can be hidden by users or sanitized from the output graph. We show that these problems are NP-hard. We thus develop exact exponential-time algorithms to solve them. To process large networks, we also develop heuristics sped up by an efficient data structure for updating the truss decomposition under edge deletions. We complement our heuristics with a lower bound on the size of an optimal solution to rigorously evaluate their effectiveness. Extensive experiments on 10 real-world graphs show that our heuristics are effective (close to the optimal or to the lower bound) and also efficient (up to two orders of magnitude faster than a natural baseline).

Learning Elastic Embeddings for Customizing On-Device Recommenders

Tong Chen,Hongzhi Yin,Yujia Zheng,Zi Huang,Yang Wang,Meng Wang

In todays context, deploying data-driven services like recommendation on edge devices instead of cloud servers becomes increasingly attractive due to privacy and network latency concerns. A common practice in building compact on-device recommender systems is to compress their embeddings which are normally the cause of excessive parameterization. However, despite the vast variety of devices and their associated memory constraints, existing memory-efficient recommender systems are only specialized for a fixed memory budget in every design and training life cycle, where a new model has to be retrained to obtain the optimal performance while adapting to a smaller/larger memory budget. In this paper, we present a novel lightweight recommendation paradigm that allows a well-trained recommender to be customized for arbitrary device-specific memory constraints without retraining. The core idea is to compose elastic embeddings for each item, where an elastic embedding is the concatenation of a set of embedding blocks that are carefully chosen by an automated search function. Correspondingly, we propose an innovative approach, namely recommendation with universally learned elastic embeddings (RULE). To ensure the expressiveness of all candidate embedding blocks, RULE enforces a diversity-driven regularization when learning different embedding blocks. Then, a performance estimator-based evolutionary search function is designed, allowing for efficient specialization of elastic embeddings under any memory constraint for on-device recommendation. Extensive experiments on real-world datasets reveal the superior performance of RULE under tight memory budgets.

Lu Cheng,Ruocheng Guo,Kai Shu,Huan Liu

Recent years have witnessed remarkable progress towards computational fake news detection. To mitigate its negative impact, we argue that it is critical to understand what user attributes potentially cause users to share fake news. The key to this causal-inference problem is to identify confounders -- variables that cause spurious associations between treatments (e.g., user attributes) and outcome (e.g., user susceptibility). In fake news dissemination, confounders can be characterized by fake news sharing behavior that inherently relates to user attributes and online activities. Learning such user behavior is typically subject to selection bias in users who are susceptible to share news on social media. Drawing on causal inference theories, we first propose a principled approach to alleviating selection bias in fake news dissemination. We then consider the learned unbiased fake news sharing behavior as the surrogate confounder that can fully capture the causal links between user attributes and user susceptibility. We theoretically and empirically characterize the effectiveness of the proposed approach and find that it could be useful in protecting society from the perils of fake news.

Improve Learning from Crowds via Generative Augmentation

Zhendong Chu,Hongning Wang

Crowdsourcing provides an efficient label collection schema for supervised machine learning. However, to control annotation cost, each instance in the crowdsourced data is typically annotated by a small number of annotators. This creates a sparsity issue and limits the quality of machine learning models trained on such data. In this paper, we study how to handle sparsity in crowdsourced data using data augmentation. Specifically, we propose to directly learn a classifier by augmenting the raw sparse annotations. We implement two principles of high-quality augmentation using Generative Adversarial Networks: 1) the generated annotations should follow the distribution of authentic ones, which is measured by a discriminator; 2) the generated annotations should have high mutual information with the ground-truth labels, which is measured by an auxiliary network. Extensive experiments and comparisons against an array of state-of-the-art learning from crowds methods on three real-world datasets proved the effectiveness of our data augmentation framework. It shows the potential of our algorithm for low-budget crowdsourcing in general.

Graph Similarity Description: How Are These Graphs Similar?

Corinna Coupette,Jilles Vreeken

How do social networks differ across platforms? How do information networks change over time? Answering questions like these requires us to compare two or more graphs. This task is commonly treated as a measurement problem, but numerical answers give limited insight. Here, we argue that if the goal is to gain understanding, we should treat graph similarity assessment as a description problem instead. We formalize this problem as a model selection task using the Minimum Description Length principle, capturing the similarity of the input graphs in a common model and the differences between them in transformations to individual models. To discover good models, we propose Momo, which breaks the problem into two parts and introduces efficient algorithms for each. Through an extensive set of experiments on a wide range of synthetic and real-world graphs, we confirm that Momo works well in practice.

Bavarian: Betweenness Centrality Approximation with Variance-Aware Rademacher Averages

Cyrus Cousins,Chloe Wohlgemuth,Matteo Riondato

We present Bavarian, a collection of sampling-based algorithms for approximating the Betweenness Centrality (BC) of all vertices in a graph. Our algorithms use Monte-Carlo Empirical Rademacher Averages (MCERAs), a concept from statistical learning theory, to efficiently compute tight bounds on the maximum deviation of the estimates from the exact values. The MCERAs provide a sample-dependent approximation guarantee much stronger than the state of the art, thanks to its use of variance-aware probabilistic tail bounds. The flexibility of the MCERA allows us to introduce a unifying framework that can be instantiated with existing sampling-based estimators of BC, thus allowing a fair comparison between them, decoupled from the sample-complexity results with which they were originally introduced. Additionally, we prove novel sample-complexity results showing that, for all estimators, the sample size sufficient to achieve a desired approximation guarantee depends on the vertex-diameter of the graph, an easy-to-bound characteristic quantity. We also show progressive-sampling algorithms and extensions to other centrality measures, such as percolation centrality. Our extensive experimental evaluation of Bavarian shows the improvement over the state-of-the art made possible by the MCERA, and it allows us to assess the different trade-offs between sample size and accuracy guarantee offered by the different estimators.

Labeled Data Generation with Inexact Supervision

Enyan Dai,Kai Shu,Yiwei Sun,Suhang Wang

The recent advanced deep learning techniques have shown the promising results in various domains such as computer vision and natural language processing. The success of deep neural networks in supervised learning heavily relies on a large amount of labeled data. However, obtaining labeled data with target labels is often challenging due to various reasons such as cost of labeling and privacy issues, which challenges existing deep models. In spite of that, it is relatively easy to obtain data with inexact supervision, i.e., having labels/tags related to the target task. For example, social media platforms are overwhelmed with billions of posts and images with self-customized tags, which are not the exact labels for target classification tasks but are usually related to the target labels. It is promising to leverage these tags (inexact supervision) and their relations with target classes to generate labeled data to facilitate the downstream classification tasks. However, the work on this is rather limited. Therefore, we study a novel problem of labeled data generation with inexact supervision. We propose a novel generative framework named as ADDES which can synthesize high-quality labeled data for target classification tasks by learning from data with inexact supervision and the relations between inexact supervision and target classes. Experimental results on image and text datasets demonstrate the effectiveness of the proposed ADDES for generating realistic labeled data from inexact supervision to facilitate the target classification task.

MiniRocket: A Very Fast (Almost) Deterministic Transform for Time Series Classification

Angus Dempster,Daniel F. Schmidt,Geoffrey I. Webb

Rocket achieves state-of-the-art accuracy for time series classification with a fraction of the computational expense of most existing methods by transforming input time series using random convolutional kernels, and using the transformed features to train a linear classifier. We reformulate Rocket into a new method, MiniRocket. MiniRocket is up to 75 times faster than Rocket on larger datasets, and almost deterministic (and optionally, fully deterministic), while maintaining essentially the same accuracy. Using this method, it is possible to train and test a classifier on all of 109 datasets from the UCR archive to state-of-the-art accuracy in under 10 minutes. MiniRocket is significantly faster than any other method of comparable accuracy (including Rocket), and significantly more accurate than any other method of remotely similar computational expense.

ST-Norm: Spatial and Temporal Normalization for Multi-variate Time Series Forecasting

Jinliang Deng,Xiusi Chen,Renhe Jiang,Xuan Song,Ivor W. Tsang

Multi-variate time series (MTS) data is a ubiquitous class of data abstraction in the real world. Any instance of MTS is generated from a hybrid dynamical system with their specific dynamics normally unknown. The hybrid nature of such a dynamical system is a result of complex external impacts, which can be summarized as high-frequency and low-frequency from the temporal view, or global and local if we take the spatial view. These impacts also determine the forthcoming development of MTS making them paramount to capture in a time series forecasting task. However, conventional methods face intrinsic difficulties in disentangling the components yielded by each kind of impact from the raw data. To this end, we propose two kinds of normalization modules -- temporal and spatial normalization -- which separately refine the high-frequency component and the local component underlying the raw data. Moreover, both modules can be readily integrated into canonical deep learning architectures such as Wavenet and Transformer. Extensive experiments on three datasets are conducted to illustrate that, with additional normalization modules, the performance of the canonical architectures can be enhanced by a large margin in the application of MTS and achieves state-of-the-art results compared with existing MTS models.

DiffMG: Differentiable Meta Graph Search for Heterogeneous Graph Neural Networks

Yuhui Ding,Quanming Yao,Huan Zhao,Tong Zhang

In this paper, we propose a novel framework to automatically utilize task-dependent semantic information which is encoded in heterogeneous information networks (HINs). Specifically, we search for a meta graph, which can capture more complex semantic relations than a meta path, to determine how graph neural networks (GNNs) propagate messages along different types of edges. We formalize the problem within the framework of neural architecture search (NAS) and then perform the search in a differentiable manner. We design an expressive search space in the form of a directed acyclic graph (DAG) to represent candidate meta graphs for a HIN, and we propose task-dependent type constraint to filter out those edge types along which message passing has no effect on the representations of nodes that are related to the downstream task. The size of the search space we define is huge, so we further propose a novel and efficient search algorithm to make the total search cost on a par with training a single GNN once. Compared with existing popular NAS algorithms, our proposed search algorithm improves the search efficiency. We conduct extensive experiments on different HINs and downstream tasks to evaluate our method, and experimental results show that our method can outperform state-of-the-art heterogeneous GNNs and also improves efficiency compared with those methods which can implicitly learn meta paths.

Global Neighbor Sampling for Mixed CPU-GPU Training on Giant Graphs

Jialin Dong,Da Zheng,Lin F. Yang,George Karypis

Graph neural networks (GNNs) are powerful tools for learning from graph data and are widely used in various applications such as social network recommendation, fraud detection, and graph search. The graphs in these applications are typically large, usually containing hundreds of millions of nodes. Training GNN models on such large graphs efficiently remains a big challenge. Despite a number of sampling-based methods have been proposed to enable mini-batch training on large graphs, these methods have not been proved to work on truly industry-scale graphs, which require GPUs or mixed CPU-GPU training. The state-of-the-art sampling-based methods are usually not optimized for these real-world hardware setups, in which data movement between CPUs and GPUs is a bottleneck. To address this issue, we propose Global Neighborhood Sampling that aims at training GNNs on giant graphs specifically for mixed CPU-GPU training. The algorithm samples a global cache of nodes periodically for all mini-batches and stores them in GPUs. This global cache allows in-GPU importance sampling of mini-batches, which drastically reduces the number of nodes in a mini-batch, especially in the input layer, to reduce data copy between CPU and GPU and mini-batch computation without compromising the training convergence rate or model accuracy. We provide a highly efficient implementation of this method and show that our implementation outperforms an efficient node-wise neighbor sampling baseline by a factor of 2u00d7 ~ 4u00d7 on giant graphs. It outperforms an efficient implementation of LADIES with small layers by a factor of 2u00d7 ~ 14u00d7 while achieving much higher accuracy than LADIES. We also theoretically analyze the proposed algorithm and show that with cached node data of a proper size, it enjoys a comparable convergence rate as the underlying node-wise sampling method.

Individual Fairness for Graph Neural Networks: A Ranking based Approach

Yushun Dong,Jian Kang,Hanghang Tong,Jundong Li

Recent years have witnessed the pivotal role of Graph Neural Networks (GNNs) in various high-stake decision-making scenarios due to their superior learning capability. Close on the heels of the successful adoption of GNNs in different application domains has been the increasing societal concern that conventional GNNs often do not have fairness considerations. Although some research progress has been made to improve the fairness of GNNs, these works mainly focus on the notion of group fairness regarding different subgroups defined by a protected attribute such as gender, age, and race. Beyond that, it is also essential to study the GNN fairness at a much finer granularity (i.e., at the node level) to ensure that GNNs render similar prediction results for similar individuals to achieve the notion of individual fairness. Toward this goal, in this paper, we make an initial investigation to enhance the individual fairness of GNNs and propose a novel ranking based framework---REDRESS. Specifically, we refine the notion of individual fairness from a ranking perspective, and formulate the ranking based individual fairness promotion problem. This naturally addresses the issue of Lipschitz constant specification and distance calibration resulted from the Lipschitz condition in the conventional individual fairness definition. Our proposed framework REDRESS encapsulates the GNN model utility maximization and the ranking-based individual fairness promotion in a joint framework to enable end-to-end training. It is noteworthy mentioning that REDRESS is a plug-and-play framework and can be easily generalized to any prevalent GNN architectures. Extensive experiments on multiple real-world graphs demonstrate the superiority of REDRESS in achieving a good balance between model utility maximization and individual fairness promotion. Our open source code can be found here: https://github.com/yushundong/REDRESS.

Sylvester Tensor Equation for Multi-Way Association

Boxin Du,Lihui Liu,Hanghang Tong

How can we identify the same or similar users from a collection of social network platforms (e.g., Facebook, Twitter, LinkedIn, etc.)? Which restaurant shall we recommend to a given user at the right time at the right location? Given a disease, which genes and drugs are most relevant? Multi-way association, which identifies strongly correlated node sets from multiple input networks, is the key to answering these questions. Despite its importance, very few multi-way association methods exist due to its high complexity. In this paper, we formulate multi-way association as a convex optimization problem, whose optimal solution can be obtained by a Sylvester tensor equation. Furthermore, we propose two fast algorithms to solve the Sylvester tensor equation, with a linear time and space complexity. We further provide theoretic analysis in terms of the sensitivity of the Sylvester tensor equation solution. Empirical evaluations demonstrate the efficacy of the proposed method.

TabularNet: A Neural Network Architecture for Understanding Semantic Structures of Tabular Data

Lun Du,Fei Gao,Xu Chen,Ran Jia,Junshan Wang,Jiang Zhang,Shi Han,Dongmei Zhang

Tabular data are ubiquitous for the widespread applications of tables and hence have attracted the attention of researchers to extract underlying information. One of the critical problems in mining tabular data is how to understand their inherent semantic structures automatically. Existing studies typically adopt Convolutional Neural Network (CNN) to model the spatial information of tabular structures yet ignore more diverse relational information between cells, such as the hierarchical and paratactic relationships. To simultaneously extract spatial and relational information from tables, we propose a novel neural network architecture, TabularNet. The spatial encoder of TabularNet utilizes the row/column-level Pooling and the Bidirectional Gated Recurrent Unit (Bi-GRU) to capture statistical information and local positional correlation, respectively. For relational information, we design a new graph construction method based on the WordNet tree and adopt a Graph Convolutional Network (GCN) based encoder that focuses on the hierarchical and paratactic relationships between cells. Our neural network architecture can be a unified neural backbone for different understanding tasks and utilized in a multitask scenario. We conduct extensive experiments on three classification tasks with two real-world spreadsheet data sets, and the results demonstrate the effectiveness of our proposed TabularNet over state-of-the-art baselines.

When Comparing to Ground Truth is Wrong: On Evaluating GNN Explanation Methods

Lukas Faber,Amin K. Moghaddam,Roger Wattenhofer

We study the evaluation of graph explanation methods. The state of the art to evaluate explanation methods is to first train a GNN, then generate explanations, and finally compare those explanations with the ground truth. We show five pitfalls that sabotage this pipeline because the GNN does not use the ground-truth edges. Thus, the explanation method cannot detect the ground truth. We propose three novel benchmarks: (i) pattern detection, (ii) community detection, and (iii) handling negative evidence and gradient saturation. In a re-evaluation of state-of-the-art explanation methods, we show paths for improving existing methods and highlight further paths for GNN explanation research.

Large-Scale Subspace Clustering via k-Factorization

Jicong Fan

Subspace clustering (SC) aims to cluster data lying in a union of low-dimensional subspaces. Usually, SC learns an affinity matrix and then performs spectral clustering. Both steps suffer from high time and space complexity, which leads to difficulty in clustering large datasets. This paper presents a method called k-Factorization Subspace Clustering (k-FSC) for large-scale subspace clustering. K-FSC directly factorizes the data into k groups via pursuing structured sparsity in the matrix factorization model. Thus, k-FSC avoids learning affinity matrix and performing eigenvalue decomposition, and has low (linear) time and space complexity on large datasets. This paper proves the effectiveness of the k-FSC model theoretically. An efficient algorithm with convergence guarantee is proposed to solve the optimization of k-FSC. In addition, k-FSC is able to handle sparse noise, outliers, and missing data, which are pervasive in real applications. This paper also provides online extension and out-of-sample extension for k-FSC to handle streaming data and cluster arbitrarily large datasets. Extensive experiments on large-scale real datasets show that k-FSC and its extensions outperform state-of-the-art methods of subspace clustering.

Gaussian Process with Graph Convolutional Kernel for Relational Learning

Jinyuan Fang,Shangsong Liang,Zaiqiao Meng,Qiang Zhang

Gaussian Process (GP) offers a principled non-parametric framework for learning stochastic functions. The generalization capability of GPs depends heavily on the kernel function, which implicitly imposes the smoothness assumptions of the data. However, common feature-based kernel functions are inefficient to model the relational data, where the smoothness assumptions implied by the kernels are violated. To model the complex and non-differentiable functions over relational data, we propose a novel Graph Convolutional Kernel, which enables to incorporate relational structures to feature-based kernels to capture the statistical structure of data. To validate the effectiveness of proposed kernel function in modeling relational data, we introduce GP models with Graph Convolutional Kernel in two relational learning settings, i.e., unsupervised settings of link prediction and semi-supervised settings of object classification. The parameters of our GP models are optimized through the scalable variational inducing point method. However, the highly structured likelihood objective requires densely sampling from variational distributions, which is costly and makes its optimization challenging in the unsupervised settings. To tackle this challenge, we propose a Local Neighbor Sampling technique with a provable more efficient computational complexity. Experimental results on real-world datasets demonstrate that our model achieves state-of-the-art performance in two relational learning tasks.

Spatial-Temporal Graph ODE Networks for Traffic Flow Forecasting

Zheng Fang,Qingqing Long,Guojie Song,Kunqing Xie

Spatial-temporal forecasting has attracted tremendous attention in a wide range of applications, and traffic flow prediction is a canonical and typical example. The complex and long-range spatial-temporal correlations of traffic flow bring it to a most intractable challenge. Existing works typically utilize shallow graph convolution networks (GNNs) and temporal extracting modules to model spatial and temporal dependencies respectively. However, the representation ability of such models is limited due to: (1) shallow GNNs are incapable to capture long-range spatial correlations, (2) only spatial connections are considered and a mass of semantic connections are ignored, which are of great importance for a comprehensive understanding of traffic networks. To this end, we propose Spatial-Temporal Graph Ordinary Differential Equation Networks (STGODE).1 Specifically, we capture spatial-temporal dynamics through a tensor-based ordinary differential equation (ODE), as a result, deeper networks can be constructed and spatial-temporal features are utilized synchronously. To understand the network more comprehensively, semantical adjacency matrix is considered in our model, and a well-design temporal dialated convolution structure is used to capture long term temporal dependencies. We evaluate our model on multiple real-world traffic datasets and superior performance is achieved over state-of-the-art baselines.

Multiple-Instance Learning from Similar and Dissimilar Bags

Lei Feng,Senlin Shu,Yuzhou Cao,Lue Tao,Hongxin Wei,Tao Xiang,Bo An,Gang Niu

Multiple-instance learning (MIL) is an important weakly supervised binary classification problem, where training instances are arranged in bags, and each bag is assigned a positive or negative label. Most of the previous studies for MIL assume that training bags are fully labeled. However, in some real-world scenarios, it could be difficult to collect fully labeled bags, due to the expensive time and labor consumption of the labeling task. Fortunately, it could be much easier for us to collect similar and dissimilar bags (indicating whether two bags share the same label or not), because we do not need to figure out the underlying label of each bag in this case. Therefore, in this paper, we for the first time investigate MIL from only similar and dissimilar bags. To solve this new MIL problem, we propose a convex formulation to train a bag-level classifier based on empirical risk minimization and theoretically derive a generalization error bound. In addition, we also propose a strong baseline for this new MIL problem, which aims to train an instance-level classifier by minimizing the instance-level empirical risk. Extensive experimental results clearly demonstrate that our proposed baseline works well, while our proposed convex formulation is even better.

Differentiable Pattern Set Mining

Jonas Fischer,Jilles Vreeken

Pattern set mining has been successful in discovering small sets of highly informative and useful patterns from data. To find good models, existing methods heuristically explore the twice-exponential search space over all possible pattern sets in a combinatorial way, by which they are limited to data over at most hundreds of features, as well as likely to get stuck in local minima. Here, we propose a gradient based optimization approach that allows us to efficiently discover high-quality pattern sets from data of millions of rows and hundreds of thousands of features.In particular, we propose a novel type of neural autoencoder called BinaPs, using binary activations and binarizing weights in each forward pass, which are directly interpretable as conjunctive patterns. For training, optimizing a data-sparsity aware reconstruction loss, continuous versions of the weights are learned in small, noisy steps. This formulation provides a link between the discrete search space and continuous optimization, thus allowing for a gradient based strategy to discover sets of high-quality and noise-robust patterns. Through extensive experiments on both synthetic and real world data, we show that BinaPs discovers high quality and noise robust patterns, and unique among all competitors, easily scales to data of supermarket transactions or biological variant calls.

ProgRPGAN: Progressive GAN for Route Planning

Tao-yang Fu,Wang-Chien Lee

Learning to route has received significant research momentum as anew approach for the route planning problem in intelligent transportation systems. By exploring global knowledge of geographical areas and topological structures of road networks to facilitate route planning, in this work, we propose a novel Generative Adversarial Network (GAN) framework, namely Progressive Route Planning GAN (ProgRPGAN), for route planning in road networks. The novelty of ProgRPGAN lies in the following aspects: 1) we propose to plan a route with levels of increasing map resolution, starting on a low-resolution grid map, gradually refining it on higher-resolution grid maps, and eventually on the road network in order to progressively generate various realistic paths; 2) we propose to transfer parameters of the previous-level generator and discriminator to the subsequent generator and discriminator for parameter initialization in order to improve the efficiency and stability in model learning; and 3) we propose to pre-train embeddings of grid cells in grid maps and intersections in the road network by capturing the network topology and external factors to facilitate effective model learn-ing. Empirical result shows that ProgRPGAN soundly outperforms the state-of-the-art learning to route methods, especially for long routes, by 9.46% to 13.02% in F1-measure on multiple large-scale real-world datasets. ProgRPGAN, moreover, effectively generates various realistic routes for the same query.

Probabilistic and Dynamic Molecule-Disease Interaction Modeling for Drug Discovery

Tianfan Fu,Cao Xiao,Cheng Qian,Lucas M. Glass,Jimeng Sun

Drug discovery aims at finding promising drug molecules for treating target diseases. Existing computational drug discovery methods mainly depend on molecule databases, ignoring valuable data collected from clinical trials. In this work, we propose PRIME to leverage high-quality drug molecules and drug-disease relations in historical clinical trials to narrow down the molecular search space in drug discovery. PRIME also introduces time dependency constraints to model evolving drug-disease relations using a probabilistic deep learning model that can quantify model uncertainty. We evaluated PRIME against leading models on both de novo design and drug repurposing tasks. Results show that compared with the best baselines, PRIME achieves 25.9% relative improvement (i.e., reduction) in average hit-ranking on drug repurposing and 47.6% relative improvement in success rate on de novo design.

Efficient Data-specific Model Search for Collaborative Filtering

Chen Gao,Quanming Yao,Depeng Jin,Yong Li

Collaborative filtering (CF), as a fundamental approach for recommender systems, is usually built on the latent factor model with learnable parameters to predict users preferences towards items. However, designing a proper CF model for a given data is not easy, since the properties of datasets are highly diverse. In this paper, motivated by the recent advances in automated machine learning (AutoML), we propose to design a data-specific CF model by AutoML techniques. The key here is a new framework that unifies state-of-the-art (SOTA) CF methods and splits them into disjoint stages of input encoding, embedding function, interaction function, and prediction function. We further develop an easy-to-use, robust, and efficient search strategy, which utilizes random search and a performance predictor for efficient searching within the above framework. In this way, we can combinatorially generalize data-specific CF models, which have not been visited in the literature, from SOTA ones. Extensive experiments on five real-world datasets demonstrate that our method can consistently outperform SOTA ones for various CF tasks. Further experiments verify the rationality of the proposed framework and the efficiency of the search strategy. The searched CF models can also provide insights for exploring more effective methods in the future.

Unsupervised Graph Alignment with Wasserstein Distance Discriminator

Ji Gao,Xiao Huang,Jundong Li

Graph alignment aims to identify node correspondence across multiple graphs, with significant implications in various domains. As supervision information is often not available, unsupervised methods have attracted a surge of research interest recently. Most of existing unsupervised methods assume that corresponding nodes should have similar local structure, which, however, often does not hold. Meanwhile, rich node attributes are often available and have shown to be effective in alleviating the above local topology inconsistency issue. Motivated by the success of graph convolution networks (GCNs) in fusing network and node attributes for various learning tasks, we aim to tackle the graph alignment problem on the basis of GCNs. However, directly grafting GCNs to graph alignment is often infeasible due to multi-faceted challenges. To bridge the gap, we propose a novel unsupervised graph alignment framework WAlign. We first develop a lightweight GCN architecture to capture both local and global graph patterns and their inherent correlations with node attributes. Then we prove that in the embedding space, obtaining optimal alignment results is equivalent to minimizing the Wasserstein distance between embeddings of nodes from different graphs. Towards this, we propose a novel Wasserstein distance discriminator to identify candidate node correspondence pairs for updating node embeddings. The whole process acts like a two-player game, and in the end, we obtain discriminative embeddings that are suitable for the alignment task. Extensive experiments on both synthetic and real-world datasets validate the effectiveness and efficiency of the proposed framework WAlign.

Maxmin-Fair Ranking: Individual Fairness under Group-Fairness Constraints

David Garcu00eda-Soriano,Francesco Bonchi

We study a novel problem of fairness in ranking aimed at minimizing the amount of individual unfairness introduced when enforcing group-fairness constraints. Our proposal is rooted in the distributional maxmin fairness theory, which uses randomization to maximize the expected satisfaction of the worst-off individuals. We devise an exact polynomial-time algorithm to find maxmin-fair distributions of general search problems (including, but not limited to, ranking), and show that our algorithm can produce rankings which, while satisfying the given group-fairness constraints, ensure that the maximum possible value is to individuals.

Boosted Second Price Auctions: Revenue Optimization for Heterogeneous Bidders

Negin Golrezaei,Max Lin,Vahab Mirrokni,Hamid Nazerzadeh

The second price auction has been the prevalent auction format used by advertising exchanges because of its simplicity and desirable incentive properties. However, even with an optimized choice of reserve prices, this auction is not revenue optimal when the bidders are heterogeneous and their valuation distributions differ significantly. In order to optimize the revenue of advertising exchanges, we propose an auction format called the boosted second price auction, which assigns a boost value to each bidder. The auction favors bidders with higher boost values and allocates the item to the bidder with the highest boosted bid. We propose a data-driven approach to optimize boost values using the previous bids of the bidders. Our analysis of auction data from Googles online advertising exchange shows that the boosted second price auction with data-optimized boost values outperforms the second price auction and empirical Myerson auction by up to 6% and 3%, respectively.

Meaning Error Rate: ASR domain-specific metric framework

Ludmila Gordeeva,Vasily Ershov,Oleg Gulyaev,Igor Kuralenok

Speech recognition became a popular task during the last decade. Automatic speech recognition (ASR) systems are used in many fields: virtual assistants, call-center automation, device speech interfaces, etc. Each application defines its own measure of quality. Improvement in one domain could lead to loss of the recognition quality in the other domain. For ASR services open to the public, it is essential to provide reasonable quality for all customers in their scenarios. State-of-the-art metrics currently do not fit well for this purpose as they do not adapt to domain specifics. In our work, we build a speech recognition quality evaluation framework that unifies feedback coming from different types of customers into a single metric. For this purpose, we collect feedback from customers, train a new dedicated metric for each customer based on their feedback, and finally aggregate these metrics in a single criterion of quality. The resulting metrics have two significant properties: they compare recognition quality in different domains, and their results are easy to interpret.

Towards Computing a Near-Maximum Weighted Independent Set on Massive Graphs

Jiewei Gu,Weiguo Zheng,Yuzheng Cai,Peng Peng

The vertices in many graphs are weighted unequally in real scenarios, but the previous studies on the maximum independent set (MIS) ignore the weights of vertices. Therefore, the weight of an MIS may not necessarily be the largest. In this paper, we study the problem of maximum weighted independent set (MWIS) that is defined as the set of independent vertices with the largest weight. Since it is intractable to deliver the exact solution for large graphs, we design a reducing and tie-breaking framework to compute a near-maximum weighted independent set. The reduction rules are critical to reduce the search space for both exact and greedy algorithms as they determine the vertices that are definitely (or not) in the MWIS while preserving the correctness of solutions. We devise a set of novel reductions including low-degree reductions and high-degree reductions for general weighted graphs. Extensive experimental studies over real graphs confirm that our proposed method outperforms the state-of-the-arts significantly in terms of both effectiveness and efficiency.

UCPhrase: Unsupervised Context-aware Quality Phrase Tagging

Xiaotao Gu,Zihan Wang,Zhenyu Bi,Yu Meng,Liyuan Liu,Jiawei Han,Jingbo Shang

Identifying and understanding quality phrases from context is a fundamental task in text mining. The most challenging part of this task arguably lies in uncommon, emerging, and domain-specific phrases. The infrequent nature of these phrases significantly hurts the performance of phrase mining methods that rely on sufficient phrase occurrences in the input corpus. Context-aware tagging models, though not restricted by frequency, heavily rely on domain experts for either massive sentence-level gold labels or handcrafted gazetteers. In this work, we propose UCPhrase, a novel unsupervised context-aware quality phrase tagger. Specifically, we induce high-quality phrase spans as silver labels from consistently co-occurring word sequences within each document. Compared with typical context-agnostic distant supervision based on existing knowledge bases (KBs), our silver labels root deeply in the input domain and context, thus having unique advantages in preserving contextual completeness and capturing emerging, out-of-KB phrases. Training a conventional neural tagger based on silver labels usually faces the risk of overfitting phrase surface names. Alternatively, we observe that the contextualized attention maps generated from a transformer-based neural language model effectively reveal the connections between words in a surface-agnostic way. Therefore, we pair such attention maps with the silver labels to train a lightweight span prediction model, which can be applied to new input to recognize (unseen) quality phrases regardless of their surface names or frequency. Thorough experiments on various tasks and datasets, including corpus-level phrase ranking, document-level keyphrase extraction, and sentence-level phrase tagging, demonstrate the superiority of our design over state-of-the-art pre-trained, unsupervised, and distantly supervised methods.

Dual Graph enhanced Embedding Neural Network for CTR Prediction

Wei Guo,Rong Su,Renhao Tan,Huifeng Guo,Yingxue Zhang,Zhirong Liu,Ruiming Tang,Xiuqiang He

CTR prediction, which aims to estimate the probability that a user will click an item, plays a crucial role in online advertising and recommender system. Feature interaction modeling based and user interest mining based methods are the two kinds of most popular techniques that have been extensively explored for many years and have made great progress for CTR prediction. However, (1) feature interaction based methods which rely heavily on the co-occurrence of different features, may suffer from the feature sparsity problem (i.e., many features appear few times); (2) user interest mining based methods which need rich user behaviors to obtain users diverse interests, are easy to encounter the behavior sparsity problem (i.e., many users have very short behavior sequences). To solve these problems, we propose a novel module named Dual Graph enhanced Embedding, which is compatible with various CTR prediction models to alleviate these two problems. We further propose a Dual Graph enhanced Embedding Neural Network(DG-ENN) for CTR prediction. Dual Graph enhanced Embedding exploits the strengths of graph representation with two carefully designed learning strategies (divide-and-conquer, curriculum-learning-inspired organized learning) to refine the embedding. We conduct comprehensive experiments on three real-world industrial datasets. The experimental results show that our proposed DG-ENN significantly outperforms state-of-the-art CTR prediction models. Moreover, when applying to state-of-the-art CTR prediction models, Dual graph enhanced embedding always obtains better performance. Further case studies prove that our proposed dual graph enhanced embedding could alleviate the feature sparsity and behavior sparsity problems. Our framework will be open-source based on MindSpore in the near future.

Deep Generative Models for Spatial Networks

Xiaojie Guo,Yuanqi Du,Liang Zhao

Spatial networks represent crucial data structures where the nodes and edges are embedded in a geometric space. Nowadays, spatial network data is becoming increasingly popular and important, ranging from microscale (e.g., protein structures), to middle-scale (e.g., biological neural networks), to macro-scale (e.g., mobility networks). Although, modeling and understanding the generative process of spatial networks are very important, they remain largely under-explored due to the significant challenges in automatically modeling and distinguishing the independency and correlation among various spatial and network factors. To address these challenges, we first propose a novel objective for joint spatial-network disentanglement from the perspective of information bottleneck as well as a novel optimization algorithm to optimize the intractable objective. Based on this, a spatial-network variational autoencoder (SND-VAE) with a new spatial-network message passing neural network (S-MPNN) is proposed to discover the independent and dependent latent factors of spatial and networks. Qualitative and quantitative experiments on both synthetic and real-world datasets demonstrate the superiority of the proposed model over the state-of-the-arts by up to 66.9% for graph generation and 37.3% for interpretability.

Subset Node Representation Learning over Large Dynamic Graphs

Xingzhi Guo,Baojian Zhou,Steven Skiena

Dynamic graph representation learning is a task to learn node embeddings over dynamic networks, and has many important applications, including knowledge graphs, citation networks to social networks. Graphs of this type are usually large-scale but only a small subset of vertices are related in downstream tasks. Current methods are too expensive to this setting as the complexity is at best linear-dependent on both the number of nodes and edges.In this paper, we propose a new method, namely Dynamic Personalized PageRank Embedding (DynamicPPE) for learning a target subset of node representations over large-scale dynamic networks. Based on recent advances in local node embedding and a novel computation of dynamic personalized PageRank vector (PPV), DynamicPPE has two key ingredients: 1) the per-PPV complexity is O (m d / u03b5) where m, d, and u03b5 are the number of edges received, average degree, global precision error respectively. Thus, the per-edge event update of a single node is only dependent on d in average; and 2) by using these high quality PPVs and hash kernels, the learned embeddings have properties of both locality and global consistency. These two make it possible to capture the evolution of graph structure effectively.Experimental results demonstrate both the effectiveness and efficiency of the proposed method over large-scale dynamic networks. We apply DynamicPPE to capture the embedding change of Chinese cities in the Wikipedia graph during this ongoing COVID-19 pandemic. https://en.wikipedia.org/wiki/COVID-19_pandemic. Our results show that these representations successfully encode the dynamics of the Wikipedia graph.

Generalized Zero-Shot Extreme Multi-label Learning

Nilesh Gupta,Sakina Bohra,Yashoteja Prabhu,Saurabh Purohit,Manik Varma

Extreme Multi-label Learning (XML) involves assigning the subset of most relevant labels to a data point from millions of label choices. A hitherto unaddressed challenge in XML is that of predicting unseen labels with no training points. These form a significant fraction of total labels and contain fresh and personalized information desired by end users. Most existing extreme classifiers are not equipped for zero-shot label prediction and hence fail to leverage unseen labels. As a remedy, this paper proposes a novel approach called ZestXML for the task of Generalized Zero-shot XML (GZXML) where relevant labels have to be chosen from all available seen and unseen labels. ZestXML learns to project a data points features close to the features of its relevant labels through a highly sparsified linear transform. This L0-constrained linear map between the two high-dimensional feature vectors is tractably recovered through a novel optimizer based on Hard Thresholding. By effectively leveraging the sparsities in features, labels and the learnt model, ZestXML achieves higher accuracy and smaller model size than existing XML approaches while also promoting efficient training & prediction, real-time label update as well as explainable prediction.Experiments on large-scale GZXML datasets demonstrated that ZestXML can be up to 14% and 10% more accurate than state-of-the-art extreme classifiers and leading BERT-based dense retrievers respectively, while having 10x smaller model size. ZestXML trains on largest dataset with 31M labels in just 30 hours on a single core of a commodity desktop. When added to an large ensemble of existing models in Bing Sponsored Search Advertising, ZestXML significantly improved click yield of IR based system by 17% and unseen query coverage by 3.4% respectively. ZestXMLs source code and benchmark datasets for GZXML will be publically released for research purposes here.

Graph Summarization with Controlled Utility Loss

Mahdi Hajiabadi,Jasbir Singh,Venkatesh Srinivasan,Alex Thomo

We present new algorithms for graph summarization where the loss in utility is fully controllable by the user. Specifically, we make three key contributions. First, we present a utility-driven graph summarization method G-SCIS, based on a clique and independent set decomposition, that produces optimal compression with zero loss of utility. The compression provided is significantly better than state-of-the-art in lossless graph summarization, while the runtime is two orders of magnitude lower. Second, we propose a highly scalable, utility-driven algorithm, T-BUDS, for fully controlled lossy summarization. It achieves high scalability by combining memory reduction using Maximum Spanning Tree with a novel binary search procedure. T-BUDS outperforms state-of-the-art drastically in terms of the quality of summarization and is about two orders of magnitude better in terms of speed. In contrast to the competition, we are able to handle web-scale graphs in a single machine without performance impediment as the utility threshold (and size of summary) decreases. Third, we show that our graph summaries can be used as-is to answer several important classes of queries, such as triangle enumeration, Pagerank and shortest paths.

Dynamic and Multi-faceted Spatio-temporal Deep Learning for Traffic Speed Forecasting

Liangzhe Han,Bowen Du,Leilei Sun,Yanjie Fu,Yisheng Lv,Hui Xiong

Dynamic Graph Neural Networks (DGNNs) have become one of the most promising methods for traffic speed forecasting. However, when adapting DGNNs for traffic speed forecasting, existing approaches are usually built on a static adjacency matrix (no matter predefined or self-learned) to learn spatial relationships among different road segments, even if the impact of two road segments can be changeable dynamically during a day. Moreover, the future traffic speed cannot only be related with the current traffic speed, but also be affected by other factors such as traffic volumes. To this end, in this paper, we aim to explore these dynamic and multi-faceted spatio-temporal characteristics inherent in traffic data for further unleashing the power of DGNNs for better traffic speed forecasting. Specifically, we design a dynamic graph construction method to learn the time-specific spatial dependencies of road segments. Then, a dynamic graph convolution module is proposed to aggregate hidden states of neighbor nodes to focal nodes by message passing on the dynamic adjacency matrices. Moreover, a multi-faceted fusion module is provided to incorporate the auxiliary hidden states learned from traffic volumes with the primary hidden states learned from traffic speeds. Finally, experimental results on real-world data demonstrate that our method can not only achieve the state-of-the-art prediction performances, but also obtain the explicit and interpretable dynamic spatial relationships of road segments.

A Graph-based Approach for Trajectory Similarity Computation in Spatial Networks

Peng Han,Jin Wang,Di Yao,Shuo Shang,Xiangliang Zhang

Trajectory similarity computation is an essential operation in many applications of spatial data analysis. In this paper, we study the problem of trajectory similarity computation over spatial network, where the real distances between objects are reflected by the network distance. Unlike previous studies which learn the representation of trajectories in Euclidean space, it requires to capture not only the sequence information of the trajectory but also the structure of spatial network. To this end, we propose GTS, a brand new framework that can jointly learn both factors so as to accurately compute the similarity. It first learns the representation of each point-of-interest (POI) in the road network along with the trajectory information. This is realized by incorporating the distances between POIs and trajectory in the random walk over the spatial network as well as the loss function. Then the trajectory representation is learned by a Graph Neural Network model to identify neighboring POIs within the same trajectory, together with an LSTM model to capture the sequence information in the trajectory. We conduct comprehensive evaluation on several real world datasets. The experimental results demonstrate that our model substantially outperforms all existing approaches.

Adaptive Transfer Learning on Graph Neural Networks

Xueting Han,Zhenhuan Huang,Bang An,Jing Bai

Graph neural networks (GNNs) is widely used to learn a powerful representation of graph-structured data. Recent work demonstrates that transferring knowledge from self-supervised tasks to downstream tasks could further improve graph representation. However, there is an inherent gap between self-supervised tasks and downstream tasks in terms of optimization objective and training data. Conventional pre-training methods may be not effective enough on knowledge transfer since they do not make any adaptation for downstream tasks. To solve such problems, we propose a new transfer learning paradigm on GNNs which could effectively leverage self-supervised tasks as auxiliary tasks to help the target task. Our methods would adaptively select and combine different auxiliary tasks with the target task in the fine-tuning stage. We design an adaptive auxiliary loss weighting model to learn the weights of auxiliary tasks by quantifying the consistency between auxiliary tasks and the target task. In addition, we learn the weighting model through meta-learning. Our methods can be applied to various transfer learning approaches, it performs well not only in multi-task learning but also in pre-training and fine-tuning. Comprehensive experiments on multiple downstream tasks demonstrate that the proposed methods can effectively combine auxiliary tasks with the target task and significantly improve the performance compared to state-of-the-art methods.

Pruning-Aware Merging for Efficient Multitask Inference

Xiaoxi He,Dawei Gao,Zimu Zhou,Yongxin Tong,Lothar Thiele

Many mobile applications demand selective execution of multiple correlated deep learning inference tasks on resource-constrained platforms. Given a set of deep neural networks, each pre-trained for a single task, it is desired that executing arbitrary combinations of tasks yields minimal computation cost. Pruning each network separately yields suboptimal computation cost due to task relatedness. A promising remedy is to merge the networks into a multitask network to eliminate redundancy across tasks before network pruning. However, pruning a multitask network combined by existing network merging schemes cannot minimise the computation cost of every task combination because they do not consider such a future pruning. To this end, we theoretically identify the conditions such that pruning a multitask network minimises the computation of all task combinations. On this basis, we propose Pruning-Aware Merging (PAM), a heuristic network merging scheme to construct a multitask network that approximates these conditions. The merged network is then ready to be further pruned by existing network pruning methods. Evaluations with different pruning schemes, datasets, and network architectures show that PAM achieves up to 4.87x less computation against the baseline without network merging, and up to 2.01x less computation against the baseline with a state-of-the-art network merging scheme.

DARING: Differentiable Causal Discovery with Residual Independence

Yue He,Peng Cui,Zheyan Shen,Renzhe Xu,Furui Liu,Yong Jiang

Discovering causal structure among a set of variables is a crucial task in various scientific and industrial scenarios. Given finite i.i.d. samples from a joint distribution, causal discovery is a challenging combinatorial problem in nature. The recent development in functional causal models, especially the NOTEARS provides a differentiable optimization framework for causal discovery. They formulate the structure learning problem as a task of maximum likelihood estimation over observational data (i.e., variable reconstruction) with specified structural constraints such as acyclicity and sparsity. Despite its success in terms of scalability, we find that optimizing the objectives of these differentiable methods is not always consistent with the correctness of learned causal graph especially when the variables carry heterogeneous noises (i.e., different noise types and noise variances) in real data from wild environments. In this paper, we provide the justification that their proneness to erroneous structures is mainly caused by the over-reconstruction problem, i.e., the noises of variables are absorbed into the variable reconstruction process, leading to the dependency among variable reconstruction residuals, and thus raise structure identifiability problems according to FCM theories. To remedy this, we propose a novel differentiable method DARING by imposing explicit residual independence constraint in an adversarial way. Extensive experimental results on both simulation and real data show that our proposed method is insensitive to the heterogeneity of external noise, and thus can significantly improve the causal discovery performances.

PcDGAN: A Continuous Conditional Diverse Generative Adversarial Network For Inverse Design

Amin Heyrani Nobari,Wei Chen,Faez Ahmed

Engineering design tasks often require synthesizing new designs that meet desired performance requirements. The conventional design process, which requires iterative optimization and performance evaluation, is slow and dependent on initial designs. Past work has used conditional generative adversarial networks (cGANs) to enable direct design synthesis for given target performances. However, most existing cGANs are restricted to categorical conditions. Recent work on Continuous conditional GAN (CcGAN) tries to address this problem, but still faces two challenges: 1) it performs poorly on non-uniform performance distributions, and 2) the generated designs may not cover the entire design space. We propose a new model, named Performance Conditioned Diverse Generative Adversarial Network (PcDGAN), which introduces a singular vicinal loss combined with a Determinantal Point Processes (DPP) based loss function to enhance diversity. PcDGAN uses a new self-reinforcing score called the Lambert Log Exponential Transition Score (LLETS) for improved conditioning. Experiments on synthetic problems and a real-world airfoil design problem demonstrate that PcDGAN outperforms state-of-the-art GAN models and improves the conditioning likelihood by 69% in an airfoil generation task and up to 78% in synthetic conditional generation tasks and achieves greater design space coverage. The proposed method enables efficient design synthesis and design space exploration with applications ranging from CAD model generation to metamaterial selection.

Federated Adversarial Debiasing for Fair and Transferable Representations

Junyuan Hong,Zhuangdi Zhu,Shuyang Yu,Zhangyang Wang,Hiroko H. Dodge,Jiayu Zhou

Federated learning is a distributed learning framework that is communication efficient and provides protection over participating users raw training data. One outstanding challenge of federate learning comes from the users heterogeneity, and learning from such data may yield biased and unfair models for minority groups. While adversarial learning is commonly used in centralized learning for mitigating bias, there are significant barriers when extending it to the federated framework. In this work, we study these barriers and address them by proposing a novel approach Federated Adversarial DEbiasing (FADE). FADE does not require users sensitive group information for debiasing and offers users the freedom to opt-out from the adversarial component when privacy or computational costs become a concern. We show that ideally, FADE can attain the same global optimality as the one by the centralized algorithm. We then analyze when its convergence may fail in practice and propose a simple yet effective method to address the problem. Finally, we demonstrate the effectiveness of the proposed framework through extensive empirical studies, including the problem settings of unsupervised domain adaptation and fair learning. Our codes and pretrained models are available at: https://github.com/illidanlab/FADE.

Uncertainty-Aware Reliable Text Classification

Yibo Hu,Latifur Khan

Deep neural networks have significantly contributed to the success in predictive accuracy for classification tasks. However, they tend to make over-confident predictions in real-world settings, where domain shifting and out-of-distribution (OOD) examples exist. Most research on uncertainty estimation focuses on computer vision because it provides visual validation on uncertainty quality. However, few have been presented in the natural language process domain. Unlike Bayesian methods that indirectly infer uncertainty through weight uncertainties, current evidential uncertainty-based methods explicitly model the uncertainty of class probabilities through subjective opinions. They further consider inherent uncertainty in data with different root causes, vacuity (i.e., uncertainty due to a lack of evidence) and dissonance (i.e., uncertainty due to conflicting evidence). In our paper, we firstly apply evidential uncertainty in OOD detection for text classification tasks. We propose an inexpensive framework that adopts both auxiliary outliers and pseudo off-manifold samples to train the model with prior knowledge of a certain class, which has high vacuity for OOD samples. Extensive empirical experiments demonstrate that our model based on evidential uncertainty outperforms other counterparts for detecting OOD examples. Our approach can be easily deployed to traditional recurrent neural networks and fine-tuned pre-trained transformers.

HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning Problem

Yun Hua,Xiangfeng Wang,Bo Jin,Wenhao Li,Junchi Yan,Xiaofeng He,Hongyuan Zha

In spite of the success of existing meta reinforcement learning methods, they still have difficulty in learning a meta policy effectively for RL problems with sparse reward. In this respect, we develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL), for sparse reward RL problems. It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments; the meta state based environment-specific meta reward shaping which effectively extends the original sparse reward trajectory by cross-environmental knowledge complementarity and as a consequence the meta policy achieves better generalization and efficiency with the shaped meta reward. Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.

Representation Learning on Knowledge Graphs for Node Importance Estimation

Han Huang,Leilei Sun,Bowen Du,Chuanren Liu,Weifeng Lv,Hui Xiong

In knowledge graphs, there are usually different types of nodes, multiple heterogeneous relations, and numerous attributes of nodes and edges, which impose the challenges on the task of Node Importance Estimation (NIE). Indeed, existing NIE approaches, such as PageRank (PR) and Node-Degree (ND), are not designed for handling knowledge graphs with the rich information related with these multifarious nodes and edges. To this end, in this paper, we propose a representation learning framework to leverage the rich information inherent in these multifarious nodes and edges for improving node importance estimation in knowledge graphs. Specifically, we provide a Relational Graph Transformer Network (RGTN), where a relational graph transformer is first proposed to propagate node information with the consideration of semantic predicate representations. Here, the assumption is that different predicates may have distinct effects on the transmission of node importance. Then, two separate encoders are designed to capture both the structural and semantic information of nodes respectively, and a co-attention module is developed to fuse the two separate representations of nodes. Next, an attention-based aggregation module is adopted to map the representations of nodes to their importance values. In addition, a learning-to-rank loss is designed to ensure that the learned representations can be aware of the relative ranking information among nodes. Finally, extensive experiments have been conducted on real-world knowledge graphs, and the results illustrate that our model outperforms the existing methods consistently for all the evaluation metrics. The code and the data are available at https://github.com/GRAPH-0/RGTN-NIE.

Metric Learning via Penalized Optimization

Hao Huang,Yanan Peng,Ting Gan,Weiping Tu,Ruiting Zhou,Sai Wu

Metric learning aims to project original data into a new space, where data points can be classified more accurately using kNN or similar types of classification algorithms. To avoid trivial learning results such as indistinguishably projecting the data onto a line, many existing approaches formulate metric learning as a constrained optimization problem, like finding a metric that minimizes the distance between data points from the same class, with a constraint of ensuring a certain separation for data points from different classes, and then they approximate the optimal solution to the constrained optimization in an iterative way. In order to improve the classification accuracy as much as possible, we try to find a metric that is able to minimize the intra-class distance and maximize the inter-class distance simultaneously. Towards this, we formulate metric learning as a penalized optimization problem, and provide design guideline, paradigms with a general formula, as well as two representative instantiations for the penalty term. In addition, we provide an analytical solution for the penalized optimization, with which costly computation can be avoid, and more importantly, there is no need to worry about the convergence rates or approximation ratios any more. Extensive experiments on real-world data sets are conducted, and the results verify the effectiveness and efficiency of our approach.

MixGCF: An Improved Training Method for Graph Neural Network-based Recommender Systems

Tinglin Huang,Yuxiao Dong,Ming Ding,Zhen Yang,Wenzheng Feng,Xinyu Wang,Jie Tang

Graph neural networks (GNNs) have recently emerged as state-of-the-art collaborative filtering (CF) solution. A fundamental challenge of CF is to distill negative signals from the implicit feedback, but negative sampling in GNN-based CF has been largely unexplored. In this work, we propose to study negative sampling by leveraging both the user-item graph structure and GNNs aggregation process. We present the MixGCF method---a general negative sampling plugin that can be directly used to train GNN-based recommender systems. In MixGCF, rather than sampling raw negatives from data, we design the hop mixing technique to synthesize hard negatives. Specifically, the idea of hop mixing is to generate the synthetic negative by aggregating embeddings from different layers of raw negatives neighborhoods. The layer and neighborhood selection process are optimized by a theoretically-backed hard selection strategy. Extensive experiments demonstrate that by using MixGCF, state-of-the-art GNN-based recommendation models can be consistently and significantly improved, e.g., 26% for NGCF and 22% for LightGCN in terms of NDCG@20.

Scaling Up Graph Neural Networks Via Graph Coarsening

Zengfeng Huang,Shengzhong Zhang,Chong Xi,Tang Liu,Min Zhou

Scalability of graph neural networks remains one of the major challenges in graph machine learning. Since the representation of a node is computed by recursively aggregating and transforming representation vectors of its neighboring nodes from previous layers, the receptive fields grow exponentially, which makes standard stochastic optimization techniques ineffective. Various approaches have been proposed to alleviate this issue, e.g., sampling-based methods and techniques based on pre-computation of graph filters.In this paper, we take a different approach and propose to use graph coarsening for scalable training of GNNs, which is generic, extremely simple and has sublinear memory and time costs during training. We present extensive theoretical analysis on the effect of using coarsening operations and provides useful guidance on the choice of coarsening methods. Interestingly, our theoretical analysis shows that coarsening can also be considered as a type of regularization and may improve the generalization. Finally, empirical results on real world datasets show that, simply applying off-the-shelf coarsening methods, we can reduce the number of nodes by up to a factor of ten without causing a noticeable downgrade in classification accuracy.

A Broader Picture of Random-walk Based Graph Embedding

Zexi Huang,Arlei Silva,Ambuj Singh

Graph embedding based on random-walks supports effective solutions for many graph-related downstream tasks. However, the abundance of embedding literature has made it increasingly difficult to compare existing methods and to identify opportunities to advance the state-of-the-art. Meanwhile, existing work has left several fundamental questions---such as how embeddings capture different structural scales and how they should be applied for effective link prediction---unanswered. This paper addresses these challenges with an analytical framework for random-walk based graph embedding that consists of three components: a random-walk process, a similarity function, and an embedding algorithm. Our framework not only categorizes many existing approaches but naturally motivates new ones. With it, we illustrate novel ways to incorporate embeddings at multiple scales to improve downstream task performance. We also show that embeddings based on autocovariance similarity, when paired with dot product ranking for link prediction, outperform state-of-the-art methods based on Pointwise Mutual Information similarity by up to 100%.

DisenQNet: Disentangled Representation Learning for Educational Questions

Zhenya Huang,Xin Lin,Hao Wang,Qi Liu,Enhong Chen,Jianhui Ma,Yu Su,Wei Tong

Learning informative representations for educational questions is a fundamental problem in online learning systems, which can promote many applications, e.g., difficulty estimation. Most solutions integrate all information of one question together following a supervised manner, where the representation results are unsatisfactory sometimes due to the following issues. First, they cannot ensure the presentation ability due to the scarcity of labeled data. Then, the label-dependent representation results have poor feasibility to be transferred. Moreover, aggregating all information into the unified may introduce some noises in applications since it cannot distinguish the diverse characteristics of questions. In this paper, we aim to learn the disentangled representations of questions. We propose a novel unsupervised model, namely DisenQNet, to divide one question into two parts, i.e., a concept representation that captures its explicit concept meaning and an individual representation that preserves its personal characteristics. We achieve this goal via mutual information estimation by proposing three self-supervised estimators in a large unlabeled question corpus. Then, we propose another enhanced model, DisenQNet+, that transfers the representation knowledge from unlabeled questions to labeled questions in specific applications by maximizing the mutual information between both. Extensive experiments on real-world datasets demonstrate that DisenQNet can generate effective and meaningful disentangled representations for questions, and furthermore, DisenQNet+ can improve the performance of different applications.

Coupled Graph ODE for Learning Interacting System Dynamics

Zijie Huang,Yizhou Sun,Wei Wang

Many real-world systems such as social networks and moving planets are dynamic in nature, where a set of coupled objects are connected via the interaction graph and exhibit complex behavior along the time. For example, the COVID-19 pandemic can be considered as a dynamical system, where objects represent geographical locations (e.g., states) whose daily confirmed cases of infection evolve over time. Outbreak at one location may influence another location as people travel between these locations, forming a graph. Thus, how to model and predict the complex dynamics for these systems becomes a critical research problem. Existing work on modeling graph-structured data mostly assumes a static setting. How to handle dynamic graphs remains to be further explored. On one hand, features of objects change over time, influenced by the linked objects in the interaction graph. On the other hand, the graph itself can also evolve, where new interactions (links) may form and existing links may drop, which may in turn be affected by the dynamic features of objects. In this paper, we propose coupled graph ODE: a novel latent ordinary differential equation (ODE) generative model that learns the coupled dynamics of nodes and edges with a graph neural network (GNN) based ODE in a continuous manner. Our model consists of two coupled ODE functions for modeling the dynamics of edges and nodes based on their latent representations respectively. It employs a novel encoder parameterized by a GNN for inferring the initial states from historical data, which serves as the starting point of the predicted latent trajectories. Experiment results on the COVID-19 dataset and the simulated social network dataset demonstrate the effectiveness of our proposed method.

TrajNet: A Trajectory-Based Deep Learning Model for Traffic Prediction

Bo Hui,Da Yan,Haiquan Chen,Wei-Shinn Ku

Ridesharing companies such as Ube and DiDi provide ride-hailing services where passengers and drivers are matched via mobile apps. As a result, large amounts of vehicle trajectories and vehicle speed data are collected that can be used for traffic prediction. The recent popularity of graph convolutional networks (GCNs) has opened up new possibilities for real-time traffic prediction and many GCN-based models have been proposed to capture the spatial correlation on the urban road network. However, the graph-based approaches fail to capture the intricate dependencies of consecutive road segments that are well captured by trajectories.Instead of proposing yet another GCN-based model for traffic prediction, we propose a novel deep learning model that treats vehicle trajectories as first-class citizens. Our model, called TrajNet, captures the spatial dependency of traffic flow by propagating information along real trajectories. To improve training efficiency, we organize the multiple trajectories in a batch used for training with a trie structure, to reuse shared computation. TrajNet uses a spatial attention mechanism to adaptively capture the dynamic correlations between different road segments, and dilated causal convolution to capture long-range temporal dependency. We also resolve the inconsistency between the fine-grained road segment coverage by trajectories, and the ground-truth traffic data that are coarse-grained, following a trajectory-based refinement framework. Extensive experiments on real traffic datasets validate the performance superiority of TrajNet over the state-of-the-art GCN-based models.

Fast and Memory-Efficient Tucker Decomposition for Answering Diverse Time Range Queries

Jun-Gi Jang,U Kang

Given a temporal dense tensor and an arbitrary time range, how can we efficiently obtain latent factors in the range? Tucker decomposition is a fundamental tool for analyzing dense tensors to discover hidden factors, and has been exploited in many data mining applications. However, existing decomposition methods do not provide the functionality to analyze a specific range of a temporal tensor. The existing methods are one-off, with the main focus on performing Tucker decomposition once for a whole input tensor. Although a few existing methods with a preprocessing phase can deal with a time range query, they are still time-consuming and suffer from low accuracy. In this paper, we propose Zoom-Tucker, a fast and memory-efficient Tucker decomposition method for finding hidden factors of temporal tensor data in an arbitrary time range. Zoom-Tucker fully exploits block structure to compress a given tensor, supporting an efficient query and capturing local information. Zoom-Tucker answers diverse time range queries quickly and memory-efficiently, by elaborately decoupling the preprocessed results included in the range and carefully determining the order of computations. We demonstrate that Zoom-Tucker is up to 171.9x faster and requires up to 230x less space than existing methods while providing comparable accuracy.

ACE-NODE: Attentive Co-Evolving Neural Ordinary Differential Equations

Sheo Yon Jhin,Minju Jo,Taeyong Kong,Jinsung Jeon,Noseong Park

Neural ordinary differential equations (NODEs) presented a new paradigm to construct (continuous-time) neural networks. While showing several good characteristics in terms of the number of parameters and the flexibility in constructing neural networks, they also have a couple of well-known limitations: i) theoretically NODEs learn homeomorphic mapping functions only, and ii) sometimes NODEs show numerical instability in solving integral problems. To handle this, many enhancements have been proposed. To our knowledge, however, integrating attention into NODEs has been overlooked for a while. To this end, we present a novel method of attentive dual co-evolving NODE (ACE-NODE): one main NODE for a downstream machine learning task and the other for providing attention to the main NODE. Our ACE-NODE supports both pairwise and elementwise attention. In our experiments, our method outperforms existing NODE-based and non-NODE-based baselines in almost all cases by non-trivial margins.

Cross-Network Learning with Partially Aligned Graph Convolutional Networks

Meng Jiang

Graph neural networks have been widely used for learning representations of nodes for many downstream tasks on graph data. Existing models were designed for the nodes on a single graph, which would not be able to utilize information across multiple graphs. The real world does have multiple graphs where the nodes are often partially aligned. For examples, knowledge graphs share a number of named entities though they may have different relation schema; collaboration networks on publications and awarded projects share some researcher nodes who are authors and investigators, respectively; people use multiple web services, shopping, tweeting, rating movies, and some may register the same email account across the platforms. In this paper, I propose partially aligned graph convolutional networks to learn node representations across the models. I investigate multiple methods (including model sharing, regularization, and alignment reconstruction) as well as theoretical analysis to positively transfer knowledge across the (small) set of partially aligned nodes. Extensive experiments on real-world knowledge graphs and collaboration networks show the superior performance of our proposed methods on relation classification and link prediction.

Pre-training on Large-Scale Heterogeneous Graph

Xunqiang Jiang,Tianrui Jia,Yuan Fang,Chuan Shi,Zhe Lin,Hui Wang

Graph neural networks (GNNs) emerge as the state-of-the-art representation learning methods on graphs and often rely on a large amount of labeled data to achieve satisfactory performance. Recently, in order to relieve the label scarcity issues, some works propose to pre-train GNNs in a self-supervised manner by distilling transferable knowledge from the unlabeled graph structures. Unfortunately, these pre-training frameworks mainly target at homogeneous graphs, while real interaction systems usually constitute large-scale heterogeneous graphs, containing different types of nodes and edges, which leads to new challenges on structure heterogeneity and scalability for graph pre-training. In this paper, we first study the problem of pre-training on large-scale heterogeneous graph and propose a novel pre-training GNN framework, named PT-HGNN. The proposed PT-HGNN designs both the node- and schema-level pre-training tasks to contrastively preserve heterogeneous semantic and structural properties as a form of transferable knowledge for various downstream tasks. In addition, a relationbased personalized PageRank is proposed to sparsify large-scale heterogeneous graph for efficient pre-training. Extensive experiments on one of the largest public heterogeneous graphs (OAG) demonstrate that our PT-HGNN significantly outperforms various state-of-the-art baselines.

Learning to Walk across Time for Interpretable Temporal Knowledge Graph Completion

Jaehun Jung,Jinhong Jung,U Kang

Static knowledge graphs (KGs), despite their wide usage in relational reasoning and downstream tasks, fall short of realistic modeling of knowledge and facts that are only temporarily valid. Compared to static knowledge graphs, temporal knowledge graphs (TKGs) inherently reflect the transient nature of real-world knowledge. Naturally, automatic TKG completion has drawn much research interests for a more realistic modeling of relational reasoning. However, most of the existing models for TKG completion extend static KG embeddings that do not fully exploit TKG structure, thus lacking in 1) accounting for temporally relevant events already residing in the local neighborhood of a query, and 2) path-based inference that facilitates multi-hop reasoning and better interpretability. In this paper, we propose T-GAP, a novel model for TKG completion that maximally utilizes both temporal information and graph structure in its encoder and decoder. T-GAP encodes query-specific substructure of TKG by focusing on the temporal displacement between each event and the query timestamp, and performs path-based inference by propagating attention through the graph. Our empirical experiments demonstrate that T-GAP not only achieves superior performance against state-of-the-art baselines, but also competently generalizes to queries with unseen timestamps. Through extensive qualitative analyses, we also show that T-GAP enjoys transparent interpretability, and follows human intuition in its reasoning process.

A Hyper-surface Arrangement Model of Ranking Distributions

Shizuo Kaji,Akira Horiguchi,Takuro Abe,Yohsuke Watanabe

A distribution on the permutations over a fixed finite set is called a ranking distribution. Modelling ranking distributions is one of the major topics in preference learning as such distributions appear as the ranking data produced by many judges. In this paper, we propose a geometric model for ranking distributions. Our idea is to use hyper-surface arrangements in a metric space as the representation space, where each component cut out by hyper-surfaces corresponds to a total ordering, and its volume is proportional to the probability. In this setting, the union of components corresponds to a partial ordering and its probability is also estimated by the volume. Similarly, the probability of a partial ordering conditioned by another partial ordering is estimated by the ratio of volumes. We provide a simple iterative algorithm to fit our model to a given dataset. We show our model can represent the distribution of a real-world dataset faithfully and can be used for prediction and visualisation purposes.

Topology Distillation for Recommender System

SeongKu Kang,Junyoung Hwang,Wonbin Kweon,Hwanjo Yu

Recommender Systems (RS) have employed knowledge distillation which is a model compression technique training a compact student model with the knowledge transferred from a pre-trained large teacher model. Recent work has shown that transferring knowledge from the teachers intermediate layer significantly improves the recommendation quality of the student. However, they transfer the knowledge of individual representation point-wise and thus have a limitation in that primary information of RS lies in the relations in the representation space. This paper proposes a new topology distillation approach that guides the student by transferring the topological structure built upon the relations in the teacher space. We first observe that simply making the student learn the whole topological structure is not always effective and even degrades the students performance. We demonstrate that because the capacity of the student is highly limited compared to that of the teacher, learning the whole topological structure is daunting for the student. To address this issue, we propose a novel method named Hierarchical Topology Distillation (HTD) which distills the topology hierarchically to cope with the large capacity gap. Our extensive experiments on real-world datasets show that the proposed method significantly outperforms the state-of-the-art competitors. We also provide in-depth analyses to ascertain the benefit of distilling the topology for RS.

Learning to Embed Categorical Features without Embedding Tables for Recommendation

Wang-Cheng Kang,Derek Zhiyuan Cheng,Tiansheng Yao,Xinyang Yi,Ting Chen,Lichan Hong,Ed H. Chi

Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.

Joint Graph Embedding and Alignment with Spectral Pivot

Paris A. Karakasis,Aritra Konar,Nicholas D. Sidiropoulos

Graphs are powerful abstractions that naturally capture the wealth of relationships in our interconnected world. This paper proposes a new approach for graph alignment, a core problem in graph mining. Classical (e.g., spectral) methods use fixed embeddings for both graphs to perform the alignment. In contrast, the proposed approach fixes the embedding of the target graph and jointly optimizes the embedding transformation and the alignment of the query graph. An alternating optimization algorithm is proposed for computing high-quality approximate solutions and compared against the prevailing state-of-the-art graph aligning frameworks using benchmark real-world graphs. The results indicate that the proposed formulation can offer significant gains in terms of matching accuracy and robustness to noise relative to existing solutions for this hard but important problem.

Auditing for Diversity Using Representative Examples

Vijay Keswani,L. Elisa Celis

Assessing the diversity of a dataset of information associated with people is crucial before using such data for downstream applications. For a given dataset, this often involves computing the imbalance or disparity in the empirical marginal distribution of a protected attribute (e.g. gender, dialect, etc.). However, real-world datasets, such as images from Google Search or collections of Twitter posts, often do not have protected attributes labeled. Consequently, to derive disparity measures for such datasets, the elements need to hand-labeled or crowd-annotated, which are expensive processes.We propose a cost-effective approach to approximate the disparity of a given unlabeled dataset, with respect to a protected attribute, using a control set of labeled representative examples. Our proposed algorithm uses the pairwise similarity between elements in the dataset and elements in the control set to effectively bootstrap an approximation to the disparity of the dataset. Importantly, we show that using a control set whose size is much smaller than the size of the dataset is sufficient to achieve a small approximation error. Further, based on our theoretical framework, we also provide an algorithm to construct adaptive control sets that achieve smaller approximation errors than randomly chosen control sets. Simulations on two image datasets and one Twitter dataset demonstrate the efficacy of our approach (using random and adaptive control sets) in auditing the diversity of a wide variety of datasets.

Q-Learning Lagrange Policies for Multi-Action Restless Bandits

Jackson A. Killian,Arpita Biswas,Sanket Shah,Milind Tambe

Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which N independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We address this restrictive assumption, designing the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning. Our first approach, MAIQL, extends a method for Q-learning the Whittle index in binary-action RMABs to the multi-action setting. We derive a generalized update rule and convergence proof and establish that, under standard assumptions, MAIQL converges to the asymptotically optimal multi-action RMAB policy as t u2192 u221e. However, MAIQL relies on learning Q-functions and indexes on two timescales which leads to slow convergence and requires problem structure to perform well. Thus, we design a second algorithm, LPQL, which learns the well-performing and more general Lagrange policy for multi-action RMABs by learning to minimize the Lagrange bound through a variant of Q-learning. To ensure fast convergence, we take an approximation strategy that enables learning on a single timescale, then give a guarantee relating the approximations precision to an upper bound of LPQLs return as t u2192 u221e. Finally, we show that our approaches always outperform baselines across multiple settings, including one derived from real-world medication adherence data.

Nicolas Klodt,Lars Seifert,Arthur Zahn,Katrin Casel,Davis Issac,Tobias Friedrich

Chromatic Correlation Clustering (CCC) models clustering of objects with categorical pairwise relationships. The model can be viewed as clustering the vertices of a graph with edge-labels (colors). Bonchi et al. [KDD 2012] introduced it as a natural generalization of the well studied problem Correlation Clustering (CC), motivated by real-world applications from data-mining, social networks and bioinformatics. We give theoretical as well as practical contributions to the study of CCC. Our main theoretical contribution is an alternative analysis of the famous Pivot algorithm for CC. We show that, when simply run color-blind, Pivot is also a linear time 3-approximation for CCC. The previous best theoretical results for CCC were a 4-approximation with a high-degree polynomial runtime and a linear time 11-approximation, both by Anava et al. [WWW 2015]. While this theoretical result justifies Pivot as a baseline comparison for other heuristics, its blunt color-blindness performs poorly in practice. We develop a color-sensitive, practical heuristic we call Greedy Expansion that empirically outperforms all heuristics proposed for CCC so far, both on real-world and synthetic instances. Further, we propose a novel generalization of CCC allowing for multi-labelled edges. We argue that it is more suitable for many of the real-world applications and extend our results to this model.

Fast Rotation Kernel Density Estimation over Data Streams

Runze Lei,Pinghui Wang,Rundong Li,Peng Jia,Junzhou Zhao,Xiaohong Guan,Chao Deng

Kernel density estimation method is a powerful tool and is widely used in many important real-world applications such as anomaly detection and statistical learning. Unfortunately, current kernel methods suffer from high computational or space costs when dealing with large-scale, high-dimensional datasets, especially when the datasets of interest are given in a stream fashion. Although there are sketch methods designed for kernel density estimation over data streams, they still suffer from high computational costs. To address this problem, in this paper, we propose a novel Rotation Kernel. The Rotation Kernel is based on a Rotation Hash method and is much faster to compute. To achieve memory-efficient kernel density estimation over data streams, we design a method, RKD-Sketch, which compresses high dimensional data streams into a small array of integer counters. We conduct extensive experiments on both synthetic and real-world datasets, and experimental results demonstrate that our RKD-Sketch saves up to 216 times computational resources and up to 104 times space resources than state-of-the-arts. Furthermore, we apply our Rotation Kernel in active learning. Results show that our method achieves up to 256 times speedup and saves up to 13 times space to achieve the same accuracy as the baseline methods.

Dip-based Deep Embedded Clustering with k-Estimation

Collin Leiber,Lena G. M. Bauer,Benjamin Schelling,Christian Bu00f6hm,Claudia Plant

The combination of clustering with Deep Learning has gained much attention in recent years. Unsupervised neural networks like autoencoders can autonomously learn the essential structures in a data set. This idea can be combined with clustering objectives to learn relevant features automatically. Unfortunately, they are often based on a k-means framework, from which they inherit various assumptions, like spherical-shaped clusters. Another assumption, also found in approaches outside the k-means-family, is knowing the number of clusters a-priori. In this paper, we present the novel clustering algorithm DipDECK, which can estimate the number of clusters simultaneously to improving a Deep Learning-based clustering objective. Additionally, we can cluster complex data sets without assuming only spherically shaped clusters. Our algorithm works by heavily overestimating the number of clusters in the embedded space of an autoencoder and, based on Hartigans Dip-test - a statistical test for unimodality - analyses the resulting micro-clusters to determine which to merge. We show in extensive experiments the various benefits of our method: (1) we achieve competitive results while learning the clustering-friendly representation and number of clusters simultaneously; (2) our method is robust regarding parameters, stable in performance, and allows for more flexibility in the cluster shape; (3) we outperform relevant competitors in the estimation of the number of clusters.

Large-Scale Data-Driven Airline Market Influence Maximization

Duanshun Li,Jing Liu,Jinsung Jeon,Seoyoung Hong,Thai Le,Dongwon Lee,Noseong Park

We present a prediction-driven optimization framework to maximize the market influence in the US domestic air passenger transportation market by adjusting flight frequencies. At the lower level, our neural networks consider a wide variety of features, such as classical air carrier performance features and transportation network features, to predict the market influence. On top of the prediction models, we define a budget-constrained flight frequency optimization problem to maximize the market influence over 2,262 routes. This problem falls into the category of the non-linear optimization problem, which cannot be solved exactly by conventional methods. To this end, we present a novel adaptive gradient ascent (AGA) method. Our prediction models show two to eleven times better accuracy in terms of the median root-mean-square error (RMSE) over baselines. In addition, our AGA optimization method runs 690 times faster with a better optimization result (in one of our largest scale experiments) than a greedy algorithm.

Shapley Counterfactual Credits for Multi-Agent Reinforcement Learning

Jiahui Li,Kun Kuang,Baoxiang Wang,Furui Liu,Long Chen,Fei Wu,Jun Xiao

Centralized Training with Decentralized Execution (CTDE) has been a popular paradigm in cooperative Multi-Agent Reinforcement Learning (MARL) settings and is widely used in many real applications. One of the major challenges in the training process is credit assignment, which aims to deduce the contributions of each agent according to the global rewards. Existing credit assignment methods focus on either decomposing the joint value function into individual value functions or measuring the impact of local observations and actions on the global value function. These approaches lack a thorough consideration of the complicated interactions among multiple agents, leading to an unsuitable assignment of credit and subsequently mediocre results on MARL. We propose Shapley Counterfactual Credit Assignment, a novel method for explicit credit assignment which accounts for the coalition of agents. Specifically, Shapley Value and its desired properties are leveraged in deep MARL to credit any combinations of agents, which grants us the capability to estimate the individual credit for each agent. Despite this capability, the main technical difficulty lies in the computational complexity of Shapley Value who grows factorially as the number of agents. We instead utilize an approximation method via Monte Carlo sampling, which reduces the sample complexity while maintaining its effectiveness. We evaluate our method on StarCraft II benchmarks across different scenarios. Our method outperforms existing cooperative MARL algorithms significantly and achieves the state-of-the-art, with especially large margins on tasks with more severe difficulties.

A Difficulty-Aware Framework for Churn Prediction and Intervention in Games

Jiayu Li,Hongyu Lu,Chenyang Wang,Weizhi Ma,Min Zhang,Xiangyu Zhao,Wei Qi,Yiqun Liu,Shaoping Ma

Users leaving from the system without further return, called user churn, is a severe negative signal in online games. Therefore, churn prediction and intervention are of great value for improving players experiences and system performance. However, the problem has not been well-studied in the game scenario. Especially, some crucial factors, such as game difficulty, have not been considered for large-scale churn analysis. In this paper, a novel Difficulty-Aware Framework (DAF) for churn prediction and intervention is proposed. Firstly, a Difficulty Flow for each user is proposed, which is utilized to derive users Personalized Perceived Difficulty during the game process. Then, a survival analysis modelD-Cox-Time is designed to model the Dynamic Influence of Perceived Difficulty on player churn intention. Finally, thePersonalized Perceived Difficulty ~(PPD) andDynamic Difficulty Influence ~(DDI) are incorporated to churn prediction and intervention. The proposed DAF framework has been specified in a real-world puzzle game as an example for churn prediction and intervention. Extensive offline experiments show significant improvements in churn prediction by introducing difficulty-related features. Besides, we conduct an online intervention system to adjust difficulty dynamically in the online game. A/B test results verify that the proposed intervention system enhances user retention and engagement significantly. To the best of our knowledge, it is the first framework in games that illustrates an in-depth understanding and leveraging dynamic and personalized perceived difficulty during game playing, which is easy to be integrated with various churn prediction and intervention models.

Mitigating Performance Saturation in Neural Marked Point Processes: Architectures and Loss Functions

Tianbo Li,Tianze Luo,Yiping Ke,Sinno Jialin Pan

Attributed event sequences are commonly encountered in practice. A recent research line focuses on incorporating neural networks with the statistical model--marked point processes, which is the conventional tool for dealing with attributed event sequences. Neural marked point processes possess good interpretability of probabilistic models as well as the representational power of neural networks. However, we find that performance of neural marked point processes is not always increasing as the network architecture becomes more complicated and larger, which is what we call the performance saturation phenomenon. This is due to the fact that the generalization error of neural marked point processes is determined by both the network representational ability and the model specification at the same time. Therefore we can draw two major conclusions: first, simple network structures can perform no worse than complicated ones for some cases; second, using a proper probabilistic assumption is as equally, if not more, important as improving the complexity of the network. Based on this observation, we propose a simple graph-based network structure called GCHP, which utilizes only graph convolutional layers, thus it can be easily accelerated by the parallel mechanism. We directly consider the distribution of interarrival times instead of imposing a specific assumption on the conditional intensity function, and propose to use a likelihood ratio loss with a moment matching mechanism for optimization and model selection. Experimental results show that GCHP can significantly reduce training time and the likelihood ratio loss with interarrival time probability assumptions can greatly improve the model performance.

Efficient Collaborative Filtering via Data Augmentation and Step-size Optimization

Xuejun Liao,Patrick Koch,Shunping Huang,Yan Xu

As a popular approach to collaborative filtering, matrix factorization (MF) models the underlying rating matrix as a product of two factor matrices, one for users and one for items. The MF model can be learned by Alternating Least Squares (ALS), which updates the two factor matrices alternately, keeping one fixed while updating the other. Although ALS improves the learning objective aggressively in each iteration, it suffers from high computational cost due to the necessity of inverting a separate matrix for every user and item. The softImpute-ALS reduces the per-iteration computation significantly using a strategy that requires only two matrix inversions; however, the computation saving leads to shrinkage of objective improvement. In this paper, we introduce a new algorithm, termed Data Augmentation with Optimal Step-size (DAOS), which alleviates the drawback of softImpute-ALS while still maintaining its low cost of computation per iteration. The DAOS is presented in the context that factor matrices may include fixed columns or rows, with this allowing bias terms and/or linear models to be incorporated into the ML model. Experimental results on synthetic and MovieLens 1M Dataset demonstrate the benefits of DAOS over ALS and softImpute-ALS in terms of generalization performance and computational time.

Multi-view Correlation based Black-box Adversarial Attack for 3D Object Detection

Bingyu Liu,Yuhong Guo,Jianan Jiang,Jian Tang,Weihong Deng

Deep neural networks have made tremendous progress in 3D object detection, which is an important task especially in autonomous driving scenarios. Benefited from the breakthroughs in deep learning and sensor technologies, 3D object detection methods based on different sensors, such as camera and LiDAR, have developed rapidly. Meanwhile, more and more researches notice that the abundant information contained in the multi-view data can be used to obtain more accurate understanding of the 3D surrounding environment. Therefore, many sensor-fusion 3D object detection methods have been proposed. As safety is critical in autonomous driving and the deep neural networks are known to be vulnerable to adversarial examples with visually imperceptible perturbations, it is significant to investigate adversarial attacks for 3D object detection. Recent works have shown that both image-based and LiDAR-based networks can be attacked by the adversarial examples while the attacks to the sensor-fusion models, which tend to be more robust, havent been studied. To this end, we propose a simple multi-view correlation based adversarial attack method for the camera-LiDAR fusion 3D object detection models and focus on the black-box attack setting which is more practical in real-world systems. Specifically, we first design a generative network to generate image adversarial examples based on an auxiliary image semantic segmentation network. Then, we develop a cross-view perturbation projection method by exploiting the camera-LiDAR correlations to map each image adversarial example to the space of the point cloud data to form the point cloud adversarial examples in the LiDAR view. Extensive experiments on the KITTI dataset demonstrate the effectiveness of the proposed method.

ControlBurn: Feature Selection by Sparse Forests

Brian Liu,Miaolan Xie,Madeleine Udell

Tree ensembles distribute feature importance evenly amongst groups of correlated features. The average feature ranking of the correlated group is suppressed, which reduces interpretability and complicates feature selection. In this paper we present ControlBurn, a feature selection algorithm that uses a weighted LASSO-based feature selection method to prune unnecessary features from tree ensembles, just as low-intensity fire reduces overgrown vegetation. Like the linear LASSO, ControlBurn assigns all the feature importance of a correlated group of features to a single feature. Moreover, the algorithm is efficient and only requires a single training iteration to run, unlike iterative wrapper-based feature selection methods. We show that ControlBurn performs substantially better than feature selection methods with comparable computational costs on datasets with correlated features.

Reinforced Anchor Knowledge Graph Generation for News Recommendation Reasoning

Danyang Liu,Jianxun Lian,Zheng Liu,Xiting Wang,Guangzhong Sun,Xing Xie

News recommendation systems play a key role in online news reading service. Knowledge graphs (KG), which contain comprehensive structural knowledge, are well known for their potential to enhance both accuracy and explainability. While existing works intensively study using KG to improve news recommendation accuracy, using KG for news recommendation reasoning has not been fully explored. A few works such as KPRN [18], [22] and ADAC [25] have discussed knowledge reasoning in some other recommendation domains such as music or movie, but their methods are not practical for the news. How to make reasoning scalable to generic KGs, easy to deploy for real-time serving and meanwhile elastic for both recall and ranking stages remains an open question.In this paper, we fill the research gap by proposing a novel recommendation reasoning paradigm AnchorKG. For each article, AnchorKG generates a compact Anchor Knowledge G raph, which corresponds to a subset of entities and their k-hop neighbors in the KG, restoring the most important knowledge information of the article. On one hand, the anchor graph can be used to enhance the latent representation of the article. On the other hand, the interaction between two anchor graphs can be used for reasoning. We develop a reinforcement learning-based framework to train the anchor graph generator, in which there are three major components, including the joint learning of recommendation and reasoning, sophisticated reward signals, and a warm-up learning stage. We conduct experiments on one public dataset and one private dataset. Results demonstrate that the AnchorKG framework not only improves recommendation accuracy, but also provides high quality knowledge-aware reasoning. We release the source code at https://github.com/danyang-liu/AnchorKG .

Signed Graph Neural Network with Latent Groups

Haoxin Liu,Ziwei Zhang,Peng Cui,Yafeng Zhang,Qiang Cui,Jiashuo Liu,Wenwu Zhu

Signed graph representation learning is an effective approach to analyze the complex patterns in real-world signed graphs with the co-existence of positive and negative links. Most previous signed graph representation learning methods resort to balance theory, a classic social theory that originated from psychology as the core assumption. However, since balance theory is shown equivalent to a simple assumption that nodes can be divided into two conflicting groups, it fails to model the structure of real signed graphs. To solve this problem, we propose Group Signed Graph Neural Network (GS-GNN) model for signed graph representation learning beyond the balance theory assumption. GS-GNN has a dual GNN architecture that consists of the global and the local module. In the global module, we adopt a more generalized assumption that nodes can be divided into multiple latent groups and that the groups can have arbitrary relations and propose a novel prototype-based GNN to learn node representations based on the assumption. In the local module, to give the model enough flexibility in modeling other factors, we do not make any prior assumptions, treat positive links and negative links as two independent relations, and adopt a relational GNN to learn node representations. Both modules can complement each other, and the concatenation of two modules is fed into downstream tasks. Extensive experimental results demonstrate the effectiveness of our GS-GNN model on both synthetic and real-world signed graphs by greatly and consistently outperforming all the baselines and achieving new state-of-the-art results. Our implementation is available in PyTorch.

NewsEmbed: Modeling News through Pre-trained Document Representations

Jialu Liu,Tianqi Liu,Cong Yu

Effectively modeling text-rich fresh content such as news articles at document-level is a challenging problem. To ensure a content-based model generalize well to a broad range of applications, it is critical to have a training dataset that is large beyond the scale of human labels while achieving desired quality. In this work, we address those two challenges by proposing a novel approach to mine semantically-relevant fresh documents, and their topic labels, with little human supervision. Meanwhile, we design a multitask model called NewsEmbed that alternatively trains a contrastive learning with a multi-label classification to derive a universal document encoder. We show that the proposed approach can provide billions of high quality organic training examples and can be naturally extended to multilingual setting where texts in different languages are encoded in the same semantic space. We experimentally demonstrate NewsEmbeds competitive performance across multiple natural language understanding tasks, both supervised and unsupervised.

Neural-Answering Logical Queries on Knowledge Graphs

Lihui Liu,Boxin Du,Heng Ji,ChengXiang Zhai,Hanghang Tong

Logical queries constitute an important subset of questions posed in knowledge graph question answering systems. Yet, effectively answering logical queries on large knowledge graphs remains a highly challenging problem. Traditional subgraph matching based methods might suffer from the noise and incompleteness of the underlying knowledge graph, often with a prolonged online response time. Recently, an alternative type of method has emerged whose key idea is to embed knowledge graph entities and the query in an embedding space so that the embedding of answer entities is close to that of the query. Compared with subgraph matching based methods, it can better handle the noisy or missing information in knowledge graph, with a faster online response. Promising as it might be, several fundamental limitations still exist, including the linear transformation assumption for modeling relations and the inability to answer complex queries with multiple variable nodes. In this paper, we propose an embedding based method (NewLook) to address these limitations. Our proposed method offers three major advantages. First (Applicability), it supports four types of logical operations and can answer queries with multiple variable nodes. Second (Effectiveness), the proposed NewLook goes beyond the linear transformation assumption, and thus consistently outperforms the existing methods. Third (Efficiency), compared with subgraph matching based methods, NewLook is at least 3 times faster in answering the queries; compared with the existing embed-ding based methods, NewLook bears a comparable or even faster online response and offline training time.

Online Additive Quantization

Qi Liu,Jin Zhang,Defu Lian,Yong Ge,Jianhui Ma,Enhong Chen

Approximate nearest neighbor search (ANNs) plays an important role in many applications ranging from information retrieval, recommender systems to machine translation. Several ANN indexes, such as hashing and quantization, have been designed to update for the evolving database, but there exists a remarkable performance gap between them and retrained indexes on the entire database. To close the gap, we propose an online additive quantization algorithm (online AQ) to dynamically update quantization codebooks with the incoming streaming data. Then we derive the regret bound to theoretically guarantee the performance of the online AQ algorithm. Moreover, to improve the learning efficiency, we develop a randomized block beam search algorithm for assigning each data to the codewords of the codebook. Finally, we extensively evaluate the proposed online AQ algorithm on four real-world datasets, showing that it remarkably outperforms the state-of-the-art baselines.

Tail-GNN: Tail-Node Graph Neural Networks

Zemin Liu,Trung-Kien Nguyen,Yuan Fang

The prevalence of graph structures in real-world scenarios enables important tasks such as node classification and link prediction. Graphs in many domains follow a long-tailed distribution in their node degrees, i.e., a significant fraction of nodes are tail nodes with a small degree. Although recent graph neural networks (GNNs) can learn powerful node representations, they treat all nodes uniformly and are not tailored to the large group of tail nodes. In particular, there is limited structural information (i.e., links) on tail nodes, resulting in inferior performance. Toward robust tail node embedding, in this paper we propose a novel graph neural network called Tail-GNN. It hinges on the novel concept of transferable neighborhood translation, to model the variable ties between a target node and its neighbors. On one hand, Tail-GNN learns a neighborhood translation from the structurally rich head nodes (i.e., high-degree nodes), which can be further transferred to the structurally limited tail nodes to enhance their representations. On the other hand, the ties with the neighbors are variable across different parts of the graph, and a global neighborhood translation is inflexible. Thus, we devise a node-wise adaptation to localize the global translation w.r.t. each node. Extensive experiments on five benchmark datasets demonstrate that our proposed Tail-GNN significantly outperforms the state-of-the-art baselines.

Dialogue Based Disease Screening Through Domain Customized Reinforcement Learning

Zhuo Liu,Yanxuan Li,Xingzhi Sun,Fei Wang,Gang Hu,Guotong Xie

In this paper, we study the problem of leveraging dialogue agents learned from reinforcement learning (RL) that can interact with patients for automatic disease screening. This application requires efficient and effective inquiry of appropriate symptoms to make accurate diagnosis recommendations. Existing studies have tried to use RL to perform both symptom inquiry and diagnosis simultaneously, which needs to deal with a large, heterogeneous action space that affects the learning efficiency and effectiveness. To address the challenge, we propose to leverage the models learned from the dialogue data to customize the settings of the reinforcement learning for more efficient action space exploration. In particular, a supervised diagnosis model is built and involved in the definition of state and reward. We also develop the clustering method to form a hierarchy in the action space. These customizations can make the learning task focus on checking the most relevant symptoms, which effectively boost the confidence of diagnosis. Besides, a novel hierarchical reinforcement learning framework with the pretraining strategy is used to reduce the dimension of action space and help the model to converge. For empirical evaluations, we conduct extensive experiments on both synthetic and real-world datasets. The results have demonstrated the superiority of our approach in diagnostic accuracy and interaction efficiency compared with other baseline methods.

HGK-GNN: Heterogeneous Graph Kernel based Graph Neural Networks

Qingqing Long,Lingjun Xu,Zheng Fang,Guojie Song

While Graph Neural Networks (GNNs) have achieved remarkable results in a variety of applications, recent studies exposed important shortcomings in their ability to capture heterogeneous structures and attributes of an underlying graph. Furthermore, though many Heterogeneous GNN (HGNN) variants have been proposed and have achieved state-of-the-art results, there are limited theoretical understandings of their properties. To this end, we introduce graph kernel to HGNNs and develop a Heterogeneous Graph Kernel-based Graph Neural Networks (HGK-GNN). Specifically, we incorporate the Mahalanobis distance (MD) to build a Heterogeneous Graph Kernel (HGK), and incorporating it into deep neural architectures, thus leveraging a heterogeneous GNN with a heterogeneous aggregation scheme. Also, we mathematically bridge HGK-GNN to metapath-based HGNNs, which are the most popular and effective variants of HGNNs. We theoretically analyze HGK-GNN with the indispensable Encoder and Aggregator component in metapath-based HGNNs, through which we provide a theoretical perspective to understand the most popular HGNNs. To the best of our knowledge, we are the first to introduce HGK into the field of HGNNs, and mark a first step in the direction of theoretically understanding and analyzing HGNNs. Correspondingly, both graph and node classification experiments are leveraged to evaluate HGK-GNN, where HGK-GNN outperforms a wide range of baselines on six real-world datasets, endorsing the analysis.

Are we really making much progress?: Revisiting, benchmarking and refining heterogeneous graph neural networks

Qingsong Lv,Ming Ding,Qiang Liu,Yuxiang Chen,Wenzheng Feng,Siming He,Chang Zhou,Jianguo Jiang,Yuxiao Dong,Jie Tang

Heterogeneous graph neural networks (HGNNs) have been blossoming in recent years, but the unique data processing and evaluation setups used by each work obstruct a full understanding of their advancements. In this work, we present a systematical reproduction of 12 recent HGNNs by using their official codes, datasets, settings, and hyperparameters, revealing surprising findings about the progress of HGNNs. We find that the simple homogeneous GNNs, e.g., GCN and GAT, are largely underestimated due to improper settings. GAT with proper inputs can generally match or outperform all existing HGNNs across various scenarios. To facilitate robust and reproducible HGNN research, we construct the Heterogeneous Graph Benchmark (HGB) , consisting of 11 diverse datasets with three tasks. HGB standardizes the process of heterogeneous graph data splits, feature processing, and performance evaluation. Finally, we introduce a simple but very strong baseline Simple-HGN-which significantly outperforms all previous models on HGB-to accelerate the advancement of HGNNs in the future.

Graph Adversarial Attack via Rewiring

Yao Ma,Suhang Wang,Tyler Derr,Lingfei Wu,Jiliang Tang

Graph Neural Networks (GNNs) have demonstrated their powerful capability in learning representations for graph-structured data. Consequently, they have enhanced the performance of many graph-related tasks such as node classification and graph classification. However, it is evident from recent studies that GNNs are vulnerable to adversarial attacks. Their performance can be largely impaired by deliberately adding carefully created unnoticeable perturbations to the graph. Existing attacking methods often produce perturbation by adding/deleting a few edges, which might be noticeable even when the number of modified edges is small. In this paper, we propose a graph rewiring operation to perform the attack. It can affect the graph in a less noticeable way compared to existing operations such as adding/deleting edges. We then utilize deep reinforcement learning to learn the strategy to effectively perform the rewiring operations. Experiments on real-world graphs demonstrate the effectiveness of the proposed framework. To understand the proposed framework, we further analyze how its generated perturbation impacts the target model and the advantages of the rewiring operations. The implementation of the proposed framework is available at https://github.com/alge24/ReWatt.

BLOCKSET (Block-Aligned Serialized Trees): Reducing Inference Latency for Tree ensemble Deployment

Meghana Madhyastha,Kunal Lillaney,James Browne,Joshua T. Vogelstein,Randal Burns

We present methods to serialize and deserialize gradient-boosted trees and random forests that optimize inference latency when models are not loaded into memory. This arises when models are larger than memory, but also systematically when models are deployed on low-resource devices in the Internet of Things or run as cloud microservices where resources are allocated on demand. Block-Aligned Serialized Trees (BLOCKSET) introduce the concept of selective access for random forests and gradient boosted trees in which only the parts of the model needed for inference are deserialized and loaded into memory. %BLOCKSET combines concepts from external memory algorithms and data-parallel %layouts of random forests that maximize I/O-density for in-memory models. Using principles from external memory algorithms, we block-align the serialization format in order to minimize the number of I/Os. For gradient boosted trees, this results in a more than five time reduction in inference latency over layouts that do not perform selective access and a 2 times latency reduction over techniques that are selective, but do not encode I/O block boundaries in the layout.

Needle in a Haystack: Label-Efficient Evaluation under Extreme Class Imbalance

Neil G. Marchant,Benjamin I. P. Rubinstein

Important tasks like record linkage and extreme classification demonstrate extreme class imbalance, with 1 minority instance to every 1 million or more majority instances. Obtaining a sufficient sample of all classes, even just to achieve statistically-significant evaluation, is so challenging that most current approaches yield poor estimates or incur impractical cost. Where importance sampling has been levied against this challenge, restrictive constraints are placed on performance metrics, estimates do not come with appropriate guarantees, or evaluations cannot adapt to incoming labels. This paper develops a framework for online evaluation based on adaptive importance sampling. Given a target performance metric and model for p(y|x), the framework adapts a distribution over items to label in order to maximize statistical precision. We establish strong consistency and a central limit theorem for the resulting performance estimates, and instantiate our framework with worked examples that leverage Dirichlet-tree models. Experiments demonstrate an average MSE superior to state-of-the-art on fixed label budgets.

Temporal Graph Signal Decomposition

Maxwell J. McNeil,Lin Zhang,Petko Bogdanov

Temporal graph signals are multivariate time series with individual components associated with nodes of a fixed graph structure. Data of this kind arises in many domains including activity of social network users, sensor network readings over time, and time course gene expression within the interaction network of a model organism. Traditional matrix decomposition methods applied to such data fall short of exploiting structural regularities encoded in the underlying graph and also in the temporal patterns of the signal. How can we take into account such structure to obtain a succinct and interpretable representation of temporal graph signals?We propose a general, dictionary-based framework for temporal graph signal decomposition (TGSD). The key idea is to learn a low-rank, joint encoding of the data via a combination of graph and time dictionaries. We propose a highly scalable decomposition algorithm for both complete and incomplete data, and demonstrate its advantage for matrix decomposition, imputation of missing values, temporal interpolation, clustering, period estimation, and rank estimation in synthetic and real-world data ranging from traffic patterns to social media activity. Our framework achieves 28% reduction in RMSE compared to baselines for temporal interpolation when as many as 75% of the observations are missing. It scales best among baselines taking under 20 seconds on 3.5 million data points and produces the most parsimonious models. To the best of our knowledge, TGSD is the first framework to jointly model graph signals by temporal and graph dictionaries.

Cross-Node Federated Graph Neural Network for Spatio-Temporal Data Modeling

Chuizheng Meng,Sirisha Rambhatla,Yan Liu

Vast amount of data generated from networks of sensors, wearables, and the Internet of Things (IoT) devices underscores the need for advanced modeling techniques that leverage the spatio-temporal structure of decentralized data due to the need for edge computation and licensing (data access) issues. While federated learning (FL) has emerged as a framework for model training without requiring direct data sharing and exchange, effectively modeling the complex spatio-temporal dependencies to improve forecasting capabilities still remains an open problem. On the other hand, state-of-the-art spatio-temporal forecasting models assume unfettered access to the data, neglecting constraints on data sharing. To bridge this gap, we propose a federated spatio-temporal model -- Cross-Node Federated Graph Neural Network (CNFGNN) -- which explicitly encodes the underlying graph structure using graph neural network (GNN)-based architecture under the constraint of cross-node federated learning, which requires that data in a network of nodes is generated locally on each node and remains decentralized. CNFGNN operates by disentangling the temporal dynamics modeling on devices and spatial dynamics on the server, utilizing alternating optimization to reduce the communication cost, facilitating computations on the edge devices. Experiments on the traffic flow forecasting task show that CNFGNN achieves the best forecasting performance in both transductive and inductive learning settings with no extra computation cost on edge devices, while incurring modest communication cost.

DeGNN: Improving Graph Neural Networks with Graph Decomposition

Xupeng Miao,Nezihe Merve Gu00fcrel,Wentao Zhang,Zhichao Han,Bo Li,Wei Min,Susie Xi Rao,Hansheng Ren,Yinan Shan,Yingxia Shao,Yujie Wang,Fan Wu,Hui Xue,Yaming Yang,Zitao Zhang,Yang Zhao,Shuai Zhang,Yujing Wang,Bin Cui,Ce Zhang

Mining from graph-structured data is an integral component of graph data management. A recent trending technique, graph convolutional network (GCN), has gained momentum in the graph mining field, and plays an essential part in numerous graph-related tasks. Although the emerging GCN optimization techniques bring improvements to specific scenarios, they perform diversely in different applications and introduce many trial-and-error costs for practitioners. Moreover, existing GCN models often suffer from oversmoothing problem. Besides, the entanglement of various graph patterns could lead to non-robustness and harm the final performance of GCNs. In this work, we propose a simple yet efficient graph decomposition approach to improve the performance of general graph neural networks. We first empirically study existing graph decomposition methods and propose an automatic connectivity-ware graph decomposition algorithm, DeGNN. To provide a theoretical explanation, we then characterize GCN from the information-theoretic perspective and show that under certain conditions, the mutual information between the output after l layers and the input of GCN converges to 0 exponentially with respect to l. On the other hand, we show that graph decomposition can potentially weaken the condition of such convergence rate, alleviating the information loss when GCN becomes deeper. Extensive experiments on various academic benchmarks and real-world production datasets demonstrate that graph decomposition generally boosts the performance of GNN models. Moreover, our proposed solution DeGNN achieves state-of-the-art performances on almost all these tasks.

Semi-Supervised Deep Learning for Multiplex Networks

Anasua Mitra,Priyesh Vijayan,Ranbir Sanasam,Diganta Goswami,Srinivasan Parthasarathy,Balaraman Ravindran

Multiplex networks are complex graph structures in which a set of entities are connected to each other via multiple types of relations, each relation representing a distinct layer. Such graphs are used to investigate many complex biological, social, and technological systems. In this work, we present a novel semi-supervised approach for structure-aware representation learning on multiplex networks. Our approach relies on maximizing the mutual information between local node-wise patch representations and label correlated structure-aware global graph representations to model the nodes and cluster structures jointly. Specifically, it leverages a novel cluster-aware, node-contextualized global graph summary generation strategy for effective joint-modeling of node and cluster representations across the layers of a multiplex network. Empirically, we demonstrate that the proposed architecture outperforms state-of-the-art methods in a range of tasks: classification, clustering, visualization, and similarity search on seven real-world multiplex networks for various experiment settings.

Scalable Hierarchical Agglomerative Clustering

Nicholas Monath,Kumar Avinava Dubey,Guru Guruganesh,Manzil Zaheer,Amr Ahmed,Andrew McCallum,Gokhan Mergen,Marc Najork,Mert Terzihan,Bryon Tjanaka,Yuan Wang,Yuchen Wu

The applicability of agglomerative clustering, for inferring both hierarchical and flat clustering, is limited by its scalability. Existing scalable hierarchical clustering methods sacrifice quality for speed and often lead to over-merging of clusters. In this paper, we present a scalable, agglomerative method for hierarchical clustering that does not sacrifice quality and scales to billions of data points. We perform a detailed theoretical analysis, showing that under mild separability conditions our algorithm can not only recover the optimal flat partition but also provide a two-approximation to non-parametric DP-Means objective. This introduces a novel application of hierarchical clustering as an approximation algorithm for the non-parametric clustering objective. We additionally relate our algorithm to the classic hierarchical agglomerative clustering method. We perform extensive empirical experiments in both hierarchical and flat clustering settings and show that our proposed approach achieves state-of-the-art results on publicly available clustering benchmarks. Finally, we demonstrate our methods scalability by applying it to a dataset of 30 billion queries. Human evaluation of the discovered clusters show that our method finds better quality of clusters than the current state-of-the-art.

An Efficient Framework for Balancing Submodularity and Cost

Sofia Maria Nikolakaki,Alina Ene,Evimaria Terzi

In the classical selection problem, the input consists of a collection of elements and the goal is to pick a subset of elements from the collection such that some objective function u0192 is maximized. This problem has been studied extensively in the data-mining community and it has multiple applications including influence maximization in social networks, team formation and recommender systems. A particularly popular formulation that captures the needs of many such applications is one where the objective function u0192 is a monotone and non-negative submodular function. In these cases, the corresponding computational problem can be solved using a simple greedy (1-1/e)-approximation algorithm.In this paper, we consider a generalization of the above formulation where the goal is to optimize a function that maximizes the submodular function u0192 minus a linear cost function cost. This formulation appears as a more natural one, particularly when one needs to strike a balance between the value of the objective function and the cost being paid in order to pick the selected elements. We address variants of this problem both in an offline setting, where the collection is known apriori, as well as in online settings, where the elements of the collection arrive in an online fashion. We demonstrate that by using simple variants of the standard greedy algorithm (used for submodular optimization) we can design algorithms that have provable approximation guarantees, are extremely efficient and work very well in practice.

Filtration Curves for Graph Representation

Leslie OBray,Bastian Rieck,Karsten Borgwardt

The two predominant approaches to graph comparison in recent years are based on (i) enumerating matching subgraphs or (ii) comparing neighborhoods of nodes. In this work, we complement these two perspectives with a third way of representing graphs: using filtration curves from topological data analysis that capture both edge weight information and global graph structure. Filtration curves are highly efficient to compute and lead to expressive representations of graphs, which we demonstrate on graph classification benchmark datasets. Our work opens the door to a new form of graph representation in data mining.

Explaining Algorithmic Fairness Through Fairness-Aware Causal Path Decomposition

Weishen Pan,Sen Cui,Jiang Bian,Changshui Zhang,Fei Wang

Algorithmic fairness has aroused considerable interests in data mining and machine learning communities recently. So far the existing research has been mostly focusing on the development of quantitative metrics to measure algorithm disparities across different protected groups, and approaches for adjusting the algorithm output to reduce such disparities. In this paper, we propose to study the problem of identification of the source of model disparities. Unlike existing interpretation methods which typically learn feature importance, we consider the causal relationships among feature variables and propose a novel framework to decompose the disparity into the sum of contributions from fairness-aware causal paths, which are paths linking the sensitive attribute and the final predictions, on the graph. We also consider the scenario when the directions on certain edges within those paths cannot be determined. Our framework is also model agnostic and applicable to a variety of quantitative disparity measures. Empirical evaluations on both synthetic and real-world data sets are provided to show that our method can provide precise and comprehensive explanations to the model disparities.

Toward Deep Supervised Anomaly Detection: Reinforcement Learning from Partially Labeled Anomaly Data

Guansong Pang,Anton van den Hengel,Chunhua Shen,Longbing Cao

We consider the problem of anomaly detection with a small set of partially labeled anomaly examples and a large-scale unlabeled dataset. This is a common scenario in many important applications. Existing related methods either exclusively fit the limited anomaly examples that typically do not span the entire set of anomalies, or proceed with unsupervised learning from the unlabeled data. We propose here instead a deep reinforcement learning-based approach that enables an end-to-end optimization of the detection of both labeled and unlabeled anomalies. This approach learns the known abnormality by automatically interacting with an anomaly-biased simulation environment, while continuously extending the learned abnormality to novel classes of anomaly (i.e., unknown anomalies) by actively exploring possible anomalies in the unlabeled data. This is achieved by jointly optimizing the exploitation of the small labeled anomaly data and the exploration of the rare unlabeled anomalies. Extensive experiments on 48 real-world datasets show that our model significantly outperforms five state-of-the-art competing methods.

Fast and Accurate Partial Fourier Transform for Time Series Data

Yong-chan Park,Jun-Gi Jang,U Kang

Given a time-series vector, how can we efficiently detect anomalies? A widely used method is to use Fast Fourier transform (FFT) to compute Fourier coefficients, take first few coefficients while discarding the remaining small coefficients, and reconstruct the original time series to find points with large errors. Despite the pervasive use, the method requires to compute all of the Fourier coefficients which can be cumbersome if the input length is large or when we need to perform many FFT operations.In this paper, we propose Partial Fourier Transform (PFT), an efficient and accurate algorithm for computing only a part of Fourier coefficients. PFT approximates a part of twiddle factors (trigonometric constants) using polynomials, thereby reducing the computational complexity due to the mixture of many twiddle factors. We derive the asymptotic time complexity of PFT with respect to input and output sizes, and tolerance. We also show that PFT provides an option to set an arbitrary approximation error bound, which is useful especially when the fast evaluation is of utmost importance. Experimental results show that PFT outperforms the current state-of-the-art algorithms, with an order of magnitude of speedup for sufficiently small output sizes without sacrificing accuracy. In addition, we demonstrate the accuracy and efficacy of PFT on real-world anomaly detection, with interpretations of anomalies in stock price data.

Faster and Generalized Temporal Triangle Counting, via Degeneracy Ordering

Noujan Pashanasangi,C. Seshadhri

Triangle counting is a fundamental technique in network analysis, that has received much attention in various input models. The vast majority of triangle counting algorithms are targeted to static graphs. Yet, many real-world graphs are directed and temporal, where edges come with timestamps. Temporal triangles yield much more information, since they account for both the graph topology and the timestamps.Temporal triangle counting has seen a few recent results, but there are varying definitions of temporal triangles. In all cases, temporal triangle patterns enforce constraints on the time interval between edges (in the triangle). We define a general notion (u03b41,3, u03b41,2, u03b42,3)-temporal triangles that allows for separate time constraints for all pairs of edges.Our main result is a new algorithm, DOTTT (Degeneracy Oriented Temporal Triangle Totaler), that exactly counts all directed variants of (u03b41,3, u03b41,2, u03b42,3)-temporal triangles. Using the classic idea of degeneracy ordering with careful combinatorial arguments, we can prove that DOTTT runs in O(mu03bau0142og m) time, where m is the number of (temporal) edges and u03ba is the graph degeneracy (max core number). Up to log factors, this matches the running time of the best static triangle counters. Moreover, this running time is better than existing.DOTTT has excellent practical behavior and runs twice as fast as existing state-of-the-art temporal triangle counters (and is also more general). For example, DOTTT computes all types of temporal queries in Bitcoin temporal network with half a billion edges in less than an hour on a commodity machine.

Local Algorithms for Estimating Effective Resistance

Pan Peng,Daniel Lopatta,Yuichi Yoshida,Gramoz Goranci

Effective resistance is an important metric that measures the similarity of two vertices in a graph. It has found applications in graph clustering, recommendation systems and network reliability, among others. In spite of the importance of the effective resistances, we still lack efficient algorithms to exactly compute or approximate them on massive graphs.In this work, we design several local algorithms for estimating effective resistances, which are algorithms that only read a small portion of the input while still having provable performance guarantees. To illustrate, our main algorithm approximates the effective resistance between any vertex pair s,t with an arbitrarily small additive error u03b5 in time O(poly (log n/u03b5)), whenever the underlying graph has bounded mixing time. We perform an extensive empirical study on several benchmark datasets, validating the performance of our algorithms.

MaNIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling

Giulia Preti,Gianmarco De Francisci Morales,Matteo Riondato

We present MaNIACS, a sampling-based randomized algorithm for computing high-quality approximations of the collection of the subgraph patterns that are frequent in a single, large, vertex-labeled graph, according to the Minimum Node Image-based (MNI) frequency measure. The output of MaNIACS comes with strong probabilistic guarantees, obtained by using the empirical Vapnik-Chervonenkis (VC) dimension, a key concept from statistical learning theory, together with strong probabilistic tail bounds on the difference between the frequency of a pattern in the sample and its exact frequency. MaNIACS leverages properties of the MNI-frequency to aggressively prune the pattern search space, and thus to reduce the time spent in exploring subspaces containing no frequent patterns. In turn, this pruning leads to better bounds to the maximum frequency estimation error, which leads to increased pruning, resulting in a beneficial feedback effect. The results of our experimental evaluation of MaNIACS on real graphs show that it returns high-quality collections of frequent patterns in large graphs up to two orders of magnitude faster than the exact algorithm.

Xin Qian,Ryan A. Rossi,Fan Du,Sungchul Kim,Eunyee Koh,Sana Malik,Tak Yeon Lee,Joel Chan

Visualization recommendation is important for exploratory analysis and making sense of the data quickly by automatically recommending relevant visualizations to the user. In this work, we propose the first end-to-end ML-based visualization recommendation system that leverages a large corpus of datasets and their relevant visualizations to learn a visualization recommendation model automatically. Then, given a new unseen dataset from an arbitrary user, the model automatically generates visualizations for that new dataset, derives scores for the visualizations, and outputs a list of recommended visualizations to the user ordered by effectiveness. We also describe an evaluation framework to quantitatively evaluate visualization recommendation models learned from a large corpus of visualizations and datasets. Through quantitative experiments, a user study, and qualitative analysis, we show that our end-to-end ML-based system recommends more effective and useful visualizations compared to existing state-of-the-art rule-based systems.

Network-Wide Traffic States Imputation Using Self-interested Coalitional Learning

Huiling qin,Xianyuan Zhan,Yuanxun Li,Xiaodu Yang,Yu Zheng

Accurate network-wide traffic state estimation is vital to many transportation operations and urban applications. However, existing methods often suffer from the scalability issue when performing real-time inference at the city-level, or not robust enough under limited data. Currently, GPS trajectory data from probe vehicles has become a popular data source for many transportation applications. GPS trajectory data has large coverage area, which is ideal for network-wide applications, but also has the disadvantage of being sparse and highly heterogeneous among different time and locations. In this study, we focus on developing a robust and interpretable network-wide traffic state imputation framework using partially observed traffic information. We introduce a new learning strategy, called self-interested coalitional learning (SCL), which forges cooperation between a main self-interested semi-supervised learning task and a discriminator as a critic to facilitate main task training while providing interpretability on the results. In our detailed model, we use a temporal graph convolutional variational autoencoder (TG-VAE) as the reconstructor, which models the complex spatio-temporal pattern in data and solves the main traffic state imputation task. A discriminator is introduced to output interpretable imputation confidence on the estimated results and also help to enhance the performance of the reconstructor. The framework is evaluated using a large GPS trajectory dataset from taxis in Jinan, China. Extensive experiments against the state-of-the-art baselines demonstrate the effectiveness and robustness of the proposed method for network-wide traffic state estimation.

Retrieval & Interaction Machine for Tabular Data Prediction

Jiarui Qin,Weinan Zhang,Rong Su,Zhirong Liu,Weiwen Liu,Ruiming Tang,Xiuqiang He,Yong Yu

Prediction over tabular data is an essential task in many data science applications such as recommender systems, online advertising, medical treatment, etc. Tabular data is structured into rows and columns, with each row as a data sample and each column as a feature attribute. Both the columns and rows of the tabular data carry useful patterns that could improve the model prediction performance. However, most existing models focus on the cross-column patterns yet overlook the cross-rowpatterns as they deal with single samples independently. In this work, we propose a general learning framework named Retrieval & Interaction Machine (RIM) that fully exploits both cross-row and cross-column patterns among tabular data. Specifically, RIM first leverages search engine techniques to efficiently retrieve useful rows of the table to assist the label prediction of the target row, then uses feature interaction networks to capture the cross-column patterns among the target row and the retrieved rows so as to make the final label prediction. We conduct extensive experiments on 11 datasets of three important tasks, i.e., CTR prediction (classification), top-n recommendation (ranking) and rating prediction (regression). Experimental results show that RIM achieves significant improvements over the state-of-the-art and various baselines, demonstrating the superiority and efficacy of RIM.

ImGAGN: Imbalanced Network Embedding via Generative Adversarial Graph Networks

Liang Qu,Huaisheng Zhu,Ruiqi Zheng,Yuhui Shi,Hongzhi Yin

Imbalanced classification on graphs is ubiquitous yet challenging in many real-world applications, such as fraudulent node detection. Recently, graph neural networks (GNNs) have shown promising performance on many network analysis tasks. However, most existing GNNs have almost exclusively focused on the balanced networks, and would get unappealing performance on the imbalanced networks. To bridge this gap, in this paper, we present a generative adversarial graph network model, called ImGAGN to address the imbalanced classification problem on graphs. It introduces a novel generator for graph structure data, named GraphGenerator, which can simulate both the minority class nodes attribute distribution and network topological structure distribution by generating a set of synthetic minority nodes such that the number of nodes in different classes can be balanced. Then a graph convolutional network (GCN) discriminator is trained to discriminate between real nodes and fake (i.e., generated) nodes, and also between minority nodes and majority nodes on the synthetic balanced network. To validate the effectiveness of the proposed method, extensive experiments are conducted on four real-world imbalanced network datasets. Experimental results demonstrate that the proposed method ImGAGN outperforms state-of-the-art algorithms for semi-supervised imbalanced node classification task.

Individual Treatment Prescription Effect Estimation in a Low Compliance Setting

Thibaud Rahier,Amu00e9lie Hu00e9liou,Matthieu Martin,Christophe Renaudin,Eustache Diemert

Individual Treatment Effect (ITE) estimation is an extensively researched problem, with applications in various domains. We model the case where there exists heterogeneous non-compliance to a randomly assigned treatment, a typical situation in health (because of non-compliance to prescription) or digital advertising (because of competition and ad blockers for instance). The lower the compliance, the more the effect of treatment prescription - or individual prescription effect (IPE) - signal fades away and becomes harder to estimate. We propose a new approach for the estimation of the IPE that takes advantage of observed compliance information to prevent signal fading. Using the Structural Causal Model framework and do-calculus, we define a general mediated causal effect setting and propose a corresponding estimator which consistently recovers the IPE with asymptotic variance guarantees. Finally, we conduct experiments on both synthetic and real-world datasets that highlight the benefit of the approach, which consistently improves state-of-the-art in low compliance settings.

MTrajRec: Map-Constrained Trajectory Recovery via Seq2Seq Multi-task Learning

Huimin Ren,Sijie Ruan,Yanhua Li,Jie Bao,Chuishi Meng,Ruiyuan Li,Yu Zheng

With the increasing adoption of GPS modules, there are a wide range of urban applications based on trajectory data analysis, such as vehicle navigation, travel time estimation, and driver behavior analysis. The effectiveness of urban applications relies greatly on the high sampling rates of trajectories precisely matched to the map. However, a large number of trajectories are collected under a low sampling rate in real-world practice, due to certain communication loss and energy constraints. To enhance the trajectory data and support the urban applications more effectively, many trajectory recovery methods are proposed to infer the trajectories in free space. In addition, the recovered trajectory still needs to be mapped to the road network, before it can be used in the applications. However, the two-stage pipeline, which first infers high-sampling-rate trajectories and then performs the map matching, is inaccurate and inefficient. In this paper, we propose a Map-constrained Trajectory Recovery framework, MTrajRec, to recover the fine-grained points in trajectories and map match them on the road network in an end-to-end manner. MTrajRec implements a multi-task sequence-to-sequence learning architecture to predict road segment and moving ratio simultaneously. Constraint mask, attention mechanism, and attribute module are proposed to overcome the limits of coarse grid representation and improve the performance. Extensive experiments based on large-scale real-world trajectory data confirm the effectiveness and efficiency of our approach.

Dawid Rymarczyk,u0141ukasz Struski,Jacek Tabor,Bartosz Zieliu0144ski

In this work, we introduce an extension to ProtoPNet called ProtoPShare which shares prototypical parts between classes. To obtain prototype sharing we prune prototypical parts using a novel data-dependent similarity. Our approach substantially reduces the number of prototypes needed to preserve baseline accuracy and finds prototypical similarities between classes. We show the effectiveness of ProtoPShare on the CUB-200-2011 and the Stanford Cars datasets and confirm the semantic consistency of its prototypical parts in user-study.

Spectral Clustering of Attributed Multi-relational Graphs

Ylli Sadikaj,Yllka Velaj,Sahar Behzadi,Claudia Plant

Graph clustering aims at discovering a natural grouping of the nodes such that similar nodes are assigned to a common cluster. Many different algorithms have been proposed in the literature: for simple graphs, for graphs with attributes associated to nodes, and for graphs where edges represent different types of relations among nodes. However, complex data in many domains can be represented as both attributed and multi-relational networks.In this paper, we propose SpectralMix, a joint dimensionality reduction technique for multi-relational graphs with categorical node attributes. SpectralMix integrates all information available from the attributes, the different types of relations, and the graph structure to enable a sound interpretation of the clustering results. Moreover, it generalizes existing techniques: it reduces to spectral embedding and clustering when only applied to a single graph and to homogeneity analysis when applied to categorical data.Experiments conducted on several real-world datasets enable us to detect dependencies between graph structure and categorical attributes, moreover, they exhibit the superiority of SpectralMix over existing methods.

Learning Process-consistent Knowledge Tracing

Shuanghong Shen,Qi Liu,Enhong Chen,Zhenya Huang,Wei Huang,Yu Yin,Yu Su,Shijin Wang

Knowledge tracing (KT), which aims to trace students changing knowledge state during their learning process, has improved students learning efficiency in online learning systems. Recently, KT has attracted much research attention due to its critical significance in education. However, most of the existing KT methods pursue high accuracy of student performance prediction but neglect the consistency of students changing knowledge state with their learning process. In this paper, we explore a new paradigm for the KT task and propose a novel model named Learning Process-consistent Knowledge Tracing (LPKT), which monitors students knowledge state through directly modeling their learning process. Specifically, we first formalize the basic learning cell as the tuple exercise---answer time---answer. Then, we deeply measure the learning gain as well as its diversity from the difference of the present and previous learning cells, their interval time, and students related knowledge state. We also design a learning gate to distinguish students absorptive capacity of knowledge. Besides, we design a forgetting gate to model the decline of students knowledge over time, which is based on their previous knowledge state, present learning gains, and the interval time. Extensive experimental results on three public datasets demonstrate that LPKT could obtain more reasonable knowledge state in line with the learning process. Moreover, LPKT also outperforms state-of-the-art KT methods on student performance prediction. Our work indicates a potential future research direction for KT, which is of both high interpretability and accuracy.

Fruit-fly Inspired Neighborhood Encoding for Classification

Kaushik Sinha,Parikshit Ram

Inspired by the fruit-fly olfactory circuit, the Fly Bloom Filter is able to efficiently summarize the data with a single pass and has been used for novelty detection. We propose a new classifier that effectively encodes the different local neighborhoods for each class with a per-class Fly Bloom Filter. The inference on test data requires an efficient Flyhash[6] operation followed by a high-dimensional, but very sparse, dot product with the per-class Bloom Filters. On the theoretical side, we establish conditions under which the predictions of our proposed classifier agrees with the predictions of the nearest neighbor classifier. We extensively evaluate our proposed scheme with 71 data sets of varied data dimensionality to demonstrate that the predictive performance of our proposed neuroscience inspired classifier is competitive to the nearest-neighbor classifiers and other single-pass classifiers.

Deep Clustering based Fair Outlier Detection

Hanyu Song,Peizhao Li,Hongfu Liu

In this paper, we focus on the fairness issues regarding unsupervised outlier detection. Traditional algorithms, without a specific design for algorithmic fairness, could implicitly encode and propagate statistical bias in data and raise societal concerns. To correct such unfairness and deliver a fair set of potential outlier candidates, we propose Deep Clustering based Fair Outlier Detection (DCFOD) that learns a good representation for utility maximization while enforcing the learnable representation to be subgroup-invariant on the sensitive attribute. Considering the coupled and reciprocal nature between clustering and outlier detection, we leverage deep clustering to discover the intrinsic cluster structure and out-of-structure instances. Meanwhile, an adversarial training erases the sensitive pattern for instances for fairness adaptation. Technically, we propose an instance-level weighted representation learning strategy to enhance the joint deep clustering and outlier detection, where the dynamic weight module re-emphasizes contributions of likely-inliers while mitigating the negative impact from outliers. Demonstrated by experiments on eight datasets comparing to 17 outlier detection algorithms, our DCFOD method consistently achieves superior performance on both the outlier detection validity and two types of fairness notions in outlier detection.

Robust Learning by Self-Transition for Handling Noisy Labels

Hwanjun Song,Minseok Kim,Dongmin Park,Yooju Shin,Jae-Gil Lee

Real-world data inevitably contains noisy labels, which induce the poor generalization of deep neural networks. It is known that the network typically begins to rapidly memorize false-labeled samples after a certain point of training. Thus, to counter the label noise challenge, we propose a novel self-transitional learning method called MORPH, which automatically switches its learning phase at the transition point from seeding to evolution. In the seeding phase, the network is updated using all the samples to collect a seed of clean samples. Then, in the evolution phase, the network is updated using only the set of arguably clean samples, which precisely keeps expanding by the updated network. Thus, MORPH effectively avoids the overfitting to false-labeled samples throughout the entire training period. Extensive experiments using five real-world or synthetic benchmark datasets demonstrate substantial improvements over state-of-the-art methods in terms of robustness and efficiency.

Triangle-aware Spectral Sparsifiers and Community Detection

Konstantinos Sotiropoulos,Charalampos E. Tsourakakis

Triangle-aware graph partitioning has proven to be a successful approach to finding communities in real-world data [8, 40, 51, 54]. But how can we explain its empirical success? Triangle-aware graph partitioning methods rely on the count of triangles an edge is contained in, in contrast to the well-established measure of effective resistance [12] that requires global information about the graph.In this work we advance the understanding of triangle-based graph partitioning in two ways. First, we introduce a novel triangle-aware sparsification scheme. Our scheme provably produces a spectral sparsifier with high probability [46, 47] on graphs that exhibit strong triadic closure, a hallmark property of real-world networks. Importantly, our sampling scheme is amenable to distributed computing, since it relies simply on computing node degrees, and edge triangle counts. Finally, we compare our methods to the Spielman-Srivastava sparsification algorithm [46] on a wide variety of real-world graphs, and we verify the applicability of our proposed sparsification scheme on real-world networks.Secondly, we develop a data-driven approach towards understanding properties of real-world communities with respect to effective resistances, and triangle counts. Our empirical approach is mainly based on the notion of ground-truth communities in datasets made available originally by Yang and Leskovec [53]. We perform a study of triangle-aware measures, and effective resistances on edges within, and across communities, and we discover certain interesting empirical findings. For example, we observe that the Jaccard similarity of an edge used by Satuluri [40], and the closely related Tectonic similarity measure introduced by Tsourakakis et al. [51] provide consistently good signals of whether an edge is contained within a community or not.

Probabilistic Gradient Boosting Machines for Large-Scale Probabilistic Regression

Olivier Sprangers,Sebastian Schelter,Maarten de Rijke

Gradient Boosting Machines (GBM) are hugely popular for solving tabular data problems. However, practitioners are not only interested in point predictions, but also in probabilistic predictions in order to quantify the uncertainty of the predictions. Creating such probabilistic predictions is difficult with existing GBM-based solutions: they either require training multiple models or they become too computationally expensive to be useful for large-scale settings. We propose Probabilistic Gradient Boosting Machines (PGBM), a method to create probabilistic predictions with a single ensemble of decision trees in a computationally efficient manner. PGBM approximates the leaf weights in a decision tree as a random variable, and approximates the mean and variance of each sample in a dataset via stochastic tree ensemble update equations. These learned moments allow us to subsequently sample from a specified distribution after training. We empirically demonstrate the advantages of PGBM compared to existing state-of-the-art methods: (i) PGBM enables probabilistic estimates without compromising on point performance in a single model, (ii) PGBM learns probabilistic estimates via a single model only (and without requiring multi-parameter boosting), and thereby offers a speedup of up to several orders of magnitude over existing state-of-the-art methods on large datasets, and (iii) PGBM achieves accurate probabilistic estimates in tasks with complex differentiable loss functions, such as hierarchical time series problems, where we observed up to 10% improvement in point forecasting performance and up to 300% improvement in probabilistic forecasting performance.

Redescription Model Mining

Felix I. Stamm,Martin Becker,Markus Strohmaier,Florian Lemmerich

This paper introduces Redescription Model Mining, a novel approach to identify interpretable patterns across two datasets that share only a subset of attributes and have no common instances. In particular, Redescription Model Mining aims to find pairs of describable data subsets -- one for each dataset -- that induce similar exceptional models with respect to a prespecified model class. To achieve this, we combine two previously separate research areas: Exceptional Model Mining and Redescription Mining. For this new problem setting, we develop interestingness measures to select promising patterns, propose efficient algorithms, and demonstrate their potential on synthetic and real-world data. Uncovered patterns can hint at common underlying phenomena that manifest themselves across datasets, enabling the discovery of possible associations between (combinations of) attributes that do not appear in the same dataset.

Norm Adjusted Proximity Graph for Fast Inner Product Retrieval

Shulong Tan,Zhaozhuo Xu,Weijie Zhao,Hongliang Fei,Zhixin Zhou,Ping Li

Efficient inner product search on embedding vectors is often the vital stage for online ranking services, such as recommendation and information retrieval. Recommendation algorithms, e.g., matrix factorization, typically produce latent vectors to represent users or items. The recommendation services are conducted by retrieving the most relevant item vectors given the user vector, where the relevance is often defined by inner product. Therefore, developing efficient recommender systems often requires solving the so-called maximum inner product search (MIPS) problem. In the past decade, there have been many studies on efficient MIPS algorithms. This task is challenging in part because the inner product does not follow the triangle inequality of metric space.Compared with hash-based or quantization-based MIPS solutions, in recent years graph-based MIPS algorithms have demonstrated their strong empirical advantages in many real-world MIPS tasks. In this paper, we propose a new index graph construction method named norm adjusted proximity graph (NAPG), for efficient MIPS. With adjusting factors estimated on sampled data, NAPG is able to select more meaningful data points to connect with when constructing graph-based index for inner product search. Our extensive experiments on a variety of datasets verify that the improved graph-based index strategy provides another strong addition to the pool of efficient MIPS algorithms.

Analysis and Applications of Class-wise Robustness in Adversarial Training

Qi Tian,Kun Kuang,Kelu Jiang,Fei Wu,Yisen Wang

Adversarial training is one of the most effective approaches to improve model robustness against adversarial examples. However, previous works mainly focus on the overall robustness of the model, and the in-depth analysis on the role of each class involved in adversarial training is still missing. In this paper, we propose to analyze the class-wise robustness in adversarial training. First, we provide a detailed diagnosis of adversarial training on six benchmark datasets, i.e., MNIST, CIFAR-10, CIFAR-100, SVHN, STL-10 and ImageNet. Surprisingly, we find that there are remarkable robustness discrepancies among classes, leading to unbalance/unfair class-wise robustness in the robust models. Furthermore, we keep investigating the relations between classes and find that the unbalanced class-wise robustness is pretty consistent among different attack and defense methods. Moreover, we observe that the stronger attack methods in adversarial learning achieve performance improvement mainly from a more successful attack on the vulnerable classes (i.e., classes with less robustness). Inspired by these interesting findings, we design a simple but effective attack method based on the traditional PGD attack, named Temperature-PGD attack, which proposes to enlarge the robustness disparity among classes with a temperature factor on the confidence distribution of each image. Experiments demonstrate our method can achieve a higher attack rate than the PGD attack. Furthermore, from the defense perspective, we also make some modifications in the training and inference phase to improve the robustness of the most vulnerable class, so as to mitigate the large difference in class-wise robustness. We believe our work can contribute to a more comprehensive understanding of adversarial training as well as rethinking the class-wise properties in robust models.

Choice Set Confounding in Discrete Choice

Kiran Tomlinson,Johan Ugander,Austin R. Benson

Standard methods in preference learning involve estimating the parameters of discrete choice models from data of selections (choices) made by individuals from a discrete set of alternatives (the choice set). While there are many models for individual preferences, existing learning methods overlook how choice set assignment affects the data. Often, the choice set itself is influenced by an individuals preferences; for instance, a consumer choosing a product from an online retailer is often presented with options from a recommender system that depend on information about the consumers preferences. Ignoring these assignment mechanisms can mislead choice models into making biased estimates of preferences, a phenomenon that we call choice set confounding. We demonstrate the presence of such confounding in widely-used choice datasets.To address this issue, we adapt methods from causal inference to the discrete choice setting. We use covariates of the chooser for inverse probability weighting and/or regression controls, accurately recovering individual preferences in the presence of choice set confounding under certain assumptions. When such covariates are unavailable or inadequate, we develop methods that take advantage of structured choice set assignment to improve prediction. We demonstrate the effectiveness of our methods on real-world choice data, showing, for example, that accounting for choice set confounding makes choices observed in hotel booking and commute transportation more consistent with rational utility maximization.

Learning Interpretable Feature Context Effects in Discrete Choice

Kiran Tomlinson,Austin R. Benson

Individuals are constantly making choices---purchasing products, consuming Web content, making social connections---so understanding what contributes to these decisions is crucial in many settings. A major interest is understanding context effects, which occur when the set of available options itself affects an individuals relative preferences. These violate traditional rationality assumptions but are commonly observed in human behavior. At the same time, identifying context effects from choice data remains a challenge; existing models posit a specific context effect a priori and then measure its effect from (often effect-targeting) data. Here, we develop discrete choice models that capture a broad range of context effects, which are learned from choice data rather than baked into the model. Our models yield intuitive, interpretable, and statistically testable context effects, all while being simple to train. We evaluate our model on several empirical choice datasets, discovering, e.g., that people are more willing to book higher-priced hotels when presented with options that are on sale. We also provide the first analysis of context effects in online social network growth, finding that users forming connections place relatively more emphasis on shared neighbors when popular users are an option.

Statistical Models Coupling Allows for Complex Local Multivariate Time Series Analysis

Veronica Tozzo,Federico Ciech,Davide Garbarino,Alessandro Verri

The increased availability of multivariate time-series asks for the development of suitable methods able to holistically analyse them. To this aim, we propose a novel flexible method for data-mining, forecasting and causal patterns detection that leverages the coupling of Hidden Markov Models and Gaussian Graphical Models. Given a multivariate non-stationary time-series, the proposed method simultaneously clusters time points while understanding probabilistic relationships among variables. The clustering divides the time points into stationary sub-groups whose underlying distribution can be inferred through a graphical model. Such coupling can be further exploited to build a time-varying regression model which allows to both make predictions and obtain insights on the presence of causal patterns. We extensively validate the proposed approach on synthetic data showing that it has better performance than the state of the art on clustering, graphical models inference and prediction. Finally, to demonstrate the applicability of our approach in real-world scenarios, we exploit its characteristics to build a profitable investment portfolio. Results show that we are able to improve the state of the art, by going from a -%20 profit to a noticeable 80%.

The Generalized Mean Densest Subgraph Problem

Nate Veldt,Austin R. Benson,Jon Kleinberg

Finding dense subgraphs of a large graph is a standard problem in graph mining that has been studied extensively both for its theoretical richness and its many practical applications. In this paper we introduce a new family of dense subgraph objectives, parameterized by a single parameter p, based on computing generalized means of degree sequences of a subgraph. Our objective captures both the standard densest subgraph problem and the maximum k-core as special cases, and provides a way to interpolate between and extrapolate beyond these two objectives when searching for other notions of dense subgraphs. In terms of algorithmic contributions, we first show that our objective can be minimized in polynomial time for all p u2265 1 using repeated submodular minimization. A major contribution of our work is analyzing the performance of different types of peeling algorithms for dense subgraphs both in theory and practice. We prove that the standard peeling algorithm can perform arbitrarily poorly on our generalized objective, but we then design a more sophisticated peeling method which for p u2265 1 has an approximation guarantee that is always at least 1/2 and converges to 1 as p u27f6 u20b6. In practice, we show that this algorithm obtains extremely good approximations to the optimal solution, scales to large graphs, and highlights a range of different meaningful notions of density on graphs coming from numerous domains. Furthermore, it is typically able to approximate the densest subgraph problem better than the standard peeling algorithm, by better accounting for how the removal of one node affects other nodes in its neighborhood.

Alphacore: Data Depth based Core Decomposition

Friedhelm Victor,Cuneyt G. Akcora,Yulia R. Gel,Murat Kantarcioglu

Core decomposition in networks has proven useful for evaluating the importance of nodes and communities in a variety of application domains, ranging from biology to social networks and finance. However, existing core decomposition algorithms have limitations in simultaneously handling multiple node and edge attributes.We propose a novel unsupervised core decomposition method that can be easily applied to directed and weighted networks. Our algorithm, AlphaCore, allows us to systematically and mathematically rigorously combine multiple node properties by using the notion of data depth. In addition, it can be used as a mixture of centrality measure and core decomposition. Compared to existing approaches, AlphaCore avoids the need to specify numerous thresholds or coefficients and yields meaningful quantitative and qualitative insights into the network structural organization.We evaluate AlphaCores performance with a focus on financial, blockchain-based token networks, the social network Reddit and a transportation network of international flight routes. We compare our results with existing core decomposition and centrality algorithms. Using ground truth about node importance, we show that AlphaCore yields the best precision and recall results among core decomposition methods using the same input features. An implementation is available at https://github.com/friedhelmvictor/alphacore.

Multi-Objective Model-based Reinforcement Learning for Infectious Disease Control

Runzhe Wan,Xinyu Zhang,Rui Song

Severe infectious diseases such as the novel coronavirus (COVID-19) pose a huge threat to public health. Stringent control measures, such as school closures and stay-at-home orders, while having significant effects, also bring huge economic losses. In the face of an emerging infectious disease, a crucial question for policymakers is how to make the trade-off and implement the appropriate interventions timely given the huge uncertainty. In this work, we propose a Multi-Objective Model-based Reinforcement Learning framework to facilitate data-driven decision-making and minimize the overall long-term cost. Specifically, at each decision point, a Bayesian epidemiological model is first learned as the environment model, and then the proposed model-based multi-objective planning algorithm is applied to find a set of Pareto-optimal policies. This framework, combined with the prediction bands for each policy, provides a real-time decision support tool for policymakers. The application is demonstrated with the spread of COVID-19 in China.

Certified Robustness of Graph Neural Networks against Adversarial Structural Perturbation

Binghui Wang,Jinyuan Jia,Xiaoyu Cao,Neil Zhenqiang Gong

Graph neural networks (GNNs) have recently gained much attention for node and graph classification tasks on graph-structured data. However, multiple recent works showed that an attacker can easily make GNNs predict incorrectly via perturbing the graph structure, i.e., adding or deleting edges in the graph. We aim to defend against such attacks via developing certifiably robust GNNs. Specifically, we prove the first certified robustness guarantee of any GNN for both node and graph classifications against structural perturbation. Moreover, we show that our certified robustness guarantee is tight. Our results are based on a recently proposed technique called randomized smoothing, which we extend to graph data. We also empirically evaluate our method for both node and graph classifications on multiple GNNs and multiple benchmark datasets. For instance, on the Cora dataset, Graph Convolutional Network with our randomized smoothing can achieve a certified accuracy of 0.49 when the attacker can arbitrarily add/delete at most 15 edges in the graph.

Privacy-Preserving Representation Learning on Graphs: A Mutual Information Perspective

Binghui Wang,Jiayi Guo,Ang Li,Yiran Chen,Hai Li

Learning with graphs has attracted significant attention recently. Existing representation learning methods on graphs have achieved state-of-the-art performance on various graph-related tasks such as node classification, link prediction, etc. However, we observe that these methods could leak serious private information. For instance, one can accurately infer the links (or node identity) in a graph from a node classifier (or link predictor) trained on the learnt node representations by existing methods. To address the issue, we propose a privacy-preserving representation learning framework on graphs from the mutual information perspective. Specifically, our framework includes a primary learning task and a privacy protection task, and we consider node classification and link prediction as the two tasks of interest. Our goal is to learn node representations such that they can be used to achieve high performance for the primary learning task, while obtaining performance for the privacy protection task close to random guessing. We formally formulate our goal via mutual information objectives. However, it is intractable to compute mutual information in practice. Then, we derive tractable variational bounds for the mutual information terms, where each bound can be parameterized via a neural network. Next, we train these parameterized neural networks to approximate the true mutual information and learn privacy-preserving node representations. We finally evaluate our framework on various graph datasets.

JOHAN: A Joint Online Hurricane Trajectory and Intensity Forecasting Framework

Ding Wang,Pang-Ning Tan

Hurricanes are one of the most catastrophic natural forces with potential to inflict severe damages to properties and loss of human lives from high winds and inland flooding. Accurate long-term forecasting of the trajectory and intensity of advancing hurricanes is therefore crucial to provide timely warnings for civilians and emergency responders to mitigate costly damages and their life-threatening impact. In this paper, we present a novel online learning framework called JOHAN that simultaneously predicts the trajectory and intensity of a hurricane based on outputs produced by an ensemble of dynamic (physical) hurricane models. In addition, JOHAN is designed to generate accurate forecasts of the ordinal-valued hurricane intensity categories to ensure that their severity level can be reliably communicated to the public. The framework also employs exponentially-weighted quantile loss functions to bias the algorithm towards improving its prediction accuracy for high category hurricanes approaching landfall. Experimental results using real-world hurricane data demonstrated the superiority of JOHAN compared to several state-of-the-art learning approaches.

Approximate Graph Propagation

Hanzhi Wang,Mingguo He,Zhewei Wei,Sibo Wang,Ye Yuan,Xiaoyong Du,Ji-Rong Wen

Efficient computation of node proximity queries such as transition probabilities, Personalized PageRank, and Katz are of fundamental importance in various graph mining and learning tasks. In particular, several recent works leverage fast node proximity computation to improve the scalability of Graph Neural Networks (GNN). However, prior studies on proximity computation and GNN feature propagation are on a case-by-case basis, with each paper focusing on a particular proximity measure.In this paper, we propose Approximate Graph Propagation (AGP), a unified randomized algorithm that computes various proximity queries and GNN feature propagations, including transition probabilities, Personalized PageRank, heat kernel PageRank, Katz, SGC, GDC, and APPNP. Our algorithm provides a theoretical bounded error guarantee and runs in almost optimal time complexity. We conduct an extensive experimental study to demonstrate AGPs effectiveness in two concrete applications: local clustering with heat kernel PageRank and node classification with GNNs. Most notably, we present an empirical study on a billion-edge graph Papers100M, the largest publicly available GNN dataset so far. The results show that AGP can significantly improve various existing GNN models scalability without sacrificing prediction accuracy.

Relational Message Passing for Knowledge Graph Completion

Hongwei Wang,Hongyu Ren,Jure Leskovec

Knowledge graph completion aims to predict missing relations between entities in a knowledge graph. In this work, we propose a relational message passing method for knowledge graph completion. Different from existing embedding-based methods, relational message passing only considers edge features (i.e., relation types) without entity IDs in the knowledge graph, and passes relational messages among edges iteratively to aggregate neighborhood information. Specifically, two kinds of neighborhood topology are modeled for a given entity pair under the relational message passing framework: (1) Relational context, which captures the relation types of edges adjacent to the given entity pair; (2) Relational paths, which characterize the relative position between the given two entities in the knowledge graph. The two message passing modules are combined together for relation prediction. Experimental results on knowledge graph benchmarks as well as our newly proposed dataset show that, our method PathCon outperforms state-of-the-art knowledge graph completion methods by a large margin. PathCon is also shown applicable to inductive settings where entities are not seen in training stage, and it is able to provide interpretable explanations for the predicted results. The code and all datasets are available at https://github.com/hwwang55/PathCon.

Deep Learning Embeddings for Data Series Similarity Search

Qitong Wang,Themis Palpanas

A key operation for the (increasingly large) data series collection analysis is similarity search. According to recent studies, SAX-based indexes offer state-of-the-art performance for similarity search tasks. However, their performance lags under high-frequency, weakly correlated, excessively noisy, or other dataset-specific properties. In this work, we propose Deep Embedding Approximation (DEA), a novel family of data series summarization techniques based on deep neural networks. Moreover, we describe SEAnet, a novel architecture especially designed for learning DEA, that introduces the Sum of Squares preservation property into the deep network design. Finally, we propose a new sampling strategy, SEASam, that allows SEAnet to effectively train on massive datasets. Comprehensive experiments on 7 diverse synthetic and real datasets verify the advantages of DEA learned using SEAnet, when compared to other state-of-the-art traditional and DEA solutions, in providing high-quality data series summarizations and similarity search results.

Deconfounded Recommendation for Alleviating Bias Amplification

Wenjie Wang,Fuli Feng,Xiangnan He,Xiang Wang,Tat-Seng Chua

Recommender systems usually amplify the biases in the data. The model learned from historical interactions with imbalanced item distribution will amplify the imbalance by over-recommending items from the majority groups. Addressing this issue is essential for a healthy ecosystem of recommendation in the long run. Existing work applies bias control to the ranking targets (e.g., calibration, fairness, and diversity), but ignores the true reason for bias amplification and trades off the recommendation accuracy.In this work, we scrutinize the cause-effect factors for bias amplification, identifying the main reason lies in the confounding effect of imbalanced item distribution on user representation and prediction score. The existence of such confounder pushes us to go beyond merely modeling the conditional probability and embrace the causal modeling for recommendation. Towards this end, we propose a Deconfounded Recommender System (DecRS), which models the causal effect of user representation on the prediction score. The key to eliminating the impact of the confounder lies in backdoor adjustment, which is however difficult to do due to the infinite sample space of the confounder. For this challenge, we contribute an approximation operator for backdoor adjustment which can be easily plugged into most recommender models. Lastly, we devise an inference strategy to dynamically regulate backdoor adjustment according to user status. We instantiate DecRS on two representative models FM [32] and NFM [16], and conduct extensive experiments over two benchmarks to validate the superiority of our proposed DecRS.

Self-supervised Heterogeneous Graph Neural Network with Co-contrastive Learning

Xiao Wang,Nian Liu,Hui Han,Chuan Shi

Heterogeneous graph neural networks (HGNNs) as an emerging technique have shown superior capacity of dealing with heterogeneous information network (HIN). However, most HGNNs follow a semi-supervised learning manner, which notably limits their wide use in reality since labels are usually scarce in real applications. Recently, contrastive learning, a self-supervised method, becomes one of the most exciting learning paradigms and shows great potential when there are no labels. In this paper, we study the problem of self-supervised HGNNs and propose a novel co-contrastive learning mechanism for HGNNs, named HeCo. Different from traditional contrastive learning which only focuses on contrasting positive and negative samples, HeCo employs cross-view contrastive mechanism. Specifically, two views of a HIN (network schema and meta-path views) are proposed to learn node embeddings, so as to capture both of local and high-order structures simultaneously. Then the cross-view contrastive learning, as well as a view mask mechanism, is proposed, which is able to extract the positive and negative embeddings from two views. This enables the two views to collaboratively supervise each other and finally learn high-level node embeddings. Moreover, two extensions of HeCo are designed to generate harder negative samples with high quality, which further boosts the performance of HeCo. Extensive experiments conducted on a variety of real-world networks show the superior performance of the proposed methods over the state-of-the-arts.

Meta Self-training for Few-shot Neural Sequence Labeling

Yaqing Wang,Subhabrata Mukherjee,Haoda Chu,Yuancheng Tu,Ming Wu,Jing Gao,Ahmed Hassan Awadallah

Neural sequence labeling is widely adopted for many Natural Language Processing (NLP) tasks, such as Named Entity Recognition (NER) and slot tagging for dialog systems and semantic parsing. Recent advances with large-scale pre-trained language models have shown remarkable success in these tasks when fine-tuned on large amounts of task-specific labeled data. However, obtaining such large-scale labeled training data is not only costly, but also may not be feasible in many sensitive user applications due to data access and privacy constraints. This is exacerbated for sequence labeling tasks requiring such annotations at token-level. In this work, we develop techniques to address the label scarcity challenge for neural sequence labeling models. Specifically, we propose a meta self-training framework which leverages very few manually annotated labels for training neural sequence models. While self-training serves as an effective mechanism to learn from large amounts of unlabeled data via iterative knowledge exchange -- meta-learning helps in adaptive sample re-weighting to mitigate error propagation from noisy pseudo-labels. Extensive experiments on six benchmark datasets including two for massive multilingual NER and four slot tagging datasets for task-oriented dialog systems demonstrate the effectiveness of our method. With only 10 labeled examples for each class in each task, the proposed method achieves 10% improvement over state-of-the-art methods demonstrating its effectiveness for limited training labels regime.

Understanding and Improving Fairness-Accuracy Trade-offs in Multi-Task Learning

Yuyan Wang,Xuezhi Wang,Alex Beutel,Flavien Prost,Jilin Chen,Ed H. Chi

As multi-task models gain popularity in a wider range of machine learning applications, it is becoming increasingly important for practitioners to understand the fairness implications associated with those models. Most existing fairness literature focuses on learning a single task more fairly, while how ML fairness interacts with multiple tasks in the joint learning setting is largely under-explored. In this paper, we are concerned with how group fairness (e.g., equal opportunity, equalized odds) as an ML fairness concept plays out in the multi-task scenario. In multi-task learning, several tasks are learned jointly to exploit task correlations for a more efficient inductive transfer. This presents a multi-dimensional Pareto frontier on (1) the trade-off between group fairness and accuracy with respect to each task, as well as (2) the trade-offs across multiple tasks. We aim to provide a deeper understanding on how group fairness interacts with accuracy in multi-task learning, and we show that traditional approaches that mainly focus on optimizing the Pareto frontier of multi-task accuracy might not perform well on fairness goals. We propose a new set of metrics to better capture the multi-dimensional Pareto frontier of fairness-accuracy trade-offs uniquely presented in a multi-task learning setting. We further propose a Multi-Task-Aware Fairness (MTA-F) approach to improve fairness in multi-task learning. Experiments on several real-world datasets demonstrate the effectiveness of our proposed approach.

Error-Bounded Online Trajectory Simplification with Multi-Agent Reinforcement Learning

Zheng Wang,Cheng Long,Gao Cong,Qianru Zhang

Trajectory data has been widely used in various applications, including taxi services, traffic management, mobility analysis, etc. It is usually collected at a sensors side in real time and corresponds to a sequence of sampled points. Constrained by the storage and/or network bandwidth of a sensor, it is common to simplify raw trajectory data when it is collected by dropping some sampled points. Many algorithms have been proposed for the error-bounded online trajectory simplification (EB-OTS) problem, which is to drop as many points as possible subject to that the error is bounded by an error tolerance. Nevertheless, these existing algorithms rely on pre-defined rules for decision making during the trajectory simplification process and there is no theoretical ground supporting their effectiveness. In this paper, we propose a multi-agent reinforcement learning method called MARL4TS for EB-OTS. MARL4TS involves two agents for different decision making problems during the trajectory simplification processes. Besides, MARL4TS has its objective equivalent to that of the EB-OTS problem, which provides some theoretical ground of its effectiveness. We conduct extensive experiments on real-world trajectory datasets, which verify that MARL4TS outperforms all existing algorithms in effectiveness and provides competitive efficiency.

TUTA: Tree-based Transformers for Generally Structured Table Pre-training

Zhiruo Wang,Haoyu Dong,Ran Jia,Jia Li,Zhiyi Fu,Shi Han,Dongmei Zhang

We propose TUTA, a unified pre-training architecture for understanding generally structured tables. Noticing that understanding a table requires spatial, hierarchical, and semantic information, we enhance transformers with three novel structure-aware mechanisms. First, we devise a unified tree-based structure, called a bi-dimensional coordinate tree, to describe both the spatial and hierarchical information of generally structured tables. Upon this, we propose tree-based attention and position embedding to better capture the spatial and hierarchical information. Moreover, we devise three progressive pre-training objectives to enable representations at the token, cell, and table levels. We pre-train TUTA on a wide range of unlabeled web and spreadsheet tables and fine-tune it on two critical tasks in the field of table structure understanding: cell type classification and table type classification. Experiments show that TUTA is highly effective, achieving state-of-the-art on five widely-studied datasets.

Probabilistic Label Tree for Streaming Multi-Label Learning

Tong Wei,Jiang-Xin Shi,Yu-Feng Li

Multi-label learning aims to predict a subset of relevant labels for each instance, which has many real-world applications. Most extant multi-label learning studies focus on a fixed size of label space. However, in many cases, the environment is open and changes gradually and new labels emerge, which is coined as streaming multi-label learning (SMLL). SMLL poses great challenges in twofolds: (1) the target output space expands dynamically; (2) new labels emerge frequently and can reach a significantly large number. Previous attempts on SMLL leverage label correlations between past and emerging labels to improve the performance, while they are inefficient when deal with large-scale problems. To cope with this challenge, in this paper, we present a new learning framework, i.e., the probabilistic streaming label tree(Pslt). In particular, each non-leaf node of the tree corresponding to a subset of labels, and a binary classifier is learned at each leaf node. Initially, Pslt is learned on partially observed labels, both tree structure and node classifiers are updated while new labels emerge. Using carefully designed updating mechanism, Psltcan seamlessly incorporate new labels by first passing them down from the root to leaf nodes and then update node classifiers accordingly. We provide theoretical bounds for the iteration complexity of tree update procedure and the estimation error on newly arrived labels. Experiments show that the proposed approach improves the performance in comparison with eleven baselines in terms of multiple evaluation metrics. The source code is available at https://gitee.com/pslt-kdd2021/pslt.

Towards Robust Prediction on Tail Labels

Tong Wei,Wei-Wei Tu,Yu-Feng Li,Guo-Ping Yang

Extreme multi-label learning (XML) works to annotate objects with relevant labels from an extremely large label set. Many previous methods treat labels uniformly such that the learned model tends to perform better on head labels, while the performance is severely deteriorated for tail labels. However, it is often desirable to predict more tail labels in many real-world applications. To alleviate this problem, in this work, we show theoretical and experimental evidence for the inferior performance of representative XML methods on tail labels. Our finding is that the norm of label classifier weights typically follows a long-tailed distribution similar to the label frequency, which results in the over-suppression of tail labels. Base on this new finding, we present two new modules: (1)ReRank works to re-rank the predicted score, which significantly improves the performance on tail labels by eliminating the effect of label-priors; (2)Taug augments tail labels via a decoupled learning scheme, which can yield more balanced classification boundary. We conduct experiments on commonly used XML benchmarks with hundreds of thousands of labels, showing that the proposed methods improve the performance of many state-of-the-art XML models by a considerable margin (6% performance gain with respect to PSP@1 on average). Anonymous source code is available at https://github.com/ReRANK-XML/rerank-XML.

Enhancing SVMs with Problem Context Aware Pipeline

Zeyi Wen,Zhishang Zhou,Hanfeng Liu,Bingsheng He,Xia Li,Jian Chen

In recent years, many data mining practitioners have treated deep neural networks (DNNs) as a standard recipe of creating the state-of-the-art solutions. As a result, models like Support Vector Machines (SVMs) have been overlooked. While the results from DNNs are encouraging, DNNs also come with their huge number of parameters in the model and overheads in long training/inference time. SVMs have excellent properties such as convexity, good generality and efficiency. In this paper, we propose techniques to enhance SVMs with an automatic pipeline which exploits the context of the learning problem. The pipeline consists of several components including data aware subproblem construction, feature customization, data balancing among subproblems with augmentation, and kernel hyper-parameter tuner. Comprehensive experiments show that our proposed solution is more efficient, while producing better results than the other SVM based approaches. Additionally, we conduct a case study of our proposed solution on a popular sentiment analysis problem---the aspect term sentiment analysis (ATSA) task. The study shows that our SVM based solution can achieve competitive predictive accuracy to DNN (and even majority of the BERT) based approaches. Furthermore, our solution is about 40 times faster in inference and has 100 times fewer parameters than the models using BERT. Our findings can encourage more research work on conventional machine learning techniques which may be a good alternative for smaller model size and faster training/inference.

Triple Adversarial Learning for Influence based Poisoning Attack in Recommender Systems

Chenwang Wu,Defu Lian,Yong Ge,Zhihao Zhu,Enhong Chen

As an important means to solve information overload, recommender systems have been widely applied in many fields, such as e-commerce and advertising. However, recent studies have shown that recommender systems are vulnerable to poisoning attacks; that is, injecting a group of carefully designed user profiles into the recommender system can severely affect recommendation quality. Despite the development from shilling attacks to optimization-based attacks, the imperceptibility and harmfulness of the generated data in most attacks are arduous to balance. To this end, we propose a triple adversarial learning for influence based poisoning attack (TrialAttack), a flexible end-to-end poisoning framework to generate non-notable and harmful user profiles. Specifically, given the input noise, TrialAttack directly generates malicious users through triple adversarial learning of the generator, discriminator, and influence module. Besides, to provide reliable influence for TrialAttack training, we explore a new approximation approach for estimating each fake users influence. Through theoretical analysis, we prove that the distribution characterized by TrialAttack approximates to the rating distribution of real users under the premise of performing an efficient attack. This property allows the injected users to attack in an unremarkable way. Experiments on three real-world datasets show that TrialAttacks attack performance outperforms state-of-the-art attacks, and the generated fake profiles are more difficult to detect compared to baselines.

Quantifying Uncertainty in Deep Spatiotemporal Forecasting

Dongxia Wu,Liyao Gao,Matteo Chinazzi,Xinyue Xiong,Alessandro Vespignani,Yi-An Ma,Rose Yu

Deep learning is gaining increasing popularity for spatiotemporal forecasting. However, prior works have mostly focused on point estimates without quantifying the uncertainty of the predictions. In high stakes domains, being able to generate probabilistic forecasts with confidence intervals is critical to risk assessment and decision making. Hence, a systematic study of uncertainty quantification (UQ) methods for spatiotemporal forecasting is missing in the community. In this paper, we describe two types of spatiotemporal forecasting problems: regular grid-based and graph-based. Then we analyze UQ methods from both the Bayesian and the frequentist point of view, casting in a unified framework via statistical decision theory. Through extensive experiments on real-world road network traffic, epidemics, and air quality forecasting tasks, we reveal the statistical and computational trade-offs for different UQ methods: Bayesian methods are typically more robust in mean prediction, while confidence levels obtained from frequentist methods provide more extensive coverage over data variations. Computationally, quantile regression type methods are cheaper for a single confidence interval but require re-training for different intervals. Sampling based methods generate samples that can form multiple confidence intervals, albeit at a higher computational cost.

Indirect Invisible Poisoning Attacks on Domain Adaptation

Jun Wu,Jingrui He

Unsupervised domain adaptation has been successfully applied across multiple high-impact applications, since it improves the generalization performance of a learning algorithm when the source and target domains are related. However, the adversarial vulnerability of domain adaptation models has largely been neglected. Most existing unsupervised domain adaptation algorithms might be easily fooled by an adversary, resulting in deteriorated prediction performance on the target domain, when transferring the knowledge from a maliciously manipulated source domain.To demonstrate the adversarial vulnerability of existing domain adaptation techniques, in this paper, we propose a generic data poisoning attack framework named I2Attack for domain adaptation with the following properties: (1) perceptibly unnoticeable: all the poisoned inputs are natural-looking; (2)adversarially indirect: only source examples are maliciously manipulated; (3) algorithmically invisible: both source classification error and marginal domain discrepancy between source and target domains will not increase. Specifically, it aims to degrade the overall prediction performance on the target domain by maximizing the label-informed domain discrepancy over both input feature space and class-label space be-tween source and target domains. Within this framework, a family of practical poisoning attacks are presented to fool the existing domain adaptation algorithms associated with different discrepancy measures. Extensive experiments on various domain adaptation benchmarks confirm the effectiveness and computational efficiency of our proposed I2Attack framework.

MapEmbed: Perfect Hashing with High Load Factor and Fast Update

Yuhan Wu,Zirui Liu,Xiang Yu,Jie Gui,Haochen Gan,Yuhao Han,Tao Li,Ori Rottenstreich,Tong Yang

Perfect hashing is a hash function that maps a set of distinct keys to a set of continuous integers without collision. However,most existing perfect hash schemes are static, which means that they cannot support incremental updates, while most datasets in practice are dynamic. To address this issue, we propose a novel hashing scheme, namely MapEmbed Hashing. Inspired by divide-and-conquer and map-and-reduce, our key idea is named map-and-embed and includes two phases: 1) Map all keys into many small virtual tables; 2) Embed all small tables into a large table by circular move. Our experimental results show that under the same experimental setting, the state-of-the-art perfect hashing (dynamic perfect hashing) can achieve around 15% load factor, around 0.3 Mops update speed, while our MapEmbed achieves around 90% ~ 95% load factor, and around 8.0 Mops update speed per thread. All codes of ours and other algorithms are open-sourced at GitHub.

Geometric Graph Representation Learning on Protein Structure Prediction

Tian Xia,Wei-Shinn Ku

Determining a proteins 3D from its sequences is one of the most challenging problems in biology. Recently, geometric deep learning has achieved great success on non-Euclidean domains including social networks, chemistry, and computer graphics. Although it is natural to present protein structures as 3D graphs, existing research has rarely studied protein structures as graphs directly. The present research explores the geometry deep learning of three-dimensional graphs on protein structures and proposes a graph neural network architecture to address these challenges. The proposed Protein Geometric Graph Neural Network (PG-GNN) models both distance geometric graph representation and dihedral geometric graph representation by geometric graph convolutions. This research shed new light on protein 3D structure studies. We investigated the effectiveness of graph neural networks over five real datasets. Our results demonstrate the potential of GNNs for 3D structure prediction.

Forecasting Interaction Order on Temporal Graphs

Wenwen Xia,Yuchen Li,Jianwei Tian,Shenghong Li

Link prediction is a fundamental task for graph analysis and the topic has been studied extensively for static or dynamic graphs. Essentially, the link prediction is formulated as a binary classification problem about two nodes. However, for temporal graphs, links (or interactions) among node sets appear in sequential orders. And the orders may lead to interesting applications. While a binary link prediction formulation fails to handle such an order-sensitive case. In this paper, we focus on such an interaction order prediction problem among a given node set on temporal graphs. For the technical aspect, we develop a graph neural network model named Temporal ATtention network (TAT), which utilizes the fine-grained time information on temporal graphs by encoding continuous real-valued timestamps as vectors. For each transformation layer of the model, we devise an attention mechanism to aggregate neighborhoods information based on their representations and time encodings attached to their specific edges. We also propose a novel training scheme to address the permutation-sensitive property of the problem. Experiments on several real-world temporal graphs reveal that TAT outperforms some state-of-the-art graph neural networks by 55% on average under the AUC metric.

Learning How to Propagate Messages in Graph Neural Networks

Teng Xiao,Zhengyu Chen,Donglin Wang,Suhang Wang

This paper studies the problem of learning message propagation strategies for graph neural networks (GNNs). One of the challenges for graph neural networks is that of defining the propagation strategy. For instance, the choices of propagation steps are often specialized to a single graph and are not personalized to different nodes. To compensate for this, in this paper, we present learning to propagate, a general learning framework that not only learns the GNN parameters for prediction but more importantly, can explicitly learn the interpretable and personalized propagate strategies for different nodes and various types of graphs. We introduce the optimal propagation steps as latent variables to help find the maximum-likelihood estimation of the GNN parameters in a variational Expectation- Maximization (VEM) framework. Extensive experiments on various types of graph benchmarks demonstrate that our proposed frame- work can significantly achieve better performance compared with the state-of-the-art methods, and can effectively learn personalized and interpretable propagate strategies of messages in GNNs.

Partial Multi-Label Learning with Meta Disambiguation

Ming-Kun Xie,Feng Sun,Sheng-Jun Huang

In partial multi-label learning (PML) problems, each instance is partially annotated with a candidate label set, which consists of multiple relevant labels and some noisy labels. To solve PML problems, existing methods typically try to recover the ground-truth information from partial annotations based on extra assumptions on the data structures. While the assumptions hardly hold in real-world applications, the trained model may not generalize well to varied PML tasks. In this paper, we propose a novel approach for partial multi-label learning with meta disambiguation (PML-MD). Instead of relying on extra assumptions, we try to disambiguate between ground-truth and noisy labels in a meta-learning fashion. On one hand, the multi-label classifier is trained by minimizing a confidence-weighted ranking loss, which distinctively utilizes the supervised information according to the label quality; on the other hand, the confidence for each candidate label is adaptively estimated with its performance on a small validation set. To speed up the optimization, these two procedures are performed alternately with an online approximation strategy. Comprehensive experiments on multiple datasets and varied evaluation metrics validate the effectiveness of the proposed method.

NAS-BERT: Task-Agnostic and Adaptive-Size BERT Compression with Neural Architecture Search

Jin Xu,Xu Tan,Renqian Luo,Kaitao Song,Jian Li,Tao Qin,Tie-Yan Liu

While pre-trained language models (e.g., BERT) have achieved impressive results on different natural language processing tasks, they have large numbers of parameters and suffer from big computational and memory costs, which make them difficult for real-world deployment. Therefore, model compression is necessary to reduce the computation and memory cost of pre-trained models. In this work, we aim to compress BERT and address the following two challenging practical issues: (1) The compression algorithm should be able to output multiple compressed models with different sizes and latencies, in order to support devices with different memory and latency limitations; (2) The algorithm should be downstream task agnostic, so that the compressed models are generally applicable for different downstream tasks. We leverage techniques in neural architecture search (NAS) and propose NAS-BERT, an efficient method for BERT compression. NAS-BERT trains a big supernet on a carefully designed search space containing a variety of architectures and outputs multiple compressed models with adaptive sizes and latency. Furthermore, the training of NAS-BERT is conducted on standard self-supervised pre-training tasks (e.g., masked language model) and does not depend on specific downstream tasks. Thus, the compressed models can be used across various downstream tasks. The technical challenge of NAS-BERT is that training a big supernet on the pre-training task is extremely costly. We employ several techniques including block-wise search, search space pruning, and performance approximation to improve search efficiency and accuracy. Extensive experiments on GLUE and SQuAD benchmark datasets demonstrate that NAS-BERT can find lightweight models with better accuracy than previous approaches, and can be directly applied to different downstream tasks with adaptive model sizes for different requirements of memory or latency.

Exploring Self-Supervised Representation Ensembles for COVID-19 Cough Classification

Hao Xue,Flora D. Salim

The usage of smartphone-collected respiratory sound, trained with deep learning models, for detecting and classifying COVID-19 becomes popular recently. It removes the need for in-person testing procedures especially for rural regions where related medical supplies, experienced workers, and equipment are limited. However, existing sound-based diagnostic approaches are trained in a fully-supervised manner, which requires large scale well-labelled data. It is critical to discover new methods to leverage unlabelled respiratory data, which can be obtained more easily. In this paper, we propose a novel self-supervised learning enabled framework for COVID-19 cough classification. A contrastive pre-training phase is introduced to train a Transformer-based feature encoder with unlabelled data. Specifically, we design a random masking mechanism to learn robust representations of respiratory sounds. The pre-trained feature encoder is then fine-tuned in the downstream phase to perform cough classification. In addition, different ensembles with varied random masking rates are also explored in the downstream phase. Through extensive evaluations, we demonstrate that the proposed contrastive pre-training, the random masking mechanism, and the ensemble architecture contribute to improving cough classification performance.

MTC: Multiresolution Tensor Completion from Partial and Coarse Observations

Chaoqi Yang,Navjot Singh,Cao Xiao,Cheng Qian,Edgar Solomonik,Jimeng Sun

Existing tensor completion formulation mostly relies on partial observations from a single tensor. However, tensors extracted from real-world data often are more complex due to: (i) Partial observation: Only a small subset of tensor elements are available. (ii) Coarse observation: Some tensor modes only present coarse and aggregated patterns (e.g., monthly summary instead of daily reports). In this paper, we are given a subset of the tensor and some aggregated/coarse observations (along one or more modes) and seek to recover the original fine-granular tensor with low-rank factorization. We formulate a coupled tensor completion problem and propose an efficient Multi-resolution Tensor Completion model (MTC) to solve the problem. Our MTC model explores tensor mode properties and leverages the hierarchy of resolutions to recursively initialize an optimization setup, and optimizes on the coupled system using alternating least squares. MTC ensures low computational and space complexity. We evaluate our model on two COVID-19 related spatio-temporal tensors. The experiments show that MTC could provide 65.20% and 75.79% percentage of fitness (PoF) in tensor completion with only 5% fine granular observations, which is 27.96% relative improvement over the best baseline. To evaluate the learned low-rank factors, we also design a tensor prediction task for daily and cumulative disease case predictions, where MTC achieves 50% in PoF and 30% relative improvements over the best baseline.

Numerical Formula Recognition from Tables

Qingping Yang,Yixuan Cao,Hongwei Li,Ping Luo

Claims over the numerical relationships among some measures are commonly expressed in tabular forms, and widely exist in the published documents on the Web. This paper introduces the problem of numerical formula recognition from tables, namely recognizing all numerical formulas inside a given table. It can well support many interesting downstream applications, such as numerical error correction in tables, formula recommendation in tables. Here, we emphasize that table is a kind of language that adopts a different linguistic paradigm from natural language. It uses visual grammar like visual layout and visual settings (e.g., indentation, font style) to express the grammatical relationships among the table cells. Understanding tables and recognizing formulas require decoding the visual grammar while simultaneously understanding the textual information. Another challenge is that formulas are complicated in terms of diverse math functions and variable-length of arguments. To address these challenges, we convert this task into a uniform framework, extracting relations of table cell pairs in a table. A two-channel neural network model TaFor is proposed to embed both the textual and visual features for a table cell. Our framework achieves the formula-level F1-score = 0.90 on a real-world dataset of 190179 tables while a retrieval-based method achieves F1-score = 0.72. We also perform extensive experiments to demonstrate the effectiveness of each component in our model, and conduct a case study to discuss the limits of the proposed model. With our published data this study also aims to attract the communitys interest in deep semantic understanding over tables.

TopNet: Learning from Neural Topic Model to Generate Long Stories

Yazheng Yang,Boyuan Pan,Deng Cai,Huan Sun

Long story generation (LSG) is one of the coveted goals in natural language processing. Different from most text generation tasks, LSG requires to output a long story of rich content based on a much shorter text input, and often suffers from information sparsity. In this paper, we propose TopNet to alleviate this problem, by leveraging the recent advances in neural topic modeling to obtain high-quality skeleton words to complement the short input. In particular, instead of directly generating a story, we first learn to map the short text input to a low-dimensional topic distribution (which is pre-assigned by a topic model). Based on this latent topic distribution, we can use the reconstruction decoder of the topic model to sample a sequence of inter-related words as a skeleton for the story. Experiments on two benchmark datasets show that our proposed framework is highly effective in skeleton word selection and significantly outperforms the state-of-the-art models in both automatic evaluation and human evaluation.

Context-aware Outstanding Fact Mining from Knowledge Graphs

Yueji Yang,Yuchen Li,Panagiotis Karras,Anthony K. H. Tung

An Outstanding Fact (OF) is an attribute that makes a target entity stand out from its peers. The mining of OFs has important applications, especially in Computational Journalism, such as news promotion, fact-checking, and news story finding. However, existing approaches to OF mining: (i) disregard the context in which the target entity appears, hence may report facts irrelevant to that context; and (ii) require relational data, which are often unavailable or incomplete in many application domains. In this paper, we introduce the novel problem of mining Context-aware Outstanding Facts (COFs) for a target entity under a given context specified by a context entity. We propose FMiner, a context-aware mining framework that leverages knowledge graphs (KGs) for COF mining. FMiner generates COFs in two steps. First, it discovers top-k relevant relationships between the target and the context entity from a KG. We propose novel optimizations and pruning techniques to expedite this operation, as this process is very expensive on large KGs due to its exponential complexity. Second, for each derived relationship, we find the attributes of the target entity that distinguish it from peer entities that have the same relationship with the context entity, yielding the top-l COFs. As such, the mining process is modeled as a top-(k,l) search problem. Context-awareness is ensured by relying on the relevant relationships with the context entity to derive peer entities for COF extraction. Consequently, FMiner can effectively navigate the search to obtain context-aware OFs by incorporating a context entity. We conduct extensive experiments, including a user study, to validate the efficiency and the effectiveness of FMiner.

Defending Privacy Against More Knowledgeable Membership Inference Attackers

Yu Yin,Ke Chen,Lidan Shou,Gang Chen

Membership Inference Attack (MIA) in deep learning is a common form of privacy attack which aims to infer whether a data sample is in a target classifiers training dataset or not. Previous studies of MIA typically tackle either a black-box or a white-box adversary model, assuming an attacker not knowing (or knowing) the structure and parameters of the target classifier while having access to the confidence vector of the query output. With the popularity of privacy protection methods such as differential privacy, it is increasingly easier for an attacker to obtain the defense method adopted by the target classifier, which poses extra challenge to privacy protection. In this paper, we name such attacker a crystal-box adversary. We present definitions for utility and privacy of target classifier, and formulate the design goal of the defense method as an optimization problem. We also conduct theoretical analysis on the respective forms of the optimization for three adversary models, namely black-box, white-box, and crystal-box, and prove that the optimization problem is NP-hard. Thereby we solve a surrogate problem and propose three defense methods, which, if used together, can make trade-off between utility and privacy. A notable advantage of our approach is that it can be used to resist attacks from three adversary models, namely black-box, white-box, and crystal-box, simultaneously. Evaluation results show effectiveness of our proposed approach for defending privacy against MIA and better performance compared to previous defense methods.

Performance-Adaptive Sampling Strategy Towards Fast and Accurate Graph Neural Networks

Minji Yoon,Thu00e9ophile Gervet,Baoxu Shi,Sufeng Niu,Qi He,Jaewon Yang

The main challenge of adapting Graph convolutional networks (GCNs) to large-scale graphs is the scalability issue due to the uncontrollable neighborhood expansion in the aggregation stage. Several sampling algorithms have been proposed to limit the neighborhood expansion. However, these algorithms focus on minimizing the variance in sampling to approximate the original aggregation. This leads to two critical problems: 1) low accuracy because the sampling policy is agnostic to the performance of the target task, and 2) vulnerability to noise or adversarial attacks on the graph.In this paper, we propose a performance-adaptive sampling strategy PASS that samples neighbors informative for a target task. PASS optimizes directly towards task performance, as opposed to variance reduction. PASS trains a sampling policy by propagating gradients of the task performance loss through GCNs and the non-differentiable sampling operation. We dissect the back-propagation process and analyze how PASS learns from the gradients which neighbors are informative and assigned high sampling probabilities. In our extensive experiments, PASS outperforms state-of-the-art sampling methods by up to 10% accuracy on public benchmarks and up to 53% accuracy in the presence of adversarial attacks.

Extremely Compact Non-local Representation Learning

Ansheng You,Xiangzeng Zhou,Yingya Zhang,Pan Pan,Yinghui Xu

In contrast to regular convolutions with local receptive fields, non-local operations have widely proven an effective method for modeling long-range dependencies. Although lots of prior works have been proposed, prohibitive computation and GPU memory occupation are still the major concerns. Different from that carrying out non-local operations pixel-wise or channel-wise in a computation intensive way, we argue that we can achieve effective non-local operation using a more compact high-order statistic, which can be computed more efficiently and may convey some high-level information. In this paper, we propose an extremely compact non-local learning module (CoNL) with high-order reasoning based on a graph convolution as the core. In our CoNL, a global Hadamard pooling (GHP) as a non-local operation is used to extract a compact second-order feature vector from the input tensor. With the help of a light-weight graph convolution network (GCN), this high-order compact vector is further refined with high-level reasoning. After the GCN refinement, the compact high-order vector intuitively indicates some global semantic characteristics, and is eventually applied to enhance the input tensor through a channel scaling operation. The CoNL module is designed easily pluggable to upgrade existing networks. Extensive experiments on a wide range of tasks demonstrate the effectiveness and efficiency of our work. The proposed CoNL can achieve comparable or superior performance over previous state-of-the-art baselines on video recognition, semantic segmentation, object detection and instance segmentation tasks. For a 96 x 96 x 2048 input, our block consumes 13.6 x less in computational cost than non-local block while 7.6 x smaller in GPU memory occupation.

Fed2: Feature-Aligned Federated Learning

Fuxun Yu,Weishan Zhang,Zhuwei Qin,Zirui Xu,Di Wang,Chenchen Liu,Zhi Tian,Xiang Chen

Federated learning learns from scattered data by fusing collaborative models from local nodes. However, conventional coordinate-based model averaging by FedAvg ignored the random information encoded per parameter and may suffer from structural feature misalignment. In this work, we propose Fed2, a feature-aligned federated learning framework to resolve this issue by establishing a firm structure-feature alignment across the collaborative models. Fed2 is composed of two major designs: First, we design a feature-oriented model structure adaptation method to ensure explicit feature allocation in different neural network structures. Applying the structure adaptation to collaborative models, matchable structures with similar feature information can be initialized at the very early training stage. During the federated learning process, we then propose a feature paired averaging scheme to guarantee aligned feature distribution and maintain no feature fusion conflicts under either IID or non-IID scenarios. Eventually, Fed2 could effectively enhance the federated learning convergence performance under extensive homo- and heterogeneous settings, providing excellent convergence speed, accuracy, and computation/communication efficiency.

Socially-Aware Self-Supervised Tri-Training for Recommendation

Junliang Yu,Hongzhi Yin,Min Gao,Xin Xia,Xiangliang Zhang,Nguyen Quoc Viet Hung

Self-supervised learning (SSL), which can automatically generate ground-truth samples from raw data, holds vast potential to improve recommender systems. Most existing SSL-based methods perturb the raw data graph with uniform node/edge dropout to generate new data views and then conduct the self-discrimination based contrastive learning over different views to learn generalizable representations. Under this scheme, only a bijective mapping is built between nodes in two different views, which means that the self-supervision signals from other nodes are being neglected. Due to the widely observed homophily in recommender systems, we argue that the supervisory signals from other nodes are also highly likely to benefit the representation learning for recommendation. To capture these signals, a general socially-aware SSL framework that integrates tri-training is proposed in this paper. Technically, our framework first augments the user data views with the user social information. And then under the regime of tri-training for multi-view encoding, the framework builds three graph encoders (one for recommendation) upon the augmented views and iteratively improves each encoder with self-supervision signals from other users, generated by the other two encoders. Since the tri-training operates on the augmented views of the same data sources for self-supervision signals, we name it self-supervised tri-training. Extensive experiments on multiple real-world datasets consistently validate the effectiveness of the self-supervised tri-training framework for improving recommendation. The code is released at https://github.com/Coder-Yu/QRec.

Efficient Optimization Methods for Extreme Similarity Learning with Nonlinear Embeddings

Bowen Yuan,Yu-Sheng Li,Pengrui Quan,Chih-Jen Lin

We study the problem of learning similarity by using nonlinear embedding models (e.g., neural networks) from all possible pairs. This problem is well-known for its difficulty of training with the extreme number of pairs. For the special case of using linear embeddings, many studies have addressed this issue of handling all pairs by considering certain loss functions and developing efficient optimization algorithms. This paper aims to extend results for general nonlinear embeddings. First, we finish detailed derivations and provide clean formulations for efficiently calculating some building blocks of optimization algorithms such as function, gradient evaluation, and Hessian-vector product. The result enables the use of many optimization methods for extreme similarity learning with nonlinear embeddings. Second, we study some optimization methods in detail. Due to the use of nonlinear embeddings, implementation issues different from linear cases are addressed. In the end, some methods are shown to be highly efficient for extreme similarity learning with nonlinear embeddings.

Enhancing Taxonomy Completion with Concept Generation via Fusing Relational Representations

Qingkai Zeng,Jinfeng Lin,Wenhao Yu,Jane Cleland-Huang,Meng Jiang

Automatic construction of a taxonomy supports many applications in e-commerce, web search, and question answering. Existing taxonomy expansion or completion methods assume that new concepts have been accurately extracted and their embedding vectors learned from the text corpus. However, one critical and fundamental challenge in fixing the incompleteness of taxonomies is the incompleteness of the extracted concepts, especially for those whose names have multiple words and consequently low frequency in the corpus. To resolve the limitations of extraction-based methods, we propose GenTaxo to enhance taxonomy completion by identifying positions in existing taxonomies that need new concepts and then generating appropriate concept names. Instead of relying on the corpus for concept embeddings, GenTaxo learns the contextual embeddings from their surrounding graph-based and language-based relational information, and leverages the corpus for pre-training a concept name generator. Experimental results demonstrate that GenTaxo improves the completeness of taxonomies over existing methods.

A Transformer-based Framework for Multivariate Time Series Representation Learning

George Zerveas,Srideepika Jayaraman,Dhaval Patel,Anuradha Bhamidipaty,Carsten Eickhoff

We present a novel framework for multivariate time series representation learning based on the transformer encoder architecture. The framework includes an unsupervised pre-training scheme, which can offer substantial performance benefits over fully supervised learning on downstream tasks, both with but even without leveraging additional unlabeled data, i.e., by reusing the existing data samples. Evaluating our framework on several public multivariate time series datasets from various domains and with diverse characteristics, we demonstrate that it performs significantly better than the best currently available methods for regression and classification, even for datasets which consist of only a few hundred training samples. Given the pronounced interest in unsupervised learning for nearly all domains in the sciences and in industry, these findings represent an important landmark, presenting the first unsupervised method shown to push the limits of state-of-the-art performance for multivariate time series regression and classification.

Off-Policy Evaluation via Adaptive Weighting with Data from Contextual Bandits

Ruohan Zhan,Vitor Hadad,David A. Hirshberg,Susan Athey

It has become increasingly common for data to be collected adaptively, for example using contextual bandits. Historical data of this type can be used to evaluate other treatment assignment policies to guide future innovation or experiments. However, policy evaluation is challenging if the target policy differs from the one used to collect data, and popular estimators, including doubly robust (DR) estimators, can be plagued by bias, excessive variance, or both. In particular, when the pattern of treatment assignment in the collected data looks little like the pattern generated by the policy to be evaluated, the importance weights used in DR estimators explode, leading to excessive variance.In this paper, we improve the DR estimator by adaptively weighting observations to control its variance. We show that a t-statistic based on our improved estimator is asymptotically normal under certain conditions, allowing us to form confidence intervals and test hypotheses. Using synthetic data and public benchmarks, we provide empirical evidence for our estimators improved accuracy and inferential properties relative to existing alternatives.

Efficient Incremental Computation of Aggregations over Sliding Windows

Chao Zhang,Reza Akbarinia,Farouk Toumani

Computing aggregation over sliding windows, i.e., finite subsets of an unbounded stream, is a core operation in streaming analytics. We propose PBA (Parallel Boundary Aggregator), a novel parallel algorithm that groups continuous slices of streaming values into chunks and exploits two buffers, cumulative slice aggregations and left cumulative slice aggregations, to compute sliding window aggregations efficiently. PBA runs in O(1) time, performing at most 3 merging operations per slide while consuming O(n) space for windows with n partial aggregations. Our empirical experiments demonstrate that PBA can improve throughput up to 4X while reducing latency, compared to state-of-the-art algorithms.

Data Poisoning Attack against Recommender System Using Incomplete and Perturbed Data

Hengtong Zhang,Changxin Tian,Yaliang Li,Lu Su,Nan Yang,Wayne Xin Zhao,Jing Gao

Recent studies reveal that recommender systems are vulnerable to data poisoning attack due to their openness nature. In data poisoning attack, the attacker typically recruits a group of controlled users to inject well-crafted user-item interaction data into the recommendation models training set to modify the model parameters as desired. Thus, existing attack approaches usually require full access to the training data to infer items characteristics and craft the fake interactions for controlled users. However, such attack approaches may not be feasible in practice due to the attackers limited data collection capability and the restricted access to the training data, which sometimes are even perturbed by the privacy preserving mechanism of the service providers. Such design-reality gap may cause failure of attacks. In this paper, we fill the gap by proposing two novel adversarial attack approaches to handle the incompleteness and perturbations in user-item interaction data. First, we propose a bi-level optimization framework that incorporates a probabilistic generative model to find the users and items whose interaction data is sufficient and has not been significantly perturbed, and leverage these users and items data to craft fake user-item interactions. Moreover, we reverse the learning process of recommendation models and develop a simple yet effective approach that can incorporate context-specific heuristic rules to handle data incompleteness and perturbations. Extensive experiments on two datasets against three representative recommendation models show that the proposed approaches can achieve better attack performance than existing approaches.

Data Poisoning Attacks Against Outcome Interpretations of Predictive Models

Hengtong Zhang,Jing Gao,Lu Su

The past decades have witnessed significant progress towards improving the accuracy of predictions powered by complex machine learning models. Despite much success, the lack of model interpretability prevents the usage of these techniques in life-critical systems such as medical diagnosis and self-driving systems. Recently, the interpretability issue has received much attention, and one critical task is to explain why a predictive model makes a specific decision. We refer to this task as outcome interpretation. Many outcome interpretation methods have been developed to produce human-understandable interpretations by utilizing intermediate results of the machine learning models, such as gradients and model parameters.Although the effectiveness of outcome interpretation approaches has been shown in a benign environment, their robustness against data poisoning attacks (i.e., attacks at the training phase) has not been studied. As the first work towards this direction, we aim to answer an important question: Can training-phase adversarial samples manipulate the outcome interpretation of target samples? To answer this question, we propose a data poisoning attack framework named IMF (Interpretation Manipulation Framework), which can manipulate the interpretations of target samples produced by representative outcome interpretation methods. Extensive evaluations verify the effectiveness and efficiency of the proposed attack strategies on two real-world datasets.

ELITE: Robust Deep Anomaly Detection with Meta Gradient

Huayi Zhang,Lei Cao,Peter VanNostrand,Samuel Madden,Elke A. Rundensteiner

Deep Learning techniques have been widely used in detecting anomalies from complex data. Most of these techniques are either unsupervised or semi-supervised because of a lack of a large number of labeled anomalies. However, they typically rely on a clean training data not polluted by anomalies to learn the distribution of the normal data. Otherwise, the learned distribution tends to be distorted and hence ineffective in distinguishing between normal and abnormal data. To solve this problem, we propose a novel approach called ELITE that uses a small number of labeled examples to infer the anomalies hidden in the training samples. It then turns these anomalies into useful signals that help to better detect anomalies from user data. Unlike the classical semi-supervised classification strategy which uses labeled examples as training data, ELITE uses them as validation set. It leverages the gradient of the validation loss to predict if one training sample is abnormal. The intuition is that correctly identifying the hidden anomalies could produce a better deep anomaly model with reduced validation loss. Our experiments on public benchmark datasets show that ELITE achieves up to 30% improvement in ROC AUC comparing to the state-of-the-art, yet robust to polluted training data.

Knowledge-Enhanced Domain Adaptation in Few-Shot Relation Classification

Jiawen Zhang,Jiaqi Zhu,Yi Yang,Wandong Shi,Congcong Zhang,Hongan Wang

Relation classification (RC) is an important task in knowledge extraction from texts, while data-driven approaches, although achieving high performance, heavily rely on a large amount of annotated training data. Recently, many few-shot RC models have been proposed and yielded promising results in general domain datasets, but when adapting to a specific domain, such as medicine, the performance drops dramatically. In this paper, we propose a Knowledge-Enhanced Few-shot RC model for the Domain Adaptation task (KEFDA), which incorporates general and domain-specific knowledge graphs (KGs) to the RC model to improve its domain adaptability. With the help of concept-level KGs, the model can better understand the semantics of texts and easily summarize the global semantics of relation types from only a few instances. To be more important, as a kind of meta-information, the manner of utilizing KGs can be transferred from existing tasks to new tasks, even across domains. Specifically, we design a knowledge-enhanced prototypical network to conduct instance matching, and a relation-meta learning network for implicit relation matching. The two scoring functions are combined to infer the relation type of a new instance. Experimental results on the Domain Adaptation Challenge in the FewRel 2.0 benchmark demonstrate that our approach significantly outperforms the state-of-the-art models (by 6.63% on average).

Attentive Heterogeneous Graph Embedding for Job Mobility Prediction

Le Zhang,Ding Zhou,Hengshu Zhu,Tong Xu,Rui Zha,Enhong Chen,Hui Xiong

Job mobility prediction is an emerging research topic that can benefit both organizations and talents in various ways, such as job recommendation, talent recruitment, and career planning. Nevertheless, most existing studies only focus on modeling the individual-level career trajectories of talents, while the impact of macro-level job transition relationships (e.g., talent flow among companies and job positions) has been largely neglected. To this end, in this paper we propose an enhanced approach to job mobility prediction based on a heterogeneous company-position network constructed from the massive career trajectory data. Specifically, we design an Attentive heterogeneous graph embedding for sequential prediction (Ahead) framework to predict the next career move of talents, which contains two components, namely an attentive heterogeneous graph embedding (AHGN) model and a Dual-GRU model for career path mining. In particular, the AHGN model is used to learn the comprehensive representation for company and position on the heterogeneous network, in which two kinds of aggregators are employed to aggregate the information from external and internal neighbors for a node. Afterwards, a novel type-attention mechanism is designed to automatically fuse the information of the two aggregators for updating node representations. Moreover, the Dual-GRU model is devised to model the parallel sequences that appear in pair, which can be used to capture the sequential interactive information between companies and positions. Finally, we conduct extensive experiments on a real-world dataset for evaluating our Ahead framework. The experimental results clearly validate the effectiveness of our approach compared with the state-of-the-art baselines in terms of job mobility prediction.

Balancing Consistency and Disparity in Network Alignment

Si Zhang,Hanghang Tong,Long Jin,Yinglong Xia,Yunsong Guo

Network alignment plays an important role in a variety of applications. Many traditional methods explicitly or implicitly assume the alignment consistency which might suffer from over-smoothness, whereas some recent embedding based methods could somewhat embrace the alignment disparity by sampling negative alignment pairs. However, under different or even competing designs of negative sampling distributions, some methods advocate positive correlation which could result in false negative samples incorrectly violating the alignment consistency, whereas others champion negative correlation or uniform distribution to sample nodes which may contribute little to learning meaningful embeddings. In this paper, we demystify the intrinsic relationships behind various network alignment methods and between these competing design principles of sampling. Specifically, in terms of model design, we theoretically reveal the close connections between a special graph convolutional network model and the traditional consistency based alignment method. For model training, we quantify the risk of embedding learning for network alignment with respect to the sampling distributions. Based on these, we propose NeXtAlign which strikes a balance between alignment consistency and disparity. We conduct extensive experiments that demonstrate the proposed method achieves significant improvements over the state-of-the-arts.

Where are we in embedding spaces?

Sixiao Zhang,Hongxu Chen,Xiao Ming,Lizhen Cui,Hongzhi Yin,Guandong Xu

Hyperbolic space and hyperbolic embeddings are becoming a popular research field for recommender systems. However, it is not clear under what circumstances the hyperbolic space should be considered. To fill this gap, This paper provides theoretical analysis and empirical results on when and where to use hyperbolic space and hyperbolic embeddings in recommender systems. Specifically, we answer the questions that which type of models and datasets are more suited for hyperbolic space, as well as which latent size to choose. We evaluate our answers by comparing the performance of Euclidean space and hyperbolic space on different latent space models in both general item recommendation domain and social recommendation domain, with 6 widely used datasets and different latent sizes. Additionally, we propose a new metric learning based recommendation method called SCML and its hyperbolic version HSCML. We evaluate our conclusions regarding hyperbolic space on SCML and show the state-of-the-art performance of hyperbolic space by comparing HSCML with other baseline methods.

ROD: Reception-aware Online Distillation for Sparse Graphs

Wentao Zhang,Yuezihan Jiang,Yang Li,Zeang Sheng,Yu Shen,Xupeng Miao,Liang Wang,Zhi Yang,Bin Cui

Graph neural networks (GNNs) have been widely used in many graph-based tasks such as node classification, link prediction, and node clustering. However, GNNs gain their performance benefits mainly from performing the feature propagation and smoothing across the edges of the graph, thus requiring sufficient connectivity and label information for effective propagation. Unfortunately, many real-world networks are sparse in terms of both edges and labels, leading to sub-optimal performance of GNNs. Recent interest in this sparse problem has focused on the self-training approach, which expands supervised signals with pseudo labels. Nevertheless, the self-training approach inherently cannot realize the full potential of refining the learning performance on sparse graphs due to the unsatisfactory quality and quantity of pseudo labels.In this paper, we propose ROD, a novel reception-aware online knowledge distillation approach for sparse graph learning. We design three supervision signals for ROD: multi-scale reception-aware graph knowledge, task-based supervision, and rich distilled knowledge, allowing online knowledge transfer in a peer-teaching manner. To extract knowledge concealed in the multi-scale reception fields, ROD explicitly requires individual student models to preserve different levels of locality information. For a given task, each student would predict based on its reception-scale knowledge, while simultaneously a strong teacher is established on-the-fly by combining multi-scale knowledge. Our approach has been extensively evaluated on 9 datasets and a variety of graph-based tasks, including node classification, link prediction, and node clustering. The result demonstrates that ROD achieves state-of-art performance and is more robust for the graph sparsity.

Learning Based Proximity Matrix Factorization for Node Embedding

Xingyi Zhang,Kun Xie,Sibo Wang,Zengfeng Huang

Node embedding learns a low-dimensional representation for each node in the graph. Recent progress on node embedding shows that proximity matrix factorization methods gain superb performance and scale to large graphs with millions of nodes. Existing approaches first define a proximity matrix and then learn the embeddings that fit the proximity by matrix factorization. Most existing matrix factorization methods adopt the same proximity for different tasks, while it is observed that different tasks and datasets may require different proximity, limiting their representation power.Motivated by this, we propose Lemane, a framework with trainable proximity measures, which can be learned to best suit the datasets and tasks at hand automatically. Our method is end-to-end, which incorporates differentiable SVD in the pipeline so that the parameters can be trained via backpropagation. However, this learning process is still expensive on large graphs. To improve the scalability, we train proximity measures only on carefully subsampled graphs, and then apply standard proximity matrix factorization on the original graph using the learned proximity. Note that, computing the learned proximities for each pair is still expensive for large graphs, and existing techniques for computing proximities are not applicable to the learned proximities. Thus, we present generalized push techniques to make our solution scalable to large graphs with millions of nodes. Extensive experiments show that our proposed solution outperforms existing solutions on both link prediction and node classification tasks on almost all datasets.

Multi-Task Learning via Generalized Tensor Trace Norm

Yi Zhang,Yu Zhang,Wei Wang

The trace norm is widely used in multi-task learning as it can discover low-rank structures among tasks in terms of model parameters. Nowadays, with the emerging of big complex datasets and the popularity of deep learning techniques, tensor trace norms have been used for deep multi-task models. However, existing tensor trace norms cannot discover all the low-rank structures and they require users to determine the importance of their components manually. To solve those two issues, in this paper, we propose a Generalized Tensor Trace Norm (GTTN). The GTTN is defined as a convex combination of matrix trace norms of all possible tensor flattenings and hence it can discover all the possible low-rank structures. Based on the induced objective function with the GTTN, we can learn combination coefficients in the GTTN with several strategies. Experiments on real-world datasets demonstrate the effectiveness of the proposed GTTN.

Initialization Matters: Regularizing Manifold-informed Initialization for Neural Recommendation Systems

Yinan Zhang,Boyang Li,Yong Liu,Hao Wang,Chunyan Miao

Proper initialization is crucial to the optimization and the generalization of neural networks. However, most existing neural recommendation systems initialize the user and item embeddings randomly. In this work, we propose a new initialization scheme for user and item embeddings called Laplacian Eigenmaps with Popularity-based Regularization for Isolated Data (LEPORID). LEPORID endows the embeddings with information regarding multi-scale neighborhood structures on the data manifold and performs adaptive regularization to compensate for high embedding variance on the tail of the data distribution. Exploiting matrix sparsity, LEPORID embeddings can be computed efficiently. We evaluate LEPORID in a wide range of neural recommendation models. In contrast to the recent surprising finding that the simple K-nearest-neighbor (KNN) method often outperforms neural recommendation systems, we show that existing neural systems initialized with LEPORID often perform on par or better than KNN. To maximize the effects of the initialization, we propose the Dual-Loss Residual Recommendation (DLR^2) network, which, when initialized with LEPORID, substantially outperforms both traditional and state-of-the-art neural recommender systems.

H2MN: Graph Similarity Learning with Hierarchical Hypergraph Matching Networks

Zhen Zhang,Jiajun Bu,Martin Ester,Zhao Li,Chengwei Yao,Zhi Yu,Can Wang

Graph similarity learning, which measures the similarities between a pair of graph-structured objects, lies at the core of various machine learning tasks such as graph classification, similarity search, etc. In this paper, we devise a novel graph neural network based framework to address this challenging problem, motivated by its great success in graph representation learning. As the vast majority of existing graph neural network models mainly concentrate on learning effective node or graph level representations of a single graph, little effort has been made to jointly reason over a pair of graph-structured inputs for graph similarity learning. To this end, we propose Hierarchical Hypergraph Matching Networks (H2sup>MN) to calculate the similarities between graph pairs with arbitrary structure. Specifically, our proposed H2MN learns graph representation from the perspective of hypergraph, and takes each hyperedge as a subgraph to perform subgraph matching, which could capture the rich substructure similarities across the graph. To enable hierarchical graph representation and fast similarity computation, we further propose a hyperedge pooling operator to transform each graph into a coarse graph of reduced size. Then, a multi-perspective cross-graph matching layer is employed on the coarsened graph pairs to extract the inter-graph similarity. Comprehensive experiments on five public datasets empirically demonstrate that our proposed model can outperform state-of-the-art baselines with different gains for graph-graph classification and regression tasks.

Fairness-Aware Online Meta-learning

Chen Zhao,Feng Chen,Bhavani Thuraisingham

In contrast to offline working fashions, two research paradigms are devised for online learning: (1) Online Meta-Learning (OML)[6, 20, 26] learns good priors over model parameters (or learning to learn) in a sequential setting where tasks are revealed one after another. Although it provides a sub-linear regret bound, such techniques completely ignore the importance of learning with fairness which is a significant hallmark of human intelligence. (2) Online Fairness-Aware Learning [1, 8, 21]. This setting captures many classification problems for which fairness is a concern. But it aims to attain zero-shot generalization without any task-specific adaptation. This therefore limits the capability of a model to adapt onto newly arrived data. To overcome such issues and bridge the gap, in this paper for the first time we proposed a novel online meta-learning algorithm, namely FFML, which is under the setting of unfairness prevention. The key part of FFML is to learn good priors of an online fair classification models primal and dual parameters that are associated with the models accuracy and fairness, respectively. The problem is formulated in the form of a bi-level convex-concave optimization. The theoretic analysis provides sub-linear upper bounds O(log T)for loss regret and O(u221alog T)violation of cumulative fairness constraints. Our experiments demonstrate the versatility of FFML by applying it to classification on three real-world datasets and show substantial improvements over the best prior work on the tradeoff between fairness and classification accuracy.

Temporal Biased Streaming Submodular Optimization

Junzhou Zhao,Pinghui Wang,Chao Deng,Jing Tao

Submodular optimization lies at the core of many data mining and machine learning applications such as data summarization and subset selection. For data streams where elements arrive one at a time, streaming submodular optimization (SSO) algorithms are desired. Existing SSO solutions are mainly designed for insertion-only streams where elements in the stream all participate in the analysis, or sliding-window streams where only the most recent data participates in the analysis. SSO for insertion-only streams does not sufficiently emphasize recent data. SSO for sliding-window streams abruptly forgets all past data. In this work, we propose a new SSO problem, i.e., temporal biased streaming submodular optimization (TBSSO), which embraces the special settings of all previous studies. TBSSO leverages a temporal bias function to force each element in the stream to participate in the analysis with a probability decreasing over time and hence elements in the stream are forgotten gradually. We design novel streaming algorithms to solve the TBSSO problem with provable approximation guarantees. Experiments show that our algorithm can find high quality solutions and improve the efficiency to about one order of magnitude faster than the baseline method.

Cluster-Reduce: Compressing Sketches for Distributed Data Streams

Yikai Zhao,Zheng Zhong,Yuanpeng Li,Yi Zhou,Yifan Zhu,Li Chen,Yi Wang,Tong Yang

Sketches, a type of probabilistic algorithms, have been widely accepted as the approximate summary of data streams. Compressing sketches is the best choice in distributed data streams to reduce communication overhead. The ideal compression algorithm should meet the following three requirements: high efficiency of compression procedure, support of direct query without decompression, and high accuracy of compressed sketches. However, no prior work can meet these requirements at the same time. Especially, the accuracy is poor after compression using existing methods. In this paper, we propose Cluster-Reduce, a framework for compressing sketches, which can meet all three requirements. Our key technique nearness clustering rearranges the adjacent counters with similar values in the sketch to significantly improve the accuracy. We use Cluster-Reduce to compress four kinds of sketches in two use-cases: distributed data streams and distributed machine learning. Extensive experimental results show that Cluster-Reduce can achieve up to 60 times smaller error than prior works. The source codes of Cluster-Reduce are available at Github anonymously[1].

Multi-graph Multi-label Learning with Dual-granularity Labeling

Yuhai Zhao,Yejiang Wang,Zhengkui Wang,Chengqi Zhang

Graphs are a powerful and versatile data structure that easily captures real life relationship. Multi-graph Multi-label learning (MGML) is a supervised learning task, which aims to learn a Multi-label classifier to label a set of objects of interest (e.g. image or text) with a bag-of-graphs representation. However, prior techniques on the MGML are developed based on transferring graphs into instances that does not fully utilize the structure information in the learning, and focus on learning the unseen labels only at the bag level. There is no existing work studying how to label the graphs within a bag that is of importance in many applications like image or text annotation. To bridge this gap, in this paper, we present a novel coarse and fine-grained Multi-graph Multi-label (cfMGML) learning framework which directly builds the learning model over the graphs and empowers the label prediction at both the coarse (aka. bag) level and fine-grained (aka. graph in each bag) level. In particular, given a set of labeled multi-graph bags, we design the scoring functions at both graph and bag levels to model the relevance between the label and data using specific graph kernels. Meanwhile, we propose a thresholding rank-loss objective function to rank the labels for the graphs and bags and minimize the hamming-loss simultaneously at one-step, which aims to address the error accumulation issue in traditional rank-loss algorithms. To tackle the non-convex optimization problem, we further develop an effective sub-gradient descent algorithm to handle high-dimensional space computation required in cfMGML. Experiments over various real-world datasets demonstrate cfMGML achieves superior performance than the state-of-arts algorithms.

Accelerating Set Intersections over Graphs by Reducing-Merging

Weiguo Zheng,Yifan Yang,Chengzhi Piao

Given two sets of vertices Sa and Sb of a graph, computing their common vertices, namely set intersection, is one primitive operation in many graph algorithms such as triangle counting, maximal clique enumeration, and subgraph matching. Thus, accelerating set intersections is beneficial to these algorithms. In the paper, we propose a novel reducing-merging framework for set intersections over graphs rather than intersecting the two sets directly. In the reducing phase, the vertices that cannot fall into the intersection are screened out by applying the range reduction. Based on the truncated subsets, the intersection can be easily obtained using the classic merging algorithm. To optimize the range codes that sketch the vertices, we formulate the problem of range code optimization and prove its NP-hardness. We develop efficient yet effective algorithms for two typical scenarios global intersection and local intersection. Moreover, we present a novel two-level merging algorithm to enhance the performance. The results of extensive experiments over real graphs show that our approach can achieve significant speedups compared to the merge-based algorithm.

Triplet Attention: Rethinking the Similarity in Transformers

Haoyi Zhou,Jianxin Li,Jieqi Peng,Shuai Zhang,Shanghang Zhang

The Transformer model has benefited various real-world applications, where the self-attention mechanism with dot-products shows superior alignment ability on building long dependency. However, the pair-wisely attended self-attention limits further performance improvement on challenging tasks. To the extent of our knowledge, this is the first work to define the Triplet Attention (A3) for Transformer, which introduces triplet connections as the complementary dependency. Specifically, we define the triplet attention based on the scalar triplet product, which may be interchangeably used with the canonical one within the multi-head attention. It allows the self-attention mechanism to attend to diverse triplets and capture complex dependency. Then, we utilize the permuted formulation and kernel tricks to establish a linear approximation to A3. The proposed architecture could be smoothly integrated into the pre-training by modifying head configurations. Extensive experiments show that our methods achieve significant performance improvement on various tasks and two benchmarks.

Table2Charts: Recommending Charts by Learning Shared Table Representations

Mengyu Zhou,Qingtao Li,Xinyi He,Yuejiang Li,Yibo Liu,Wei Ji,Shi Han,Yining Chen,Daxin Jiang,Dongmei Zhang

It is common for people to create different types of charts to explore a multi-dimensional dataset (table). However, to recommend commonly composed charts in real world, one should take the challenges of efficiency, imbalanced data and table context into consideration. In this paper, we propose Table2Charts framework which learns common patterns from a large corpus of (table, charts) pairs. Based on deep Q-learning with copying mechanism and heuristic searching, Table2Charts does table-to-sequence generation, where each sequence follows a chart template. On a large spreadsheet corpus with 165k tables and 266k charts, we show that Table2Charts could learn a shared representation of table fields so that recommendation tasks on different chart types could mutually enhance each other. Table2Charts outperforms other chart recommendation systems in both multi-type task (with doubled recall numbers R@3=0.61 and R@1=0.43) and human evaluations.

Xiaotian Zhou,Zhongzhi Zhang

The operation of adding edges has been frequently used to the study of opinion dynamics in social networks for various purposes. In this paper, we consider the edge addition problem for the DeGroot model of opinion dynamics in a social network with n nodes and m edges, in the presence of a small number s << n of competing leaders with binary opposing opinions 0 or 1. Concretely, we pose and investigate the problem of maximizing the equilibrium overall opinion by creating k new edges in a candidate edge set, where each edge is incident to a 1-valued leader and a follower node. We show that the objective function is monotone and submodular. We then propose a simple greedy algorithm with an approximation factor (1 - 1 over e) that approximately solves the problem in O(n3) time. Moreover, we provide a fast algorithm with a (1 - 1 over e -u2208) approximation ratio and u00d5(mkeu2208-2) time complexity for any u2208 > 0, where u00d5 (u22c5) notation suppresses the poly (log n) factors. Extensive experiments demonstrate that our second approximate algorithm is efficient and effective, which scales to large networks with more than a million nodes.

PURE: Positive-Unlabeled Recommendation with Generative Adversarial Network

Yao Zhou,Jianpeng Xu,Jun Wu,Zeinab Taghavi,Evren Korpeoglu,Kannan Achan,Jingrui He

Recommender systems are powerful tools for information filtering with the ever-growing amount of online data. Despite its success and wide adoption in various web applications and personalized products, many existing recommender systems still suffer from multiple drawbacks such as large amount of unobserved feedback, poor model convergence, etc. These drawbacks of existing work are mainly due to the following two reasons: first, the widely used negative sampling strategy, which treats the unlabeled entries as negative samples, is invalid in real-world settings; second, all training samples are retrieved from the discrete observations, and the underlying true distribution of the users and items is not learned.In this paper, we address these issues by developing a novel framework named PURE, which trains an unbiased positive-unlabeled discriminator to distinguish the true relevant user-item pairs against the ones that are non-relevant, and a generator that learns the underlying user-item continuous distribution. For a comprehensive comparison, we considered 14 popular baselines from 5 different categories of recommendation approaches. Extensive experiments on two public real-world data sets demonstrate that PURE achieves the best performance in terms of 8 ranking based evaluation metrics.

Modeling Context-aware Features for Cognitive Diagnosis in Student Learning

Yuqiang Zhou,Qi Liu,Jinze Wu,Fei Wang,Zhenya Huang,Wei Tong,Hui Xiong,Enhong Chen,Jianhui Ma

The contexts and cultures have a direct impact on student learning by affecting students implicit cognitive states, such as the preference and the proficiency on specific knowledge. Motivated by the success of context-aware modeling in various fields, such as recommender systems, in this paper, we propose to study how to model context-aware features and adapt them for more precisely diagnosing students knowledge proficiency. Specifically, by analyzing the characteristics of educational contexts, we design a two-stage framework ECD (Educational context-aware Cognitive Diagnosis), where a hierarchical attentive network is first proposed to represent the context impact on students and then an adaptive optimization is used to achieve diagnosis enhancement by aggregating the cognitive states reflected from both educational contexts and students historical learning records. Moreover, we give three implementations of general ECD framework following the typical cognitive diagnosis solutions. Finally, we conduct extensive experiments on nearly 52 million records of the students sampled by PISA (Programme for International Student Assessment) from 73 countries and regions. The experimental results not only prove that ECD is more effective in student performance prediction since it can well capture the impact from educational contexts to students cognitive states, but also give some interesting discoveries regarding the difference among different educational contexts in different countries and regions.

S-LIME: Stabilized-LIME for Model Explanation

Zhengze Zhou,Giles Hooker,Fei Wang

An increasing number of machine learning models have been deployed in domains with high stakes such as finance and healthcare. Despite their superior performances, many models are black boxes in nature which are hard to explain. There are growing efforts for researchers to develop methods to interpret these black-box models. Post hoc explanations based on perturbations, such as LIME [39], are widely used approaches to interpret a machine learning model after it has been built. This class of methods has been shown to exhibit large instability, posing serious challenges to the effectiveness of the method itself and harming user trust. In this paper, we propose S-LIME, which utilizes a hypothesis testing framework based on central limit theorem for determining the number of perturbation points needed to guarantee stability of the resulting explanation. Experiments on both simulated and real world data sets are provided to demonstrate the effectiveness of our method.

Popularity Bias in Dynamic Recommendation

Ziwei Zhu,Yun He,Xing Zhao,James Caverlee

Popularity bias is a long-standing challenge in recommender systems: popular items are overly recommended at the expense of less popular items that users may be interested in being under-recommended. Such a bias exerts detrimental impact on both users and item providers, and many efforts have been dedicated to studying and solving such a bias. However, most existing works situate the popularity bias in a static setting, where the bias is analyzed only for a single round of recommendation with logged data. These works fail to take account of the dynamic nature of real-world recommendation process, leaving several important research questions unanswered: how does the popularity bias evolve in a dynamic scenario? what are the impacts of unique factors in a dynamic recommendation process on the bias? and how to debias in this long-term dynamic process? In this work, we investigate the popularity bias in dynamic recommendation and aim to tackle these research gaps. Concretely, we conduct an empirical study by simulation experiments to analyze popularity bias in the dynamic scenario and propose a dynamic debiasing strategy and a novel False Positive Correction method utilizing false positive signals to debias, which show effective performance in extensive experiments.

Controllable Generation from Pre-trained Language Models via Inverse Prompting

Xu Zou,Da Yin,Qingyang Zhong,Hongxia Yang,Zhilin Yang,Jie Tang

Large-scale pre-trained language models have demonstrated strong capabilities of generating realistic texts. However, it remains challenging to control the generation results. Previous approaches such as prompting are far from sufficient, and lack of controllability limits the usage of language models. To tackle this challenge, we propose an innovative method, inverse prompting, to better control text generation. The core idea of inverse prompting is to use generated text to inversely predict the prompt during beam search, which enhances the relevance between the prompt and the generated text and thus improves controllability. Empirically, we pre-train a large-scale Chinese language model to perform a systematic study using human evaluation on the tasks of open-domain poem generation and open-domain long-form question answering. Results demonstrate that our proposed method substantially outperforms the baselines and that our generation quality is close to human performance on some of the tasks.

TDGIA: Effective Injection Attacks on Graph Neural Networks

Xu Zou,Qinkai Zheng,Yuxiao Dong,Xinyu Guan,Evgeny Kharlamov,Jialiang Lu,Jie Tang

Graph Neural Networks (GNNs) have achieved promising performance in various real-world applications. However, recent studies have shown that GNNs are vulnerable to adversarial attacks. In this paper, we study a recently-introduced realistic attack scenario on graphs---graph injection attack (GIA). In the GIA scenario, the adversary is not able to modify the existing link structure and node attributes of the input graph, instead the attack is performed by injecting adversarial nodes into it. We present an analysis on the topological vulnerability of GNNs under GIA setting, based on which we propose the Topological Defective Graph Injection Attack (TDGIA) for effective injection attacks. TDGIA first introduces the topological defective edge selection strategy to choose the original nodes for connecting with the injected ones. It then designs the smooth feature optimization objective to generate the features for the injected nodes. Extensive experiments on large-scale datasets show that TDGIA can consistently and significantly outperform various attack baselines in attacking dozens of defense GNN models. Notably, the performance drop on target GNNs resultant from TDGIA is more than double the damage brought by the best attack solution among hundreds of submissions on KDD-CUP 2020.

Counterfactual Graphs for Explainable Classification of Brain Networks

Carlo Abrate,Francesco Bonchi

Training graph classifiers able to distinguish between healthy brains and dysfunctional ones, can help identifying substructures associated to specific cognitive phenotypes. However, the mere predictive power of the graph classifier is of limited interest to the neuroscientists, which have plenty of tools for the diagnosis of specific mental disorders. What matters is the interpretation of the model, as it can provide novel insights and new hypotheses. In this paper we propose counterfactual graphs as a way to produce local post-hoc explanations of any black-box graph classifier. Given a graph and a black-box, a counterfactual is a graph which, while having high structural similarity with the original graph, is classified by the black-box in a different class. We propose and empirically compare several strategies for counterfactual graph search. Our experiments against a white-box classifier with known optimal counterfactual, show that our methods, although heuristic, can produce counterfactuals very close to the optimal one. Finally, we show how to use counterfactual graphs to build global explanations correctly capturing the behaviour of different black-box classifiers and providing interesting insights for the neuroscientists.

All Models Are Useful: Bayesian Ensembling for Robust High Resolution COVID-19 Forecasting

Aniruddha Adiga,Lijing Wang,Benjamin Hurt,Akhil Peddireddy,Przemyslaw Porebski,Srinivasan Venkatramanan,Bryan Leroy Lewis,Madhav Marathe

Timely, high-resolution forecasts of infectious disease incidence are useful for policy makers in deciding intervention measures and estimating healthcare resource burden. In this paper, we consider the task of forecasting COVID-19 confirmed cases at the county level for the United States. Although multiple methods have been explored for this task, their performance has varied across space and time due to noisy data and the inherent dynamic nature of the pandemic. We present a forecasting pipeline which incorporates probabilistic forecasts from multiple statistical, machine learning and mechanistic methods through a Bayesian ensembling scheme, and has been operational for nearly 6 months serving local, state and federal policymakers in the United States. While showing that the Bayesian ensemble is at least as good as the individual methods, we also show that each individual method contributes significantly for different spatial regions and time points. We compare our models performance with other similar models being integrated into CDC-initiated COVID-19 Forecast Hub, and show better performance at longer forecast horizons. Finally, we also describe how such forecasts are used to increase lead time for training mechanistic scenario projections. Our work demonstrates that such a real-time high resolution forecasting pipeline can be developed by integrating multiple methods within a performance-based ensemble to support pandemic response.

Dynamic Language Models for Continuously Evolving Content

Spurthi Amba Hombaiah,Tao Chen,Mingyang Zhang,Michael Bendersky,Marc Najork

The content on the web is in a constant state of flux. New entities,issues, and ideas continuously emerge, while the semantics of the existing conversation topics gradually shift. In recent years, pre-trained language models like BERT greatly improved the state-of-the-art for a large spectrum of content understanding tasks.Therefore, in this paper, we aim to study how these language models can be adapted to better handle continuously evolving web content.In our study, we first analyze the evolution of 2013 - 2019 Twitter data, and unequivocally confirm that a BERT model trained on past tweets would heavily deteriorate when directly applied to data from later years. Then, we investigate two possible sources of the deterioration: the semantic shift of existing tokens and the sub-optimal or failed understanding of new tokens. To this end, we both explore two different vocabulary composition methods, as well as propose three sampling methods which help in efficient incremental training for BERT-like models. Compared to a new model trained from scratch offline, our incremental training (a) reduces the training costs, (b) achieves better performance on evolving content, and (c)is suitable for online deployment. The superiority of our methods is validated using two downstream tasks. We demonstrate significant improvements when incrementally evolving the model from a particular base year, on the task of Country Hashtag Prediction, as well as on the OffensEval 2019 task.

Quantifying and Addressing Ranking Disparity in Human-Powered Data Acquisition

Sihem Amer-Yahia,Shady Elbassuoni,Ahmad Ghizzawi,Anas Hosami

Algorithmic bias has been identified as a key challenge in many AI applications. One major source of bias is the data used to build these applications. For instance, many AI applications rely on human users to generate training data. The generated data might be biased if the data acquisition process is skewed towards certain groups of people based on say gender, ethnicity or location. This typically happens as a result of a hidden association between the peoples qualifications for data acquisition and the peoples protected attributes. In this paper, we study how to unveil and address disparity in data acquisition. We focus on the case where the data acquisition process involves ranking of people and we define disparity as the unbalanced targeting of people by the data acquisition process. To quantify disparity, we formulate an optimization problem that partitions people on their protected attributes, computes the qualifications of people in each partition, and finds the partitioning that exhibits the highest disparity in qualifications. Due to the combinatorial nature of our problem, we devise heuristics to navigate the space of partitions. We also discuss how to address disparity between partitions. We conduct a series of experiments on real and simulated datasets that demonstrate that our proposed approach is successful in quantifying and addressing ranking disparity in human-powered data acquisition.

Auto-Split: A General Framework of Collaborative Edge-Cloud AI

Amin Banitalebi-Dehkordi,Naveen Vedula,Jian Pei,Fei Xia,Lanjun Wang,Yong Zhang

In many industry scale applications, large and resource consuming machine learning models reside in powerful cloud servers. At the same time, large amounts of input data are collected at the edge of cloud. The inference results are also communicated to users or passed to downstream tasks at the edge. The edge often consists of a large number of low-power devices. It is a big challenge to design industry products to support sophisticated deep model deployment and conduct model inference in an efficient manner so that the model accuracy remains high and the end-to-end latency is kept low. This paper describes the techniques and engineering practice behind Auto-Split, an edge-cloud collaborative prototype of Huawei Cloud. This patented technology is already validated on selected applications, is on its way for broader systematic edge-cloud application integration, and is being made available for public use as an automated pipeline service for end-to-end cloud-edge collaborative intelligence deployment. To the best of our knowledge, there is no existing industry product that provides the capability of Deep Neural Network (DNN) splitting.

Unpaired Generative Molecule-to-Molecule Translation for Lead Optimization

Guy Barshatski,Kira Radinsky

Molecular lead optimization is an important task of drug discovery focusing on generating novel molecules similar to a drug candidate but with enhanced properties. Prior works focused on supervised models requiring datasets of pairs of a molecule and an enhanced molecule. These approaches require large amounts of data and are limited by the bias of the specific examples of enhanced molecules. In this work, we present an unsupervised generative approach with a molecule-embedding component that maps a discrete representation of a molecule to a continuous space. The components are then coupled with a unique training architecture leveraging molecule fingerprints and applying double cycle constraints to enable both chemical resemblance to the original molecular lead while generating novel molecules with enhanced properties. We evaluate our method on multiple common molecular optimization tasks, including dopamine receptor (DRD2) and drug likeness (QED), and show our method outperforms previous state-of-the-art baselines. Moreover, we conduct thorough ablation experiments to show the effect and necessity of important components in our model. Furthermore, we demonstrate our methods ability to generate FDA-approved drugs it has never encountered before, such as Perazine and Clozapine, which are used to treat psychotic disorders, like Schizophrenia. The system is currently being deployed for use in the Targeted Drug Delivery and Personalized Medicine laboratories generating treatments using nanoparticle-based technology.

TimeSHAP: Explaining Recurrent Models through Sequence Perturbations

Jou00e3o Bento,Pedro Saleiro,Andru00e9 F. Cruz,Mu00e1rio A.T. Figueiredo,Pedro Bizarro

Although recurrent neural networks (RNNs) are state-of-the-art in numerous sequential decision-making tasks, there has been little research on explaining their predictions. In this work, we present TimeSHAP, a model-agnostic recurrent explainer that builds upon KernelSHAP and extends it to the sequential domain. TimeSHAP computes feature-, timestep-, and cell-level attributions. As sequences may be arbitrarily long, we further propose a pruning method that is shown to dramatically decrease both its computational cost and the variance of its attributions. We use TimeSHAP to explain the predictions of a real-world bank account takeover fraud detection RNN model, and draw key insights from its explanations: i) the model identifies important features and events aligned with what fraud analysts consider cues for account takeover; ii) positive predicted sequences can be pruned to only 10% of the original length, as older events have residual attribution values; iii) the most recent input event of positive predictions only contributes on average to 41% of the models score; iv) notably high attribution to clients age, upheld on higher false positive rates for older clients.

A Framework for Modeling Cyber Attack Techniques from Security Vulnerability Descriptions

Hodaya Binyamini,Ron Bitton,Masaki Inokuchi,Tomohiko Yagyu,Yuval Elovici,Asaf Shabtai

Attack graphs are one of the main techniques used to automate the cybersecurity risk assessment process. In order to derive a relevant attack graph, up-to-date information on known cyber attack techniques should be represented as interaction rules. However, designing and creating new interaction rules is a time consuming task performed manually by security experts. We present a novel, end-to-end, automated framework for modeling new attack techniques from the textual description of security vulnerabilities. Given a description of a security vulnerability, the proposed framework first extracts the relevant attack entities required to model the attack, completes missing information on the vulnerability, and derives a new interaction rule that models the attack; this new rule is then integrated within the MulVal attack graph tool. The proposed framework implements a novel data science pipeline that includes a dedicated cybersecurity linguistic model trained on the NVD repository, a recurrent neural network model used for attack entity extraction, a logistic regression model used for completing the missing information, and a transition probability matrix for automatically generating new interaction rule. We evaluated the performance of each of the individual algorithms, as well as the complete framework, and demonstrated its effectiveness.

VisRel: Media Search at Scale

Fedor Borisyuk,Siddarth Malreddy,Jun Mei,Yiqun Liu,Xiaoyi Liu,Piyush Maheshwari,Anthony Bell,Kaushik Rangadurai

In this paper, we present VisRel, a deployed large-scale media search system that leverages text understanding, media understanding, and multimodal technologies to deliver a modern multimedia search experience. We share our insight on developing image and video understanding models for content retrieval, training efficient and effective media-to-query relevance models, and refining online and offline metrics to measure the success of one of the largest media search databases in the industry. We summarize our learnings gathered from hundreds of A/B test experiments and describe the most effective technical approaches. The techniques presented in this work have contributed 34% (abs.) improvement to media-to-query relevance and 10% improvement to user engagement. We believe that this work can provide practical solutions and insights for engineers who are interested in applying media understanding technologies to empower multimedia search systems that operate at Facebook scale.

GEM: Translation-Free Zero-Shot Global Entity Matcher for Global Catalogs

Karim Bouyarmane

We propose a modular BiLSTM / CNN / Transformer deep-learning encoder architecture, together with a data synthesis and training approach, to solve the problem of matching catalog products across different languages, different local catalogs, and different catalog data contributors. The end-to-end model relies solely on raw natural language textual data in the catalog entries and on images of the products, without any feature engineering, and is entirely translation-free, not requiring the translation of the catalog natural-language data to the same base language for inference. We report experiments results on a 4-languages-scope model (English, French, German, Spanish) matching entities from 4 local catalogs (UK, France, Germany, Spain) of a retail website. We demonstrate that the model achieves performance comparable to state-of-the-art existing entity matchers that operate within a single language, and that the model achieves high-performance zero-shot inference on language pairs not seen in training.

A Semi-Personalized System for User Cold Start Recommendation on Music Streaming Apps

Lu00e9a Briand,Guillaume Salha-Galvan,Walid Bendada,Mathieu Morlon,Viet-Anh Tran

Music streaming services heavily rely on recommender systems to improve their users experience, by helping them navigate through a large musical catalog and discover new songs, albums or artists. However, recommending relevant and personalized content to new users, with few to no interactions with the catalog, is challenging. This is commonly referred to as the user cold start problem. In this applied paper, we present the system recently deployed on the music streaming service Deezer to address this problem. The solution leverages a semi-personalized recommendation strategy, based on a deep neural network architecture and on a clustering of users from heterogeneous sources of information. We extensively show the practical impact of this system and its effectiveness at predicting the future musical preferences of cold start users on Deezer, through both offline and online large-scale experiments. Besides, we publicly release our code as well as anonymized usage data from our experiments. We hope that this release of industrial resources will benefit future research on user cold start recommendation.

Generating Mobility Trajectories with Retained Data Utility

Chu Cao,Mo Li

This paper presents TrajGen, an approach to generate artificial datasets of mobility trajectories based on an original trajectory dataset while retaining the utility of the original data in supporting various mobility applications. The generated mobility data is disentangled with the original data and can be shared without compromising the data privacy. TrajGen leverages Generative Adversarial Nets combined with a Seq2Seq model to generate the spatial-temporal trajectory data. TrajGen is implemented and evaluated with real-world taxi trajectory data in Singapore. The extensive experimental results demonstrate that TrajGen is able to generate artificial trajectory data that retain key statistical characteristics of the original data. Two case studies, i.e. road map updating and Origin-Destination demand estimation are performed with the generated artificial data, and the results show that the artificial trajectories generated by TrajGen retain the utility of original data in supporting the two applications.

Interactive Audience Expansion On Large Scale Online Visitor Data

Gromit Yeuk-Yin Chan,Tung Mai,Anup B. Rao,Ryan A. Rossi,Fan Du,Clu00e1udio T. Silva,Juliana Freire

Online marketing platforms often store millions of website visitors behavior as a large sparse matrix with rows as visitors and columns as behavior. These platforms allow marketers to conduct Audience Expansion, a technique to identify new audiences with similar behavior to the original target audiences. In this paper, we propose a method to achieve interactive Audience Expansion from millions of visitor data efficiently. Unlike other methods that undergo significant computations upon inputs, our approach provides interactive responses when a marketer inputs the target audiences and similarity measures. The idea is to apply data summarization technique on the large visitor matrix to obtain a small set of summaries representing the similarities in the matrix. We propose efficient algorithms to compute the data summaries on a distributed computing environment (i.e., Spark) and conduct the expansion using the summaries. Our experiment shows that our approach (1) provides 10 times more accurate and 27 times faster Audience Expansion results on real datasets and (2) achieves a 98% speed-up compared to straightforward data summarization implementations. We also present an interface to apply the algorithm for real-world scenarios.

Supporting COVID-19 Policy Response with Large-scale Mobility-based Modeling

Serina Chang,Mandy L. Wilson,Bryan Lewis,Zakaria Mehrab,Komal K. Dudakiya,Emma Pierson,Pang Wei Koh,Jaline Gerardin,Beth Redbird,David Grusky,Madhav Marathe,Jure Leskovec

Mobility restrictions have been a primary intervention for controlling the spread of COVID-19, but they also place a significant economic burden on individuals and businesses. To balance these competing demands, policymakers need analytical tools to assess the costs and benefits of different mobility reduction measures. In this paper, we present our work motivated by our interactions with the Virginia Department of Health on a decision-support tool that utilizes large-scale data and epidemiological modeling to quantify the impact of changes in mobility on infection rates. Our model captures the spread of COVID-19 by using a fine-grained, dynamic mobility network that encodes the hourly movements of people from neighborhoods to individual places, with over 3 billion hourly edges. By perturbing the mobility network, we can simulate a wide variety of reopening plans and forecast their impact in terms of new infections and the loss in visits per sector. To deploy this model in practice, we built a robust computational infrastructure to support running millions of model realizations, and we worked with policymakers to develop an interactive dashboard that communicates our models predictions for thousands of potential policies.

Extreme Multi-label Learning for Semantic Matching in Product Search

Wei-Cheng Chang,Daniel Jiang,Hsiang-Fu Yu,Choon Hui Teo,Jiong Zhang,Kai Zhong,Kedarnath Kolluri,Qie Hu,Nikhil Shandilya,Vyacheslav Ievgrafov,Japinder Singh,Inderjit S. Dhillon

We consider the problem of semantic matching in product search: given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more. Because of large catalog spaces and real-time latency constraints, semantic matching algorithms not only desire high recall but also need to have low latency. Conventional lexical matching approaches (e.g., Okapi-BM25) exploit inverted indices to achieve fast inference time, but fail to capture behavioral signals between queries and products. In contrast, embedding-based models learn semantic representations from customer behavior data, but the performance is often limited by shallow neural encoders due to latency constraints. Semantic product search can be viewed as an eXtreme Multi-label Classification (XMC) problem, where customer queries are input instances and products are output labels. In this paper, we aim to improve semantic product search by using tree-based XMC models where inference time complexity is logarithmic in the number of products. We consider hierarchical linear models with n-gram features for fast real-time inference. Quantitatively, our method maintains a low latency of 1.25 milliseconds per query and achieves a 65% improvement of Recall@100 (60.9% v.s. 36.8%) over a competing embedding-based DSSM model. Our model is robust to weight pruning with varying thresholds, which can flexibly meet different system requirements for online deployments. Qualitatively, our method can retrieve products that are complementary to existing product search system and add diversity to the match set.

Task-wise Split Gradient Boosting Trees for Multi-center Diabetes Prediction

Mingcheng Chen,Zhenghui Wang,Zhiyun Zhao,Weinan Zhang,Xiawei Guo,Jian Shen,Yanru Qu,Jieli Lu,Min Xu,Yu Xu,Tiange Wang,Mian Li,Weiwei Tu,Yong Yu,Yufang Bi,Weiqing Wang,Guang Ning

Diabetes prediction is an important data science application in the social healthcare domain. There exist two main challenges in the diabetes prediction task: data heterogeneity since demographic and metabolic data are of different types, data insufficiency since the number of diabetes cases in a single medical center is usually limited. To tackle the above challenges, we employ gradient boosting decision trees (GBDT) to handle data heterogeneity and introduce multi-task learning (MTL) to solve data insufficiency. To this end, Task-wise Split Gradient Boosting Trees (TSGB) is proposed for the multi-center diabetes prediction task. Specifically, we firstly introduce task gain to evaluate each task separately during tree construction, with a theoretical analysis of GBDTs learning objective. Secondly, we reveal a problem when directly applying GBDT in MTL, i.e., the negative task gain problem. Finally, we propose a novel split method for GBDT in MTL based on the task gain statistics, named task-wise split, as an alternative to standard feature-wise split to overcome the mentioned negative task gain problem. Extensive experiments on a large-scale real-world diabetes dataset and a commonly used benchmark dataset demonstrate TSGB achieves superior performance against several state-of-the-art methods. Detailed case studies further support our analysis of negative task gain problems and provide insightful findings. The proposed TSGB method has been deployed as an online diabetes risk assessment software for early diagnosis.

Web-Scale Generic Object Detection at Microsoft Bing

Stephen Xi Chen,Saurajit Mukherjee,Unmesh Phadke,Tingting Wang,Junwon Park,Ravi Theja Yada

In this paper, we present Generic Object Detection (GenOD), one of the largest object detection systems deployed to a web-scale general visual search engine that can detect over 900 categories for all Microsoft Bing Visual Search queries in near real-time. It acts as a fundamental visual query understanding service that provides object-centric information and shows gains in multiple production scenarios, improving upon domain-specific models. We discuss the challenges of collecting data, training, deploying and updating such a large-scale object detection model with multiple dependencies. We discuss a data collection pipeline that reduces per-bounding box labeling cost by 81.5% and latency by 61.2% while improving on annotation quality. We show that GenOD can improve weighted average precision by over 20% compared to multiple domain-specific models. We also improve the model update agility by nearly 2 times with the proposed disjoint detector training compared to joint fine-tuning. Finally we demonstrate how GenOD benefits visual search applications by significantly improving object-level search relevance by 54.9% and user engagement by 59.9%.

Curriculum Meta-Learning for Next POI Recommendation

Yudong Chen,Xin Wang,Miao Fan,Jizhou Huang,Shengwen Yang,Wenwu Zhu

Next point-of-interest (POI) recommendation is a hot research field where a recent emerging scenario, next POI to search recommendation, has been deployed in many online map services such as Baidu Maps. One of the key issues in this scenario is providing satisfactory recommendation services for cold-start cities with a limited number of user-POI interactions, which requires transferring the knowledge hidden in rich data from many other cities to these cold-start cities. Existing literature either does not consider the city-transfer issue or cannot simultaneously tackle the data sparsity and pattern diversity issues among various users in multiple cities. To address these issues, we explore city-transfer next POI to search recommendation that transfers the knowledge from multiple cities with rich data to cold-start cities with scarce data. We propose a novel Curriculum Hardness Aware Meta-Learning (CHAML) framework, which incorporates hard sample mining and curriculum learning into a meta-learning paradigm. Concretely, the CHAML framework considers both city-level and user-level hardness to enhance the conditional sampling during meta training, and uses an easy-to-hard curriculum for the city-sampling pool to help the meta-learner converge to a better state. Extensive experiments on two real-world map search datasets from Baidu Maps demonstrate the superiority of CHAML framework.

Robust Object Detection Fusion Against Deception

Ka-Ho Chow,Ling Liu

Deep neural network (DNN) based object detection has become an integral part of numerous cyber-physical systems, perceiving physical environments and responding proactively to real-time events. Recent studies reveal that well-trained multi-task learners like DNN-based object detectors perform poorly in the presence of deception. This paper presents FUSE, a deception-resilient detection fusion approach with three novel contributions. First, we develop diversity-enhanced fusion teaming mechanisms, including diversity-enhanced joint training algorithms, for producing high diversity fusion detectors. Second, we introduce a three-tier detection fusion framework and a graph partitioning algorithm to construct fusion-verified detection outputs through three mutually reinforcing components: objectness fusion, bounding box fusion, and classification fusion. Third but not least, we provide a formal analysis of robustness enhancement by FUSE-protected systems. Extensive experiments are conducted on eleven detectors from three families of detection algorithms on two benchmark datasets. We show that FUSE guarantees strong robustness in mitigating the state-of-the-art deception attacks, including adversarial patches - a form of physical attacks using confined visual distortion.

FASER: Seismic Phase Identifier for Automated Monitoring

Farhan Asif Chowdhury,M Ashraf Siddiquee,Glenn Eli Baker,Abdullah Mueen

Seismic phase identification classifies the type of seismic wave received at a station based on the waveform (i.e., time series) recorded by a seismometer. Automated phase identification is an integrated component of large scale seismic monitoring applications, including earthquake warning systems and underground explosion monitoring. Accurate, fast, and fine-grained phase identification is instrumental for earthquake location estimation, understanding Earths crustal and mantle structure for predictive modeling, etc. However, existing operational systems utilize multiple nearby stations for precise identification, which delays response time with added complexity and manual interventions. Moreover, single-station systems mostly perform coarse phase identification. In this paper, we revisit the seismic phase classification as an integrated part of a seismic processing pipeline. We develop a machine-learned model FASER, that takes input from a signal detector and produces phase types as output for a signal associator. The model is a combination of convolutional and long short-term memory networks. Our method identifies finer wave types, including crustal and mantle phases. We conduct comprehensive experiments on real datasets to show that FASER outperforms existing baselines. We evaluate FASER holding out sources and stations across the world to demonstrate consistent performance for novel sources and stations.

On Post-selection Inference in A/B Testing

Alex Deng,Yicheng Li,Jiannan Lu,Vivek Ramamurthy

When interpreting A/B tests, we typically focus only on the statistically significant results and take them by face value. This practice, termed post-selection inference in the statistical literature, may negatively affect both point estimation and uncertainty quantification, and therefore hinder trustworthy decision making in A/B testing. To address this issue, in this paper we explore two seemingly unrelated paths, one based on supervised machine learning and the other on empirical Bayes, and propose post-selection inferential approaches that combine the strengths of both. Through large-scale simulated and empirical examples, we demonstrate that our proposed methodologies stand out among other existing ones in both reducing post-selection biases and improving confidence interval coverage rates, and discuss how they can be conveniently adjusted to real-life scenarios.

Globally Optimized Matchmaking in Online Games

Qilin Deng,Hao Li,Kai Wang,Zhipeng Hu,Runze Wu,Linxia Gong,Jianrong Tao,Changjie Fan,Peng Cui

As one of the core components of online games, matchmaking is the process of arranging multiple players into matches, where the quality of matchmaking systems directly determines player satisfaction and further affects the life cycle of game products. With the number of candidate players increases, the number of possible match combinations grows exponentially, which makes the current implementation for multiplayer matchmaking can only obtain locally optimal arrangement in an inefficient fashion. In this paper, we focus on the globally optimized matchmaking problem, in which the objective is to decide an optimal matching sequence for the queuing players. To tackle this challenging problem, we propose a novel data-driven matchmaking framework, called GloMatch, based on machine learning principles. Through transforming the matchmaking problem into a sequential decision problem, we solve it with the help of an effective policy-based deep reinforcement learning algorithm. Quantitative experiments on simulation and online game environments demonstrate the effectiveness of the presented framework.

Causal and Interpretable Rules for Time Series Analysis

Amin Dhaou,Antoine Bertoncello,Su00e9bastien Gourvu00e9nec,Josselin Garnier,Erwan Le Pennec

The number of complex infrastructures in an industrial setting is growing and is not immune to unexplained recurring events such as breakdowns or failure that can have an economic and environmental impact. To understand these phenomena, sensors have been placed on the different infrastructures to track, monitor, and control the dynamics of the systems. The causal study of these data allows predictive and prescriptive maintenance to be carried out. It helps to understand the appearance of a problem and find counterfactual outcomes to better operate and defuse the event. In this paper, we introduce a novel approach combining the case-crossover design which is used to investigate acute triggers of diseases in epidemiology, and the Apriori algorithm which is a data mining technique allowing to find relevant rules in a dataset. The resulting time series causal algorithm extracts interesting rules in our application case which is a non-linear time series dataset. In addition, a predictive rule-based algorithm demonstrates the potential of the proposed method.

Deep Learning based Crop Row Detection with Online Domain Adaptation

Rashed Doha,Mohammad Al Hasan,Sohel Anwar,Veera Rajendran

Detecting crop rows from video frames in real time is a fundamental challenge in the field of precision agriculture. Deep learning based semantic segmentation method, namely U-net, although successful in many tasks related to precision agriculture, performs poorly for solving this task. The reasons include paucity of large scale labeled datasets in this domain, diversity in crops, and the diversity of appearance of the same crops at various stages of their growth. In this work, we discuss the development of a practical real-life crop row detection system in collaboration with an agricultural sprayer company. Our proposed method takes the output of semantic segmentation using U-net, and then apply a clustering based probabilistic temporal calibration which can adapt to different fields and crops without the need for retraining the network. Experimental results validate that our method can be used for both refining the results of the U-net to reduce errors and also for frame interpolation of the input video stream.

Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning

Chao Du,Zhifeng Gao,Shuo Yuan,Lining Gao,Ziyan Li,Yifan Zeng,Xiaoqiang Zhu,Jian Xu,Kun Gai,Kuang-Chih Lee

Modern online advertising systems inevitably rely on personalization methods, such as click-through rate (CTR) prediction. Recent progress in CTR prediction enjoys the rich representation capabilities of deep learning and achieves great success in large-scale industrial applications. However, these methods can suffer from lack of exploration. Another line of prior work addresses the exploration-exploitation trade-off problem with contextual bandit methods, which are recently less studied in the industry due to the difficulty in extending their flexibility with deep models. In this paper, we propose a novel Deep Uncertainty-Aware Learning (DUAL) method to learn CTR models based on Gaussian processes, which can provide predictive uncertainty estimations while maintaining the flexibility of deep neural networks. DUAL can be easily implemented on existing models and deployed in real-time systems with minimal extra computational overhead. By linking the predictive uncertainty estimation ability of DUAL to well-known bandit algorithms, we further present DUAL-based Ad-ranking strategies to boost up long-term utilities such as the social welfare in advertising systems. Experimental results on several public datasets demonstrate the effectiveness of our methods. Remarkably, an online A/B test deployed in the Alibaba display advertising platform shows an 8.2% social welfare improvement and an 8.0% revenue lift.

Clustering for Private Interest-based Advertising

Alessandro Epasto,Andru00e9s Muu00f1oz Medina,Steven Avery,Yijian Bai,Robert Busa-Fekete,CJ Carey,Ya Gao,David Guthrie,Subham Ghosh,James Ioannidis,Junyi Jiao,Jakub Lacki,Jason Lee,Arne Mauser,Brian Milch,Vahab Mirrokni,Deepak Ravichandran,Wei Shi,Max Spero,Yunting Sun,Umar Syed,Sergei Vassilvtiskii,Shuo Wang

We study the problem of designing privacy-enhanced solutions for interest-based advertisement (IBA). IBA is a key component of the online ads ecosystem and provides a better ad experience to users. Indeed, IBA enables advertisers to show users impressions that are relevant to them. Nevertheless, the current way ad tech companies achieve this is by building detailed interest profiles for individual users. In this work we ask whether such fine grained personalization is required, and present mechanisms that achieve competitive performance while giving privacy guarantees to the end users. More precisely we present the first detailed exploration of how to implement Chromes Federated Learning of Cohorts (FLoC) API. We define the privacy properties required for the API and evaluate multiple hashing and clustering algorithms discussing the trade-offs between utility, privacy, and ease of implementation.

Automated Testing of Graphics Units by Deep-Learning Detection of Visual Anomalies

Lev Faivishevsky,Adi Szeskin,Ashwin K. Muppalla,Ravid Shwartz-Ziv,Itamar Ben Ari,Ronen Laperdon,Benjamin Melloul,Tahi Hollander,Tom Hope,Amitai Armon

We present a novel system for performing real-time detection of diverse visual corruptions in videos, for validating the quality of graphics units in our company. The system is used for several types of content, including movies and 3D graphics, with strict constraints on low false alert rates and real-time processing of millions of video frames per day. These constraints required novel solutions involving both hardware and software, including new supervised and weakly-supervised methods we developed. Our deployed system has enabled a ~20X reduction of human effort and discovering new corruptions missed by humans and existing approaches.

Heterogeneous Temporal Graph Transformer: An Intelligent System for Evolving Android Malware Detection

Yujie Fan,Mingxuan Ju,Shifu Hou,Yanfang Ye,Wenqiang Wan,Kui Wang,Yinming Mei,Qi Xiong

The explosive growth and increasing sophistication of Android malware call for new defensive techniques to protect mobile users against novel threats. To address this challenge, in this paper, we propose and develop an intelligent system named Dr.Droid to jointly model malware propagation and evolution for their detection at the first attempt. In Dr.Droid, we first exploit higher-level semantic and social relations within the ecosystem (e.g., app-market, app-developer, market-developer relations etc.) to characterize app propagation patterns; and then we present a structured heterogeneous graph to model the complex relations among different types of entities. To capture malware evolution, we further consider the temporal dependence and introduce a heterogeneous temporal graph to jointly model malware propagation and evolution by considering heterogeneous spatial dependencies with temporal dimensions. Afterwards, we propose a novel heterogeneous temporal graph transformer framework (denoted as HTGT) to integrate both spatial and temporal dependencies while preserving the heterogeneity to learn node representations for malware detection. Specifically, in our proposed HTGT, to preserve the heterogeneity, we devise a heterogeneous spatial transformer to derive heterogeneous attentions over each node and edge to learn dedicated representations for different types of entities and relations; to model temporal dependencies, we design a temporal transformer into the HTGT to attentively aggregate its historical sequences of a given node (e.g., app); the two transformers work in an iterative manner for representation learning. Promising experimental results based on the large-scale sample collections from anti-malware industry demonstrate the performance of Dr.Droid, by comparison with state-of-the-art baselines and popular mobile security products.

SSML: Self-Supervised Meta-Learner for En Route Travel Time Estimation at Baidu Maps

Xiaomin Fang,Jizhou Huang,Fan Wang,Lihang Liu,Yibo Sun,Haifeng Wang

Travel time estimation (TTE) is one of the most critical modules at Baidu Maps, which plays a vital role in intelligent transportation services such as route planning and navigation. During the driving en route, the navigation system of Baidu Maps can provide real-time estimations on when a user will arrive at the destination. It automatically recalculates and updates the remaining travel time from the drivers current position to the destination (hereafter referred to as remaining route) every few minutes. The previously deployed TTE model at Baidu Maps, i.e., ConSTGAT, takes the remaining route as well as the current time as input and provides the corresponding estimated time of arrival. However, it ignores the route that has been already traveled from the origin to the drivers current position (hereafter referred to as traveled route), which could contribute to improving the accuracy of time estimation. In this work, we believe that the traveled route conveys valuable evidence that could facilitate the modeling of driving preference and take that into consideration for the task of en route travel time estimation (ER-TTE). This task is non-trivial because it requires adapting fast to a users driving preference using a few observed behaviors in the traveled route. To this end, we frame ER-TTE as a few-shot learning problem and consider the observed behaviors in the traveled route as training examples while the future behaviors in the remaining route as test examples. To tackle the few-shot learning problem, we propose a novel model-based meta-learning approach, called SSML, to learn the meta-knowledge so as to fast adapt to a users driving preference and improve the time estimation of the remaining route. SSML leverages the technique of self-supervised learning, which is equivalent to generating a significant number of synthetic learning tasks, to further improve the performance. Extensive offline tests conducted on large-scale real-world datasets collected from Baidu Maps demonstrate the superiority of SSML. The online tests before deploying in production were successfully performed, which confirms the practical applicability of SSML.

MoCha: Large-Scale Driving Pattern Characterization for Usage-based Insurance

Zhihan Fang,Guang Yang,Dian Zhang,Xiaoyang Xie,Guang Wang,Yu Yang,Fan Zhang,Desheng Zhang

Given widely adopted vehicle tracking technologies, usage-based insurance has been a rising market over the past few years. With potential discounts from insurance companies, customers voluntarily install sensing devices in their vehicles for insurance companies, which are utilized to analyze their historical driving patterns to derive the risks of future driving. However, it is challenging to characterize and predict driving patterns, especially for new users with limited data. To address this issue, we propose and evaluate a system called MoCha to accurately characterize driving patterns for usage-based insurance. The key question we aim to explore with MoCha is whether we can fully explore long-term driving patterns of new users with only limited historical data of themselves by leveraging abundant data of other users and contextual information. To answer this question, we design (i) a multi-level driving pattern modeling component to capture the spatial-temporal dependency on both individual and group level, and (ii) a multi-task learning method to utilize underlying relations of driving metrics and predict multiple driving metrics simultaneously. We implement and evaluate MoCha with real-world on-board diagnostics data from a large insurance company with more than 340,000 vehicles. Further, we validate the usefulness of MoCha by predicting driving risks based on real-world claim data in a Chinese city, Shenzhen.

Adversarial Attacks on Deep Models for Financial Transaction Records

Ivan Fursov,Matvey Morozov,Nina Kaploukhaya,Elizaveta Kovtun,Rodrigo Rodrigo Rivera-Castro,Gleb Gusev,Dmitry Babaev,Ivan Kireev,Alexey Zaytsev,Evgeny Burnaev

Machine learning models using transaction records as inputs are popular among financial institutions. The most efficient models use deep-learning architectures similar to those in the NLP community, posing a challenge due to their tremendous number of parameters and limited robustness. In particular, deep-learning models are vulnerable to adversarial attacks: a little change in the input harms the models output. In this work, we examine adversarial attacks on transaction records data and defenses from these attacks. The transaction records data have a different structure than the canonical NLP or time-series data, as neighboring records are less connected than words in sentences, and each record consists of both discrete merchant code and continuous transaction amount. We consider a black-box attack scenario, where the attack doesnt know the true decision model and pay special attention to adding transaction tokens to the end of a sequence. These limitations provide a more realistic scenario, previously unexplored in the NLP world. The proposed adversarial attacks and the respective defenses demonstrate remarkable performance using relevant datasets from the financial industry. Our results show that a couple of generated transactions are sufficient to fool a deep-learning model. Further, we improve model robustness via adversarial training or separate adversarial examples detection. This work shows that embedding protection from adversarial attacks improves model robustness, allowing a wider adoption of deep models for transaction records in banking and finance.

A Deep Learning Method for Route and Time Prediction in Food Delivery Service

Chengliang Gao,Fan Zhang,Guanqun Wu,Qiwan Hu,Qiang Ru,Jinghua Hao,Renqing He,Zhizhao Sun

Online food ordering and delivery service has widely served peoples daily demands worldwide, e.g., it has reached a number of 34.9 million online orders per day in Q3 of 2020 in Meituan food delivery platform. For the food delivery service, accurate estimation of the drivers delivery route and time, defined as the FD-RTP task, is very significant to customer satisfaction and driver experience. In the paper, we apply deep learning to the FD-RTP task for the first time, and propose a deep network named FDNET. Different from traditional heuristic search algorithms, we predict the probability of each feasible location the driver will visit next, through mining a large amount of food delivery data. Guided by the probabilities, FDNET greatly reduces the search space in delivery route generation, and the calculation times of time prediction. As a result, various kinds of information can be fully utilized in FDNET within the limited computation time. Careful consideration of the factors having effect on the drivers behaviors and introduction of more abundant spatiotemporal information both contribute to the improvements. Offline experiments over the large-scale real-world dataset, and online A/B test demonstrate the effectiveness of our proposed FDNET.

An Embedding Learning Framework for Numerical Features in CTR Prediction

Huifeng Guo,Bo Chen,Ruiming Tang,Weinan Zhang,Zhenguo Li,Xiuqiang He

Click-Through Rate (CTR) prediction is critical for industrial recommender systems, where most deep CTR models follow an Embedding & Feature Interaction paradigm. However, the majority of methods focus on designing network architectures to better capture feature interactions while the feature embedding, especially for numerical features, has been overlooked. Existing approaches for numerical features are difficult to capture informative knowledge because of the low capacity or hard discretization based on the offline expertise feature engineering. In this paper, we propose a novel embedding learning framework for numerical features in CTR prediction (AutoDis) with high model capacity, end-to-end training and unique representation properties preserved. AutoDis consists of three core components: meta-embeddings, automatic discretization and aggregation. Specifically, we propose meta-embeddings for each numerical field to learn global knowledge from the perspective of field with a manageable number of parameters. Then the differentiable automatic discretization performs soft discretization and captures the correlations between the numerical features and meta-embeddings. Finally, distinctive and informative embeddings are learned via an aggregation function. Comprehensive experiments on two public and one industrial datasets are conducted to validate the effectiveness of AutoDis. Moreover, AutoDis has been deployed onto a mainstream advertising platform, where online A/B test demonstrates the improvement over the base model by 2.1% and 2.7% in terms of CTR and eCPM, respectively. In addition, the code of our framework is publicly available in MindSpore.

We Know What You Want: An Advertising Strategy Recommender System for Online Advertising

Liyi Guo,Junqi Jin,Haoqi Zhang,Zhenzhe Zheng,Zhiye Yang,Zhizhuang Xing,Fei Pan,Lvyin Niu,Fan Wu,Haiyang Xu,Chuan Yu,Yuning Jiang,Xiaoqiang Zhu

Advertising expenditures have become the major source of revenue for e-commerce platforms. Providing good advertising experiences for advertisers by reducing their costs of trial and error in discovering the optimal advertising strategies is crucial for the long-term prosperity of online advertising. To achieve this goal, the advertising platform needs to identify the advertisers optimization objectives, and then recommend the corresponding strategies to fulfill the objectives. In this work, we first deploy a prototype of strategy recommender system on Taobao display advertising platform, which indeed increases the advertisers performance and the platforms revenue, indicating the effectiveness of strategy recommendation for online advertising. We further augment this prototype system by explicitly learning the advertisers preferences over various advertising performance indicators and then optimization objectives through their adoptions of different recommending advertising strategies. We use contextual bandit algorithms to efficiently learn the advertisers preferences and maximize the recommendation adoption, simultaneously. Simulation experiments based on Taobao online bidding data show that the designed algorithms can effectively optimize the strategy adoption rate of advertisers.

Training Recommender Systems at Scale: Communication-Efficient Model and Data Parallelism

Vipul Gupta,Dhruv Choudhary,Peter Tang,Xiaohan Wei,Xing Wang,Yuzhen Huang,Arun Kejariwal,Kannan Ramchandran,Michael W. Mahoney

In this paper, we consider hybrid parallelism---a paradigm that employs both Data Parallelism (DP) and Model Parallelism (MP)---to scale distributed training of large recommendation models. We propose a compression framework called Dynamic Communication Thresholding (DCT) for communication-efficient hybrid training. DCT filters the entities to be communicated across the network through a simple hard-thresholding function, allowing only the most relevant information to pass through. For communication efficient DP, DCT compresses the parameter gradients sent to the parameter server during model synchronization. The threshold is updated only once every few thousand iterations to reduce the computational overhead of compression. For communication efficient MP, DCT incorporates a novel technique to compress the activations and gradients sent across the network during the forward and backward propagation, respectively. This is done by identifying and updating only the most relevant neurons of the neural network for each training sample in the data. We evaluate DCT on publicly available natural language processing and recommender models and datasets, as well as recommendation systems used in production at Facebook. DCT reduces communication by at least 100x and 20x during DP and MP, respectively. The algorithm has been deployed in production, and it improves end-to-end training time for a state-of-the-art industrial recommender model by 37%, without any loss in performance.

Budget Allocation as a Multi-Agent System of Contextual & Continuous Bandits

Benjamin Han,Carl Arndt

Budget allocation for online advertising suffers from multiple complications, including significant delay between the initial ad impression to the call to action as well as cold-start prediction problems for ad campaigns with limited or no historical performance data. To address these issues, we introduce the Contextual Budgeting System (CBS ), a budget allocation framework using a multi-agent system of contextual & continuous Multi-Armed Bandits. Our proposed solution decomposes the problem into a convex optimization problem whose objective is drawn using Thompson Sampling. In order to efficiently deal with context and cold-start, we propose a transfer learning mechanism using supervised learning methods that augment simple parametric models.We apply an implementation of this algorithm to all spending for new driver acquisition at Lyft and measure a (22 u00b1 10)% improvement in the mean Cost Per user Acquisition (CPA) over a previous non-contextual model, based on Markov Chain Monte-Carlo, generating tens of millions of dollars annually in efficiency improvements while also increasing total user acquisition.

MEDTO: Medical Data to Ontology Matching Using Hybrid Graph Neural Networks

Junheng Hao,Chuan Lei,Vasilis Efthymiou,Abdul Quamar,Fatma u00d6zcan,Yizhou Sun,Wei Wang

Medical ontologies are widely used to describe and organize medical terminologies and to support many critical applications on healthcare databases. These ontologies are often manually curated (e.g., UMLS, SNOMED CT, and MeSH) by medical experts. Medical databases, on the other hand, are often created by database administrators, using different terminology and structures. The discrepancies between medical ontologies and databases compromise interoperability between them. Data to ontology matching is the process of finding semantic correspondences between tables in databases to standard ontologies. Existing solutions such as ontology matching have mostly focused on engineering features from terminological, structural, and semantic model information extracted from the ontologies. However, this is often labor intensive and the accuracy varies greatly across different ontologies. Worse yet, the ontology capturing a medical database is often not given in practice. In this paper, we propose MEDTO, a novel end-to-end framework that consists of three innovative techniques: (1) a lightweight yet effective method that bootstrap a semantically rich ontology from a given medical database, (2) a hyperbolic graph convolution layer that encodes hierarchical concepts in the hyperbolic space, and (3) a heterogeneous graph layer that encodes both local and global context information of a concept. Experiments on two real-world medical datasets matching against SNOMED CT show significant improvements compared to the state-of-the-art methods. MEDTO also consistently achieves competitive results on a benchmark from the Ontology Alignment Evaluation Initiative.

Adversarial Feature Translation for Multi-domain Recommendation

Xiaobo Hao,Yudan Liu,Ruobing Xie,Kaikai Ge,Linyao Tang,Xu Zhang,Leyu Lin

Real-world super platforms such as Google and WeChat usually have different recommendation scenarios to provide heterogeneous items for users diverse demands. Multi-domain recommendation (MDR) is proposed to improve all recommendation domains simultaneously, where the key point is to capture informative domain-specific features from all domains. To address this problem, we propose a novel Adversarial feature translation (AFT) model for MDR, which learns the feature translations between different domains under a generative adversarial network framework. Precisely, in the multi-domain generator, we propose a domain-specific masked encoder to highlight inter-domain feature interactions, and then aggregate these features via a transformer and a domain-specific attention. In the multi-domain discriminator, we explicitly model the relationships between item, domain and users general/domain-specific representations with a two-step feature translation inspired by the knowledge representation learning. In experiments, we evaluate AFT on a public and an industrial MDR datasets and achieve significant improvements. We also conduct an online evaluation on a real-world MDR system. We further give detailed ablation tests and model analyses to verify the effectiveness of different components. Currently, we have deployed AFT on WeChat Top Stories. The source code is in https://github.com/xiaobocser/AFT.

Neural Instant Search for Music and Podcast

Helia Hashemi,Aasish Pappu,Mi Tian,Praveen Chandar,Mounia Lalmas,Benjamin Carterette

Over recent years, podcasts have emerged as a novel medium for sharing and broadcasting information over the Internet. Audio streaming platforms originally designed for music content, such as Amazon Music, Pandora, and Spotify, have reported a rapid growth, with millions of users consuming podcasts every day. With podcasts emerging as a new medium for consuming information, the need to develop information access systems that enable efficient and effective discovery from a heterogeneous collection of music and podcasts is more important than ever. However, information access in such domains still remains understudied. In this work, we conduct a large-scale log analysis to study and compare podcast and music search behavior on Spotify, a major audio streaming platform. Our findings suggest that there exist fundamental differences in user behavior while searching for podcasts compared to music. Specifically, we identify the need to improve podcast search performance. We propose a simple yet effective transformer-based neural instant search model that retrieves items from a heterogeneous collection of music and podcast content. Our model takes advantage of multi-task learning to optimize for a ranking objective in addition to a query intent type identification objective. Our experiments on large-scale search logs show that the proposed model significantly outperforms strong baselines for both podcast and music queries.

A Unified Solution to Constrained Bidding in Online Display Advertising

Yue He,Xiujun Chen,Di Wu,Junwei Pan,Qing Tan,Chuan Yu,Jian Xu,Xiaoqiang Zhu

In online display advertising, advertisers usually participate in real-time bidding to acquire ad impression opportunities. In most advertising platforms, a typical impression acquiring demand of advertisers is to maximize the sum value of winning impressions under budget and some key performance indicators constraints, (e.g. maximizing clicks with the constraints of budget and cost per click upper bound). The demand can be various in value type (e.g. ad exposure/click), constraint type (e.g. cost per unit value) and constraint number. Existing works usually focus on a specific demand or hardly achieve the optimum. In this paper, we formulate the demand as a constrained bidding problem, and deduce a unified optimal bidding function on behalf of an advertiser. The optimal bidding function facilitates an advertiser calculating bids for all impressions with only m parameters, where m is the constraint number. However, in real application, it is non-trivial to determine the parameters due to the non-stationary auction environment. We further propose a reinforcement learning (RL) method to dynamically adjust parameters to achieve the optimum, whose converging efficiency is significantly boosted by the recursive optimization property in our formulation. We name the formulation and the RL method, together, as Unified Solution to Constrained Bidding (USCB). USCB is verified to be effective on industrial datasets and is deployed in Alibaba display advertising platform.

Sliding Spectrum Decomposition for Diversified Recommendation

Yanhua Huang,Weikun Wang,Lei Zhang,Ruiwen Xu

Content feed, a type of product that recommends a sequence of items for users to browse and engage with, has gained tremendous popularity among social media platforms. In this paper, we propose to study the diversity problem in such a scenario from an item sequence perspective using time series analysis techniques. We derive a method calledsliding spectrum decomposition (SSD) that captures users perception of diversity in browsing a long item sequence. We also share our experiences in designing and implementing a suitable item embedding method for accurate similarity measurement under long tail effect. Combined together, they are now fully implemented and deployed in Xiaohongshu Apps production recommender system that serves the main Explore Feed product for tens of millions of users every day. We demonstrate the effectiveness and efficiency of the method through theoretical analysis, offline experiments and online A/B tests.

Hierarchical Training: Scaling Deep Recommendation Models on Large CPU Clusters

Yuzhen Huang,Xiaohan Wei,Xing Wang,Jiyan Yang,Bor-Yiing Su,Shivam Bharuka,Dhruv Choudhary,Zewei Jiang,Hai Zheng,Jack Langman

Neural network based recommendation models are widely used to power many internet-scale applications including product recommendation and feed ranking. As the models become more complex and more training data is required during training, improving the training scalability of these recommendation models becomes an urgent need. However, improving the scalability without sacrificing the model quality is challenging. In this paper, we conduct an in-depth analysis of the scalability bottleneck in existing training architecture on large scale CPU clusters. Based on these observations, we propose a new training architecture called Hierarchical Training, which exploits both data parallelism and model parallelism for the neural network part of the model within a group. We implement hierarchical training with a two-layer design: a tagging system that decides the operator placement and a net transformation system that materializes the training plans, and integrate hierarchical training into existing training stack. We propose several optimizations to improve the scalability of hierarchical training including model architecture optimization, communication compression, and various system-level improvements. Extensive experiments at massive scale demonstrate that hierarchical training can speed up distributed recommendation model training by 1.9x without model quality drop.

Deep Inclusion Relation-aware Network for User Response Prediction at Fliggy

Zai Huang,Mingyuan Tao,Bufeng Zhang

User response prediction plays a crucial role in many applications (e.g. search ranking and personalized recommendation) at online travel platforms. Although existing methods have made a great success by focusing on feature interaction or user behaviors, they cannot synthetically exploit item inclusion relations describing relationships of an item including or being included by another one, which are important components among travel items. To this end, in this paper, we propose a novel Deep Inclusion Relation-aware Network (DIRN) for user response prediction by synthetically exploiting inclusion relations among travel items. Specifically, on the item graph constructed with inclusion relations, we first leverage a node embedding approach to learn the item graph-based embedding. Then, we design Representation-based Interest Layer and Relation Path Interest Layer to extract user latent interest with user behaviors in two ways. Representation-based Interest Layer models the item-to-item similarity based on item representations containing the graph-based embedding with an attention mechanism and obtains user temporal interest by summing up representations of interacted items with similarities. Relation Path Interest Layer measures item-to-item realistic associations to extract user interest with inclusion relation paths. Offline experiments on a real-world data from Fliggy clearly validate the effectiveness of DIRN. Furthermore, DIRN has been successfully deployed online in search ranking at Fliggy and achieves significant improvement.

MPCSL - A Modular Pipeline for Causal Structure Learning

Johannes Huegle,Christopher Hagedorn,Michael Perscheid,Hasso Plattner

The examination of causal structures is crucial for data scientists in a variety of machine learning application scenarios. In recent years, the corresponding interest in methods of causal structure learning has led to a wide spectrum of independent implementations, each having specific accuracy characteristics and introducing implementation-specific overhead in the runtime. Hence, considering a selection of algorithms or different implementations in different programming languages utilizing different hardware setups becomes a tedious manual task with high setup costs. Consequently, a tool that enables to plug in existing methods from different libraries into a single system to compare and evaluate the results is substantial support for data scientists in their research efforts.In this work, we propose an architectural blueprint of a pipeline for causal structure learning and outline our reference implementation MPCSL that addresses the requirements towards platform independence and modularity while ensuring the comparability and reproducibility of experiments. Moreover, we demonstrate the capabilities of MPCSL within a case study, where we evaluate existing implementations of the well-known PC-Algorithm concerning their runtime performance characteristics.

Knowledge-Guided Efficient Representation Learning for Biomedical Domain

Kishlay Jha,Guangxu Xun,Nan Du,Aidong Zhang

Pre-trained concept representations are essential to many biomedical text mining and natural language processing tasks. As such, various representation learning approaches have been proposed in the literature. More recently, contextualized embedding approaches (i.e., BERT based models) that capture the implicit semantics of concepts at a granular level have significantly outperformed the conventional word embedding approaches (i.e., Word2Vec/GLoVE based models). Despite significant accuracy gains achieved, these approaches are often computationally expensive and memory inefficient. To address this issue, we propose a new representation learning approach that efficiently adapts the concept representations to the newly available data. Specifically, the proposed approach develops a knowledge-guided continual learning strategy wherein the accurate/stable context-information present in human-curated knowledge-bases is exploited to continually identify and retrain the representations of those concepts whose corpus-based context evolved coherently over time. Different from previous studies that mainly leverage the curated knowledge to improve the accuracy of embedding models, the proposed research explores the usefulness of semantic knowledge from the perspective of accelerating the training efficiency of embedding models. Comprehensive experiments under various efficiency constraints demonstrate that the proposed approach significantly improves the computational performance of biomedical word embedding models.

Bootstrapping for Batch Active Sampling

Heinrich Jiang,Maya R. Gupta

The goal of active learning is to select the best examples from an unlabeled pool of data to label to improve a model trained with the addition of these labeled examples. We discuss a real-world use case for batch active sampling that works at larger scales. The standard margin algorithm has repeatedly been shown difficult to beat in practice for the classic active sampling set-up, but for larger batches and candidate pools, we show that margin sampling may not provide enough diversity. We present a simple variant of margin sampling for the batch setting that scores candidate samples by their minimum margin to a set of bootstrapped margins, and explain how this proposal increases diversity in a supervised and efficient way, and why it differs from the usual ensemble methods for active sampling. Experiments on benchmark datasets show that the proposed min-margin sampling consistently works better than margin as the batch size grows, and better than the five other diversity-encouraging active sampling methods we tested. Two real-world case studies illustrate the practical value, and help highlight challenges of applying and deploying batch active sampling.

FleetRec: Large-Scale Recommendation Inference on Hybrid GPU-FPGA Clusters

Wenqi Jiang,Zhenhao He,Shuai Zhang,Kai Zeng,Liang Feng,Jiansong Zhang,Tongxuan Liu,Yong Li,Jingren Zhou,Ce Zhang,Gustavo Alonso

We present FleetRec, a high-performance and scalable recommendation inference system within tight latency constraints. FleetRec takes advantage of heterogeneous hardware including GPUs and the latest FPGAs equipped with high-bandwidth memory. By disaggregating computation and memory to different types of hardware and bridging their connections by high-speed network, FleetRec gains the best of both worlds, and can naturally scale out by adding nodes to the cluster. Experiments on three production models up to 114 GB show that FleetRec outperforms optimized CPU baseline by more than one order of magnitude in terms of throughput while achieving significantly lower latency.

Network Experimentation at Scale

Brian Karrer,Liang Shi,Monica Bhole,Matt Goldman,Tyrone Palmer,Charlie Gelman,Mikael Konutgan,Feng Sun

We describe our network experimentation framework, deployed at Facebook, which accounts for interference between experimental units. We document this system, including the design and estimation procedures, and detail insights we have gained from the many experiments that have used this system at scale. In our estimation procedure, we introduce a cluster-based regression adjustment that substantially improves precision for estimating global treatment effects, as well as a procedure to test for interference. With our regression adjustment, we find that imbalanced clusters can better account for interference than balanced clusters without sacrificing accuracy. In addition, we show that logging exposure to a treatment can result in additional variance reduction. Interference is a widely acknowledged issue in online field experiments, yet there is less evidence from real-world experiments demonstrating interference in online settings. We fill this gap by describing two case studies that capture significant network effects and highlight the value of this experimentation framework.

Addressing Non-Representative Surveys using Multiple Instance Learning

Yaniv Katz,Oded Vainas

In recent years, non representative survey sampling and non response bias constitute major obstacles in obtaining a reliable population quantity estimate from finite survey samples. As such, researchers have been focusing on identifying methods to resolve these biases. In this paper, we look at this well known problem from a fresh perspective, and formulate it as a learning problem. To meet this challenge, we suggest solving the learning problem using a multiple instance learning (MIL) paradigm. We devise two different MIL based neural network topologies, each based on a different implementation of an attention pooling layer. These models are trained to accurately infer the population quantity of interest even when facing a biased sample. To the best of our knowledge, this is the first time MIL has ever been suggested as a solution to this problem. In contrast to commonly used statistical methods, this approach can be accomplished without having to collect sensitive personal data of the respondents and without having to access population level statistics of the same sensitive data. To validate the effectiveness of our approaches, we test them on a real-world movie rating dataset which is used to mimic a biased survey by experimentally contaminating it with different kinds of survey bias. We show that our suggested topologies outperform other MIL architectures, and are able to partly counter the adverse effect of biased sampling on the estimation quality. We also demonstrate how these methods can be easily adapted to perform well even when part of the survey is based on a small number of respondents.

Micro-climate Prediction - Multi Scale Encoder-decoder based Deep Learning Framework

Peeyush Kumar,Ranveer Chandra,Chetan Bansal,Shivkumar Kalyanaraman,Tanuja Ganu,Michael Grant

This paper presents a deep learning approach for a versatile Micro-climate prediction framework (DeepMC). Micro climate predictions are of critical importance across various applications, such as Agriculture, Forestry, Energy, Search & Rescue, etc. To the best of our knowledge, there is no other single framework which can accurately predict various micro-climate entities using Internet of Things (IoT)data. We present a generic framework (DeepMC) which predicts various climatic parameters such as soil moisture, humidity, windspeed, radiation, temperature based on the requirement over a period of 12 hours - 120 hours with a varying resolution of 1 hour - 6hours, respectively. This framework proposes the following new ideas: 1) Localization of weather forecast to IoT sensors by fusing weather station forecasts with the decomposition of IoT data at multiple scales and 2) A multi-scale encoder and two levels of attention mechanisms which learns a latent representation of the interaction between various resolutions of the IoT sensor data and weather station forecasts. We present multiple real-world agricultural and energy scenarios, and report results with uncertainty estimates from the live deployment of DeepMC, which demonstrate that DeepMC outperforms various baseline methods and reports 90%+ accuracy with tight error bounds.

Architecture and Operation Adaptive Network for Online Recommendations

Lang Lang,Zhenlong Zhu,Xuanye Liu,Jianxin Zhao,Jixing Xu,Minghui Shan

Learning feature interactions is crucial for model performance in online recommendations. Extensive studies are devoted to designing effective structures for learning interactive information in an explicit way and tangible progress has been made. However, the core interaction calculations of these models are artificially specified, such as inner product, outer product and self-attention, which results in high dependence on domain knowledge. Hence model effect is bounded by both restriction of human experience and the finiteness of candidate operations. In this paper, we propose a generalized interaction paradigm to lift the limitation, where operations adopted by existing models can be regarded as its special form. Based on this paradigm, we design a novel model to adaptively explore and optimize the operation itself according to data, named generalized interaction network(GIN). We proved that GIN is a generalized form of a wide range of state-of-the-art models, which means GIN can automatically search for the best operation among these models as well as a broader underlying architecture space. Finally, an architecture adaptation method is introduced to further boost the performance of GIN by discriminating important interactions. Thereby, architecture and operation adaptive network(AOANet) is presented. Experiment results on two large scale datasets show the superiority of our model. AOANet has been deployed to industrial production. In a 7-day A/B test, the click-through rate increased by 10.94%, which represents considerable business benefits.

Dual Attentive Sequential Learning for Cross-Domain Click-Through Rate Prediction

Pan Li,Zhichao Jiang,Maofei Que,Yao Hu,Alexander Tuzhilin

Cross domain recommender system constitutes a powerful method to tackle the cold-start and sparsity problem by aggregating and transferring user preferences across multiple category domains. Therefore, it has great potential to improve click-through-rate prediction performance in online commerce platforms having many domains of products. While several cross domain sequential recommendation models have been proposed to leverage information from a source domain to improve CTR predictions in a target domain, they did not take into account bidirectional latent relations of user preferences across source-target domain pairs. As such, they cannot provide enhanced cross-domain CTR predictions for both domains simultaneously. In this paper, we propose a novel approach to cross-domain sequential recommendations based on the dual learning mechanism that simultaneously transfers information between two related domains in an iterative manner until the learning process stabilizes. In particular, the proposed Dual Attentive Sequential Learning (DASL) model consists of two novel components Dual Embedding and Dual Attention, which jointly establish the two-stage learning process: we first construct dual latent embeddings that extract user preferences in both domains simultaneously, and subsequently provide cross-domain recommendations by matching the extracted latent embeddings with candidate items through dual-attention learning mechanism. We conduct extensive offline experiments on three real-world datasets to demonstrate the superiority of our proposed model, which significantly and consistently outperforms several state-of-the-art baselines across all experimental settings. We also conduct an online A/B test at a major video streaming platform Alibaba-Youku, where our proposed model significantly improves business performance over the latest production system in the company.

Embedding-based Product Retrieval in Taobao Search

Sen Li,Fuyu Lv,Taiwei Jin,Guli Lin,Keping Yang,Xiaoyi Zeng,Xiao-Ming Wu,Qianli Ma

Nowadays, the product search service of e-commerce platforms has become a vital shopping channel in peoples life. The retrieval phase of products determines the search systems quality and gradually attracts researchers attention. Retrieving the most relevant products from a large-scale corpus while preserving personalized user characteristics remains an open question. Recent approaches in this domain have mainly focused on embedding-based retrieval (EBR) systems. However, after a long period of practice on Taobao, we find that the performance of the EBR system is dramatically degraded due to its: (1) low relevance with a given query and (2) discrepancy between the training and inference phases. Therefore, we propose a novel and practical embedding-based product retrieval model, named Multi-Grained Deep Semantic Product Retrieval (MGDSPR). Specifically, we first identify the inconsistency between the training and inference stages, and then use the softmax cross-entropy loss as the training objective, which achieves better performance and faster convergence. Two efficient methods are further proposed to improve retrieval relevance, including smoothing noisy training data and generating relevance-improving hard negative samples without requiring extra knowledge and training procedures. We evaluate MGDSPR on Taobao Product Search with significant metrics gains observed in offline experiments and online A/B tests. MGDSPR has been successfully deployed to the existing multi-channel retrieval system in Taobao Search. We also introduce the online deployment scheme and share practical lessons of our retrieval system to contribute to the community.

Debiasing Learning based Cross-domain Recommendation

Siqing Li,Liuyi Yao,Shanlei Mu,Wayne Xin Zhao,Yaliang Li,Tonglei Guo,Bolin Ding,Ji-Rong Wen

As it becomes prevalent that user information exists in multiple platforms or services, cross-domain recommendation has been an important task in industry. Although it is well known that users tend to show different preferences in different domains, existing studies seldom model how domain biases affect user preferences. Focused on this issue, we develop a casual-based approach to mitigating the domain biases when transferring the user information cross domains. To be specific, this paper presents a novel debiasing learning based cross-domain recommendation framework with causal embedding. In this framework, we design a novel Inverse-Propensity-Score (IPS) estimator designed for cross-domain scenario, and further propose three kinds of restrictions for propensity score learning. Our framework can be generally applied to various recommendation algorithms for cross-domain recommendation. Extensive experiments on both public and industry datasets have demonstrated the effectiveness of the proposed framework.

An Experimental Study of Quantitative Evaluations on Saliency Methods

Xiao-Hui Li,Yuhan Shi,Haoyang Li,Wei Bai,Caleb Chen Cao,Lei Chen

It has been long debated that eXplainable AI (XAI) is an important technology for model and data exploration, validation, and debugging. To deploy XAI into actual systems, an executable and comprehensive evaluation of the quality of generated explanation is highly in demand. In this paper, we briefly summarize the status quo of the quantitative metrics of different properties of XAI including evaluation on faithfulness, localization, sensitivity check, and stability. With an exhaustive experimental study based on them, we conclude that among all the typical methods we compare, no single explanation method dominates others in all metrics. Nonetheless, Gradient-weighted Class Activation Mapping (Grad-CAM) and Randomly Input Sampling for Explanation (RISE) perform fairly well in most of the metrics. We further present a novel utilization of the evaluation results to diagnose the classification bases for models. Hopefully, this valuable work could serve as a guide for future research.

Reinforced Iterative Knowledge Distillation for Cross-Lingual Named Entity Recognition

Shining Liang,Ming Gong,Jian Pei,Linjun Shou,Wanli Zuo,Xianglin Zuo,Daxin Jiang

Named entity recognition (NER) is a fundamental component in many applications, such as Web Search and Voice Assistants. Although deep neural networks greatly improve the performance of NER, due to the requirement of large amounts of training data, deep neural networks can hardly scale out to many languages in an industry setting. To tackle this challenge, cross-lingual NER transfers knowledge from a rich-resource language to languages with low resources through pre-trained multilingual language models. Instead of using training data in target languages, cross-lingual NER has to rely on only training data in source languages, and optionally adds the translated training data derived from source languages. However, the existing cross-lingual NER methods do not make good use of rich unlabeled data in target languages, which is relatively easy to collect in industry applications. To address the opportunities and challenges, in this paper we describe our novel practice in Microsoft to leverage such large amounts of unlabeled data in target languages in real production settings. To effectively extract weak supervision signals from the unlabeled data, we develop a novel approach based on the ideas of semi-supervised learning and reinforcement learning. The empirical study on three benchmark data sets verifies that our approach establishes the new state-of-the-art performance with clear edges. Now, the NER techniques reported in this paper are on their way to become a fundamental component for Web ranking, Entity Pane, Answers Triggering, and Question Answering in the Microsoft Bing search engine. Moreover, our techniques will also serve as part of the Spoken Language Understanding module for a commercial voice assistant. We plan to open source the code of the prototype framework after deployment.

Unveiling Fake Accounts at the Time of Registration: An Unsupervised Approach

Xiao Liang,Zheng Yang,Binghui Wang,Shaofeng Hu,Zijie Yang,Dong Yuan,Neil Zhenqiang Gong,Qi Li,Fang He

Online social networks (OSNs) are plagued by fake accounts. Existing fake account detection methods either require a manually labeled training set, which is time-consuming and costly, or rely on rich information of OSN accounts, e.g., content and behaviors, which incurs significant delay in detecting fake accounts. In this work, we propose UFA (Unveiling Fake Accounts) to detect fake accounts immediately after they are registered in an unsupervised fashion. First, through a measurement study on the registration patterns on a real-world registration dataset, we observe that fake accounts tend to cluster on outlier registration patterns, e.g., IP and phone numbers. Then, we design an unsupervised learning algorithm to learn weights for all registration accounts and their features that reveal outlier registration patterns. Next, we construct a registration graph to capture the correlation between registration accounts, and utilize a community detection method to detect fake accounts via analyzing the registration graph structure. We evaluate UFA using real-world WeChat datasets. Our results demonstrate that UFA achieves a precision 94% with a recall ~80%, while a supervised variant requires 600K manual labels to obtain the comparable performance. Moreover, UFA has been deployed by WeChat to detect fake accounts for more than one year. UFA detects 500K fake accounts per day with a precision ~93% on average, via manual verification by the WeChat security team.

M6: Multi-Modality-to-Multi-Modality Multitask Mega-transformer for Unified Pretraining

Junyang Lin,Rui Men,An Yang,Chang Zhou,Yichang Zhang,Peng Wang,Jingren Zhou,Jie Tang,Hongxia Yang

Multimodal pretraining has demonstrated success in the downstream tasks of cross-modal representation learning. However, it is limited to the English data, and there is still a lack of large-scale dataset for multimodal pretraining in Chinese. In this work, we propose the largest dataset for pretraining in Chinese, which consists of over 1.9TB images and 292GB texts. The dataset has large coverage over domains, including encyclopedia, question answering, forum discussion, etc. Besides, we propose a method called M6, referring to Multi-Modality-to-Multi-Modality Multitask Mega-transformer, for unified pretraining on the data of single modality and multiple modalities. The model is pretrained with our proposed tasks, including text-to-text transfer, image-to-text transfer, as well as multi-modality-to-text transfer. The tasks endow the model with strong capability of understanding and generation. We scale the model to 10 billion parameters, and build the largest pretrained model in Chinese. Experimental results show that our proposed M6 outperforms the baseline in a number of downstream tasks concerning both single modality and multiple modalities, and the 10B-parameter pretrained model demonstrates strong potential in the setting of zero-shot learning.

PAM: Understanding Product Images in Cross Product Category Attribute Extraction

Rongmei Lin,Xiang He,Jie Feng,Nasser Zalmout,Yan Liang,Li Xiong,Xin Luna Dong

Understanding product attributes plays an important role in improving online shopping experience for customers and serves asan integral part for constructing a product knowledge graph. Most existing methods focus on attribute extraction from text description or utilize visual information from product images such as shape and color. Compared to the inputs considered in prior works, a product image in fact contains more information, represented by a rich mixture of words and visual clues with a layout carefully designed to impress customers. This work proposes a more inclusive framework that fully utilizes these different modalities for attribute extraction.Inspired by recent works in visual question answering, we use a transformer based sequence to sequence model to fuse representations of product text, Optical Character Recognition (OCR) tokens and visual objects detected in the product image. The framework is further extended with the capability to extract attribute value across multiple product categories with a single model, by training the decoder to predict both product category and attribute value and conditioning its output on product category. The model provides a unified attribute extraction solution desirable at an e-commerce platform that offers numerous product categories with a diverse body of product attributes. We evaluated the model on two product attributes, one with many possible values and one with a small set of possible values, over 14 product categories and found the model could achieve 15% gain on the Recall and 10% gain on the F1 score compared to existing methods using text-only features.

Large-Scale Network Embedding in Apache Spark

Wenqing Lin

Network embedding has been widely used in social recommendation and network analysis, such as recommendation systems and anomaly detection with graphs. However, most of previous approaches cannot handle large graphs efficiently, due to that (i) computation on graphs is often costly and (ii) the size of graph or the intermediate results of vectors could be prohibitively large, rendering it difficult to be processed on a single machine. In this paper, we propose an efficient and effective distributed algorithm for network embedding on large graphs using Apache Spark, which recursively partitions a graph into several small-sized subgraphs to capture the internal and external structural information of nodes, and then computes the network embedding for each subgraph in parallel. Finally, by aggregating the outputs on all subgraphs, we obtain the embeddings of nodes in a linear cost. After that, we demonstrate in various experiments that our proposed approach is able to handle graphs with billions of edges within a few hours and is at least 4 times faster than the state-of-the-art approaches. Besides, it achieves up to 4.25% and 4.27% improvements on link prediction and node classification tasks respectively. In the end, we deploy the proposed algorithms in two online games of Tencent with the applications of friend recommendation and item recommendation, which improve the competitors by up to 91.11% in running time and up to 12.80% in the corresponding evaluation metrics.

Intention-aware Heterogeneous Graph Attention Networks for Fraud Transactions Detection

Can Liu,Li Sun,Xiang Ao,Jinghua Feng,Qing He,Hao Yang

Fraud transactions have been the major threats to the healthy development of e-commerce platforms, which not only damage the user experience but also disrupt the orderly operation of the market. User behavioral data is widely used to detect fraud transactions, and recent works show that accurate modeling of user intentions in behavioral sequences can propel further improvements on the performances. However, most existing methods treat each transaction as an independent data instance without considering the transaction-level interactions accessed by transaction attributes, e.g., information on remark, logistics, payment, device and etc., which may fail to achieve satisfactory results in more complex scenarios. In this paper, a novel heterogeneous transaction-intention network is devised to leverage the cross-interaction information over transactions and intentions, which consists of two types of nodes, namely transaction and intention nodes, and two types of edges, i.e., transaction-intention and transaction-transaction edges. Then we propose a graph neural method coined IHGAT(Intention-aware Heterogeneous Graph ATtention networks) that not only perceives sequence-like intentions, but also encodes the relationship among transactions. Extensive experiments on a real-world dataset of Alibaba platform show that our proposed algorithm outperforms state-of-the-art methods in both offline and online modes.

Categorization of Financial Transactions in QuickBooks

Juan Liu,Lei Pei,Ying Sun,Heather Simpson,Jocelyn Lu,Nhung Ho

This paper shares our work on building a machine learning system to categorize transactions for Intuits QuickBooks product. Transaction categorization is challenging due to the complexity of accounting, the need for personalization, and the diversity of customers. We have broken down this monolithic problem into smaller pieces based on customers life-cycle stages, and tailored solutions to address customer pain-points for each. Modern machine learning technologies such as deep neural networks, transfer learning, and few-shot learning are adopted to enable accurate transaction categorization. Furthermore our system learns user actions in real-time to provide relevant and in-time category recommendations. This in-session learning capability reduces user workload, improves customer experience, and helps to cultivate confidence.

KompaRe: A Knowledge Graph Comparative Reasoning System

Lihui Liu,Boxin Du,Yi Ren Fung,Heng Ji,Jiejun Xu,Hanghang Tong

Reasoning is a fundamental capability for harnessing valuable insight, knowledge and patterns from knowledge graphs. Existing work has primarily been focusing on point-wise reasoning, including search, link prediction, entity prediction, subgraph matching and so on. This paper introduces comparative reasoning over knowledge graphs, which aims to infer the commonality and inconsistency with respect to multiple pieces of clues. We envision that the comparative reasoning will complement and expand the existing point-wise reasoning over knowledge graphs. In detail, we develop KompaRe, the first of its kind prototype system that provides comparative reasoning capability over large knowledge graphs. We present both the system architecture and its core algorithms, including knowledge segment extraction, pairwise reasoning and collective reasoning. Empirical evaluations demonstrate the efficacy of the proposed KompaRe.

Lane Change Scheduling for Autonomous Vehicle: A Prediction-and-Search Framework

Shuncheng Liu,Han Su,Yan Zhao,Kai Zeng,Kai Zheng

Automation in road vehicles is an emerging technology that has developed rapidly over the last decade. There have been many inter-disciplinary challenges posed on existing transportation infrastructure by autonomous vehicles (AV). In this paper, we conduct an algorithmic study on when and how an autonomous vehicle should change its lane, which is a fundamental problem in vehicle automation field and root cause of most phantom traffic jams. We propose a prediction-and-search framework, called Cheetah (Change lane smart for autonomous vehicle), which aims to optimize the lane changing maneuvers of autonomous vehicle while minimizing its impact on surrounding vehicles. In the prediction phase, Cheetah learns the spatio-temporal dynamics from historical trajectories of surrounding vehicles with a deep model (GAS-LED) and predict their corresponding actions in the near future. A global attention mechanism and state sharing strategy are also incorporated to achieve higher accuracy and better convergence efficiency. Then in the search phase, Cheetah looks for optimal lane change maneuvers for the autonomous vehicle by taking into account a few factors such as speed, impact on other vehicles and safety issues. A tree-based adaptive beam search algorithm is designed to reduce the search space and improve accuracy. Extensive experiments on real and synthetic data evidence that the proposed framework excels state-of-the-art competitors with respect to both effectiveness and efficiency.

Neural Auction: End-to-End Learning of Auction Mechanisms for E-Commerce Advertising

Xiangyu Liu,Chuan Yu,Zhilin Zhang,Zhenzhe Zheng,Yu Rong,Hongtao Lv,Da Huo,Yiqing Wang,Dagui Chen,Jian Xu,Fan Wu,Guihai Chen,Xiaoqiang Zhu

In e-commerce advertising, it is crucial to jointly consider various performance metrics, e.g., user experience, advertiser utility, and platform revenue. Traditional auction mechanisms, such as GSP and VCG auctions, can be suboptimal due to their fixed allocation rules to optimize a single performance metric (e.g., revenue or social welfare). Recently, data-driven auctions, learned directly from auction outcomes to optimize multiple performance metrics, have attracted increasing research interests. However, the procedure of auction mechanisms involves various discrete calculation operations, making it challenging to be compatible with continuous optimization pipelines in machine learning. In this paper, we design Deep Neural Auctions (DNAs) to enable end-to-end auction learning by proposing a differentiable model to relax the discrete sorting operation, a key component in auctions. We optimize the performance metrics by developing deep models to efficiently extract contexts from auctions, providing rich features for auction design. We further integrate the game theoretical conditions within the model design, to guarantee the stability of the auctions. DNAs have been successfully deployed in the e-commerce advertising system at Taobao. Experimental evaluation results on both large-scale data set as well as online A/B test demonstrated that DNAs significantly outperformed other mechanisms widely adopted in industry.

Pre-trained Language Model for Web-scale Retrieval in Baidu Search

Yiding Liu,Weixue Lu,Suqi Cheng,Daiting Shi,Shuaiqiang Wang,Zhicong Cheng,Dawei Yin

Retrieval is a crucial stage in web search that identifies a small set of query-relevant candidates from a billion-scale corpus. Discovering more semantically-related candidates in the retrieval stage is very promising to expose more high-quality results to the end users. However, it still remains non-trivial challenges of building and deploying effective retrieval models for semantic matching in real search engine. In this paper, we describe the retrieval system that we developed and deployed in Baidu Search. The system exploits the recent state-of-the-art Chinese pretrained language model, namely Enhanced Representation through kNowledge IntEgration (ERNIE), which facilitates the system with expressive semantic matching. In particular, we developed an ERNIE-based retrieval model, which is equipped with 1) expressive Transformer-based semantic encoders, and 2) a comprehensive multi-stage training paradigm. More importantly, we present a practical system workflow for deploying the model in web-scale retrieval. Eventually, the system is fully deployed into production, where rigorous offline and online experiments were conducted. The results show that the system can perform high-quality candidate retrieval, especially for those tail queries with uncommon demands. Overall, the new retrieval system facilitated by pretrained language model (i.e., ERNIE) can largely improve the usability and applicability of our search engine.

Que2Search: Fast and Accurate Query and Document Understanding for Search at Facebook

Yiqun Liu,Kaushik Rangadurai,Yunzhong He,Siddarth Malreddy,Xunlong Gui,Xiaoyi Liu,Fedor Borisyuk

In this paper, we present Que2Search, a deployed query and product understanding system for search. Que2Search leverages multi-task and multi-modal learning approaches to train query and product representations. We achieve over 5% absolute offline relevance improvement and over 4% online engagement gain over state-of-the-art Facebook product understanding system by combining the latest multilingual natural language understanding architectures like XLM and XLM-R with multi-modal fusion techniques. In this paper, we describe how we deploy XLM-based search query understanding model that runs <1.5ms @P99 on CPU at Facebook scale, which has been a significant challenge in the industry. We also describe what model optimizations worked (and what did not) based on numerous offline and online A/B experiments. We deploy Que2Search to Facebook Marketplace Search and share our deployment experience to production and tuning tricks to achieve higher efficiency in online A/B experiments. Que2Search has demonstrated gains in production applications and operates at Facebook scale.

AliCoCo2: Commonsense Knowledge Extraction, Representation and Application in E-commerce

Xusheng Luo,Le Bo,Jinhang Wu,Lin Li,Zhiy Luo,Yonghua Yang,Keping Yang

Commonsense knowledge used by humans while doing online shopping is valuable but difficult to be captured by existing systems running on e-commerce platforms. While construction of common- sense knowledge graphs in e-commerce is non-trivial, representation learning upon such graphs poses unique challenge compared to well-studied open-domain knowledge graphs (e.g., Freebase). By leveraging the commonsense knowledge and representation techniques, various applications in e-commerce can be benefited. Based on AliCoCo, the large-scale e-commerce concept net assisting a series of core businesses in Alibaba, we further enrich it with more commonsense relations and present AliCoCo2, the first commonsense knowledge graph constructed for e-commerce use. We propose a multi-task encoder-decoder framework to provide effective representations for nodes and edges from AliCoCo2. To explore the possibility of improving e-commerce businesses with commonsense knowledge, we apply newly mined commonsense relations and learned embeddings to e-commerce search engine and recommendation system in different ways. Experimental results demonstrate that our proposed representation learning method achieves state-of-the-art performance on the task of knowledge graph completion (KGC), and applications on search and recommendation indicate great potential value of the construction and use of commonsense knowledge graph in e-commerce. Besides, we propose an e-commerce QA task with a new benchmark during the construction of AliCoCo2, for testing machine common sense in e-commerce, which can benefit research community in exploring commonsense reasoning.

What Happened Next? Using Deep Learning to Value Defensive Actions in Football Event-Data

Charbel Merhej,Ryan J. Beal,Tim Matthews,Sarvapali Ramchurn

Objectively quantifying the value of player actions in football (soccer) is a challenging problem. To date, studies in football analytics have mainly focused on the attacking side of the game, while there has been less work on event-driven metrics for valuing defensive actions (e.g., tackles and interceptions). Therefore in this paper, we use deep learning techniques to define a novel metric that values such defensive actions by studying the threat of passages of play that preceded them. By doing so, we are able to value defensive actions based on what they prevented from happening in the game. Our Defensive Action Expected Threat (DAxT) model has been validated using real-world event-data from the 2017/2018 and 2018/2019 English Premier League seasons, and we combine our model outputs with additional features to derive an overall rating of defensive ability for players. Overall, we find that our model is able to predict the impact of defensive actions allowing us to better value defenders using event-data.

Diversity driven Query Rewriting in Search Advertising

Akash Kumar Mohankumar,Nikit Begwani,Amit Singh

Retrieving keywords (bidwords) with the same intent as query, referred to as close variant keywords, is of prime importance for effective targeted search advertising. For head and torso search queries, sponsored search engines use a huge repository of same intent queries and keywords, mined ahead of time. Online, this repository is used to rewrite the query and then lookup the rewrite in a repository of bid keywords contributing to significant revenue. Recently generative retrieval models have been shown to be effective at the task of generating such query rewrites. We observe two main limitations of such generative models. First, rewrites generated by these models exhibit low lexical diversity, and hence the rewrites fail to retrieve relevant keywords that have diverse linguistic variations. Second, there is a misalignment between the training objective - the likelihood of training data, v/s what we desire - improved quality and coverage of rewrites. In this work, we introduce CLOVER, a framework to generate both high-quality and diverse rewrites by optimizing for human assessment of rewrite quality using our diversity-driven reinforcement learning algorithm. We use an evaluation model, trained to predict human judgments, as the reward function to finetune the generation policy. We empirically show the effectiveness of our proposed approach through offline experiments on search queries across geographies spanning three major languages. We also perform online A/B experiments on Bing, a large commercial search engine, which shows (i) better user engagement with an average increase in clicks by 12.83% accompanied with an average defect reduction by 13.97%, and (ii) improved revenue by 21.29%.

Andrea Nestler,Nour Karessli,Karl Hajjar,Rodrigo Weffer,Reza Shirvany

E-commerce is growing at an unprecedented rate and the fashion industry has recently witnessed a noticeable shift in customers order behaviour towards stronger online shopping. However, fashion articles ordered online do not always find their way to a customers wardrobe. In fact, a large share of them end up being returned. Finding clothes that fit online is very challenging and accounts for one of the main drivers of increased return rates in fashion e-commerce. Size and fit related returns severely impact 1. the customers experience and their dissatisfaction with online shopping, 2. the environment through an increased carbon footprint, and 3. the profitability of online fashion platforms. Due to poor fit, customers often end up returning articles that they like but do not fit them, which they have to re-order in a different size. To tackle this issue we introduce SizeFlags, a probabilistic Bayesian model based on weakly annotated large-scale data from customers. Leveraging the advantages of the Bayesian framework, we extend our model to successfully integrate rich priors from human experts feedback and computer vision intelligence. Through extensive experimentation, large-scale A/B testing and continuous evaluation of the model in production, we demonstrate the strong impact of the proposed approach in robustly reducing size-related returns in online fashion over~14~countries.

AttDMM: An Attentive Deep Markov Model for Risk Scoring in Intensive Care Units

Yilmazcan Ozyurt,Mathias Kraus,Tobias Hatt,Stefan Feuerriegel

Clinical practice in intensive care units (ICUs) requires early warnings when a patients condition is about to deteriorate so that preventive measures can be undertaken. To this end, prediction algorithms have been developed that estimate the risk of mortality in ICUs. In this work, we propose a novel generative deep probabilistic model for real-time risk scoring in ICUs. Specifically, we develop an attentive deep Markov model called AttDMM. To the best of our knowledge, AttDMM is the first ICU prediction model that jointly learns both long-term disease dynamics (via attention) and different disease states in health trajectory (via a latent variable model). Our evaluations were based on an established baseline dataset (MIMIC-III) with 53,423 ICU stays. The results confirm that compared to state-of-the-art baselines, our AttDMM was superior: AttDMM achieved an area under the receiver operating characteristic curve (AUROC) of 0.876, which yielded an improvement over the state-of-the-art method by 2.2%. In addition, the risk score from the AttDMM provided warnings several hours earlier. Thereby, our model shows a path towards identifying patients at risk so that health practitioners can intervene early and save patient lives.

Amazon SageMaker Automatic Model Tuning: Scalable Gradient-Free Optimization

Valerio Perrone,Huibin Shen,Aida Zolic,Iaroslav Shcherbatyi,Amr Ahmed,Tanya Bansal,Michele Donini,Fela Winkelmolen,Rodolphe Jenatton,Jean Baptiste Faddoul,Barbara Pogorzelska,Miroslav Miladinovic,Krishnaram Kenthapadi,Matthias Seeger,Cu00e9dric Archambeau

Tuning complex machine learning systems is challenging. Machine learning typically requires to set hyperparameters, be it regularization, architecture, or optimization parameters, whose tuning is critical to achieve good predictive performance. To democratize access to machine learning systems, it is essential to automate the tuning. This paper presents Amazon SageMaker Automatic Model Tuning (AMT), a fully managed system for gradient-free optimization at scale. AMT finds the best version of a trained machine learning model by repeatedly evaluating it with different hyperparameter configurations. It leverages either random search or Bayesian optimization to choose the hyperparameter values resulting in the best model, as measured by the metric chosen by the user. AMT can be used with built-in algorithms, custom algorithms, and Amazon SageMaker pre-built containers for machine learning frameworks. We discuss the core functionality, system architecture, our design principles, and lessons learned. We also describe more advanced features of AMT, such as automated early stopping and warm-starting, showing in experiments their benefits to users.

User Consumption Intention Prediction in Meituan

Yukun Ping,Chen Gao,Taichi Liu,Xiaoyi Du,Hengliang Luo,Depeng Jin,Yong Li

For online life service platforms, such as Meituan, user consumption intention, as the internal driving force of consumption behaviors, plays a significant role in understanding and predicting users demand and purchase. However, user consumption intention prediction is quite challenging. Different from consumption behaviors, consumption intention is implicit and always not reflected by behavioral data. Moreover, it is affected by both user intrinsic preference and spatio-temporal context. To overcome these challenges, in Meituan, we design a real-world system consisting of two stages, intention detection and prediction. Specifically, at the intention-detection stage, we combine the knowledge of human experts and consumption information to obtain explicit intentions and match consumption with intentions based on user review data. At the intention-prediction stage, to collectively exploit the rich heterogeneous influencing factors, we design a graph neural network-based intention prediction model GRIP, which can capture user intrinsic preference and spatio-temporal context. Extensive offline evaluations demonstrate that our prediction model outperforms the best baseline by 10.26% and 33.28% for two metrics and online A/B tests on millions of users validate the effectiveness of our system.

Bootstrapping Recommendations at Chrome Web Store

Zhen Qin,Honglei Zhuang,Rolf Jagerman,Xinyu Qian,Po Hu,Dan Chary Chen,Xuanhui Wang,Michael Bendersky,Marc Najork

Google Chrome, one of the worlds most popular web browsers, features an extension framework allowing third-party developers to enhance Chromes functionality. Chrome extensions are distributed through the Chrome Web Store (CWS), a Google-operated online marketplace. In this paper, we describe how we developed and deployed three recommender systems for discovering relevant extensions in CWS, namely non-personalized recommendations, related extension recommendations, and personalized recommendations. Unlike most existing papers that focus on novel algorithms, this paper focuses on sharing practical experiences when building large-scale recommender systems under various real-world constraints, such as privacy constraints, data sparsity and skewness issues, and product design choices (e.g., user interface). We show how these constraints make standard approaches difficult to succeed in practice. We share success stories that turn negative live metrics to positive ones, including: 1) how we use interpretable neural models to bootstrap the systems, help identifying pipeline issues, and pave the way for more advanced models; 2) a new item-item based algorithm for related recommendations that works under highly skewed data distributions; and 3) how the previous two techniques can help bootstrapping the personalized recommendations, which significantly reduces development cycles and bypasses various real-world difficulties. All the explorations in this work are verified in live traffic on millions of users. We believe that the findings in this paper can help practitioners to build better large-scale recommender systems.

Lambda Learner: Fast Incremental Learning on Data Streams

Rohan Ramanath,Konstantin Salomatin,Jeffrey D. Gee,Kirill Talanine,Onkar Dalal,Gungor Polatkan,Sara Smoot,Deepak Kumar

One of the most well-established applications of machine learning is in deciding what content to show website visitors. When observation data comes from high-velocity, user-generated data streams, machine learning methods perform a balancing act between model complexity, training time, and computational costs. Furthermore, when model freshness is critical, the training of models becomes time-constrained. Parallelized batch offline training, although horizontally scalable, is often not time-considerate or cost-effective. In this paper, we propose Lambda Learner, a new framework for training models by incremental updates in response to mini-batches from data streams. We show that the resulting model of our framework closely estimates a periodically updated model trained on offline data and outperforms it when model updates are time-sensitive. We provide theoretical proof that the incremental learning updates improve the loss-function over a stale batch model. We present a large-scale deployment on the sponsored content platform for a large social network, serving hundreds of millions of users across different channels (e.g., desktop, mobile). We address challenges and complexities from both algorithms and infrastructure perspectives, illustrate the system details for computation, storage, stream processing training data, and open-source the system.

RAPT: Pre-training of Time-Aware Transformer for Learning Robust Healthcare Representation

Houxing Ren,Jingyuan Wang,Wayne Xin Zhao,Ning Wu

With the development of electronic health records (EHRs), prenatal care examination records have become available for developing automatic prediction or diagnosis approaches with machine learning methods. In this paper, we study how to effectively learn representations applied to various downstream tasks for EHR data. Although several methods have been proposed in this direction, they usually adapt classic sequential models to solve one specific diagnosis task or address unique EHR data issues. This makes it difficult to reuse these existing methods for the early diagnosis of pregnancy complications or provide a general solution to address the series of health problems caused by pregnancy complications. In this paper, we propose a novel model RAPT, which stands for RepresentAtion by Pre-training time-aware Transformer. To associate pre-training and EHR data, we design an architecture that is suitable for both modeling EHR data and pre-training, namely time-aware Transformer. To handle various characteristics in EHR data, such as insufficiency, we carefully devise three pre-training tasks to handle data insufficiency, data incompleteness and short sequence problems, namely similarity prediction, masked prediction and reasonability check. In this way, our representations can capture various EHR data characteristics. Extensive experimental results for four downstream tasks have shown the effectiveness of the proposed approach. We also introduce sensitivity analysis to interpret the model and design an interface to show results and interpretation for doctors. Finally, we implement a diagnosis system for pregnancy complications based on our pre-training model. Doctors and pregnant women can benefit from the diagnosis system in early diagnosis of pregnancy complications.

A Bayesian Approach to In-Game Win Probability in Soccer

Pieter Robberechts,Jan Van Haaren,Jesse Davis

In-game win probability models, which provide a sports teams likelihood of winning at each point in a game based on historical observations, are becoming increasingly popular. In baseball, basketball and American football, they have become important tools to enhance fan experience, to evaluate in-game decision-making, and to inform coaching decisions. While equally relevant in soccer, the adoption of these models is held back by technical challenges arising from the low-scoring nature of the sport.In this paper, we introduce an in-game win probability model for soccer that addresses the shortcomings of existing models. First, we demonstrate that in-game win probability models for other sports struggle to provide accurate estimates for soccer, especially towards the end of a game. Second, we introduce a novel Bayesian statistical framework that estimates running win, tie and loss probabilities by leveraging a set of contextual game state features. An empirical evaluation on eight seasons of data for the top-five soccer leagues demonstrates that our framework provides well-calibrated probabilities. Furthermore, two use cases show its ability to enhance fan experience and to evaluate performance in crucial game situations.

Contextual Bandit Applications in a Customer Support Bot

Sandra Sajeev,Jade Huang,Nikos Karampatziakis,Matthew Hall,Sebastian Kochman,Weizhu Chen

Virtual support agents have grown in popularity as a way for businesses to provide better and more accessible customer service. Some challenges in this domain include ambiguous user queries as well as changing support topics and user behavior (non-stationarity). We do, however, have access to partial feedback provided by the user (clicks, surveys, and other events) which can be leveraged to improve the user experience. Adaptable learning techniques, like contextual bandits, are a natural fit for this problem setting. In this paper, we discuss real-world implementations of contextual bandits (CB) for the Microsoft virtual agent. It includes intent disambiguation based on neural-linear bandits (NLB) and contextual recommendations based on a collection of multi-armed bandits (MAB). Our solutions have been deployed to production and have improved key business metrics of the Microsoft virtual agent, as confirmed by A/B experiments. Results include a relative increase of over 12% in problem resolution rate and relative decrease of over 4% in escalations to a human operator. While our current use cases focus on intent disambiguation and contextual recommendation for support bots, we believe our methods can be extended to other domains.

Predicting COVID-19 Spread from Large-Scale Mobility Data

Amray Schwabe,Joel Persson,Stefan Feuerriegel

To manage the COVID-19 epidemic effectively, decision-makers in public health need accurate forecasts of case numbers. A potential near real-time predictor of future case numbers is human mobility; however, research on the predictive power of mobility is lacking. To fill this gap, we introduce a novel model for epidemic forecasting based on mobility data, called mobility marked Hawkes model. The proposed model consists of three components: (1) A Hawkes process captures the transmission dynamics of infectious diseases. (2) A mark modulates the rate of infections, thus accounting for how the reproduction number R varies across space and time. The mark is modeled using a regularized Poisson regression based on mobility covariates. (3) A correction procedure incorporates new cases seeded by people traveling between regions. Our model was evaluated on the COVID-19 epidemic in Switzerland. Specifically, we used mobility data from February through April 2020, amounting to approximately 1.5 billion trips. Trip counts were derived from large-scale telecommunication data, i.e., cell phone pings from the Swisscom network, the largest telecommunication provider in Switzerland. We compared our model against various state-of-the-art baselines in terms of out-of-sample root mean squared error. We found that our model outperformed the baselines by 15.52%. The improvement was consistently achieved across different forecast horizons between 5 and 21 days. In addition, we assessed the predictive power of conventional point of interest data, confirming that telecommunication data is superior. To the best of our knowledge, our work is the first to predict the spread of COVID-19 from telecommunication data. Altogether, our work contributes to previous research by developing a scalable early warning system for decision-makers in public health tasked with controlling the spread of infectious diseases.

Learning to Assign: Towards Fair Task Assignment in Large-Scale Ride Hailing

Dingyuan Shi,Yongxin Tong,Zimu Zhou,Bingchen Song,Weifeng Lv,Qiang Yang

Ride hailing is a widespread shared mobility application where the central issue is to assign taxi requests to drivers with various objectives. Despite extensive research on task assignment in ride hailing, the fairness of earnings among drivers is largely neglected. Pioneer studies on fair task assignment in ride hailing are ineffective and inefficient due to their myopic optimization perspective and time-consuming assignment techniques. In this work, we propose LAF, an effective and efficient task assignment scheme that optimizes both utility and fairness. We adopt reinforcement learning to make assignments in a holistic manner and propose a set of acceleration techniques to enable fast fair assignment on large-scale data. Experiments show that LAF outperforms the state-of-the-arts by up to 86.7%, 29.1%, 797% on fairness, utility and efficiency, respectively.

Interpretable Drug Response Prediction using a Knowledge-based Neural Network

Oliver Snow,Hossein Sharifi-Noghabi,Jialin Lu,Olga Zolotareva,Mark Lee,Martin Ester

Predicting drug response based on the genomic profile of a cancer patient is one of the hallmarks of precision oncology. Despite current methods for drug response prediction becoming more accurate, there is still a need to switch from black box predictions to methods that offer high accuracy as well as interpretable predictions. This is of particular importance in real-world applications such as drug response prediction in cancer patients. In this paper, we propose BDKANN, a novel knowledge-based method that employs the hierarchical information on how proteins form complexes and act together in pathways to form the architecture of a deep neural network. We employ BDKANN to predict cancer drug response from cell line gene expression data and our experimental results demonstrate that not only does BDKANN have a low prediction error compared to baseline models but it also allows meaningful interpretation of the network. These interpretations can both explain predictions made and discover novel connections in the biological knowledge that may lead to new hypotheses about mechanisms of drug action.

Maya Srikanth,Anqi Liu,Nicholas Adams-Cohen,Jian Cao,R. Michael Alvarez,Anima Anandkumar

Tracking and collecting fast-evolving online discussions provides vast data for studying social media usage and its role in peoples public lives. However, collecting social media data using a static set of keywords fails to satisfy the growing need to monitor dynamic conversations and to study fast-changing topics. We propose a dynamic keyword search method to maximize the coverage of relevant information in fast-evolving online discussions. The method uses word embedding models to represent the semantic relations between keywords and predictive models to forecast the future trajectory of keywords. We also implement a visual user interface to aid in the decision making process in each round of keyword updates. This allows for both human-assisted tracking and fully-automated data collection. In simulations using historical #MeToo data in 2017, our human-assisted tracking method outperforms the traditional static baseline method significantly, achieving 37.1% improvement in F-1 score in the task of tracking the top trending keywords. We conduct a contemporary case study to cover dynamic conversations about the recent Presidential Inauguration and to test the dynamic data collection system. Our case studies reflect the effectiveness of our process and also points to the potential challenges in future deployment.

A PLAN for Tackling the Locust Crisis in East Africa: Harnessing Spatiotemporal Deep Models for Locust Movement Forecasting

Maryam Tabar,Jared Gluck,Anchit Goyal,Fei Jiang,Derek Morr,Annalyse Kehs,Dongwon Lee,David P. Hughes,Amulya Yadav

East Africa is experiencing the worst locust infestation in over 25 years, which has severely threatened the food security of millions of people across the region. The primary strategy adopted by human experts at the United Nations Food and Agricultural Organization (UN-FAO) to tackle locust outbreaks involves manually surveying at-risk geographical areas, followed by allocating and spraying pesticides in affected regions. In order to augment and assist human experts at the UN-FAO in this task, we utilize crowdsourced reports of locust observations collected by PlantVillage (the worlds leading knowledge delivery system for East African farmers) and develop PLAN, a Machine Learning (ML) algorithm for forecasting future migration patterns of locusts at high spatial and temporal resolution across East Africa. PLANs novel spatio-temporal deep learning architecture enables representing PlantVillages crowdsourced locust observation data using novel image-based feature representations, and its design is informed by several unique insights about this problem domain. Experimental results show that PLAN achieves superior predictive performance against several baseline models - it achieves an AUC score of 0.9 when used with a data augmentation method. PLAN represents a first step in using deep learning to assist and augment human expertise at PlantVillage (and UN-FAO) in locust prediction, and its real-world usability is currently being evaluated by domain experts (including a potential idea to use the heatmaps created by PLAN in a Kenyan TV show). The source code is available at https://github.com/maryam-tabar/PLAN.

Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms

Xiaocheng Tang,Fan Zhang,Zhiwei Qin,Yansheng Wang,Dingyuan Shi,Bingchen Song,Yongxin Tong,Hongtu Zhu,Jieping Ye

Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day, providing great promises for improving transportation efficiency through the tasks of order dispatching and vehicle repositioning. Existing studies, however, usually consider the two tasks in simplified settings that hardly address the complex interactions between the two, the real-time fluctuations between supply and demand, and the necessary coordinations due to the large-scale nature of the problem. In this paper we propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks. At the center of the framework is a globally shared value function that is updated continuously using online experiences generated from real-time platform transactions. To improve the sample-efficiency and the robustness, we further propose a novel periodic ensemble method combining the fast online learning with a large-scale offline training scheme that leverages the abundant historical driver trajectory data. This allows the proposed framework to adapt quickly to the highly dynamic environment, to generalize robustly to recurrent patterns and to drive implicit coordinations among the population of managed vehicles. Extensive experiments based on real-world datasets show considerably improvements over other recently proposed methods on both tasks. Particularly, V1D3 outperforms the first prize winners of both dispatching and repositioning tracks in the KDD Cup 2020 RL competition, achieving state-of-the-art results on improving both total driver income and user experience related metrics.

Bipartite Dynamic Representations for Abuse Detection

Andrew Z. Wang,Rex Ying,Pan Li,Nikhil Rao,Karthik Subbian,Jure Leskovec

Abusive behavior in online retail websites and communities threatens the experience of regular community members. Such behavior often takes place within a complex, dynamic, and large-scale network of users interacting with items. Detecting abuse is challenging due to the scarcity of labeled abuse instances and complexity of combining temporal and network patterns while operating at a massive scale. Previous approaches to dynamic graph modeling either do not scale, do not effectively generalize from a few labeled instances, or compromise performance for scalability. Here we present BiDyn, a general method to detect abusive behavior in dynamic bipartite networks at scale, while generalizing from limited training labels. BiDyn develops an efficient hybrid RNN-GNN architecture trained via a novel stacked ensemble training scheme. We also propose a novel pre-training framework for dynamic graphs that helps to achieve superior performance at scale. Our approach outperforms recent large-scale dynamic graph baselines in an abuse classification task by up to 14% AUROC while requiring 10x less memory per training batch in both open and proprietary datasets.

Live-Streaming Fraud Detection: A Heterogeneous Graph Neural Network Approach

Haishuai Wang,Zhao Li,Peng Zhang,Jiaming Huang,Pengrui Hui,Jian Liao,Ji Zhang,Jiajun Bu

Live-streaming platforms have recently gained significant popularity by attracting an increasing number of young users and have become a very promising form of online shopping. Similar to the traditional online shopping platforms such as Taobao, live-streaming platforms also suffer from online malicious fraudulent behaviors where many transactions are not genuine. The existing anti-fraud models proposed to recognize fraudulent transactions on traditional online shopping platforms are inapplicable on live-streaming platforms. This is mainly because live-streaming platforms are characterized by a unique type of heterogeneous live-streaming networks where multiple heterogeneous types of nodes such as users, live-streamers, and products are connected with multiple different types of edges associated with edge features. In this paper, we propose a new approach based on a heterogeneous graph neural network for LIve-streaming Fraud dEtection (called LIFE). LIFE designs an innovative heterogeneous graph learning model that fully utilizes various heterogeneous information of shopping transactions, users, streamers, and items from a given live-streaming platform. Moreover, a label propagation algorithm is employed within our LIFE framework to handle the limited number of labeled fraudulent transactions for model training. Extensive experimental results on a large-scale Taobao live-streaming platform demonstrate that the proposed method is superior to the baseline models in terms of fraud detection effectiveness on live-streaming platforms. Furthermore, we conduct a case study to show that the proposed method is able to effectively detect fraud communities for live-streaming e-commerce platforms.

Tac-Valuer: Knowledge-based Stroke Evaluation in Table Tennis

Jiachen Wang,Dazhen Deng,Xiao Xie,Xinhuan Shu,Yu-Xuan Huang,Le-Wen Cai,Hui Zhang,Min-Ling Zhang,Zhi-Hua Zhou,Yingcai Wu

Stroke evaluation is critical for coaches to evaluate players performance in table tennis matches. However, current methods highly demand proficient knowledge in table tennis and are time-consuming. We collaborate with the Chinese national table tennis team and propose Tac-Valuer, an automatic stroke evaluation framework for analysts in table tennis teams. In particular, to integrate analysts knowledge into the machine learning model, we employ the latest effective framework named abductive learning, showing promising performance. Based on abductive learning, Tac-Valuer combines the state-of-the-art computer vision algorithms to extract and embed stroke features for evaluation. We evaluate the design choices of the approach and present Tac-Valuers usability through use cases that analyze the performance of the top table tennis players in world-class events.

Reinforcing Pretrained Models for Generating Attractive Text Advertisements

Xiting Wang,Xinwei Gu,Jie Cao,Zihua Zhao,Yulan Yan,Bhuvan Middha,Xing Xie

We study how pretrained language models can be enhanced by using deep reinforcement learning to generate attractive text advertisements that reach the high quality standard of real-world advertiser mediums. To improve ad attractiveness without hampering user experience, we propose a model-based reinforcement learning framework for text ad generation, which constructs a model for the environment dynamics and avoids large sample complexity. Based on the framework, we develop Masked-Sequence Policy Gradient, a reinforcement learning algorithm that integrates efficiently with pretrained models and explores the action space effectively. Our method has been deployed to production in Microsoft Bing. Automatic offline experiments, human evaluation, and online experiments demonstrate the superior performance of our method.

Multimodal Emergent Fake News Detection via Meta Neural Process Networks

Yaqing Wang,Fenglong Ma,Haoyu Wang,Kishlay Jha,Jing Gao

Fake news travels at unprecedented speeds, reaches global audiences and puts users and communities at great risk via social media platforms. Deep learning based models show good performance when trained on large amounts of labeled data on events of interest, whereas the performance of models tends to degrade on other events due to domain shift. Therefore, significant challenges are posed for existing detection approaches to detect fake news on emergent events, where large-scale labeled datasets are difficult to obtain. Moreover, adding the knowledge from newly emergent events requires to build a new model from scratch or continue to fine-tune the model, which can be challenging, expensive, and unrealistic for real-world settings. In order to address those challenges, we propose an end-to-end fake news detection framework named MetaFEND, which is able to learn quickly to detect fake news on emergent events with a few verified posts. Specifically, the proposed model integrates meta-learning and neural process methods together to enjoy the benefits of these approaches. In particular, a label embedding module and a hard attention mechanism are proposed to enhance the effectiveness by handling categorical information and trimming irrelevant posts. Extensive experiments are conducted on multimedia datasets collected from Twitter and Weibo. The experimental results show our proposed MetaFEND model can detect fake news on never-seen events effectively and outperform the state-of-the-art methods.

Domain-Specific Pretraining for Vertical Search: Case Study on Biomedical Literature

Yu Wang,Jinchao Li,Tristan Naumann,Chenyan Xiong,Hao Cheng,Robert Tinn,Cliff Wong,Naoto Usuyama,Richard Rogahn,Zhihong Shen,Yang Qin,Eric Horvitz,Paul N. Bennett,Jianfeng Gao,Hoifung Poon

Information overload is a prevalent challenge in many high-value domains. A prominent case in point is the explosion of the biomedical literature on COVID-19, which swelled to hundreds of thousands of papers in a matter of months. In general, biomedical literature expands by two papers every minute, totalling over a million new papers every year. Search in the biomedical realm, and many other vertical domains is challenging due to the scarcity of direct supervision from click logs. Self-supervised learning has emerged as a promising direction to overcome the annotation bottleneck. We propose a general approach for vertical search based on domain-specific pretraining and present a case study for the biomedical domain. Despite being substantially simpler and not using any relevance labels for training or development, our method performs comparably or better than the best systems in the official TREC-COVID evaluation, a COVID-related biomedical search competition. Using distributed computing in modern cloud infrastructure, our system can scale to tens of millions of articles on PubMed and has been deployed as Microsoft Biomedical Search, a new search experience for biomedical literature: https://aka.ms/biomedsearch.

Representation Learning for Predicting Customer Orders

Tongwen Wu,Yu Yang,Yanzhi Li,Huiqiang Mao,Liming Li,Xiaoqing Wang,Yuming Deng

The ability to predict future customer orders is of significant value to retailers in making many crucial operational decisions. Different from next basket prediction or temporal set prediction, which focuses on predicting a subset of items for a single user, this paper aims for the distributional information of future orders, i.e., the possible subsets of items and their frequencies (probabilities), which is required for decisions such as assortment selection for front-end warehouses and capacity evaluation for fulfillment centers. Based on key statistics of a real order dataset from Tmall supermarket, we show the challenges of order prediction. Motivated by our analysis that biased models of order distribution can still help improve the quality of order prediction, we design a generative model to capture the order distribution for customer order prediction. Our model utilizes representation learning to embed items into a Euclidean space and design a highly efficient SGD algorithm to learn the item embeddings. Future order prediction is done by calibrating orders obtained by random walks over the embedding graph. The experiments show that our model outperforms all the existing methods. The benefit of our model is also illustrated with an application to assortment selection for front-end warehouses.

Medical Entity Relation Verification with Large-scale Machine Reading Comprehension

Yuan Xia,Chunyu Wang,Zhenhui Shi,Jingbo Zhou,Chao Lu,Haifeng Huang,Hui Xiong

Medical entity relation verification is a crucial step to build a practical and enterprise medical knowledge graph (MKG) because high-precision medical entity relation is a key requirement for many MKG-based applications. Existing relation verification approaches for general knowledge graphs are not designed for considering medical domain knowledge, although it is central to achieve high-quality entity relation verification for MKG. To this end, in this paper, we introduce a system for medical entity relation verification with large-scale machine reading comprehension. The proposed system is tailored to overcome the unique challenges of medical relation verification including high variants of medical terms, the high difficulty of evidence searching in complex medical documents, and the lack of evidence labels for supervision. To deal with the problem of variants of medical terms, we introduce a synonym-aware retrieve model to retrieve the potential evidence implicitly verifying the given claim. To better utilize the medical domain knowledge, a relation-aware evidence detector and a medical ontology-enhanced aggregator are developed to improve the performance of the relation verification module. Moreover, to overcome the challenge of providing high-quality evidence due to the lack of labels, we introduce an interactive collaborative-training method to iteratively improve the evidence accuracy. Finally, we conduct extensive experiments to demonstrate that the performance of our proposed system is superior to all comparable models. We also demonstrate that our system can significantly reduce the annotation time by medical experts in real-world verification tasks. It can help to improve the efficiency by nearly 300%. In particular, our system has been embedded into the Baidu Clinical Decision Support System.

EXACTA: Explainable Column Annotation

Yikun Xian,Handong Zhao,Tak Yeon Lee,Sungchul Kim,Ryan Rossi,Zuohui Fu,Gerard de Melo,S. Muthukrishnan

Column annotation, the process of annotating tabular columns with labels, plays a fundamental role in digital marketing data governance. It has a direct impact on how customers manage their data and facilitates compliance with regulations, restrictions, and policies applicable to data use. Despite substantial gains in accuracy brought by recent deep learning-driven column annotation methods, their incapability of explaining why columns are matched with particular target labels has drawn concern, due to the black-box nature of deep neural networks. Such explainability is of particular importance in industrial marketing scenarios, where data stewards need to quickly verify and calibrate the annotation results to ascertain the correctness of downstream applications. This work sheds new light on the explainable column annotation problem, the first of its kind column annotation task. To achieve this, we propose a new approach called EXACTA, which conducts multi-hop knowledge graph reasoning using inverse reinforcement learning to find a path from a column to a potential target label while ensuring both annotation performance and explainability. We experiment on four benchmarks, both publicly available and real-world ones, and undertake a comprehensive analysis on the explainability. The results suggest that our method not only provides competitive annotation performance compared with existing deep learning-based models, but more importantly, produces faithfully explainable paths for annotated columns to facilitate human examination.

DMBGN: Deep Multi-Behavior Graph Networks for Voucher Redemption Rate Prediction

Fengtong Xiao,Lin Li,Weinan Xu,Jingyu Zhao,Xiaofeng Yang,Jun Lang,Hao Wang

In E-commerce, vouchers are important marketing tools to enhance users engagement and boost sales and revenue. The likelihood that a user redeems a voucher is a key factor in voucher distribution decision. User-item Click-Through-Rate (CTR) models are often applied to predict the user-voucher redemption rate. However, the voucher scenario involves more complicated relations among users, items and vouchers. The users historical behavior in a voucher collection activity reflects users voucher usage patterns, which is nevertheless overlooked by the CTR-based solutions. In this paper, we propose a Deep Multi-behavior Graph Networks (DMBGN) to shed light on this field for the voucher redemption rate prediction. The complex structural user-voucher-item relationships are captured by a User-Behavior Voucher Graph (UVG). User behavior happening both before and after voucher collection is taken into consideration, and a high-level representation is extracted by Higher-order Graph Neural Networks. On top of a sequence of UVGs, an attention network is built which can help to learn users long-term voucher redemption preference. Extensive experiments on three large-scale production datasets demonstrate the proposed DMBGN model is effective, with 10% to 16% relative AUC improvement over Deep Neural Networks (DNN), and 2% to 4% AUC improvement over Deep Interest Network (DIN). Source code and a sample dataset are made publicly available to facilitate future research.

FIVES: Feature Interaction Via Edge Search for Large-Scale Tabular Data

Yuexiang Xie,Zhen Wang,Yaliang Li,Bolin Ding,Nezihe Merve Gu00fcrel,Ce Zhang,Minlie Huang,Wei Lin,Jingren Zhou

High-order interactive features capture the correlation between different columns and thus are promising to enhance various learning tasks on ubiquitous tabular data. To automate the generation of interactive features, existing works either explicitly traverse the feature space or implicitly express the interactions via intermediate activations of some designed models. These two kinds of methods show that there is essentially a trade-off between feature interpretability and search efficiency. To possess both of their merits, we propose a novel method named Feature Interaction Via Edge Search (FIVES), which formulates the task of interactive feature generation as searching for edges on the defined feature graph. Specifically, we first present our theoretical evidence that motivates us to search for useful interactive features with increasing order. Then we instantiate this search strategy by optimizing both a dedicated graph neural network (GNN) and the adjacency tensor associated with the defined feature graph. In this way, the proposed FIVES method simplifies the time-consuming traversal as a typical training course of GNN and enables explicit feature generation according to the learned adjacency tensor. Experimental results on both benchmark and real-world datasets show the advantages of FIVES over several state-of-the-art methods. Moreover, the interactive features identified by FIVES are deployed on the recommender system of Taobao, a worldwide leading e-commerce platform. Results of an online A/B testing further verify the effectiveness of the proposed method FIVES, and we further provide FIVES as AI utilities for the customers of Alibaba Cloud.

Towards the D-Optimal Online Experiment Design for Recommender Selection

Da Xu,Chuanwei Ruan,Evren Korpeoglu,Sushant Kumar,Kannan Achan

Selecting the optimal recommender via online exploration-exploitation is catching increasing attention where the traditional A/B testing can be slow and costly, and offline evaluations are prone to the bias of history data. Finding the optimal online experiment is nontrivial since both the users and displayed recommendations carry contextual features that are informative to the reward. While the problem can be formalized via the lens of multi-armed bandits, the existing solutions are found less satisfactorily because the general methodologies do not account for the case-specific structures, particularly for the e-commerce recommendation we study. To fill in the gap, we leverage the D-optimal design from the classical statistics literature to achieve the maximum information gain during exploration, and reveal how it fits seamlessly with the modern infrastructure of online inference. To demonstrate the effectiveness of the optimal designs, we provide semi-synthetic simulation studies with published code and data for reproducibility purposes. We then use our deployment example on Walmart.com to fully illustrate the practical insights and effectiveness of the proposed methods.

PAMI: A Computational Module for Joint Estimation and Progression Prediction of Glaucoma

Linchuan Xu,Ryo Asaoka,Taichi Kiwaki,Hiroshi Murata,Yuri Fujino,Kenji Yamanishi

Glaucoma, which can cause irreversible damage to the sight of human eyes, is conventionally diagnosed by visual field (VF) sensitivity. However, it is labor-intensive and time-consuming to measure VF. Recently, optical coherence tomography (OCT) has been adopted to measure retinal layers thickness (RT) for assisting the diagnosis because glaucoma makes structural changes to RT and it is much less costly to obtain RT. In particular, RT can assist in mainly two manners. One is to estimate a VF from an RT such that clinical doctors only need to obtain an RT of a patient and then convert it to a VF for the diagnosis. The other is to predict future VFs by utilizing both past VFs and RTs, i.e., the prediction of progression of VF over time. The two computational tasks are performed as two data mining tasks because currently there is no knowledge about the exact form of the computations involved. In this paper, we study a novel problem which is the integration of the two data mining tasks. The motivation is that both the two data mining tasks deal with transforming information from the RT domain to the VF domain such that the knowledge discovered in one task can be useful for another. The integration is non-trivial because the two tasks do not share the way of transformation. To address this issue, we design a progression-agnostic and mode-independent (PAMI) module which facilitates cross-task knowledge utilization. We empirically demonstrate that our proposed method outperforms the state-of-the-art method for the estimation by 6.33% in terms of mean of the root mean square error on a real dataset, and outperforms the state-of-the-art method for the progression prediction by 3.49% for the best case.

Session-Aware Query Auto-completion using Extreme Multi-Label Ranking

Nishant Yadav,Rajat Sen,Daniel N. Hill,Arya Mazumdar,Inderjit S. Dhillon

Query auto-completion (QAC) is a fundamental feature in search engines where the task is to suggest plausible completions of a prefix typed in the search bar. Previous queries in the user session can provide useful context for the users intent and can be leveraged to suggest auto-completions that are more relevant while adhering to the users prefix. Such session-aware QACs can be generated by recent sequence-to-sequence deep learning models; however, these generative approaches often do not meet the stringent latency requirements of responding to each user keystroke. Moreover, these generative approaches pose the risk of showing nonsensical queries. One can pre-compute a relatively small subset of relevant queries for common prefixes and rank them based on the context. However, such an approach fails when no relevant queries for the current context are present in the pre-computed set.In this paper, we provide a solution to this problem: we take the novel approach of modeling session-aware QAC as an eXtreme Multi-Label Ranking (XMR) problem where the input is the previous query in the session and the users current prefix, while the output space is the set of tens of millions of queries entered by users in the recent past. We adapt a popular XMR algorithm for this purpose by proposing several modifications to the key steps in the algorithm. The proposed modifications yield a 10x improvement in terms of Mean Reciprocal Rank (MRR) over the baseline XMR approach on a public search logs dataset. We are able to maintain an inference latency of less than 10 ms while still using session context. When compared against baseline models of acceptable latency, we observed a 33% improvement in MRR for short prefixes of up to 3 characters. Moreover, our model yielded a statistically significant improvement of 2.81% over a production QAC system in terms of suggestion acceptance rate, when deployed on the search bar of an online shopping store as part of an A/B test.

FLOP: Federated Learning on Medical Datasets using Partial Networks

Qian Yang,Jianyi Zhang,Weituo Hao,Gregory P. Spell,Lawrence Carin

The outbreak of COVID-19 Disease due to the novel coronavirus has caused a shortage of medical resources. To aid and accelerate the diagnosis process, automatic diagnosis of COVID-19 via deep learning models has recently been explored by researchers across the world. While different data-driven deep learning models have been developed to mitigate the diagnosis of COVID-19, the data itself is still scarce due to patient privacy concerns. Federated Learning (FL) is a natural solution because it allows different organizations to cooperatively learn an effective deep learning model without sharing raw data. However, recent studies show that FL still lacks privacy protection and may cause data leakage. We investigate this challenging problem by proposing a simple yet effective algorithm, named Federated Learning on Medical Datasets using Partial Networks (FLOP), that shares only a partial model between the server and clients. Extensive experiments on benchmark data and real-world healthcare tasks show that our approach achieves comparable or better performance while reducing the privacy and security risks. Of particular interest, we conduct experiments on the COVID-19 dataset and find that our FLOP algorithm can allow different hospitals to collaboratively and effectively train a partially shared model without sharing local patients data.

Improving the Information Disclosure in Mobility-on-Demand Systems

Yue Yang,Yuan Shi,Dejian Wang,Qisheng Chen,Lei Xu,Hanqian Li,Zhouyu Fu,Xin Li,Hao Zhang

Nowadays, the ubiquity of sharing economy and the booming of ride-sharing services prompt Mobility-on-Demand (MoD) platforms to explore and develop new business modes. Different from forcing full-time drivers to serve the dispatched orders, these modes usually aim to attract part-time drivers to share their vehicles and employ a driver-choose-order pattern by displaying a sequence of orders to drivers as a candidate set. A key issue here is to determine which orders should be displayed to each driver. In this work, we propose a novel framework to tackle this issue, known as the Information Disclosure problem in MoD systems. The problem is solved in two steps combining estimation with optimization: 1) in the estimation step, we investigate the drivers choice behavior and estimate the probability of choosing an order or ignoring the displayed candidate set. 2) in the optimization step, we transform the problem into determining the optimal edge configuration in a bipartite graph, then we develop a Minimal-Loss Edge Cutting (MLEC) algorithm to solve it. Through extensive experiments on both the simulation and the real-world data from Huolala business, the proposed method remarkably improves users experience and platform efficiency. Based on these promising results, the proposed framework has been successfully deployed in the real-world MoD system in Huolala.

Talent Demand Forecasting with Attentive Neural Sequential Model

Qi Zhang,Hengshu Zhu,Ying Sun,Hao Liu,Fuzhen Zhuang,Hui Xiong

To cope with the fast-evolving business trend, it becomes critical for companies to continuously review their talent recruitment strategies by the timely forecast of talent demand in recruitment market. While many efforts have been made on recruitment market analysis, due to the sparsity of fine-grained talent demand time series and the complex temporal correlation of the recruitment market, there is still no effective approach for fine-grained talent demand forecast, which can quantitatively model the dynamics of the recruitment market. To this end, in this paper, we propose a data-driven neural sequential approach, namely Talent Demand Attention Network (TDAN), for forecasting fine-grained talent demand in the recruitment market. Specifically, we first propose to augment the univariate time series of talent demand at multiple grained levels and extract intrinsic attributes of both companies and job positions with matrix factorization techniques. Then, we design a Mixed Input Attention module to capture company trends and industry trends to alleviate the sparsity of fine-grained talent demand. Meanwhile, we design a Relation Temporal Attention module for modeling the complex temporal correlation that changes with the company and position. Finally, extensive experiments on a real-world recruitment dataset clearly validate the effectiveness of our approach for fine-grained talent demand forecast, as well as its interpretability for modeling recruitment trends. In particular, TDAN has been deployed as an important functional component of intelligent recruitment system of cooperative partner.

MEOW: A Space-Efficient Nonparametric Bid Shading Algorithm

Wei Zhang,Brendan Kitts,Yanjun Han,Zhengyuan Zhou,Tingyu Mao,Hao He,Shengjun Pan,Aaron Flores,San Gultekin,Tsachy Weissman

Bid Shading has become increasingly important in Online Advertising, with a large amount of commercial [4,12,13,29] and research work [11,20,28] recently published. Most approaches for solving the bid shading problem involve estimating the probability of win distribution, and then maximizing surplus [28]. These generally use parametric assumptions for the distribution, and there has been some discussion as to whether Log-Normal, Gamma, Beta, or other distributions are most effective [8,38,41,44]. In this paper, we show evidence that online auctions generally diverge in interesting ways from classic distributions. In particular, real auctions generally exhibit significant structure, due to the way that humans set up campaigns and inventory floor prices [16,26]. Using these insights, we present a nonparametric method for Bid Shading which enables the exploitation of this deep structure. The algorithm has low time and space complexity, and is designed to operate within the challenging millisecond Service Level Agreements of Real-Time Bid Servers. We deploy it in one of the largest Demand Side Platforms in the United States, and show that it reliably out-performs best in class Parametric benchmarks. We conclude by suggesting some ways that the best aspects of parametric and nonparametric approaches could be combined.

HALO: Hierarchy-aware Fault Localization for Cloud Systems

Xu Zhang,Chao Du,Yifan Li,Yong Xu,Hongyu Zhang,Si Qin,Ze Li,Qingwei Lin,Yingnong Dang,Andrew Zhou,Saravanakumar Rajmohan,Dongmei Zhang

A typical cloud system has a large amount of telemetry data collected by pervasive software monitors that keep tracking the health status of the system. The telemetry data is essentially multi-dimensional data, which contains attributes and failure/success status of the system being monitored. By identifying the attribute value combinations where the failures are mostly concentrated (which we call fault-indicating combination), we can localize the cause of system failures into a smaller scope, thus facilitating fault diagnosis. However, due to the combinatorial explosion problem and the latent hierarchical structure in cloud telemetry data, it is still intractable to localize the fault to a proper granularity in an efficient way. In this paper, we propose HALO, a hierarchy-aware fault localization approach for locating the fault-indicating combinations from telemetry data. Our approach automatically learns the hierarchical relationship among attributes and leverages the hierarchy structure for precise and efficient fault localization. We have evaluated HALO on both industrial and synthetic datasets and the results confirm that HALO outperforms the existing methods. Furthermore, we have successfully deployed HALO to different services in Microsoft Azure and Microsoft 365, witnessed its impact in real-world practice.

AutoLoss: Automated Loss Function Search in Recommendations

Xiangyu Zhao,Haochen Liu,Wenqi Fan,Hui Liu,Jiliang Tang,Chong Wang

Designing an effective loss function plays a crucial role in training deep recommender systems. Most existing works often leverage a predefined and fixed loss function that could lead to suboptimal recommendation quality and training efficiency. Some recent efforts rely on exhaustively or manually searched weights to fuse a group of candidate loss functions, which is exceptionally costly in computation and time. They also neglect the various convergence behaviors of different data examples. In this work, we propose an AutoLoss framework that can automatically and adaptively search for the appropriate loss function from a set of candidates. To be specific, we develop a novel controller network, which can dynamically adjust the loss probabilities in a differentiable manner. Unlike existing algorithms, the proposed controller can adaptively generate the loss probabilities for different data examples according to their varied convergence behaviors. Such design improves the models generalizability and transferability between deep recommender systems and datasets. We evaluate the proposed framework on two benchmark datasets. The results show that AutoLoss outperforms representative baselines. Further experiments have been conducted to deepen our understandings of AutoLoss, including its transferability, components and training efficiency.

An Efficient Deep Distribution Network for Bid Shading in First-Price Auctions

Tian Zhou,Hao He,Shengjun Pan,Niklas Karlsson,Bharatbhushan Shetty,Brendan Kitts,Djordje Gligorijevic,San Gultekin,Tingyu Mao,Junwei Pan,Jianlong Zhang,Aaron Flores

Since 2019, most ad exchanges and sell-side platforms (SSPs), in the online advertising industry, shifted from second to first price auctions. Due to the fundamental difference between these auctions, demand-side platforms (DSPs) have had to update their bidding strategies to avoid bidding unnecessarily high and hence overpaying. Bid shading was proposed to adjust the bid price intended for second-price auctions, in order to balance cost and winning probability in a first-price auction setup. In this study, we introduce a novel deep distribution network for optimal bidding in both open (non-censored) and closed (censored) online first-price auctions. Offline and online A/B testing results show that our algorithm outperforms previous state-of-art algorithms in terms of both surplus and effective cost per action (eCPX) metrics. Furthermore, the algorithm is optimized in run-time and has been deployed into VerizonMedia DSP as production algorithm, serving hundreds of billions of bid requests per day. Online A/B test shows that advertisers ROI are improved by +2.4%, +2.4%, and +8.6% for impression based (CPM), click based (CPC), and conversion based (CPA) campaigns respectively.

Pre-trained Language Model based Ranking in Baidu Search

Lixin Zou,Shengqiang Zhang,Hengyi Cai,Dehong Ma,Suqi Cheng,Shuaiqiang Wang,Daiting Shi,Zhicong Cheng,Dawei Yin

As the heart of a search engine, the ranking system plays a crucial role in satisfying users information demands. More recently, neural rankers fine-tuned from pre-trained language models (PLMs) establish state-of-the-art ranking effectiveness. However, it is nontrivial to directly apply these PLM-based rankers to the large-scale web search system due to the following challenging issues: (1) the prohibitively expensive computations of massive neural PLMs, especially for long texts in the web document, prohibit their deployments in an online ranking system that demands extremely low latency; (2) the discrepancy between existing ranking-agnostic pre-training objectives and the ad-hoc retrieval scenarios that demand comprehensive relevance modeling is another main barrier for improving the online ranking system; (3) a real-world search engine typically involves a committee of ranking components, and thus the compatibility of the individually fine-tuned ranking model is critical for a cooperative ranking system. In this work, we contribute a series of successfully applied techniques in tackling these exposed issues when deploying the state-of-the-art Chinese pre-trained language model, i.e., ERNIE, in the online search engine system. We first articulate a novel practice to cost-efficiently summarize the web document and contextualize the resultant summary content with the query using a cheap yet powerful Pyramid-ERNIE architecture. Then we endow an innovative paradigm to finely exploit the large-scale noisy and biased post-click behavioral data for relevance-oriented pre-training. We also propose a human-anchored fine-tuning strategy tailored for the online ranking system, aiming to stabilize the ranking signals across various online components. Extensive offline and online experimental results show that the proposed techniques significantly boost the search engines performance.

Software as a Medical Device: Regulating AI in Healthcare via Responsible AI

Muhammad Aurangzeb Ahmad,Steve Overman,Christine Allen,Vikas Kumar,Ankur Teredesai,Carly Eckert

With the increased adoption of AI in healthcare, there is a growing recognition and demand to regulate AI in healthcare to avoid potential harm and unfair bias against vulnerable populations. Around a hundred governmental bodies and commissions as well as leaders in the tech sector have proposed principles to create responsible AI systems. However, most of these proposals are short on specifics which has led to charges of ethics washing. In this tutorial we offer a guide to help navigate through complex governmental regulations and explain the various constituent practical elements of a responsible AI system in healthcare in the light of proposed regulations. Additionally, we breakdown and emphasize that the recommendations from regulatory bodies like FDA or the EU are necessary but not sufficient elements of creating a responsible AI system. We elucidate how regulations and guidelines often focus on epistemic concerns to the detriment of practical concerns e.g., requirement for fairness without explicating what fairness constitutes for a use case. FDAs Software as a medical device document and EUs GDPR among other AI governance documents talk about the need for implementing sufficiently good machine learning practices. In this tutorial we elucidate what that would mean from a practical perspective for real world use cases in healthcare throughout the machine learning cycle i.e., Data Management, Data Specification, Feature Engineering, Model Evaluation, Model Specification, Model Explainability, Model Fairness, Reproducibility, checks for data leakage and model leakage. We note that conceptualizing responsible AI as a process rather than an end goal accords well with how AI systems are used in practice. We also discuss how a domain centric stakeholder perspective translates into balancing requirements for multiple competing optimization criteria.

Data Science on Blockchains

Cuneyt Gurcan Akcora,Murat Kantarcioglu,Yulia R. Gel

Blockchain technology garners an ever-increasing interest of researchers in various domains that benefit from scalable cooperation among trust-less parties. As blockchains and their applications proliferate, so do the complexity and volume of data stored by Blockchains. Analyzing this data has emerged as an important research topic, already leading to methodological advancements in information sciences.In this tutorial, we offer a holistic view of applied Data Science on Blockchains. Starting with the core components of Blockchain, we will detail the state of art in Blockchain data analytics for graph, security, and finance domains. Our examples will answer questions, such as, how to parse, extract and clean the data stored in blockchains?, how to store and query Blockchain data? and what features we could compute from blockchains?We will share tutorial notes, collected meta-information, and further reading pointers on our tutorial website at https://blockchaintutorial.github.io/

Online Advertising Incrementality Testing And Experimentation: Industry Practical Lessons

Joel Barajas,Narayan Bhamidipati,James G. Shanahan

Online advertising has historically been approached as user targeting and ad-to-user matching problems within sophisticated optimization algorithms. As the research area and ad tech industry have progressed over the last couple of decades, advertisers have increasingly emphasized the causal effect estimation of their ads (aka incrementality) using controlled experiments (or A/B testing). Even though observational approaches have been derived in marketing science since the 80s including media mix models, the availability of online advertising personalization has enabled the deployment of more rigorous randomized controlled experiments with millions of individuals. These evolutions in marketing science, online advertising, and the ad tech industry have posed incredible challenges for engineers, data scientists, and marketers alike. With low effect percentage differences (or lift) and often sparse conversion rates, the development of incrementality testing platforms at scale suggests tremendous engineering challenges in the measurement precision and detailed implementation. Similarly, the correct interpretation of results addressing a business goal within the marketing science domain requires significant data science and experimentation research expertise. All these challenges on the ongoing evolution of the online advertising industry and the heterogeneity of its sources (social, paid search, native, programmatic, etc). In the current tutorial, we propose a practical, grounded view in the incrementality testing landscape, including: The business need Solutions in the literature Design and choices in the development of incrementality testing platform The testing cycle, case studies, and recommendations to effective results delivery Incrementality testing evolution in the industry We will provide first-hand lessons on developing and operationalizing such a platform in a major combined DSP and ad network; these are based on running tens of experiments for up to two months each over the last couple of years.

Challenges in KDD and ML for Sustainable Development

Laure Berti-Equille,David Dao,Stefano Ermon,Bedharta Goswami

Artificial Intelligence and machine learning techniques can offer powerful tools for addressing the greatest challenges facing humanity and helping society adapt to a rapidly changing climate, respond to disasters and pandemic crisis, and reach the United Nations (UN) Sustainable Development Goals (SDGs) by 2030. In recent approaches for mitigation and adaptation, data analytics and ML are only one part of the solution that requires interdisciplinary and methodological research and innovations. For example, challenges include multi-modal and multi-source data fusion to combine satellite imagery with other relevant data, handling noisy and missing ground data at various spatio-temporal scales, and ensembling multiple physical and ML models to improve prediction accuracy. Despite recognized successes, there are many areas where ML is not applicable, performs poorly or gives insights that are not actionable. This tutorial will survey the recent and significant contributions in KDD and ML for sustainable development and will highlight current challenges that need to be addressed to transform and equip engaged sustainability science with robust ML-based tools to support actionable decision-making for a more sustainable future.

Explainability for Natural Language Processing

Marina Danilevsky,Shipi Dhanorkar,Yunyao Li,Lucian Popa,Kun Qian,Anbang Xu

This lecture-style tutorial, which mixes in an interactive literature browsing component, is intended for the many researchers and practitioners working with text data and on applications of natural language processing (NLP) in data science and knowledge discovery. The focus of the tutorial is on the issues of transparency and interpretability as they relate to building models for text and their applications to knowledge discovery. As black-box models have gained popularity for a broad range of tasks in recent years, both the research and industry communities have begun developing new techniques to render them more transparent and interpretable. Reporting from an interdisciplinary team of social science, human-computer interaction (HCI), and NLP/knowledge management researchers, our tutorial has two components: an introduction to explainable AI (XAI) in the NLP domain and a review of the state-of-the-art research; and findings from a qualitative interview study of individuals working on real-world NLP projects as they are applied to various knowledge extraction and discovery at a large, multinational technology and consulting corporation. The first component will introduce core concepts related to explainability in NLP. Then, we will discuss explainability for NLP tasks and report on a systematic literature review of the state-of-the-art literature in AI, NLP and HCI conferences. The second component reports on our qualitative interview study, which identifies practical challenges and concerns that arise in real-world development projects that require the modeling and understanding of text data.

Fairness and Explanation in Clustering and Outlier Detection

Ian Davidson

As machines move towards replacing humans in decision making the need to make intelligent systems transparent (explainable and fair) becomes paramount. However, fairness and explanation remain understudied problems for unsupervised learning with a recent survey on explanation not covering the topicand the seminal papers on fairness appearing only in 2017. The work in outlier detection is even more recent appearing only in the last year. The need for transparency in unsu- pervised learning is greater than in supervised learning as the lack of supervision means there is no extrinsic measure why a given model was chosen. Hence there is more room to be unfair and a greater demand for explanation. In this tutorial we will consider fairness and explanation for classic unsupervised learning methods that are used extensively in data-mining. The majority of published work is for clustering but we will also cover newer work on unsupervised outlier detection. We will cover both explanation and fairness from multiple perspectives. We begin with the philosophical, legal and ethical motivations of what we are trying to achieve with fairness and explanation. Then we move onto rigorous formal definitions of these problems, algorithmic solutions along with their limitations. We then overview example applications and future work.

New Frontiers of Multi-Network Mining: Recent Developments and Future Trend

Boxin Du,Si Zhang,Yuchen Yan,Hanghang Tong

Networks (i.e., graphs) are often collected from multiple sources and platforms, such as social networks extracted from multiple online platforms, team-specific collaboration networks within an organization, and inter-dependent infrastructure networks, etc. Such networks from different sources form the multi-networks, which can exhibit the unique patterns that are invisible if we mine the individual network separately. However, compared with single-network mining, multi-network mining is still under-explored due to its unique challenges. First ( multi-network models ), networks under different circumstances can be modeled into a variety of models. How to properly build multi-network models from the complex data? Second ( multi-network mining algorithms ), it is often nontrivial to either extend single-network mining algorithms to multi-networks or design new algorithms. How to develop effective and efficient mining algorithms on multi-networks? The objectives of this tutorial are to: (1) comprehensively review the existing multi-network models, (2) elaborate the techniques in multi-network mining with a special focus on recent advances, and (3) elucidate open challenges and future research directions. We believe this tutorial could be beneficial to various application domains, and attract researchers and practitioners from data mining as well as other interdisciplinary fields.

Data Quality for Machine Learning Tasks

Nitin Gupta,Shashank Mujumdar,Hima Patel,Satoshi Masuda,Naveen Panwar,Sambaran Bandyopadhyay,Sameep Mehta,Shanmukha Guttula,Shazia Afzal,Ruhi Sharma Mittal,Vitobha Munigala

The quality of training data has a huge impact on the efficiency, accuracy and complexity of machine learning tasks. Data remains susceptible to errors or irregularities that may be introduced during collection, aggregation or annotation stage. This necessitates profiling and assessment of data to understand its suitability for machine learning tasks and failure to do so can result in inaccurate analytics and unreliable decisions. While researchers and practitioners have focused on improving the quality of models, there are limited efforts towards improving the data quality.Assessing the quality of the data across intelligently designed metrics and developing corresponding transformation operations to address the quality gaps helps to reduce the effort of a data scientist for iterative debugging of the ML pipeline to improve model performance. This tutorial highlights the importance of analysing data quality in terms of its value for ML applications. Finding the data quality issues in data helps different personas like data stewards, data scientists, subject matter experts, or machine learning scientists to get relevant data insights and take remedial actions to rectify any issue. This tutorial surveys all the important data quality related approaches for structured, unstructured and spatio-temporal domains discussed in literature, focusing on the intuition behind them, highlighting their strengths and similarities, and illustrates their applicability to real-world problems. Finally we will discuss the interesting work IBM Research is doing in this space.

Real-time Event Detection for Emergency Response Tutorial

Alejandro Jaimes,Joel Tetreault

The amount of public data being generated on a daily basis has grown exponentially in the last few years and continues to increase at incredible speed. Most of this data is unstructured and includes text in different formats, in different languages, from many different sources; images, video, audio, and data from sensors. A lot of that data contains information about events happening all over the world, many of which require emergency response. Detecting events in public data, in real time, is therefore critical in many applications: from getting information to first responders as quickly as possible, to creating situational awareness in such emergency situations, as getting the right information to the right places as quickly as possible is critical in saving lives. When an event is ongoing, information on what is happening can be critical in making decisions to keep people safe and take control of the particular situation unfolding. First responders have to quickly make decisions that include what resources to deploy and where. Fortunately, in most emergencies, people use social media to publicly share information. At the same time, sensor data is increasingly becoming available. In order to do this, efficient computational approaches must detect and deliver the right information to the right destination. This tutorial will cover techniques at the state-of-the art to detect events in real-time from large-scale heterogeneous sources. We will focus on NLP, Computer Vision, and Anomaly Detection techniques. We will give specific examples and discuss relevant future research directions in Machine Learning, NLP, Computer Vision and other fields relevant to real time event detection. We will also discuss applications of event detection.

Graph Representation Learning: Foundations, Methods, Applications and Systems

Wei Jin,Yao Ma,Yiqi Wang,Xiaorui Liu,Jiliang Tang,Yukuo Cen,Jiezhong Qiu,Jie Tang,Chuan Shi,Yanfang Ye,Jiawei Zhang,Philip S. Yu

Graphs such as social networks and molecular graphs are ubiquitous data structures in the real world. Due to their prevalence, it is of great research importance to extract meaningful patterns from graph structured data so that downstream tasks can be facilitated. Instead of designing hand-engineered features, graph representation learning has emerged to learn representations that can encode the abundant information about the graph. It has achieved tremendous success in various tasks such as node classification, link prediction, and graph classification and has attracted increasing attention in recent years.In this tutorial, we systematically review the foundations, techniques, applications and advances in graph representation learning. We first introduce the foundations on graph theory and graph Fourier analysis. We then cover the key achievements of graph representation learning in recent years. Concretely, we discuss the six topics: 1) network embedding theories and systems; 2) foundations of graph neural networks (GNNs); 3) CogDL toolkit for GNNs; 4) scalable GNNs; 5) self-supervised learning in GNNs and 6) heterogeneous graphs and heterogeneous GNNs. Finally, we will introduce the applications of graph representation learning with a focus on recommender systems.

Machine Learning Robustness, Fairness, and their Convergence

Jae-Gil Lee,Yuji Roh,Hwanjun Song,Steven Euijong Whang

Responsible AI becomes critical where robustness and fairness must be satisfied together. Traditionally, the two topics have been studied by different communities for different applications. Robust training is designed for noisy or poisoned data where image data is typically considered. In comparison, fair training primarily deals with biased data where structured data is typically considered. Nevertheless, robust training and fair training are fundamentally similar in considering that both of them aim at fixing the inherent flaws of real-world data. In this tutorial, we first cover state-of-the-art robust training techniques where most of the research is on combating various label noises. In particular, we cover label noise modeling, robust training approaches, and real-world noisy data sets. Then, proceeding to the related fairness literature, we discuss pre-processing, in-processing, and post-processing unfairness mitigation techniques, depending on whether the mitigation occurs before, during, or after the model training. Finally, we cover the recent trend emerged to combine robust and fair training in two flavors: the former is to make the fair training more robust (i.e., robust fair training), and the latter is to consider robustness and fairness as two equals to incorporate them into a holistic framework. This tutorial is indeed timely and novel because the convergence of the two topics is increasingly common, but yet to be addressed in tutorials. The tutors have extensive experience publishing papers in top-tier machine learning and data mining venues and developing machine learning platforms.

AutoML: A Perspective where Industry Meets Academy

Yaliang Li,Zhen Wang,Bolin Ding,Ce Zhang

Machine learning methods have been adopted for various real-world applications, ranging from social networks, online image/video-sharing platforms, and e-commerce to education, healthcare, etc. However, several components of machine learning methods, including data representation, hyperparameter and model architecture, can largely affect their performance in practice. Moreover, the explosions of data scale and model size make the optimization of these components more and more time-consuming for machine learning developers. To tackle these challenges, Automated Machine Learning (AutoML) aims to automate the process of applying machine learning methods to solve real-world application tasks, reducing the time of tuning machine learning methods while maintaining good performance. In this tutorial, we will introduce the main research topics of AutoML, including Hyperparameter Optimization, Neural Architecture Search and Meta-Learning. Two emerging topics of AutoML, DNN-based Feature Generation and Machine Learning Guided Database, will also be discussed as they are important components for real-world applications. For each topic, we will motivate it with examples from industry, illustrate the state-of-the-art methods, and discuss their pros and cons from both perspectives of industry and academy. We will also discuss some future research directions based on our experience from industry and the trends in academy.

Advances in Mining Heterogeneous Healthcare Data

Fenglong Ma,Muchao Ye,Junyu Luo,Cao Xiao,Jimeng Sun

Thanks to the explosion of heterogeneous healthcare data and advanced machine learning and data mining techniques, specifically deep learning methods, we now have an opportunity to make difference in healthcare. In this tutorial, we will present state-of-the-art deep learning methods and their real-world applications, specifically focusing on exploring the unique characteristics of different types of healthcare data. The first half will be spent on introducing recent advances in mining structured healthcare data, including computational phenotyping, disease early detection/risk prediction and treatment recommendation. In the second half, we will focus on challenges specific to the unstructured healthcare data, and introduce advanced deep learning methods in automated ICD coding, understandable medical language translation, clinical trial mining, and medical report generation. This tutorial is intended for students, engineers and researchers who are interested in applying deep learning methods to healthcare, and prerequisite knowledge will be minimal. The tutorial will be concluded with open problems and a Q&A session.

On the Power of Pre-Trained Text Representations: Models and Applications in Text Mining

Yu Meng,Jiaxin Huang,Yu Zhang,Jiawei Han

Recent years have witnessed the enormous success of text representation learning in a wide range of text mining tasks. Earlier word embedding learning approaches represent words as fixed low-dimensional vectors to capture their semantics. The word embeddings so learned are used as the input features of task-specific models. Recently, pre-trained language models (PLMs), which learn universal language representations via pre-training Transformer-based neural models on large-scale text corpora, have revolutionized the natural language processing (NLP) field. Such pre-trained representations encode generic linguistic features that can be transferred to almost any text-related applications. PLMs outperform previous task-specific models in many applications as they only need to be fine-tuned on the target corpus instead of being trained from scratch. In this tutorial, we introduce recent advances in pre-trained text embeddings and language models, as well as their applications to a wide range of text mining tasks. Specifically, we first overview a set of recently developed self-supervised and weakly-supervised text embedding methods and pre-trained language models that serve as the fundamentals for downstream tasks. We then present several new methods based on pre-trained text embeddings and language models for various text mining applications such as topic discovery and text classification. We focus on methods that are weakly-supervised, domain-independent, language-agnostic, effective and scalable for mining and discovering structured knowledge from large-scale text corpora. Finally, we demonstrate with real-world datasets how pre-trained text representations help mitigate the human annotation burden and facilitate automatic, accurate and efficient text analyses.

Toward Explainable Deep Anomaly Detection

Guansong Pang,Charu Aggarwal

Anomaly explanation, also known as anomaly localization, is as important as, if not more than, anomaly detection in many real-world applications. However, it is challenging to build explainable detection models due to the lack of anomaly-supervisory information and the unbounded nature of anomaly; most existing studies exclusively focus on the detection task only, including the recently emerging deep learning-based anomaly detection that leverages neural networks to learn expressive low-dimensional representations or anomaly scores for the detection task. Deep learning models, including deep anomaly detection models, are often constructed as black boxes, which have been criticized for the lack of explainability of their prediction results. To tackle this explainability issue, there have been numerous techniques introduced over the years, many of which can be utilized or adapted to offer highly explainable detection results. This tutorial aims to present a comprehensive review of the advances in deep learning-based anomaly detection and explanation. We first review popular state-of-the-art deep anomaly detection methods from different categories of approaches, followed by the introduction of a number of principled approaches used to provide anomaly explanation for deep detection models. Through this tutorial, we aim to promote the development in algorithms, theories and evaluation of explainable deep anomaly detection in the machine learning and data mining community. The slides and other materials of the tutorial are made publicly available at https://tinyurl.com/explainableDeepAD.

Data Pricing and Data Asset Governance in the AI Era

Jian Pei,Feida Zhu,Zicun Cong,Xuan Luo,Huiwen Liu,Xin Mu

Data is one of the most critical resources in the AI Era. While substantial research has been dedicated to training machine learning models using various types of data, much less efforts have been invested in the exploration of assessing and governing data assets in end-to-end processes of machine learning and data science, that is, the pipeline where data is collected and processed, and then machine learning models are produced, requested, deployed, shared and evolved. To provide a state-of-the-art overall picture of this important and novel area and advocate the related research and development, we present a tutorial addressing two essential problems. First, in the pipeline of machine learning, how can data and machine learning models be priced properly so that contributions from various parties can be assessed and recognized in a fair manner? Second, in the collaboration among many parties in building, distributing and sharing machine learning models, how can data as assets be managed? Accordingly, the first part of our proposal surveys data and model pricing in the pipeline of machine learning, while the second part discusses data asset governance for collaborative artificial intelligence. Each part is self-contained. At the same time, the two parts echo each other and connect a series of interesting and important problems into a dynamic big picture.

From Tables to Knowledge: Recent Advances in Table Understanding

Jay Pujara,Pedro Szekely,Huan Sun,Muhao Chen

A wealth of human knowledge is expressed in structured tables, across web pages, scientific articles, spreadsheets, and databases. This wealth of knowledge is mirrored by diversity in the vast number of layout structures, content types, formats, and surface forms used to express tables. Recent advances in representation learning and knowledge representation have made progress in exploiting structural regularities in tabular data to unlock this knowledge. In this tutorial, we provide a survey of these advances for a host of table understanding tasks, including table segmentation, semantic typing of cells, transforming tables to knowledge graphs, entity linking, and table retrieval tasks for question answering. The structure of the tutorial will include three major modules. The first will provide attendees an introduction to the seminal work in organization of data in tables, and cover the major goals and approaches of computational systems that undertake table understanding. The second module will cover specific models used for table understanding tasks, such as table discovery, table segmentation and layout detection, cell classification and semantic typing, mapping tables to knowledge graphs and linking to known entities, and table retrieval in search and question answering. The final tutorial module will provide a primer for researchers who want to get involved with the table understanding community, providing them a guide to the most commonly used benchmark datasets and models, downstream applications and evaluations, and a sketch of the open problems in table understanding. Our tutorial is designed to be approachable to many different audiences, and will serve as a timely resource in a field that is quickly progressing. To engage audiences, we will incorporate opportunities to ask questions and discuss, as well as small group activities and exercises with other participants. When possible, we will incorporate practical demos that help illustrate the operation of the tools and models we discuss, as well as pointing out places where the existing state-of-the-art system can be improved. Additionally, we will introduce participants to many of the open-source tools and datasets that the tutors have built and curated, helping new researchers begin working on these problems quickly and effectively.

High-Dimensional Similarity Query Processing for Data Science

Jianbin Qin,Wei Wang,Chuan Xiao,Ying Zhang,Yaoshu Wang

Similarity query (a.k.a. nearest neighbor query) processing has been an active research topic for several decades. It is an essential procedure in a wide range of applications (e.g., classification & regression, deduplication, image retrieval, and recommender systems). Recently, representation learning and auto-encoding methods as well as pre-trained models have gained popularity. They basically deal with dense high-dimensional data, and this trend brings new opportunities and challenges to similarity query processing. Meanwhile, new techniques have emerged to tackle this long-standing problem theoretically and empirically. This tutorial aims to provide a comprehensive review of high-dimensional similarity query processing for data science. It introduces solutions from a variety of research communities, including data mining (DM), database (DB), machine learning (ML), computer vision (CV), natural language processing (NLP), and theoretical computer science (TCS), thereby highlighting the interplay between modern computer science and artificial intelligence technologies. We first discuss the importance of high-dimensional similarity query processing in data science applications, and then review query processing algorithms such as cover tree, locality sensitive hashing, product quantization, proximity graphs, as well as recent advancements such as learned indexes. We analyze their strengths and weaknesses and discuss the selection of algorithms in various application scenarios. Moreover, we consider the selectivity estimation of high-dimensional similarity queries, and show how researchers are bringing in state-of-the-art ML techniques to address this problem. We expect that this tutorial will provide an impetus towards new technologies for data science.

A Visual Tour of Bias Mitigation Techniques for Word Representations

Archit Rathore,Sunipa Dev,Jeff M. Phillips,Vivek Srikumar,Bei Wang

Word vector embeddings have been shown to contain and amplify biases in data they are extracted from. Consequently, many techniques have been proposed to identify, mitigate, and attenuate these biases in word representations. In this tutorial, we will review a collection of state-of-the-art debiasing techniques. To aid this, we provide an open source web-based visualization tool and offer hands-on experience in exploring the effects of these debiasing techniques on the geometry of high-dimensional word vectors. To help understand how various debiasing techniques change the underlying geometry, we decompose each technique into interpretable sequences of primitive operations, and study their effect on the word vectors using dimensionality reduction and interactive visual exploration.

Language Scaling: Applications, Challenges and Approaches

Linjun Shou,Ming Gong,Jian Pei,Xiubo Geng,Xingjie Zhou,Daxin Jiang

Language scaling aims to deploy Natural Language Processing (NLP) applications economically across many countries/regions with different languages. Language scaling has been heavily invested by industry since many parties want to deploy their applications/services to global markets. At the same time, scaling out NLP applications to various languages, essentially a data science problem, remains a grand challenge due to the huge differences in the morphology, syntaxes, and pragmatics among different languages. We present a comprehensive survey and tutorial on language scaling. We start with a clear problem description for language scaling and an intuitive discussion on the overall challenges. Then, we outline two major categories of approaches to language scaling, namely, model transfer and data transfer. We present a taxonomy to summarize various methods in literature. A large part of the tutorial is organized to address various types of NLP applications. Finally, we discuss several important challenges in this area and future directions.

Mixed Method Development of Evaluation Metrics

Brian St. Thomas,Praveen Chandar,Christine Hosey,Fernando Diaz

Designers of online search and recommendation services often need to develop metrics to assess system performance. This tutorial focuses on mixed methods approaches to developing user-focused evaluation metrics. This starts with choosing how data is logged or how to interpret current logged data, with a discussion of how qualitative insights and design decisions can restrict or enable certain types of logging. When we create a metric from that logged data, there are underlying assumptions about how users interact with the system and evaluate those interactions. We will cover what these assumptions look like for some traditional system evaluation metrics and highlight quantitative and qualitative methods that investigate and adapt these assumptions to be more explicit and expressive of genuine user behavior. We discuss the role that mixed methods teams can play at each stage of metric development, starting with data collection, designing both online and offline metrics, and supervising metric selection for decision making. We describe case studies and examples of these methods applied in the context of evaluating personalized search and recommendation systems. Finally, we close with practical advice for applied quantitative researchers who may be in the early stages of planning collaborations with qualitative researchers for mixed methods metrics development.

Causal Inference and Machine Learning in Practice with EconML and CausalML: Industrial Use Cases at Microsoft, TripAdvisor, Uber

Vasilis Syrgkanis,Greg Lewis,Miruna Oprescu,Maggie Hei,Keith Battocchi,Eleanor Dillon,Jing Pan,Yifeng Wu,Paul Lo,Huigang Chen,Totte Harinen,Jeong-Yoon Lee

In recent years, both academic research and industry applications see an increased effort in using machine learning methods to measure granular causal effects and design optimal policies based on these causal estimates. Open source packages such as CausalML and EconML provide a unified interface for applied researchers and industry practitioners with a variety of machine learning methods for causal inference. The tutorial will cover the topics including conditional treatment effect estimators by meta-learners and tree-based algorithms, model validations and sensitivity analysis, optimization algorithms including policy leaner and cost optimization. In addition, the tutorial will demonstrate the production of these algorithms in industry use cases.

Artificial Intelligence for Drug Discovery

Jian Tang,Fei Wang,Feixiong Cheng

Drug discovery is a long and costly process, taking on average 10 years and 2.5 billion dollars to develop a new drug. Artificial intelligence has the potential to significantly accelerate the process of drug discovery by analyzing a large amount of data generated in the biomedical domain such as bioassays, chemical experiments, and biomedical literature. Recently, there is a growing interesting in developing AI techniques for drug discovery in many different communities including machine learning, data mining, and biomedical community. In this tutorial, we will provide a detailed introduction to key problems in drug discovery such as molecular property prediction, de novo molecular design and molecular optimization, retrosynthesis reaction and prediction, and drug repurposing and combination, and also key technique advancements with artificial intelligence for these problems. This tutorial can be served as introduction materials for both computer scientist interested in drug discovery as well as drug discovery practitioners for learning the latest AI techniques along this direction.

Suresh Venkatasubramanian,Carlos Scheidegger,Sorelle Friedler,Aaron Clauset

As ML systems have become more broadly adopted in high-stakes settings, our scrutiny of them should reflect their greater impact on real lives. The field of fairness in data mining and machine learning has blossomed in the last decade, but most of the attention has been directed at tabular and image data. In this tutorial, we will discuss recent advances in network fairness. Specifically, we focus on problems where ones position in a network holds predictive value (e.g., in a classification or regression setting) and favorable network position can lead to a cascading loop of positive outcomes, leading to increased inequality. We start by reviewing important sociological notions such as social capital, information access, and influence, as well as the now-standard definitions of fairness in ML settings. We will discuss the formalizations of these concepts in the network fairness setting, presenting recent work in the field, and future directions.

Counterfactual Explanations in Explainable AI: A Tutorial

Cong Wang,Xiao-Hui Li,Haocheng Han,Shendi Wang,Luning Wang,Caleb Chen Cao,Lei Chen

Deep learning has shown powerful performances in many fields, however its black-box nature hinders its further applications. In response, explainable artificial intelligence emerges, aiming to explain the predictions and behaviors of deep learning models. Among many explanation methods, counterfactual explanation has been identified as one of the best methods due to its resemblance to human cognitive process: to deliver an explanation by constructing a contrastive situation so that human may interpret the underlying mechanism by cognitively demonstrating the difference.In this tutorial, we will introduce the cognitive concept and characteristics of counterfactual explanation, its computational form, mainstream methods, and various adaptation in terms of different explanation settings. In addition, we will demonstrate several typical use cases of counterfactual explanations in popular research areas. Finally, in light of practice, we outline the potential applications of counterfactual explanations like data augmentation or conversation system. We hope this tutorial can help the participants get an overview sense of counterfactual explanations.

Automated Machine Learning on Graph

Xin Wang,Wenwu Zhu

Machine learning on graphs has been extensively studiedin both academic and industry. However, as the literature on graph learning booms with a vast number of emerging methods and techniques, it becomes increasingly difficult to manually design the optimal machine learning algorithm for different graph-related tasks. To solve this critical challenge, automated machine learning (AutoML) on graphs which combines the strength of graph machine learning and AutoML together, is gaining attentions from the research community. In this tutorial, we discuss AutoML on graphs, primarily focusing on hyper-parameter optimization (HPO) and neural architecture search (NAS) for graph machine learning. We further overview libraries related to automated graph machine learning and in depth discuss AutoGL, the first dedicated open-source library for AutoML on graphs. In the end, we share our insights on future research directions for automated graph machine learning. To the best of our knowledge, this tutorial is the first to systematically and comprehensively review automated machine learning on graphs, possessing a great potential to draw a large amount of interests in the community.

Deep Learning on Graphs for Natural Language Processing

Lingfei Wu,Yu Chen,Heng Ji,Bang Liu

There are a rich variety of NLP problems that can be best expressed with graph structures. Due to the great power in modeling non-Euclidean data like graphs, deep learning on graphs techniques (i.e., Graph Neural Networks (GNNs)) have opened a new door to solving challenging graph-related NLP problems, and have already achieved great success. Despite the success, deep learning on graphs for NLP (DLG4NLP) still faces many challenges (e.g., automatic graph construction, graph representation learning for complex graphs, learning mapping between complex data structures). This tutorial will cover relevant and interesting topics on applying deep learning on graph techniques to NLP, including automatic graph construction for NLP, graph representation learning for NLP, advanced GNN based models (e.g., graph2seq, graph2tree, and graph2graph) for NLP, and the applications of GNNs in various NLP tasks (e.g., machine translation, natural language generation, information extraction and semantic parsing). In addition, hands-on demonstration sessions will be included to help the audience gain practical experience on applying GNNs to solve challenging NLP problems using our recently developed open source library -- Graph4NLP, the first library for researchers and practitioners for easy use of GNNs for various NLP tasks.

Adversarial Robustness in Deep Learning: From Practices to Theories

Han Xu,Yaxin Li,Xiaorui Liu,Wentao Wang,Jiliang Tang

Deep neural networks (DNNs) have achieved unprecedented accomplishments in various machine learning tasks. However, recent studies demonstrate that DNNs are extremely vulnerable to adversarial examples. They are manually synthesized input samples which look benign but can severely fool the prediction of DNN models. For machine learning practitioners who are applying DNNs, understanding the behavior of adversarial examples will not only help them improve the safety of their models, but also can help them have deeper insights into the working mechanism of the DNNs. In this tutorial, we provide a comprehensive overview on the recent advances of adversarial examples and their countermeasures, from both practical and theoretical perspectives. From the practical aspect, we give a detailed introduction of the popular algorithms to generate adversarial examples under different adversarys goals. We also discuss how the defending strategies are developed to resist these attacks, and how new attacks come out to break these defenses. From the theoretical aspect, we discuss a series of intrinsic behaviors of robust DNNs which are different from traditional DNNs, especially about their optimization and generalization properties. Finally, we introduce DeepRobust, a Pytorch adversarial learning library which aims to build a comprehensive and easy-to-use platform to foster this research field. Via our tutorial, the audience can grip the main ideas of adversarial attacks and defenses and gain a deep insight of DNNs robustness. The tutorial official website is at https://sites.google.com/view/kdd21-tutorial-adv-robust.

Physics-Guided AI for Large-Scale Spatiotemporal Data

Rose Yu,Paris Perdikaris,Anuj Karpatne

There is a great interest in scientific communities for harnessing the power of AI in applications ranging from climate science to quantum chemistry. The common theme in many of these applications is that the data are spatiotemporal with governing physics. Unfortunately, todays ML approaches are mostly purely data-driven, i.e., they solely rely on (labeled) data for learning statistical patterns. Collecting labeled data can be quite expensive in real-world applications. Moreover, the resulting black-box AI models are difficult to interpret for domain scientists. Many scientific applications contain valuable domain knowledge such as laws of physics or symmetry. On its own, black-box AI may ignore known physical laws, or spend tremendous training time only to re-discover them. This can lead to solutions that may violate physical principles or predictions that generalize poorly to unseen test scenarios. For example, energy conservation is well-understood in climate science but existing ML models predictions often fail to follow such a principle. Physics-guided or physics-informed AI is an emerging area spanning several disciplines to principally integrate physics in AI models and algorithms. The goal of this tutorial is to (1) provide an overview of spatiotemporal data analysis and its central role in science (2) survey development in physics-guided AI and their connection to existing techniques in scientific fields, and (3) identify the benchmark datasets, open problems and future directions in physics-guided AI and its broader impact. We believe this is a very timely and highly impactful topic. This tutorial will draw attention from the data science community to emerging applications in science. It will also bring new audiences from the scientific fields to data science. Currently, many techniques for analyzing scientific data have been developed in isolation. This tutorial aims to bridge the gap and facilitate cross-learning from different domains.

All You Need to Know to Build a Product Knowledge Graph

Nasser Zalmout,Chenwei Zhang,Xian Li,Yan Liang,Xin Luna Dong

Knowledge graphs have been pivotal in supporting downstream applications like search, recommendation, and question answering, among others. Therefore, knowledge graphs have naturally become key enabling technologies in e-Commerce platforms. Developing a high coverage product knowledge graph is more challenging than generic knowledge graphs. The highly specific and complex domain, the sparsity of training data, along with the dynamic taxonomies and product types, can constrain the resulting knowledge graphs. In this tutorial we present best practices and ML innovations in industry towards building a scalable product knowledge graph. Contributions in this domain benefit from the general literature in areas including information extraction and data mining, tailored to address the specific characteristics of e-Commerce platforms.

Simple and Automatic Distributed Machine Learning on Ray

Hao Zhang,Zhuohan Li,Lianmin Zheng,Ion Stoica

In recent years, the pace of innovations in the fields of machine learning (ML) has accelerated, researchers in SysML have created algorithms and systems that parallelize ML training over multiple devices or computational nodes. As ML models become more structurally complex, many systems have struggled to provide all-round performance on a variety of models. Particularly, ML scale-up is usually underestimated in terms of the amount of knowledge and time required to map from an appropriate distribution strategy to the model. Applying parallel training systems to complex models adds nontrivial development overheads in addition to model prototyping, and often results in lower-than-expected performance. This tutorial identifies research and practical pain points in parallel ML training, and discusses latest development of algorithms and systems on addressing these challenges in both usability and performance. In particular, this tutorial presents a new perspective of unifying seemingly different distributed ML training strategies. Based on it, introduces new techniques and system architectures to simplify and automate ML parallelization. This tutorial is built upon the authors years of research and industry experience, comprehensive literature survey, and several latest tutorials and papers published by the authors and peer researchers. The tutorial consists of four parts. The first part will present a landscape of distributed ML training techniques and systems, and highlight the major difficulties faced by real users when writing distributed ML code with big model or big data. The second part dives deep to explain the mainstream training strategies, guided with real use case. By developing a new and unified formulation to represent the seemingly different data- and model- parallel strategies, we describe a set of techniques and algorithms to achieve ML auto-parallelization, and compiler system architectures for auto-generating and exercising parallelization strategies based on models and clusters. The third part of this tutorial exposes a hidden layer of practical pain points in distributed ML training: hyper-parameter tuning and resource allocation, and introduces techniques to improve these aspects. The fourth part is designed as a hands-on coding session, in which we will walk through the audiences on writing distributed training programs in Python, using the various distributed ML tools and interfaces provided by the Ray ecosystem.

Causal Inference from Network Data

Elena Zheleva,David Arbour

This tutorial presents state-of-the-art research on causal inference from network data in the presence of interference. We start by motivating research in this area with real-world applications, such as measuring influence in social networks and market experimentation. We discuss the challenges of applying existing causal inference techniques designed for independent and identically distributed (i.i.d.) data to relational data, some of the solutions that currently exist and the gaps and opportunities for future research. We present existing network experiment designs for measuring different possible effects of interest. Then we focus on causal inference from observational data, its representation, identification, and estimation. We conclude with research on causal discovery in networks.

Multi-Objective Recommendations

Yong Zheng,David (Xuejun) Wang

The development of recommender systems usually deal with single-objective optimizations, such as minimizing prediction errors or maximizing the ranking quality. There is an emerging demand in multi-objective recommendations in which the recommendation list can be generated by optimizing multiple objectives. For example, researchers may balance different evaluation metrics (e.g., accuracy, novelty, diversity) in their models, or consider different objectives in a multi-task recommender. This tutorial provides an overview of the multi-objective optimization and its applications in the area of recommender systems. More specifically, we summarize the multi-objective optimization methods, identify the circumstances in which a multi-objective recommender system could be useful, and point out the challenges in multi-objective recommendations.

Towards Fair Federated Learning

Zirui Zhou,Lingyang Chu,Changxin Liu,Lanjun Wang,Jian Pei,Yong Zhang

Federated learning has become increasingly popular as it facilitates collaborative training of machine learning models among multiple clients while preserving their data privacy. In practice, one major challenge for federated learning is to achieve fairness in collaboration among the participating clients, because different clients contributions to a model are usually far from equal due to various reasons. Besides, as machine learning models are deployed in more and more important applications, how to achieve model fairness, that is, to ensure that a trained model has no discrimination against sensitive attributes, has become another critical desiderata for federated learning. In this tutorial, we discuss formulations and methods such that collaborative fairness, model fairness, and privacy can be fully respected in federated learning. We review the existing efforts and the latest progress, and discuss a series of potential directions.

Fragile Earth: Accelerating Progress towards Equitable Sustainability

Naoki Abe,Kathleen Buckingham,Bistra Dilkina,Emre Eftelioglu,Auroop Ganguly,James Hodson,Ramakrishnan Kannan

Fragile Earth 2021, our annual workshop is taking place as part of the Earth Day events at ACMs KDD 2021 Conference on research in Machine Learning and its applications. The 5th edition of Fragile Earth will bring together the research community, industry, and policymakers to develop radically new technological foundations for advancing and meeting the Sustainable Development Goals in a way that ensures equitable and inclusive progress.

Bijaya Adhikari,Ajitesh Srivastava,Sen Pei,Sarah Kefayati,Rose Yu,Amulya Yadav,Alexander Rodru00edguez,Arvind Ramanathan,Anil Vullikanti,B. Aditya Prakash

The 4th epiDAMIK@SIGKDD workshop is a forum to discuss new insights into how data mining can play a bigger role in epidemiology and public health research. While the integration of data science methods into epidemiology has significant potential, it remains under studied. We aim to raise the profile of this emerging research area of data-driven and computational epidemiology, and create a venue for presenting state-of-the-art and in-progress results-in particular, results that would otherwise be difficult to present at a major data mining conference, including lessons learnt in the trenches. The current COVID-19 pandemic has only showcased the urgency and importance of this area. Our target audience consists of data mining and machine learning researchers from both academia and industry who are interested in epidemiological and public-health applications of their work, and practitioners from the areas of mathematical epidemiology and public health.

AdKDD 2021

Abraham Bagherjeiran,Nemanja Djuric,Mihaljo Grbovic,Kuang-chih Lee,Kun Liu,Vladan Radosavljevic,Suju Rajan

The digital advertising field has always had challenging ML problems, learning from petabytes of data that is highly imbalanced, reactivity times in the milliseconds and more recently compounded with the complex users path to purchase across devices, across platforms and even online/real-world behavior. The AdKDD workshop continues to be a forum for researchers in advertising, during and after KDD. Our website which hosts slides and abstracts receives approximately 2,000 monthly visits. In surveys during AdKDD 2019 and 2020, over 60% agreed that AdKDD is the reason they attended KDD and over 90% indicated they would attend next year. The 2021 edition is particularly timely because of ongoing developments in ad tracking. We will aim to discuss notions of privacy and tracking enforced by GDPR and through company policies. In addition, we will seek papers that discuss fairness in the context of advertising, to what extent does hyper-personalization work, and on whether the ad industry as a whole needs to think through more effective business models such as incrementality. Ad tech is in an interesting place of evolution/maturity now and we would like to use the AdKDD forum to get the researchers to think not only about the ML aspects but also spark conversations about the societal ones.

ODD: Outlier Detection and Description

Siddharth Bhatia,Bryan Hooi,Leman Akoglu,Sourav Chatterjee,Xiaodong Jiang,Manish Gupta

We propose to organize the 6th ODD workshop at KDD 2021, following the successful series of the past five ODD Workshops that have been organized at KDD 2013, KDD 2014, KDD 2015, KDD 2016, and KDD 2018.

BiblioDAP21: The 1st Workshop on Bibliographic Data Analysis and Processing

Zeyd Boukhers,Philipp Mayr,Silvio Peroni

Automatic processing of bibliographic data becomes very important in digital libraries, data science and machine learning due to its importance in keeping pace with the significant increase of published papers every year from one side and to the inherent challenges from the other side. This processing has several aspects including but not limited to I) Automatic extraction of references from PDF documents, II) Building an accurate citation graph, III) Author name disambiguation, etc. Bibliographic data is heterogeneous by nature and occurs in both structured (e.g. citation graph) and unstructured (e.g. publications) formats. Therefore, it requires data science and machine learning techniques to be processed and analysed. Here we introduce BiblioDAP21: The 1st Workshop on Bibliographic Data Analysis and Processing.

Bayesian Causal Inference for Real World Interactive Systems

Nicolas Chopin,Mike Gartrell,Dawen Liang,Alberto Lumbreras,David Rohde,Yixin Wang

Machine learning has allowed many systems that we interact with to improve performance and personalize. Recommender systems in particular are one of the largest users of machine learning in production environments that have improved performance of real-world systems. Learning in these interactive systems requires models that combine very diverse signals, including the logs of the interactive system (indicating if the intervention succeeded or failed) augmented with other data sources including: collaborative filtering, text, and image data. Bayesian inference is a compelling method to combine these diverse signals in a principled manner, but deployment of systems based on Bayesian principles remain challenging. The reward signal in the system logs is often uneven. Accurate estimation of reward is possible for exploiting actions, but often poor for other actions (exploration). Non-Bayesian methods such as inverse propensity score methods, the reinforce algorithm, and other heuristic-based approaches currently dominate practice. These commonly-used heuristics are often ineffective at leveraging diverse data. In contrast, Bayesian methods offer a principled, robust framework for learning from uneven signals and combining different types of information. Drawing upon the bandit and reinforcement learning community, in this workshop we will explore innovations in Bayesian inference for real world interactive systems, and consider advantages and limitations of the Bayesian approach.

Workshop on Online and Adaptative Recommender Systems (OARS)

Xiquan Cui,Estelle Afshar,Khalifeh Al-Jadda,Srijan Kumar,Julian McAuley,Tao Ye,Kamelia Aryafar,Vachik Dave,Mohammad Korayem

Many recommender systems deployed in the real world rely on categorical user-profiles and/or pre-calculated recommendation actions that stay static during a user session. Recent trends suggest that recommender systems should model user intent in real time and constantly adapt to meet user needs at the moment or change user behavior in situ. In addition, there have been many advances that make online and adaptive recommender systems (OARS) feasible, scalable, and more sophisticated. This workshop aims to bring together practitioners and researchers from academia and industry to discuss the challenges and approaches to implement OARS algorithms and systems and improve user experiences by better modeling and responding to user intent.

Measures and Best Practices for Responsible AI

Sunipa Dev,Mehrnoosh Sameki,Jwala Dhamala,Cho-Jui Hsieh

The use of machine learning (ML) based systems has become ubiquitous including their usage in critical applications like medicine and assistive technologies. Therefore, it is important to determine the trustworthiness of these ML models and tasks. A key component in this determination is the development of task specific datasets, metrics, and best practices which are able to measure the various aspects of responsible model development and deployment including robustness, interpretability and fairness. Further, datasets are also key when training for a given task, be it coreference resolution in language modeling or facial recognition in computer vision. Imbalances and inadequate representation in datasets can have repercussions of an undesirable nature. Some common examples include how coreference resolution systems in NLU are often not all gender inclusive, discrepancies in the measurement of how robust and trustworthy machine predictions are in domains where the selective labels problem is prevalent, and discriminatory determination of pain or care levels of people belonging to different demographics in health science applications. Development of task specific datasets which do better in this regard is also extremely vital. In this workshop, we invite contributions towards different (i) datasets which help enhance task performance and inclusivity, (ii) measures and metrics which help in determining the trustworthiness of a model/dataset, (iii) assessment or remediation tools for fairer, more transparent, robust, and reliable models, and (iv) case studies describing responsible development and deployment of AI systems across fields such as healthcare, financial services, insurance, etc. The datasets, measures, mitigation techniques, and best practices could focus on different areas including (but not restricted to) the following: Fairness and Bias Robustness Reliability and Safety Interpretability Explainability Ethical AI Causal Inference Counterfactual Example Analysis They could also be focussed on the applications in diverse fields such as industry, finance, healthcare and beyond. Text based datasets can be in languages other than English as well.

Ming Ding,Yuxiao Dong,Xiao Liu,Jiezhong Qiu,Jie Tang,Zhilin Yang

The International Workshop on Pretraining: Algorithms, Architectures, and Applications (Pretrain@KDD 2021) presents interdisciplinary contributions in pretraining. The workshop is related to machine learning, deep learning, representation learning, natural language processing, computer vision, graph learning, and knowledge discovery. The program of the workshop will focus on presenting and discussing the state-of-the-art, open problems, challenges and latest models, techniques and algorithms in the field of pretraining, covering aspects of algorithms, architectures and applications.

International Workshop on Knowledge Graph: Heterogenous Graph Deep Learning and Applications

Ying Ding,Bogdan Arsintescu,Ching-Hua Chen,Haoyun Feng,Franu00e7ois Scharffe,Oshani Seneviratne,Juan Sequeda

Knowledge graph (KG) is the backbone to enable cognitive Artificial Intelligence (AI), which relies on cognitive computing and semantic reasoning. Knowledge graph is the connected data with the semantically enriched context. It is the necessary step for the next move of AI. Our daily activities have closely intermingled with various applications powered by knowledge graphs. It has even entered our healthcare system to facilitate clinical decision making and improve hospital efficiency. This workshop aims to bring researchers and practitioners to promote research and applications related to knowledge graph.

Data Science with Human in the Loop

Eduard Dragut,Yunyao Li,Lucian Popa,Slobodan Vucetic

The aim of this workshop is to stimulate research on human-computer interaction challenges in data science. We invite researchers and practitioners interested in understanding how to optimize the human-computer cooperation and how to minimize human effort along the data science pipeline in a wide range of data science tasks and real-life applications. One over-arching challenge is to raise the level of abstraction of human-computer interaction to more sophisticated interaction models that better reflect a humans conceptual model and understanding. This workshop will bring together the interdisciplinary researchers from academia, research labs and practice to share, exchange, learn, and develop preliminary results, new concepts, ideas, principles, and methodologies on understanding and improving human-computer interaction for cost-effective development of data science models and for knowledge discovery. We expect the workshop to help develop and grow a strong community of researchers who are interested in this topic, and yield future collaborations and scientific exchanges across the relevant areas of data mining, machine learning, data and knowledge management, human-machine interaction, and user interfaces.

DI-2021: The Second Document Intelligence Workshop

Benjamin Han,Douglas Burdick,Dave Lewis,Yijuan Lu,Hamid Motahari,Sandeep Tata

Business documents are central to the operation of all organizations, and they come in all shapes and sizes: project reports, planning documents, technical specifications, financial statements, meeting minutes, legal agreements, contracts, resumes, purchase orders, invoices, and many more. The ability to read, understand and interpret these documents, referred to here as Document Intelligence (DI), is challenging due to not only many domains of knowledge involved, but also their complex formats and structures, internal and external cross references deployed, and even less-than-ideal quality of scans and OCR oftentimes performed on them. This workshop aims to explore and advance the current state of research and practice in answering these challenges.

Aude Hofleitner,Meng Jiang,Srijan Kumar,Neil Shah,Kai Shu

Misinformation and misbehavior mining on the web (MIS2) workshop is held virtually on August 14, 2021 and is co-located with the ACM SIGKDD 2021 conference. The web has become a breeding ground for misbehavior and misinformation. It is timely and crucial to understand, detect, forecast, and mitigate their harm. MIS2 workshop as an interdisciplinary venue for researchers and practitioners who study the dark side of the web. The workshop program includes a peer-reviewed set of paper presentations and keynote talks, giving the attendees an immersive experience of this research field.

WIT: Workshop on deriving Insights from user-generated Text

Estevam Hruschka,Tom M. Mitchell,Marko Grobelnik,Behzad Golshan

User-Generated text is a rich source of user insights and experiences that can be very helpful in many different daily life situations, such as when deciding what product to buy, what hotel to stay, what company to apply for a job, what region to buy a house, etc. This kind of text also plays a very relevant role in current research efforts in academic research groups, technology companies, as well as big publishers, telecommunications players, recruiting and job-market focuses organizations, etc. The goal of this new workshop is to bring together researchers interested in the application of novel techniques in AI/ML/NLP and Knowledge Discovery to address challenges around harnessing text-heavy user-generated data that is available to organizations and over the Web. The workshop program contains invited speakers, contributed talks, poster sessions and a discussion panel

Daniel Jiang,Haipeng Luo,Chu Wang,Yingfei Wang

The areas of reinforcement learning and multi-armed bandits have recently seen significant innovation, while many application domains, such as e-commerce, are full of problems and challenges to which vanilla RL or MAB methods cannot directly apply. This workshop aims at filling this communication gap by creating a platform for researchers and practitioners from both the method/theory side and application side of the community. Having this platform now instead of at a later time is beneficial to all sides of the community: practitioners and frontline scientists are able to avoid re-inventing existing techniques; theory-oriented researchers can find motivation in industry problems, working within more realistic settings, and making real-world impact. The 1st Multi-armed Bandits and Reinforcement Learning Workshop was a full day workshop co-located with the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD 2021) in Singapore.

Workshop on Data-Efficient Machine Learning (DeMaL)

Sumeet Katariya,Nikhil Rao,Chandan K. Reddy

The recent increase in the size of neural networks has led to a proportional increase in the demands for high-quality human-annotated data. Labeling data is a costly and time-consuming endeavor, and the need for large data is often satiated through creative techniques such as data augmentation, transfer learning, self-supervised learning, active learning, to name a few. Many of these techniques are designed for specific data types such as images, text, and speech. The data in many data-mining applications however is multi-modal in nature, has implicit signals from user-interactions, and involves multiple agents. Given the uniqueness, importance, and growing interest in these problems, we feel that the ACM Conference on Knowledge Discovery and Data Mining (SIGKDD) 2021 is an appropriate venue for running a workshop on Data-efficient Machine Learning. In this proposal, we discuss our vision for this workshop.

Machine Learning in Finance

Senthil Kumar,Leman Akoglu,Nitesh Chawla,Jose A. Rodriguez-Serrano,Tanveer Faruquie,Saurabh Nagrecha

The finance industry is constantly faced with an ever evolving set of challenges including credit card fraud, identity theft, network intrusion, money laundering, human trafficking, and illegal sales of firearms. There are also newly emerging threats such as fake news in financial media that can lead to distortions in trading strategies and investment decisions. In addition, traditional problems such as customer analytics, forecasting, and recommendations take on a unique flavor when applied to financial data. A number of new ideas are emerging to tackle all these problems including semi-supervised learning methods, deep learning algorithms, network/graph based solutions as well as linguistic approaches. These methods must often be able to work in real-time and be able handle large volumes of data. The purpose of this workshop is to bring together researchers and practitioners to discuss both the problems faced by the financial industry and potential solutions. We have invited regular papers, positional papers and extended abstracts of work in progress. We have also encouraged short papers from financial industry practitioners that introduce domain specific problems and challenges to academic researchers. This event is the fourth in a sequence of finance related workshops we have organized at KDD since 2017.

The KDD 2021 Workshop on Causal Discovery (CD2021)

Thuc Duy Le,Jiuyong Li,Gregory Cooper,Sofia Triantafyllou,Elias Bareinboim,Huan Liu,Negar Kiyavash

As a basic and effective tool for explanation, prediction and decision making, causal relationships have been utilized in almost all disciplines. Traditionally, causal relationships are identified by making use of interventions or randomized controlled experiments. However, conducting such experiments is often expensive or even impossible due to cost or ethical concerns. Therefore, there has been an increasing interest in discovering causal relationships based on observational data, and in the past few decades, significant contributions have been made to this field by computer scientists.Inspired by such achievements and following the success of CD 2016 - CD 2020, CD 2021 continues to serve as a forum for researchers and practitioners in data mining and other disciplines to share their recent research in causal discovery in their respective fields and to explore the possibility of interdisciplinary collaborations in the study of causality. Based on the platform of KDD, this workshop is especially interested in attracting contributions that link data mining/machine learning research with causal discovery, and solutions to causal discovery in large-scale datasets.

The Third International TrueFact Workshop: Making a Credible Web for Tomorrow

Subhabrata Mukherjee,Qi Li,Sihong Xie,Philip Yu,Jing Gao

The Third International TrueFact Workshop: Making a Credible Web for Tomorrow is geared towards bringing academic, industry and government researchers and practitioners together to tackle the challenges in misinformation, data quality, truth finding, fact-checking, credibility analysis and rumor detection -- in heterogeneous and multi-modal sources of information including texts, images, videos, relational data, social networks and knowledge graphs.

Anomaly and Novelty Detection, Explanation, and Accommodation (ANDEA)

Guansong Pang,Jundong Li,Anton van den Hengel,Longbing Cao,Thomas G. Dietterich

The detection of, explanation of, and accommodation to anomalies and novelties are active research areas in multiple communities, including data mining, machine learning, and computer vision. They are applied in various guises including anomaly detection, out-of-distribution example detection, adversarial example recognition and detection, curiosity-driven reinforcement learning, and open-set recognition and adaptation, all of which are of great interest to the SIGKDD community. The techniques developed have been applied in a wide range of domains including fraud detection and anti-money laundering in fintech, early disease detection, intrusion detection in large-scale computer networks and data centers, defending AI systems from adversarial attacks, and in improving the practicality of agents through overcoming the closed-world assumption.This workshop is focused on Anomaly and Novelty Detection, Explanation, and Accommodation (ANDEA). It will gather researchers and practitioners from data mining, machine learning, and computer vision communities and diverse knowledge background to promote the development of fundamental theories, effective algorithms, and novel applications of anomaly and novelty detection, characterization, and adaptation. All materials of keynote talks, panel discussion, and accepted papers of the workshop are made available at https://tinyurl.com/andea2021.

2nd International Workshop on Data Quality Assessment for Machine Learning

Hima Patel,Fuyuki Ishikawa,Laure Berti-Equille,Nitin Gupta,Sameep Mehta,Satoshi Masuda,Shashank Mujumdar,Shazia Afzal,Srikanta Bedathur,Yasuharu Nishi

The 2nd International Workshop on Data Quality Assessment for Machine Learning (DQAML21) is organized in conjunction with the Special Interest Group on Knowledge Discovery and Data Mining (SIGKDD). This workshop aims to serve as a forum for the presentation of research related to data quality assessment and remediation in AI/ML pipeline. Data quality is a critical issue in the data preparation phase and involves numerous challenging problems related to detection, remediation, visualization and evaluation of data issues. The workshop aims to provide a platform to researchers and practitioners to discuss such challenges across different modalities of data like structured, time series, text and graphical. The aim is to attract perspectives from both industrial and academic circles.

VDS21: Visualization in Data Science

Claudia Plant,Alvitta Ottley,Liang Gou,Torsten Mu00f6ller,Adam Perer,Alexander Lex,Junming Shao

Data science is the practice of deriving insight from data, enabled by modeling, computational methods, interactive visual analysis, and domain-driven problem solving. Data science draws from methodology developed in such fields as applied mathematics, statistics, machine learning, data mining, data management, visualization, and HCI. It drives discoveries in business, economy, biology, medicine, environmental science, the physical sciences, the humanities and social sciences, and beyond. Machine learning and data mining and visualization are integral parts of data science, and essential to enable sophisticated analysis of data. Nevertheless, both research areas are currently still rather separated and investigated by different communities rather independently. The goal of this workshop is to bring researchers from both communities together in order to discuss common interests, to talk about practical issues in application-related projects, and to identify open research problems. This summary gives a brief overview of the ACM KDD Workshop on Visualization in Data Science (VDS at ACM KDD and IEEE VIS), which will take place virtually on Aug 14-18, 2021 (Held in conjunction with KDD21). The workshop website is available at: http://www.visualdatascience.org/2021/

MiLeTS21: 7th KDD Workshop on Mining and Learning from Time Series

Sanjay Purushotham,Yaguang Li,Zhengping Che

Time series data are ubiquitous. Rapid advances in diverse sensing technologies, ranging from remote sensors to wearables and social sensing, are generating a rapid growth in the size and complexity of time series archives. This has resulted in a fundamental shift away from parsimonious, infrequent measurement to nearly continuous monitoring and recording. This demands development of new tools and solutions. The goals of this workshop are to: (1) highlight the significant challenges that underpin learning and mining from time series data (e.g. irregular sampling, spatiotemporal structure, and uncertainty quantification), (2) discuss recent algorithmic, theoretical, statistical, or systems-based developments for tackling these problems, and (3) synergize the research activities and discuss both new and open problems in time series analysis and mining.

MLHat: Deployable Machine Learning for Security Defense

Gang Wang,Arridhana Ciptadi,Ali Ahmadzadeh

The MLHat workshop aims to bring together academic researchers and industry practitioners to discuss the open challenges, potential solutions, and best practices to deploy machine learning at scale for security defense. The workshop will discuss related topics from both defender perspectives (white-hat) and the attacker perspectives (black-hat). We call the workshop MLHats, to serve as a place for people who are interested in using machine learning to solve practical security problems. The workshop will focus on defining new machine learning paradigms under various security application contexts and identifying exciting new future research directions. At the same time, the workshop will also have a strong industry presence to provide insights into the challenges in deploying and maintaining machine learning models and the much-needed discussion on the capabilities that the state-of-the-arts failed to provide.

The Fifth International Workshop on Automation in Machine Learning

Tao Wang,Patrick Koch,Brett Wujek,Jun Liu,Hai Li

The Fifth International Workshop on Automation in Machine Learning aims to identify opportunities and challenges for automation in machine learning, to provide an opportunity for researchers to discuss best practices for automation in machine learning potentially leading to definition of standards, and to provide a forum for researchers to speak out and debate on different ideas in automation in machine learning. The workshop agenda includes four invited keynote speakers and four accepted paper presentations chosen from a peer review process. A panel discussion will close out the workshop to allow for an engaging and interactive exchange of thoughts and ideas on AutoML.

Machine Learning for Consumers and Markets

Wen Wang,Han Zhao,Dokyun DK Lee,George H. Chen

Consumers leave digital footprints through large volumes of heterogeneous data which is a wealth of commercial value for firms, waiting to be mined. While there are initial success stories, this area is still under-explored. Further research and communication between the ML community and business community are needed to better align the objectives and create more successful applications. While machine learning is equipped to handle a variety of raw data for predictive tasks, without the theoretical insights from economics and consumer behavior to guide ML models, extracting generalizable insights with clear managerial implications and formulating impactful policies remain elusive. This workshop aims to promote further communication between these disciplines to foster synergistic development of impactful research that could benefit one another.

TMC 2021: 2021 International Workshop on Talent and Management Computing

Hui Xiong,Hengshu Zhu,Tong Xu,Xi Zhang

In todays competitive and fast-evolving business environment, it is a critical time for organizations to rethink how to deal with the talent and management related tasks in a quantitative manner. Indeed, thanks to the era of big data, the availability of large-scale talent data provides unparalleled opportunities for business leaders to understand the rules of talent and management, which in turn deliver intelligence for effective decision making and management for their organizations. In the past few years, talent and management computing have increasingly attracted attentions from KDD communities, and a number of research/applied data science efforts have been devoted. To this end, the purpose of this workshop, i.e., the 2021 International Workshop on Talent and Management Computing, is to bring together researchers and practitioners to discuss both the critical problems faced by talent and management related domains, and potential data-driven solutions by leveraging state-of-the-art data mining technologies.

PLP 2021: Workshop on Programming Language Processing

Chang Xu,Siqi Ma,David Lo

The first international Workshop on Programming Language Processing presents interdisciplinary contributions that address programming language procession problems with machine learning and data mining techniques. Recently, there are lots of successful natural language processing methods. But the mining of programming languages could not exactly follow the manner of natural language processing. The difference between natural language and programming language brings in new research challenges and opportunities. The workshop will bring together researchers from machine learning, data mining and software engineering to discuss and debate the path forward for mining the value of programming languages.

2nd International Workshop on Industrial Recommendation Systems (IRS)

Jianpeng Xu,Lingfei Wu,Xiaolin Pang,Mohit Sharma,Dawei Yin,George Karypis,Justin Basilico,Philip S. Yu

Recommendation systems are used widely across many industries, such as e-commerce, multimedia content platforms and social networks, to provide suggestions that a user will most likely consume or connect; thus, improving the user experience. This motivates people in both industry and research organizations to focus on personalization or recommendation algorithms, which has resulted in a plethora of research papers. While academic research mostly focuses on the performance of recommendation algorithms in terms of ranking quality or accuracy, it often neglects key factors that impact how a recommendation system will perform in a real-world environment. These key factors include but are not limited to: business metric definition and evaluation, recommendation quality control, data and model scalability, model interpretability, model robustness and fairness, and resource limitations, such as computing and memory resources budgets, engineering workforce cost, etc. The gap in constraints and requirements between academic research and industry limits the broad applicability of many of academias contributions for industrial recommendation systems. This workshop aspires to bridge this gap by bringing together researchers from both academia and industry. Its goal is to serve as a venue through which academic researchers become aware of the additional factors that may affect the adoption of an algorithm into real production systems, and how well it will perform if deployed. Industrial researchers will also benefit from sharing the practical insights, approaches, and frameworks as well.

Workshop on Model Mining

Shan You,Chang Xu,Fei Wang,Changshui Zhang

How to mine the knowledge in the pretrained models is of significance in achieving more promising performance, since practitioners have access to many pretrained models easily. This Workshop on Model Mining aims to investigate more diverse and advanced manners in mining knowledge within models, which tends to leverage the pretrained models more wisely, elegantly and systematically. There are many topics related to this workshop, such as distilling a lightweight model from a well-trained heavy model via teacher-student paradigm, and boosting the performance of the model by carefully designing the predecessor tasks, e.g., pre-training, self-supervised and contrastive learning. Model mining as a special way of data mining is relevant to SIGKDD, and its audience including researchers and engineers will benefit a lot for designing more advanced algorithms for their tasks.

The 4th Artificial Intelligence of Things (AIoT) Workshop

Jian Zhang,Jian Tang,Yiran Chen,Jie Liu,Jieping Ye,Marilyn Wolf,Vijaykrishnan Narayanan,Mani Srivastava,Michael I. Jordan,Victor Bahl

With advancement of recent network and chip technologies, IoT devices are becoming smarter with increasing compute power, bandwidth, and storage available on the device. This enables intelligent decision making and information transferring on the devices and unleashes the power of AIoT (Artificial Intelligence of Things) that supports scenarios such as smart city/agriculture/manufacturing/health care and self-driving scenarios. The AIoT Workshop is a forum for researchers, scientists, engineers, and practitioners to share and learn AI powered IoT solutions. The AIoT is a multi-disciplinary area, which include but not limited to IoT, AI/ML, embedded systems, and networking. The 4th AIoT workshop will be hosted virtually in conjunction with the 27th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2021). The workshop program consists of keynote(s), invited talks, accepted technical paper presentations, as well as an indoor location competition panel.

Overview of the 1st Workshop on City Brain Research

Guanjie Zheng,Porter Jenkins,Yanyan Xu,Dongyao Chen

The 1st Workshop on City Brain Research examines the current challenges and recent breakthroughs related to intelligent urban transportation. The workshop will be organized in a novel form --- offering debates on three main components involved in the transportation policy development cycle: data collection, policy learning, and the effects on human behavior. The organizers intend to invite speakers and attendees from different backgrounds, ranging from computer science, transportation, to urban planning. The final outcomes include live discussions of the three consistent topics, a comprehensive annual report summarizing current practices and future directions, and a detailed tutorial on the workshop day.

Xiaoqiang Zhu,Kuang-chih Lee,Guorui Zhou,Biye Jiang,Zhe Wang,Ruiming Tang,Kan Ren,Qingyao Ai,Weinan Zhang

Recently, we have witnessed that deep learning-based approaches has been widely applied to empower many internet-scale applications. However, the data in these internet-scale applications are high dimensional and extremely sparse, which makes it different from those applications with dense data processing, such as image classification and speech recognition, where deep learning-based approaches have been extensively studied. One of the main applications is the user-centric platform that consists of great deal of users, items and user generated tabular data which are quite high-dimensional. The characteristics of such data pose unique challenges to the adoption of deep learning in these applications, including modeling, training, and online serving, etc. More and more communities from both academia and industry have initiated the endeavors to solve these challenges. This workshop will provide a venue for both the research and engineering communities to discuss and formulate the challenges, utilize opportunities, and propose new ideas in the practice of deep learning on high-dimensional sparse data.

Navigation

DeepNLP KDD2021 Accepted Paper List AI Robotic and STEM Top Conference & Journal Papers

Introduction

KDD2021 ACCEPTED PAPER LIST