
This blog summarizes the latest research development of dialogue and large language models (LLM) papers published in ACL2023 conferences. This year there are total 79 papers related to dialogue in ACL2023. Most of the authors' affiliations are top research institutes (Google Research, DeepMind, Meta FAIR) and universities (Stanford, Berkeley, MIT, CMU and others).
Navigation
- 1.One Cannot Stand for Everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue Systems
- 2.SafeConv: Explaining and Correcting Conversational Unsafe Behavior
- 3.Span-Selective Linear Attention Transformers for Effective and Robust Schema-Guided Dialogue State Tracking
- 4.EM Pre-training for Multi-party Dialogue Response Generation
- 5.Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information
- 6.DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations
- 7.DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization
- 8.Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach
- 9.Query Enhanced Knowledge-Intensive Conversation via Unsupervised Joint Modeling
- 10.White-Box Multi-Objective Adversarial Attack on Dialogue Generation
- 11.Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
- 12.BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
- 13.Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
- 14.MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
- 15.Injecting knowledge into language generation: a case study in auto-charting after-visit care instructions from medical dialogue
- 16.DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation
- 17.BREAK: Breaking the Dialogue State Tracking Barrier with Beam Search and Re-ranking
- 18.Learning to Generate Equitable Text in Dialogue from Biased Training Data
- 19.TREA: Tree-Structure Reasoning Schema for Conversational Recommendation
- 20.CORE: Cooperative Training of Retriever-Reranker for Effective Dialogue Response Selection
- 21.PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism
- 22.MPCHAT: Towards Multimodal Persona-Grounded Conversation
- 23.Towards Boosting the Open-Domain Chatbot with Human Feedback
- 24.Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations
- 25.ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
- 26.Towards Faithful Dialogues via Focus Learning
- 27.Prompter: Zero-shot Adaptive Prefixes for Dialogue State Tracking Domain Adaptation
- 28.Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation
- 29.Privacy-Preserving Domain Adaptation of Semantic Parsers
- 30.VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
- 31.Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona
- 32.FutureTOD: Teaching Future Knowledge to Pre-trained Language Model for Task-Oriented Dialogue
- 33.PeaCoK: Persona Commonsense Knowledge for Consistent and Engaging Narratives
- 34.Retrieval-free Knowledge Injection through Multi-Document Traversal for Dialogue Models
- 35.Annotating and Detecting Fine-grained Factual Errors for Dialogue Summarization
- 36.MMDialog: A Large-scale Multi-turn Dialogue Dataset Towards Multi-modal Open-domain Conversation
- 37.Envisioning Future from the Past: Hierarchical Duality Learning for Multi-Turn Dialogue Generation
- 38.Can Language Models Make Fun? A Case Study in Chinese Comical Crosstalk
- 39.A Dataset of Argumentative Dialogues on Scientific Papers
- 40.Contextual Knowledge Learning for Dialogue Generation
- 41.Speech-Text Pre-training for Spoken Dialog Understanding with Explicit Cross-Modal Alignment
- 42.MidMed: Towards Mixed-Type Dialogues for Medical Consultation
- 43.CASE: Aligning Coarse-to-Fine Cognition and Affection for Empathetic Response Generation
- 44.RECAP: Retrieval-Enhanced Context-Aware Prefix Encoder for Personalized Dialogue Response Generation
- 45.Dual Class Knowledge Propagation Network for Multi-label Few-shot Intent Detection
- 46.The CRINGE Loss: Learning what language not to model
- 47.Modeling User Satisfaction Dynamics in Dialogue via Hawkes Process
- 48.Pre-training Multi-party Dialogue Models with Latent Discourse Inference
- 49.DecompEval: Evaluating Generated Texts as Unsupervised Decomposed Question Answering
- 50.SimOAP: Improve Coherence and Consistency in Persona-based Dialogue Generation via Over-sampling and Post-evaluation
- 51.Improved Instruction Ordering in Recipe-Grounded Conversation
- 52.Dialog-Post: Multi-Level Self-Supervised Objectives and Hierarchical Model for Dialogue Post-Training
- 53.Language Detoxification with Attribute-Discriminative Latent Space
- 54.A Cognitive Stimulation Dialogue System with Multi-source Knowledge Fusion for Elders with Cognitive Impairment
- 55.A Synthetic Data Generation Framework for Grounded Dialogues
- 56.Your spouse needs professional help: Determining the Contextual Appropriateness of Messages through Modeling Social Relationships
- 57.XDailyDialog: A Multilingual Parallel Dialogue Corpus
- 58.HAUSER: Towards Holistic and Automatic Evaluation of Simile Generation
- 59.Socratic Pretraining: Question-Driven Pretraining for Controllable Summarization
- 60.RADE: Reference-Assisted Dialogue Evaluation for Open-Domain Dialogue
- 61.Extrinsic Evaluation of Machine Translation Metrics
- 62.A Cross-Modality Context Fusion and Semantic Refinement Network for Emotion Recognition in Conversation
- 63.PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
- 64.Learning New Skills after Deployment: Improving open-domain internet-driven dialogue with human feedback
- 65.On the Compositional Generalization in Versatile Open-domain Dialogue
- 66.Dialogue Summarization with Static-Dynamic Structure Fusion Graph
- 67.Reference Matters: Benchmarking Factual Error Correction for Dialogue Summarization with Fine-grained Evaluation Framework
- 68.Multimodal Persona Based Generation of Comic Dialogs
- 69.Seen to Unseen: Exploring Compositional Generalization of Multi-Attribute Controllable Dialogue Generation
- 70.Towards Understanding Omission in Dialogue Summarization
- 71.Donâ??t Forget Your ABCâ??s: Evaluating the State-of-the-Art in Chat-Oriented Dialogue Systems
- 72.Bridging The Gap: Entailment Fused-T5 for Open-retrieval Conversational Machine Reading Comprehension
- 73.LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming
- 74.FactKG: Fact Verification via Reasoning on Knowledge Graphs
- 75.Covering Uncommon Ground: Gap-Focused Question Generation for Answer Assessment
- 76.With a Little Push, NLI Models can Robustly and Efficiently Predict Faithfulness
- 77.Controllable Mixed-Initiative Dialogue Generation through Prompting
- 78.Listener Model for the PhotoBook Referential Game with CLIPScores as Implicit Reference Chain
- 79.Towards Fewer Hallucinations in Knowledge-Grounded Dialogue Generation via Augmentative and Contrastive Knowledge-Dialogue
Paper List
1.One Cannot Stand for Everyone! Leveraging Multiple User Simulators to train Task-oriented Dialogue Systems
Yajiao Liu,Xin Jiang,Yichun Yin,Yasheng Wang,Fei Mi,Qun Liu,Xiang Wan,Benyou Wang
Download URL
https://aclanthology.org/2023.acl-long.1/
abstract
AbstractUser simulators are agents designed to imitate human users; recent advances have found that Task-oriented Dialogue (ToD) systems optimized toward a user simulator could better satisfy the need of human users. However, this might result in a sub-optimal ToD system if it is tailored to only one ad hoc user simulator, since human users can behave differently. In this paper, we propose a framework called MUST to optimize ToD systems via leveraging Multiple User SimulaTors. The main challenges of implementing MUST fall in 1) how to adaptively determine which user simulator to interact with the ToD system at each optimization step, since the ToD system might be over-fitted to some specific user simulators, and simultaneously under-fitted to some others; 2) how to avoid catastrophic forgetting of the adaption for a simulator that is not selected for several consecutive optimization steps.To tackle these challenges, we formulate MUST as a Multi-armed bandits (MAB) problem and provide a method called MUSTadaptive that balances i) the boosting adaption for adaptive interactions between different user simulators and the ToD system andii) the uniform adaption to avoid the catastrophic forgetting issue.With both automatic evaluations and human evaluations, our experimental results on MultiWOZ show that the dialogue system trained by MUST achieves a better performance than those trained by a single user simulator. It also has a better generalization ability when testing with unseen user simulators.
2.SafeConv: Explaining and Correcting Conversational Unsafe Behavior
Mian Zhang,Lifeng Jin,Linfeng Song,Haitao Mi,Wenliang Chen,Dong Yu
Download URL
https://aclanthology.org/2023.acl-long.2/
abstract
AbstractOne of the main challenges open-domain end-to-end dialogue systems, or chatbots, face is the prevalence of unsafe behavior, such as toxic languages and harmful suggestions. However, existing dialogue datasets do not provide enough annotation to explain and correct such unsafe behavior. In this work, we construct a new dataset called SafeConv for the research of conversational safety: (1) Besides the utterance-level safety labels, SafeConv also provides unsafe spans in an utterance, information able to indicate which words contribute to the detected unsafe behavior; (2) SafeConv provides safe alternative responses to continue the conversation when unsafe behavior detected, guiding the conversation to a gentle trajectory. By virtue of the comprehensive annotation of SafeConv, we benchmark three powerful models for the mitigation of conversational unsafe behavior, including a checker to detect unsafe utterances, a tagger to extract unsafe spans, and a rewriter to convert an unsafe response to a safe version. Moreover, we explore the huge benefits brought by combining the models for explaining the emergence of unsafe behavior and detoxifying chatbots. Experiments show that the detected unsafe behavior could be well explained with unsafe spans and popular chatbots could be detoxified by a huge extent. The dataset is available at https://github.com/mianzhang/SafeConv.
3.Span-Selective Linear Attention Transformers for Effective and Robust Schema-Guided Dialogue State Tracking
Björn Bebensee,Haejun Lee
Download URL
https://aclanthology.org/2023.acl-long.6/
abstract
AbstractIn schema-guided dialogue state tracking models estimate the current state of a conversation using natural language descriptions of the service schema for generalization to unseen services. Prior generative approaches which decode slot values sequentially do not generalize well to variations in schema, while discriminative approaches separately encode history and schema and fail to account for inter-slot and intent-slot dependencies. We introduce SPLAT, a novel architecture which achieves better generalization and efficiency than prior approaches by constraining outputs to a limited prediction space. At the same time, our model allows for rich attention among descriptions and history while keeping computation costs constrained by incorporating linear-time attention. We demonstrate the effectiveness of our model on the Schema-Guided Dialogue (SGD) and MultiWOZ datasets. Our approach significantly improves upon existing models achieving 85.3 JGA on the SGD dataset. Further, we show increased robustness on the SGD-X benchmark: our model outperforms the more than 30x larger D3ST-XXL model by 5.0 points.
4.EM Pre-training for Multi-party Dialogue Response Generation
Yiyang Li,Hai Zhao
Download URL
https://aclanthology.org/2023.acl-long.7/
abstract
AbstractDialogue response generation requires an agent to generate a response according to the current dialogue history, in terms of which two-party dialogues have been well studied, but leaving a great gap for multi-party dialogues at the same time. Different from two-party dialogues where each response is a direct reply to its previous utterance, the addressee of a response utterance should be specified before it is generated in the multi-party scenario. Thanks to the huge amount of two-party conversational data, various pre-trained language models for two-party dialogue response generation have been proposed. However, due to the lack of annotated addressee labels in multi-party dialogue datasets, it is hard to use them to pre-train a response generation model for multi-party dialogues. To tackle this obstacle, we propose an Expectation-Maximization (EM) approach that iteratively performs the expectation steps to generate addressee labels, and the maximization steps to optimize a response generation model. Theoretical analyses and extensive experiments have justified the feasibility and effectiveness of our proposed method. The official implementation of this paper is available at https://github.com/EricLee8/MPDRG.
5.Evaluating Open-Domain Dialogues in Latent Space with Next Sentence Prediction and Mutual Information
Kun Zhao,Bohao Yang,Chenghua Lin,Wenge Rong,Aline Villavicencio,Xiaohui Cui
Download URL
https://aclanthology.org/2023.acl-long.33/
abstract
AbstractThe long-standing one-to-many issue of the open-domain dialogues poses significant challenges for automatic evaluation methods, i.e., there may be multiple suitable responses which differ in semantics for a given conversational context.To tackle this challenge, we propose a novel learning-based automatic evaluation metric (CMN), which can robustly evaluate open-domain dialogues by augmenting Conditional Variational Autoencoders (CVAEs) with a Next Sentence Prediction (NSP) objective and employing Mutual Information (MI) to model the semantic similarity of text in the latent space. Experimental results on two open-domain dialogue datasets demonstrate the superiority of our method compared with a wide range of baselines, especially in handling responses which are distant to the â??goldenâ?? reference responses in semantics.
6.DialoGPS: Dialogue Path Sampling in Continuous Semantic Space for Data Augmentation in Multi-Turn Conversations
Ang Lv,Jinpeng Li,Yuhan Chen,Gao Xing,Ji Zhang,Rui Yan
Download URL
https://aclanthology.org/2023.acl-long.70/
abstract
AbstractIn open-domain dialogue generation tasks, contexts and responses in most datasets are one-to-one mapped, violating an important many-to-many characteristic: a context leads to various responses, and a response answers multiple contexts. Without such patterns, models poorly generalize and prefer responding safely. Many attempts have been made in either multi-turn settings from a one-to-many perspective or in a many-to-many perspective but limited to single-turn settings. The major challenge to many-to-many augment multi-turn dialogues is that discretely replacing each turn with semantic similarity breaks fragile context coherence. In this paper, we propose DialoGue Path Sampling (DialoGPS) method in continuous semantic space, the first many-to-many augmentation method for multi-turn dialogues. Specifically, we map a dialogue to our extended Brownian Bridge, a special Gaussian process. We sample latent variables to form coherent dialogue paths in the continuous space. A dialogue path corresponds to a new multi-turn dialogue and is used as augmented training data. We show the effect of DialoGPS with both automatic and human evaluation.
7.DIONYSUS: A Pre-trained Model for Low-Resource Dialogue Summarization
Yu Li,Baolin Peng,Pengcheng He,Michel Galley,Zhou Yu,Jianfeng Gao
Download URL
https://aclanthology.org/2023.acl-long.76/
abstract
AbstractDialogue summarization has recently garnered significant attention due to its wide range of applications. However, existing methods for summarizing dialogues have limitations because they do not take into account the inherent structure of dialogue and rely heavily on labeled data, which can lead to poor performance in new domains. In this work, we propose DIONYSUS (dynamic input optimization in pre-training for dialogue summarization), a pre-trained encoder-decoder model for summarizing dialogues in any new domain. To pre-train DIONYSUS, we create two pseudo summaries for each dialogue example: one from a fine-tuned summarization model and the other from important dialogue turns. We then choose one of these pseudo summaries based on information distribution differences in different types of dialogues. This selected pseudo summary serves as the objective for pre-training DIONYSUS using a self-supervised approach on a large dialogue corpus. Our experiments show that DIONYSUS outperforms existing methods on six datasets, as demonstrated by its ROUGE scores in zero-shot and few-shot settings
8.Facilitating Multi-turn Emotional Support Conversation with Positive Emotion Elicitation: A Reinforcement Learning Approach
Jinfeng Zhou,Zhuang Chen,Bo Wang,Minlie Huang
Download URL
https://aclanthology.org/2023.acl-long.96/
abstract
AbstractEmotional support conversation (ESC) aims to provide emotional support (ES) to improve oneâ??s mental state. Existing works stay at fitting grounded responses and responding strategies (e.g., question), which ignore the effect on ES and lack explicit goals to guide emotional positive transition. To this end, we introduce a new paradigm to formalize multi-turn ESC as a process of positive emotion elicitation. Addressing this task requires finely adjusting the elicitation intensity in ES as the conversation progresses while maintaining conversational goals like coherence. In this paper, we propose Supporter, a mixture-of-expert-based reinforcement learning model, and well design ES and dialogue coherence rewards to guide policyâ??s learning for responding. Experiments verify the superiority of Supporter in achieving positive emotion elicitation during responding while maintaining conversational goals including coherence.
9.Query Enhanced Knowledge-Intensive Conversation via Unsupervised Joint Modeling
Mingzhu Cai,Siqi Bao,Xin Tian,Huang He,Fan Wang,Hua Wu
Download URL
https://aclanthology.org/2023.acl-long.97/
abstract
AbstractIn this paper, we propose an unsupervised query enhanced approach for knowledge-intensive conversations, namely QKConv. There are three modules in QKConv: a query generator, an off-the-shelf knowledge selector, and a response generator. QKConv is optimized through joint training, which produces the response by exploring multiple candidate queries and leveraging corresponding selected knowledge. The joint training solely relies on the dialogue context and target response, getting exempt from extra query annotations or knowledge provenances. To evaluate the effectiveness of the proposed QKConv, we conduct experiments on three representative knowledge-intensive conversation datasets: conversational question-answering, task-oriented dialogue, and knowledge-grounded conversation. Experimental results reveal that QKConv performs better than all unsupervised methods across three datasets and achieves competitive performance compared to supervised methods.
10.White-Box Multi-Objective Adversarial Attack on Dialogue Generation
Yufei Li,Zexin Li,Yingfan Gao,Cong Liu
Download URL
https://aclanthology.org/2023.acl-long.100/
abstract
AbstractPre-trained transformers are popular in state-of-the-art dialogue generation (DG) systems. Such language models are, however, vulnerable to various adversarial samples as studied in traditional tasks such as text classification, which inspires our curiosity about their robustness in DG systems. One main challenge of attacking DG models is that perturbations on the current sentence can hardly degrade the response accuracy because the unchanged chat histories are also considered for decision-making. Instead of merely pursuing pitfalls of performance metrics such as BLEU, ROUGE, we observe that crafting adversarial samples to force longer generation outputs benefits attack effectivenessâ??the generated responses are typically irrelevant, lengthy, and repetitive. To this end, we propose a white-box multi-objective attack method called DGSlow. Specifically, DGSlow balances two objectivesâ??generation accuracy and length, via a gradient-based multi-objective optimizer and applies an adaptive searching mechanism to iteratively craft adversarial samples with only a few modifications. Comprehensive experiments on four benchmark datasets demonstrate that DGSlow could significantly degrade state-of-the-art DG models with a higher success rate than traditional accuracy-based methods. Besides, our crafted sentences also exhibit strong transferability in attacking other models.
11.Divide, Conquer, and Combine: Mixture of Semantic-Independent Experts for Zero-Shot Dialogue State Tracking
Qingyue Wang,Liang Ding,Yanan Cao,Yibing Zhan,Zheng Lin,Shi Wang,Dacheng Tao,Li Guo
Download URL
https://aclanthology.org/2023.acl-long.114/
abstract
AbstractZero-shot transfer learning for Dialogue State Tracking (DST) helps to handle a variety of task-oriented dialogue domains without the cost of collecting in-domain data. Existing works mainly study common data- or model-level augmentation methods to enhance the generalization but fail to effectively decouple semantics of samples, limiting the zero-shot performance of DST. In this paper, we present a simple and effective â??divide, conquer and combineâ?? solution, which explicitly disentangles the semantics of seen data, and leverages the performance and robustness with the mixture-of-experts mechanism. Specifically, we divide the seen data into semantically independent subsets and train corresponding experts, the newly unseen samples are mapped and inferred with mixture-of-experts with our designed ensemble inference.Extensive experiments on MultiWOZ2.1 upon T5-Adapter show our schema significantly and consistently improves the zero-shot performance, achieving the SOTA on settings without external knowledge, with only 10M trainable parameters.
12.BIG-C: a Multimodal Multi-Purpose Dataset for Bemba
Claytone Sikasote,Eunice Mukonde,Md Mahfuz Ibn Alam,Antonios Anastasopoulos
Download URL
https://aclanthology.org/2023.acl-long.115/
abstract
AbstractWe present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the â??traditionallyâ?? used high-resourced ones. All data and code are publicly available: [https://github.com/csikasote/bigc](https://github.com/csikasote/bigc).
13.Schema-Guided User Satisfaction Modeling for Task-Oriented Dialogues
Yue Feng,Yunlong Jiao,Animesh Prasad,Nikolaos Aletras,Emine Yilmaz,Gabriella Kazai
Download URL
https://aclanthology.org/2023.acl-long.116/
abstract
AbstractUser Satisfaction Modeling (USM) is one of the popular choices for task-oriented dialogue systems evaluation, where user satisfaction typically depends on whether the userâ??s task goals were fulfilled by the system. Task-oriented dialogue systems use task schema, which is a set of task attributes, to encode the userâ??s task goals. Existing studies on USM neglect explicitly modeling the userâ??s task goals fulfillment using the task schema. In this paper, we propose SG-USM, a novel schema-guided user satisfaction modeling framework. It explicitly models the degree to which the userâ??s preferences regarding the task attributes are fulfilled by the system for predicting the userâ??s satisfaction level. SG-USM employs a pre-trained language model for encoding dialogue context and task attributes. Further, it employs a fulfillment representation layer for learning how many task attributes have been fulfilled in the dialogue, an importance predictor component for calculating the importance of task attributes. Finally, it predicts the user satisfaction based on task attribute fulfillment and task attribute importance. Experimental results on benchmark datasets (i.e. MWOZ, SGD, ReDial, and JDDC) show that SG-USM consistently outperforms competitive existing methods. Our extensive analysis demonstrates that SG-USM can improve the interpretability of user satisfaction modeling, has good scalability as it can effectively deal with unseen tasks and can also effectively work in low-resource settings by leveraging unlabeled data.Code is available at https://github.com/amzn/user-satisfaction-modeling.
14.MoralDial: A Framework to Train and Evaluate Moral Dialogue Systems via Moral Discussions
Hao Sun,Zhexin Zhang,Fei Mi,Yasheng Wang,Wei Liu,Jianwei Cui,Bin Wang,Qun Liu,Minlie Huang
Download URL
https://aclanthology.org/2023.acl-long.123/
abstract
AbstractMorality in dialogue systems has raised great attention in research recently. A moral dialogue system aligned with usersâ?? values could enhance conversation engagement and user connections. In this paper, we propose a framework, MoralDial to train and evaluate moral dialogue systems. In our framework, we first explore the communication mechanisms of morality and resolve expressed morality into three parts, which indicate the roadmap for building a moral dialogue system. Based on that, we design a simple yet effective method: constructing moral discussions between simulated specific users and the dialogue system. The constructed discussions consist of expressing, explaining, revising, and inferring moral views in dialogue exchanges, which makes conversational models learn morality well in a natural manner. Furthermore, we propose a novel evaluation method under the framework. We evaluate the multiple aspects of morality by judging the relation between dialogue responses and human values in discussions, where the multifaceted nature of morality is particularly considered. Automatic and manual experiments demonstrate that our framework is promising to train and evaluate moral dialogue systems.
15.Injecting knowledge into language generation: a case study in auto-charting after-visit care instructions from medical dialogue
Maksim Eremeev,Ilya Valmianski,Xavier Amatriain,Anitha Kannan
Download URL
https://aclanthology.org/2023.acl-long.133/
abstract
AbstractFactual correctness is often the limiting factor in practical applications of natural language generation in high-stakes domains such as healthcare. An essential requirement for maintaining factuality is the ability to deal with rare tokens. This paper focuses on rare tokens that appear in both the source and the reference sequences, and which, when missed during generation, decrease the factual correctness of the output text. For high-stake domains that are also knowledge-rich, we show how to use knowledge to (a) identify which rare tokens that appear in both source and reference are important and (b) uplift their conditional probability. We introduce the â??utilization rateâ?? that encodes knowledge and serves as a regularizer by maximizing the marginal probability of selected tokens. We present a study in a knowledge-rich domain of healthcare, where we tackle the problem of generating after-visit care instructions based on patient-doctor dialogues. We verify that, in our dataset, specific medical concepts with high utilization rates are underestimated by conventionally trained sequence-to-sequence models. We observe that correcting this with our approach to knowledge injection reduces the uncertainty of the model as well as improves factuality and coherence without negatively impacting fluency.
16.DiffusEmp: A Diffusion Model-Based Framework with Multi-Grained Control for Empathetic Response Generation
Guanqun Bi,Lei Shen,Yanan Cao,Meng Chen,Yuqiang Xie,Zheng Lin,Xiaodong He
Download URL
https://aclanthology.org/2023.acl-long.158/
abstract
AbstractEmpathy is a crucial factor in open-domain conversations, which naturally shows oneâ??s caring and understanding to others. Though several methods have been proposed to generate empathetic responses, existing works often lead to monotonous empathy that refers to generic and safe expressions. In this paper, we propose to use explicit control to guide the empathy expression and design a framework DiffusEmp based on conditional diffusion language model to unify the utilization of dialogue context and attribute-oriented control signals. Specifically, communication mechanism, intent, and semantic frame are imported as multi-grained signals that control the empathy realization from coarse to fine levels. We then design a specific masking strategy to reflect the relationship between multi-grained signals and response tokens, and integrate it into the diffusion model to influence the generative process. Experimental results on a benchmark dataset EmpatheticDialogue show that our framework outperforms competitive baselines in terms of controllability, informativeness, and diversity without the loss of context-relatedness.
17.BREAK: Breaking the Dialogue State Tracking Barrier with Beam Search and Re-ranking
Seungpil Won,Heeyoung Kwak,Joongbo Shin,Janghoon Han,Kyomin Jung
Download URL
https://aclanthology.org/2023.acl-long.159/
abstract
AbstractDespite the recent advances in dialogue state tracking (DST), the joint goal accuracy (JGA) of the existing methods on MultiWOZ 2.1 still remains merely 60%. In our preliminary error analysis, we find that beam search produces a pool of candidates that is likely to include the correct dialogue state. Motivated by this observation, we introduce a novel framework, called BREAK (Beam search and RE-rAnKing), that achieves outstanding performance on DST. BREAK performs DST in two stages: (i) generating k-best dialogue state candidates with beam search and (ii) re-ranking the candidates to select the correct dialogue state. This simple yet powerful framework shows state-of-the-art performance on all versions of MultiWOZ and M2M datasets. Most notably, we push the joint goal accuracy to 80-90% on MultiWOZ 2.1-2.4, which is an improvement of 23.6%, 26.3%, 21.7%, and 10.8% over the previous best-performing models, respectively. The data and code will be available at https://github.com/tony-won/DST-BREAK
18.Learning to Generate Equitable Text in Dialogue from Biased Training Data
Anthony Sicilia,Malihe Alikhani
Download URL
https://aclanthology.org/2023.acl-long.163/
abstract
AbstractThe ingrained principles of fairness in a dialogue systemâ??s decision-making process and generated responses are crucial for user engagement, satisfaction, and task achievement. Absence of equitable and inclusive principles can hinder the formation of common ground, which in turn negatively impacts the overall performance of the system. For example, misusing pronouns in a user interaction may cause ambiguity about the intended subject. Yet, there is no comprehensive study of equitable text generation in dialogue. Aptly, in this work, we use theories of computational learning to study this problem. We provide formal definitions of equity in text generation, and further, prove formal connections between learning human-likeness and learning equity: algorithms for improving equity ultimately reduce to algorithms for improving human-likeness (on augmented data). With this insight, we also formulate reasonable conditions under which text generation algorithms can learn to generate equitable text without any modifications to the biased training data on which they learn. To exemplify our theory in practice, we look at a group of algorithms for the GuessWhat?! visual dialogue game and, using this example, test our theory empirically. Our theory accurately predicts relative-performance of multiple algorithms in generating equitable text as measured by both human and automated evaluation.
19.TREA: Tree-Structure Reasoning Schema for Conversational Recommendation
Wendi Li,Wei Wei,Xiaoye Qu,Xian-Ling Mao,Ye Yuan,Wenfeng Xie,Dangyang Chen
Download URL
https://aclanthology.org/2023.acl-long.167/
abstract
AbstractConversational recommender systems (CRS) aim to timely trace the dynamic interests of users through dialogues and generate relevant responses for item recommendations. Recently, various external knowledge bases (especially knowledge graphs) are incorporated into CRS to enhance the understanding of conversation contexts. However, recent reasoning-based models heavily rely on simplified structures such as linear structures or fixed-hierarchical structures for causality reasoning, hence they cannot fully figure out sophisticated relationships among utterances with external knowledge. To address this, we propose a novel Tree structure Reasoning schEmA named TREA. TREA constructs a multi-hierarchical scalable tree as the reasoning structure to clarify the causal relationships between mentioned entities, and fully utilizes historical conversations to generate more reasonable and suitable responses for recommended results. Extensive experiments on two public CRS datasets have demonstrated the effectiveness of our approach.
20.CORE: Cooperative Training of Retriever-Reranker for Effective Dialogue Response Selection
Chongyang Tao,Jiazhan Feng,Tao Shen,Chang Liu,Juntao Li,Xiubo Geng,Daxin Jiang
Download URL
https://aclanthology.org/2023.acl-long.174/
abstract
AbstractEstablishing retrieval-based dialogue systems that can select appropriate responses from the pre-built index has gained increasing attention. Recent common practice is to construct a two-stage pipeline with a fast retriever (e.g., bi-encoder) for first-stage recall followed by a smart response reranker (e.g., cross-encoder) for precise ranking. However, existing studies either optimize the retriever and reranker in independent ways, or distill the knowledge from a pre-trained reranker into the retriever in an asynchronous way, leading to sub-optimal performance of both modules. Thus, an open question remains about how to train them for a better combination of the best of both worlds. To this end, we present a cooperative training of the response retriever and the reranker whose parameters are dynamically optimized by the ground-truth labels as well as list-wise supervision signals from each other. As a result, the two modules can learn from each other and evolve together throughout the training. Experimental results on two benchmarks demonstrate the superiority of our method.
21.PVGRU: Generating Diverse and Relevant Dialogue Responses via Pseudo-Variational Mechanism
Yongkang Liu,Shi Feng,Daling Wang,Yifei Zhang,Hinrich Schütze
Download URL
https://aclanthology.org/2023.acl-long.185/
abstract
AbstractWe investigate response generation for multi-turn dialogue in generative chatbots. Existing generative modelsbased on RNNs (Recurrent Neural Networks) usually employ the last hidden state to summarize the history, which makesmodels unable to capture the subtle variability observed in different dialogues and cannot distinguish the differencesbetween dialogues that are similar in composition. In this paper, we propose Pseudo-Variational Gated Recurrent Unit (PVGRU). The key novelty of PVGRU is a recurrent summarizing variable thataggregates the accumulated distribution variations of subsequences. We train PVGRU without relying on posterior knowledge, thus avoiding the training-inference inconsistency problem. PVGRU can perceive subtle semantic variability through summarizing variables that are optimized by two objectives we employ for training: distribution consistency and reconstruction. In addition, we build a Pseudo-Variational Hierarchical Dialogue(PVHD) model based on PVGRU. Experimental results demonstrate that PVGRU can broadly improve the diversity andrelevance of responses on two benchmark datasets.
22.MPCHAT: Towards Multimodal Persona-Grounded Conversation
Jaewoo Ahn,Yeda Song,Sangdoo Yun,Gunhee Kim
Download URL
https://aclanthology.org/2023.acl-long.189/
abstract
AbstractIn order to build self-consistent personalized dialogue agents, previous research has mostly focused on textual persona that delivers personal facts or personalities. However, to fully describe the multi-faceted nature of persona, image modality can help better reveal the speakerâ??s personal characteristics and experiences in episodic memory (Rubin et al., 2003; Conway, 2009). In this work, we extend persona-based dialogue to the multimodal domain and make two main contributions. First, we present the first multimodal persona-based dialogue dataset named MPCHAT, which extends persona with both text and images to contain episodic memories. Second, we empirically show that incorporating multimodal persona, as measured by three proposed multimodal persona-grounded dialogue tasks (i.e., next response prediction, grounding persona prediction, and speaker identification), leads to statistically significant performance improvements across all tasks. Thus, our work highlights that multimodal persona is crucial for improving multimodal dialogue comprehension, and our MPCHAT serves as a high-quality resource for this research.
23.Towards Boosting the Open-Domain Chatbot with Human Feedback
Hua Lu,Siqi Bao,Huang He,Fan Wang,Hua Wu,Haifeng Wang
Download URL
https://aclanthology.org/2023.acl-long.224/
abstract
AbstractMany open-domain dialogue models pre-trained with social media comments can generate coherent replies but have difficulties producing engaging responses. This phenomenon might mainly result from the deficiency of annotated human-human conversations and the misalignment with human preference. In this paper, we propose a novel and efficient framework Diamante to boost the open-domain chatbot, where two kinds of human feedback (including explicit demonstration and implicit preference) are collected and leveraged. By asking annotators to select or amend the model-generated candidate responses, Diamante efficiently collects the human demonstrated responses and constructs a Chinese chit-chat dataset. To enhance the alignment with human preference, Diamante leverages the implicit preference in the data collection process and introduces the generation-evaluation joint training. Comprehensive experiments indicate that the Diamante dataset and joint training paradigm can significantly boost the performance of pre-trained dialogue models. The overall engagingness of the previous state-of-the-art model has been improved remarkably by 50% in Chinese open-domain conversations.
24.Knowledge-enhanced Mixed-initiative Dialogue System for Emotional Support Conversations
Yang Deng,Wenxuan Zhang,Yifei Yuan,Wai Lam
Download URL
https://aclanthology.org/2023.acl-long.225/
abstract
AbstractUnlike empathetic dialogues, the system in emotional support conversations (ESC) is expected to not only convey empathy for comforting the help-seeker, but also proactively assist in exploring and addressing their problems during the conversation. In this work, we study the problem of mixed-initiative ESC where the user and system can both take the initiative in leading the conversation. Specifically, we conduct a novel analysis on mixed-initiative ESC systems with a tailor-designed schema that divides utterances into different types with speaker roles and initiative types. Four emotional support metrics are proposed to evaluate the mixed-initiative interactions. The analysis reveals the necessity and challenges of building mixed-initiative ESC systems. In the light of this, we propose a knowledge-enhanced mixed-initiative framework (KEMI) for ESC, which retrieves actual case knowledge from a large-scale mental health knowledge graph for generating mixed-initiative responses. Experimental results on two ESC datasets show the superiority of KEMI in both content-preserving evaluation and mixed initiative related analyses.
25.ACCENT: An Automatic Event Commonsense Evaluation Metric for Open-Domain Dialogue Systems
Sarik Ghazarian,Yijia Shao,Rujun Han,Aram Galstyan,Nanyun Peng
Download URL
https://aclanthology.org/2023.acl-long.241/
abstract
AbstractCommonsense reasoning is omnipresent in human communications and thus is an important feature for open-domain dialogue systems. However, evaluating commonsense in dialogue systems is still an open challenge. We take the first step by focusing on event commonsense that considers events and their relations, and is crucial in both dialogues and general commonsense reasoning. We propose ACCENT, an event commonsense evaluation metric empowered by commonsense knowledge bases (CSKBs). ACCENT first extracts event-relation tuples from a dialogue, and then evaluates the response by scoring the tuples in terms of their compatibility with the CSKB. To evaluate ACCENT, we construct the first public event commonsense evaluation dataset for open-domain dialogues.Our experiments show that ACCENT is an efficient metric for event commonsense evaluation, which achieves higher correlations with human judgments than existing baselines.
26.Towards Faithful Dialogues via Focus Learning
Yifan Deng,Xingsheng Zhang,Heyan Huang,Yue Hu
Download URL
https://aclanthology.org/2023.acl-long.250/
abstract
AbstractMaintaining faithfulness between responses and knowledge is an important research topic for building reliable knowledge-grounded dialogue systems. Existing models heavily rely on elaborate data engineering or increasing the modelâ??s parameters ignoring to track the tokens that significantly influence losses, which is decisive for the optimization direction of the model in each iteration. To address this issue, we propose Focus Learning (FocusL), a novel learning approach that adjusts the contribution of each token to the optimization direction by directly scaling the corresponding objective loss. Specifically, we first introduce a positioning method by utilizing similarity distributions between knowledge and each response token to locate knowledge-aware tokens. Then, we further design a similarity-to-weight transformation to provide dynamic token-level weights for the cross-entropy loss. Finally, we use the weighted loss to encourage the model to pay special attention to the knowledge utilization. Experimental results demonstrate that our method achieves the new state-of-the-art results and generates more reliable responses while maintaining training stability.
27.Prompter: Zero-shot Adaptive Prefixes for Dialogue State Tracking Domain Adaptation
Ibrahim Taha Aksu,Min-Yen Kan,Nancy Chen
Download URL
https://aclanthology.org/2023.acl-long.252/
abstract
AbstractA challenge in the Dialogue State Tracking (DST) field is adapting models to new domains without using any supervised data â?? zero-shot domain adaptation. Parameter-Efficient Transfer Learning (PETL) has the potential to address this problem due to its robustness. However, it has yet to be applied to the zero-shot scenarios, as it is not clear how to apply it unsupervisedly. Our method, Prompter, uses descriptions of target domain slots to generate dynamic prefixes that are concatenated to the key and values at each layerâ??s self-attention mechanism. This allows for the use of prefix-tuning in zero-shot. Prompter outperforms previous methods on both the MultiWOZ and SGD benchmarks. In generating prefixes, our analyses find that Prompter not only utilizes the semantics of slot descriptions but also how often the slots appear together in conversation. Moreover, Prompterâ??s gains are due to its improved ability to distinguish â??noneâ??-valued dialogue slots, compared against baselines.
28.Enhancing Dialogue Generation via Dynamic Graph Knowledge Aggregation
Chen Tang,Hongbo Zhang,Tyler Loakman,Chenghua Lin,Frank Guerin
Download URL
https://aclanthology.org/2023.acl-long.253/
abstract
AbstractIncorporating external graph knowledge into neural chatbot models has been proven effective for enhancing dialogue generation. However, in conventional graph neural networks (GNNs), message passing on a graph is independent from text, resulting in the graph representation hidden space differing from that of the text. This training regime of existing models therefore leads to a semantic gap between graph knowledge and text. In this study, we propose a novel framework for knowledge graph enhanced dialogue generation. We dynamically construct a multi-hop knowledge graph with pseudo nodes to involve the language model in feature aggregation within the graph at all steps. To avoid the semantic biases caused by learning on vanilla subgraphs, the proposed framework applies hierarchical graph attention to aggregate graph features on pseudo nodes and then attains a global feature. Therefore, the framework can better utilise the heterogeneous features from both the post and external graph knowledge. Extensive experiments demonstrate that our framework outperforms state-of-the-art (SOTA) baselines on dialogue generation. Further analysis also shows that our representation learning framework can fill the semantic gap by coagulating representations of both text and graph knowledge. Moreover, the language model also learns how to better select knowledge triples for a more informative response via exploiting subgraph patterns within our feature aggregation process. Our code and resources are available at https://github.com/tangg555/SaBART.
29.Privacy-Preserving Domain Adaptation of Semantic Parsers
Fatemehsadat Mireshghallah,Yu Su,Tatsunori Hashimoto,Jason Eisner,Richard Shin
Download URL
https://aclanthology.org/2023.acl-long.271/
abstract
AbstractTask-oriented dialogue systems often assist users with personal or confidential matters. For this reason, the developers of such a system are generally prohibited from observing actual usage. So how can they know where the system is failing and needs more training data or new functionality? In this work, we study ways in which realistic user utterances can be generated synthetically, to help increase the linguistic and functional coverage of the system, without compromising the privacy of actual users. To this end, we propose a two-stage Differentially Private (DP) generation method which first generates latent semantic parses, and then generates utterances based on the parses. Our proposed approach improves MAUVE by 2.5X and parse tree function-type overlap by 1.3X relative to current approaches for private synthetic data generation, improving both on fluency and semantic coverage. We further validate our approach on a realistic domain adaptation task of adding new functionality from private user data to a semantic parser, and show overall gains of 8.5% points on its accuracy with the new feature.
30.VSTAR: A Video-grounded Dialogue Dataset for Situated Semantic Understanding with Scene and Topic Transitions
Yuxuan Wang,Zilong Zheng,Xueliang Zhao,Jinpeng Li,Yueqian Wang,Dongyan Zhao
Download URL
https://aclanthology.org/2023.acl-long.276/
abstract
AbstractVideo-grounded dialogue understanding is a challenging problem that requires machine to perceive, parse and reason over situated semantics extracted from weakly aligned video and dialogues. Most existing benchmarks treat both modalities the same as a frame-independent visual understanding task, while neglecting the intrinsic attributes in multimodal dialogues, such as scene and topic transitions. In this paper, we present Video-grounded Scene&Topic AwaRe dialogue (VSTAR) dataset, a large scale video-grounded dialogue understanding dataset based on 395 TV series. Based on VSTAR, we propose two benchmarks for video-grounded dialogue understanding: scene segmentation and topic segmentation, and one benchmark for video-grounded dialogue generation. Comprehensive experiments are performed on these benchmarks to demonstrate the importance of multimodal information and segments in video-grounded dialogue understanding and generation.
31.Enhancing Personalized Dialogue Generation with Contrastive Latent Variables: Combining Sparse and Dense Persona
Yihong Tang,Bo Wang,Miao Fang,Dongming Zhao,Kun Huang,Ruifang He,Yuexian Hou
Download URL
https://aclanthology.org/2023.acl-long.299/
abstract
AbstractThe personalized dialogue explores the consistent relationship between dialogue generation and personality. Existing personalized dialogue agents model persona profiles from three resources: sparse or dense persona descri