Mixed-Session Conversation with
Egocentric Memory

UNIST AI
Findings of EMNLP 2024

Abstract

Recently introduced dialogue systems have demonstrated high usability. However, they still fall short of reflecting real-world conversation scenarios. Current dialogue systems exhibit an inability to replicate the dynamic, continuous, long-term interactions involving multiple partners. This shortfall arises because there have been limited efforts to account for both aspects of real-world dialogues: deeply layered interactions over the long-term dialogue and widely expanded conversation networks involving multiple participants. As the effort to incorporate these aspects combined, we introduce Mixed-Session Conversation, a dialogue system designed to construct conversations with various partners in a multi-session dialogue setup. We propose a new dataset called MiSC to implement this system. The dialogue episodes of MiSC consist of 6 consecutive sessions, with four speakers (one main speaker and three partners) appearing in each episode. Also, we propose a new dialogue model with a novel memory management mechanism, called Egocentric Memory Enhanced Mixed-Session Conversation Agent (EMMA). EMMA collects and retains memories from the main speaker's perspective during conversations with partners, enabling seamless continuity in subsequent interactions. Extensive human evaluations validate that the dialogues in MiSC demonstrate a seamless conversational flow, even when conversation partners change in each session. EMMA trained with MiSC is also evaluated to maintain high memorability without contradiction throughout the entire conversation.

Teaser
A sample of our MiSC. The main speaker collects each speaker's memory from the main speaker's perspective at the end of each session and utilizes this memory to proceed with the conversation in the following session. The memory referenced when generating utterances can be identified through symbols, and connected memories are represented by the same symbol.

Motivation

Open-domain conversations have generally evolved in two directions:
Width: Expanding the number of participants involved (e.g., multi-party conversations)
Depth: Sustaining conversation continuity over time (e.g., multi-session conversations)
method
Despite these advancements, there has been little research that addresses both breadth and depth together. This gap is addressed by the concept of Mixed-Session Conversation, a novel approach that combines both aspects. In a mixed-session conversation, the main speaker engages with different partners across multiple sessions. This contrasts with traditional multi-session conversations, where the speaker interacts with the same set of partners throughout. By introducing new conversation partners in each session, mixed-session conversations bring a dynamic, evolving interaction that enriches both the breadth and depth of the conversation.

MiSC

method
We propose MiSC, the first dataset to implement mixed-session conversations. Each episode in MiSC features one main speaker interacting with three different partners across six continuous dialogue sessions. This dataset allows for dynamic interactions between the main speaker and different conversational partners, distinguishing MiSC from traditional multi-session or multi-party datasets where speakers engage with the static partners throughout.

Egocentric Memory

In mixed-session conversation, the main speaker must retain context across multiple sessions with different partners. To achieve seamless interactions in each session, the main speaker requires a separate memory mechanism capable of handling these complexities-something that hasn't existed before.
method
To address this, we propose a novel memory manage mechanism called Egocentric Memory, which summarizes the memory elements of both the main speaker and their conversation partners from the main speaker's perspective during each session. However, simply summarizing these memories is insufficient as the sessions progress. Thus, we introduce the concept of 'memory links', ensuring that updates to the memory content remain interconnected across related memories, maintaining coherence throughout the conversation.

EMMA

We introduce EMMA, a new dialogue model trained with MiSC and built on egocentric memory, allowing to expand the conversation network both in width and depth from all perspectives.
method
EMMA collects memories of each conversation partner from its own perspective in every session, ensuring seamless continuity in subsequent sessions. It is composed of two modules: (1) a dialogue module and (2) a retrieval module, utilizing a total of 1B parameters (780M for the dialogue module and 220M for the retrieval module).

BibTeX

@article{jang2024mixed,
  title={Mixed-Session Conversation with Egocentric Memory},
  author={Jang, Jihyoung and Kim, Taeyoung and Kim, Hyounghun},
  journal={arXiv preprint arXiv:2410.02503},
  year={2024}
}