Lu Tan

Title: Integration of pre-trained large language model and knowledge graph embedding into topic modeling with latent Dirichlet allocation
Date: Tuesday, August 13th, 2024
Time: 10:30am
Location: LIB 7200
Supervised by: Dr. Liangliang Wang

Abstract: Probabilistic topic models are widely used for discovering potential content-related clusters, called topics, in discrete data such as text corpora, with many extensions developed. One notable research direction is knowledge-based topic models, which incorporate knowledge graph embedding (KGE) to enhance topics coherence. By projecting knowledge graphs into a low-dimensional continuous vector space, we obtain embeddings that capture the relationships between entities and the graph structure, thereby providing more information for generating quality topics. Inspired by the advanced capabilities of pre-trained large language models (LLM) in natural language processing tasks, we propose to integrate a pre-trained LLM and KGE into topic modeling with latent Dirichlet allocation (LDA). More specifically, conventional knowledge graph embedding methods are enhanced by utilizing the knowledge representation of entities and relations from a pre-trained LLM. This Pretrain-KGE is further combined with LDA in a joint statistical model to improve the interpretability of the generated topics. Experimental results indicate that the proposed method outperforms the foundational model across different settings.

丁香园AV

Statistics and Actuarial Science

Lu Tan

Contact us