GDL4LLM Logo

GDL4LLM: Each graph is a new language: Graph Learning with LLMs

Huachi Zhou, Jiahe Du, Chuang Zhou, Chang Yang, Yilin Xiao, Yuxuan Xie, Xiao Huang

The Hong Kong Polytechnic University

๐Ÿงท Code ๐Ÿ“œ Paper ๐Ÿšฉ Data

Sketched Overview

GDL4LLM introduces a novel approach to text-attributed graph learning with LLMs by treating graphs as a language rather than using natural language descriptions. Recognizing that natural language is too verbose and unstructured for modeling complex graph relationships, GDL4LLM translates graphs into a concise corpus on which LLMs can be pre-trained. This enables efficient representation of subgraphs with minimal tokens during fine-tuning. Experiments on real-world datasets show GDL4LLM outperforms description-based and embedding-based approaches by effectively modeling multi-hop neighborhoods.

Contributions:

๐Ÿ“Œ We convert the problem of modeling graph structures for LLMs into a graph language learning problem. We justify this approach by proving that the graph language learning objective enables LLMs to learn graph structural information.

๐Ÿ”ง We introduce GDL4LLM, a simple yet effective framework. It generates a graph language corpus from the given graph and pre-trains LLMs on this corpus to understand the graph. The framework then samples from the graph language corpus to represent subgraphs centered around target nodes for fine-tuning on downstream tasks.

๐Ÿ“Š Through extensive experiments on three real-world datasets, we demonstrate that GDL4LLM outperforms competitive baselines. It surpasses both description-based and textual attribute embedding-based approaches by efficiently modeling different orders of neighbors with LLMs.

Node classification performance comparison among baselines w.r.t. micro classification accuracy across three datasets.
NLP Models GNNs ACM Wiki Amazon
Val. Test Val. Test Val. Test
Fine-tuned LMs + GNNs
Bert - 74.4 73.2 69.5 68.8 86.2 87.0
GCN 77.6 77.1 69.4 68.4 92.3 92.8
GAT 77.9 78.0 70.5 69.8 92.5 92.4
GraphSAGE 77.3 76.8 73.1 72.7 92.0 92.3
Roberta - 78.1 76.6 67.8 68.1 84.9 85.9
GCN 80.1 79.4 68.5 68.0 92.3 92.5
GAT 79.7 78.9 70.1 71.0 92.5 92.4
GraphSAGE 78.5 78.3 72.7 72.1 92.2 92.1
GraphSAGE 80.9 79.5 73.2 70.4 94.3 94.1
Specialized Frameworks for Text-Attributed Graphs
MPAD 80.1 78.9 68.8 68.0 93.1 92.8
GLEM 81.4 79.8 72.6 71.2 92.5 93.3
GraphFormers 75.3 75.1 66.8 67.5 85.6 86.4
LLAGA 77.2 77.5 71.7 72.0 90.1 90.8
InstructGLM 75.4 74.5 72.2 70.6 94.3 94.2
GDL4LLM 81.9 81.4 74.3 73.2 94.6 94.6
Fine-tuned Large Language Models +/- GNNs
GraphAdapter - 80.8 80.4 71.9 71.7 94.1 93.4
Llama3-8b - 80.7 80.6 71.9 71.2 92.0 91.6
Llama3-8b GraphSAGE 82.0 81.3 72.8 73.0 93.1 92.8
GDL4LLM w/ attr 83.9 82.8 74.0 73.4 95.8 95.5

Overall Framework and Pipeline

Framework Diagram

The figure demonstrates a comparison between mainstream methods and GDL4LLM for node-classification task. Figure (a) utilizes LLMs to embed node attributes and leverages GNN to aggregate the embeddings. Figure (b) presents the descriptions of graph structure centered around target nodes. Figure (c) illustrates how LLMs are pre-trained to capture graph structures through graph language learning, and how textual attributes are further integrated to enhance LLMs fine-tuning.

Observations and Findings

Performance Comparison of GDL4LLM Models

(i) Pre-training significantly enhances GDL4LLM performance, especially when combined with textual attributes in prompts, creating a synergistic effect where structural understanding complements semantic comprehension.

(ii) GDL4LLM performs better with Llama-3 than Llama-2 due to architectural improvements and better training data, with ablation studies confirming the effectiveness of the pre-training objective across different LLM architectures.

Hyperparameter Analysis

Discipline-specific insights

We examine two critical hyperparameters: the length of sampled graph sentences l and the number of sampled sentences k. The figure shows optimal performance at l=5 and k=10, and the performance gain is marginal when approaching this value.

These results demonstrate our framework's effectiveness in modeling high-order structural information, such as inter-order dependencies. For instance, a length of 5 captures fourth-order structural information, whereas GNNs, often converging in about two layers, typically capture only second-order information.

Contact

For inquiries or contributions, please contact us at huachi.zhou@connect.polyu.hk.