GraphRAG on technical documents - impact of knowledge graph schema

This repository contains the code, data, and results for the paper titled "GraphRAG on technical documents - impact of knowledge graph schema" by Henri Scaffidi, Prof. Melinda Hodkiewicz, Dr. Caitlin Woods, and Nicole Roocke (2025).

Overview

The project assesses how 1) domain-specific knowledge graph schema, and 2) the selection of local or global GraphRAG search options, impact the quality of GraphRAG responses to questions on technical documents. We use Microsoft's GraphRAG framework for all experiments which is available under an MIT license.

Code

The src directory of the repository contains the following:

Python code used to run GraphRAG pipelines (adapted from Microsoft's GraphRAG Notebooks)
Four sub-directories, containing settings and data for each of our four GraphRAG pipelines (differing in the specified knowledge graph schema - see entity_types in settings.yaml)

Data

The data directory of the repository contains the following:

mriwa_report_subset_txt: The .txt versions of the seven MRIWA technical reports analysed in this project. All MRIWA reports are publicly accessible at MRIWA's Project Portfolio as PDF versions. We used PyPDF2 to extract the PDF text to .txt files.
mriwa_cqa: The set of MRIWA-defined competency questions and answers used to evaluate the GraphRAG pipelines in this project.

Results

The results directory of the repository contains the GraphRAG pipelines' responses, using both local and global search, to all MRIWA-defined competency questions.

Supplementary Materials

The supplementary_materials directory of the repository contains the following:

GraphRAG performance analysis marking scheme and results
Cost analysis
Entity tagging example using a domain-specific knowledge graph schema on MRIWA report text
Cross-validation of our performance analysis results using RAGAS
Distribution of page count across MRIWA technical reports
MRIWA report sample selection process

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
data		data
results		results
src		src
supplementary_materials		supplementary_materials
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

GraphRAG on technical documents - impact of knowledge graph schema

Overview

Code

Data

Results

Supplementary Materials

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

nlp-tlp/GraphRAG-on-Minerals-Domain

Folders and files

Latest commit

History

Repository files navigation

GraphRAG on technical documents - impact of knowledge graph schema

Overview

Code

Data

Results

Supplementary Materials

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages