GraphRAG初体验
GraphRAG
网址
- https://github.com/microsoft/graphrag
- https://microsoft.github.io/graphrag/
安装
# 源码安装,你也可以直接就python包安装
# pip install graphrag
conda create --name graphrag python=3.12
conda activate graphrag
git clone git@github.com:microsoft/graphrag.git
cd graphrag
# 安装
poetry install
运行
会消耗大量token,如果你用api成本会很高,和我只是玩一下搞个本地小模型就行,不过效果会差一点
初始化
# 官方demo 我已经跑过一次了,所以我换个目录
mkdir -p ./ragdemo/input
curl https://www.gutenberg.org/cache/epub/24022/pg24022.txt -o ./ragdemo/input/book.txt
graphrag init --root ./ragdemo
# 上面执行完后会生成提示词和配置文件
# 目录如下
(graphrag) ~/work/projects/githubProjects/graphrag/ragdemo git:[main]
tree -a
.
├── .env
├── input
│ └── book.txt
├── prompts
│ ├── basic_search_system_prompt.txt
│ ├── community_report_graph.txt
│ ├── community_report_text.txt
│ ├── drift_reduce_prompt.txt
│ ├── drift_search_system_prompt.txt
│ ├── extract_claims.txt
│ ├── extract_graph.txt
│ ├── global_search_knowledge_system_prompt.txt
│ ├── global_search_map_system_prompt.txt
│ ├── global_search_reduce_system_prompt.txt
│ ├── local_search_system_prompt.txt
│ ├── question_gen_system_prompt.txt
│ └── summarize_descriptions.txt
└── settings.yaml
2 directories, 16 files
配置
然后修改一下配置文件,我直接贴一下我的配置,其实就是配置模型的地址
我这里之前运行会报什么embedding找不到还是什么忘记了,当时没记录
加上
encoding_model: cl100k_base
就可以了
settings.yaml
models:
default_chat_model:
type: openai_chat
api_base: http://cc:9004/v1 # 换成你的
auth_type: api_key
api_key: ${GRAPHRAG_API_KEY} # .env文件里面的
model: deepSeek-r1-distill-qwen-7b # 换成你的
encoding_model: cl100k_base
model_supports_json: true # 开启json返回,需要模型支持
concurrent_requests: 25
async_mode: threaded
retry_strategy: native
max_retries: -1
tokens_per_minute: 0
requests_per_minute: 0
default_embedding_model:
type: openai_embedding
api_base: http://localhost:8000/v1 # 换成你的
auth_type: api_key
api_key: ${GRAPHRAG_API_KEY}
model: bge-large-zh # 换成你的
encoding_model: cl100k_base
model_supports_json: true
concurrent_requests: 25
async_mode: threaded
retry_strategy: native
max_retries: -1 # set to -1 for dynamic retry logic (most optimal setting based on server response)
tokens_per_minute: 0 # set to 0 to disable rate limiting
requests_per_minute: 0 # set to 0 to disable rate limiting
# 下面基本没改
vector_store:
default_vector_store:
type: lancedb
db_uri: output/lancedb
container_name: default
overwrite: True
embed_text:
model_id: default_embedding_model
vector_store_id: default_vector_store
batch_max_tokens: 256
### Input settings ###
input:
type: file # or blob
file_type: text # or csv
base_dir: "input"
file_encoding: utf-8
file_pattern: ".*\\.txt$$"
chunks:
size: 1200
overlap: 100
group_by_columns: [id]
### Output settings ###
## If blob storage is specified in the following four sections,
## connection_string and container_name must be provided
cache:
type: file # [file, blob, cosmosdb]
base_dir: "cache"
reporting:
type: file # [file, blob, cosmosdb]
base_dir: "logs"
output:
type: file # [file, blob, cosmosdb]
base_dir: "output"
### Workflow settings ###
extract_graph:
model_id: default_chat_model
prompt: "prompts/extract_graph.txt"
entity_types: [organization,person,geo,event]
max_gleanings: 1
summarize_descriptions:
model_id: default_chat_model
prompt: "prompts/summarize_descriptions.txt"
max_length: 500
extract_graph_nlp:
text_analyzer:
extractor_type: regex_english # [regex_english, syntactic_parser, cfg]
extract_claims:
enabled: false
model_id: default_chat_model
prompt: "prompts/extract_claims.txt"
description: "Any claims or facts that could be relevant to information discovery."
max_gleanings: 1
community_reports:
model_id: default_chat_model
graph_prompt: "prompts/community_report_graph.txt"
text_prompt: "prompts/community_report_text.txt"
max_length: 2000
max_input_length: 8000
cluster_graph:
max_cluster_size: 10
embed_graph:
enabled: false # if true, will generate node2vec embeddings for nodes
umap:
enabled: false # if true, will generate UMAP embeddings for nodes (embed_graph must also be enabled)
snapshots:
graphml: false
embeddings: false
### Query settings ###
## The prompt locations are required here, but each search method has a number of optional knobs that can be tuned.
## See the config docs: https://microsoft.github.io/graphrag/config/yaml/#query
local_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/local_search_system_prompt.txt"
global_search:
chat_model_id: default_chat_model
map_prompt: "prompts/global_search_map_system_prompt.txt"
reduce_prompt: "prompts/global_search_reduce_system_prompt.txt"
knowledge_prompt: "prompts/global_search_knowledge_system_prompt.txt"
drift_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/drift_search_system_prompt.txt"
reduce_prompt: "prompts/drift_search_reduce_prompt.txt"
basic_search:
chat_model_id: default_chat_model
embedding_model_id: default_embedding_model
prompt: "prompts/basic_search_system_prompt.txt"
.env
GRAPHRAG_API_KEY=qingchen
索引
# 命令
graphrag index --root ./ragdemo
# 日志 因为我是用的r1推理模型,跑了快10分钟样子
Logging enabled at /home/li/work/projects/githubProjects/graphrag/ragdemo/logs/indexing-engine.log
🚀 LLM Config Params Validated
🚀 Embedding LLM Config Params Validated
Running standard indexing.
🚀 create_base_text_units
id text document_ids n_tokens
0 86f2dd5aa96fe727ccca072d573c1a41243bf8cab458ff... The Project Gutenberg eBook of A Christmas Ca... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
1 0827478fb3afa0e1da102b716b2d47c62f3c15c86c545c... and thither in\n restless haste and moanin... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
2 f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698f... -fisted hand at the grindstone, Scrooge! a\nsq... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
3 ea40fa993bc68de06cf942686eddf15b52f9ac78e098be... 'Bah!' again; and followed it up with 'Humbug!... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
4 caac5bc87dcc730bdcec5b004f703a0dfe16e4b607da84... have no doubt his liberality is well represen... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
5 9f07c229334decfb6501f878b2cd1cc40db3b4f12a320a... ,\ngnawed and mumbled by the hungry cold as bo... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
6 959d2ece7722c77132385313994d2286f0c67771b5e358... would\nbe untrue. But he put his hand upon th... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
7 0022307a5a33bdaff1b8f7253abab9b629a9ddc3d8d91c... then\nremembered to have heard that ghosts in... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
8 e8a75d7e6b6d2775cf2a15fdb85fcf157b91e7d24a0504... ,\nto save himself from falling in a swoon. Bu... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
9 7c08f37daa6a7ca2d6fb45b017bf020a30dfc48e18d638... with my eyes turned down,\nand never raise th... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
10 7106cedfc5aa145d96c1702893d6eea308e9716ca96d35... ITS\n\n\nWhen Scrooge awoke it was so dark, th... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
11 13ef805b5cf79d075a8f4a8e32d8725e4182cfad830486... of light, by which all this was visible; and\... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
12 8f550d4a33ae24fd7714d2cbf916191a2f38d0858e9080... that the crisp air laughed to hear it.\n\n'Th... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
13 3bfe0728d0f3a1e8849ab6f674ee5a306fc56b0c62e5a7... was a boy singing a Christmas\ncarol at my do... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
14 0e0a282d7502845686a91ea6a1a291993603636a845022... in. At sight of an old gentleman in a Welsh w... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
15 3d84489c0be1064ccb4c6df8e3575d05744a0d22290a7c... you or I could have told\nit him!) struck up ... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
16 ab366da07d9f56b5f77c934da766c61839b5654cfba094... nothing on which it is so hard as poverty; and... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
17 8c6dec0c51e945fcceae8251235aa9d768c8d9c41c7f06... liked, I own, to have touched her lips; to ha... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
18 7a9311b7e329ae1fac52e6ac695fc4ed9d48febf2bcff4... that he turned\nuncomfortably cold when he be... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
19 a227efd3cc209ea9cfb69a943b753b820ecd2eb51347a9... in it, and the ancient sheath was\neaten up w... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
20 b759ca5379d5a531ef50ef76d61698acaee3036cf6cac1... scales descending on the counter made a merry... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
21 bfcc6aef4e7efdd5c32b0157c5691d3d2274b6ebda3d23... and on the threshold of the door the Spirit s... \n\nSuch a bustle ensued that you might ... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
23 47cbca4e8ee4879e2a76c00f5b0a9bf574a6cb38835471... tell\nme if Tiny Tim will live.'\n\n'I see a v... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
24 32031f6041976bcd6a4755f14c8cf6f5504de9b8ae1190... the blaze showed preparations for a\ncosy din... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
25 f2cfcaa0558776e37c69b7ad7d699be6a339877d52fe66... had a Christmas thought, or spoke below his br... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
26 a4ead601c850e33f98e9d8557c1f202e171b35a6341618... such a ridiculous fellow!'\n\nScrooge's nephe... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
27 a1a89e8413e412e2f2cfd4593f9a6ea8da1c518e7d1210... they were sharp girls too, as Topper could ha... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
28 42edb1421d271236b16218a24c4816128915b258ac6b0d... \nintently at the Spirit's robe, 'but I see so... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
29 65fdbee3de754a7c64bcfc64e8277e03359ac720c8dbac... \nprecious time to me, I know. Lead on, Spirit... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
30 dc85e50cb0aace572490a39a839e1e9aa2230ba016b1ec... saw his new-born resolutions carried out\nin ... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
31 b4eb7ad3a32605c8d8ab7974be2cb0cd8e03b2fe3b1bec... ,' replied the woman: 'and it\nshould have bee... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
32 457a565186babdf687cd01dce30b827a6a992931bbe03f... announced itself in awful language.\n\nThe roo... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
33 92dc0829fc2a9d47ac9811e67f75bb93d92da2370a514e... the money;\nand even though we were not, it w... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
34 04c8bb3d14e0ee6d2bf625cd54f57b074b81d107c5a96a... 't know.'\n\n'Knew what, my dear?'\n\n'Why, th... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
35 5bd541b658bb07ac08bbaf75e10f6d512ab7278687ba0f... his knees.\n\nThe finger pointed from the gra... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
36 39978da5debac56b66bb05401e2657988eac8f858ad8e9... said Scrooge.\n\n'To-day!' replied the boy. '... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
37 d2d2f89d4b28e7c9deb3e9a4f9103deb88cffdafe9eeb9... the gentleman, as if his breath were taken aw... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
38 9666894f76db0df7fee03437fb9427ddafd4694c5a5899... . 'A merrier Christmas,\nBob, my good fellow, ... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
39 227b5ec1d4d4dd9e677008e41a9eaa1fc2669f824e972d... .C. The Project Gutenberg Literary Archive Fou... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
40 9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a43... charge a reasonable fee for copies of or prov... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1200
41 ce7a46b0af4208721555b8e02d9645d4546ae01188f645... . The invalidity or\nunenforceability of any p... [bb49d2e2192b79850db11ace71aef4f82181eb998921c... 1055
🚀 create_final_documents
id human_readable_id title text text_unit_ids creation_date metadata
0 bb49d2e2192b79850db11ace71aef4f82181eb998921c9... 1 book.txt The Project Gutenberg eBook of A Christmas Ca... [86f2dd5aa96fe727ccca072d573c1a41243bf8cab458f... 2025-03-20 11:43:11 +0800 NaN
🚀 extract_graph
{'entities': title type text_unit_ids frequency description
0 COUNTING-HOUSE| ORGANIZATION| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The Counting-House is Scrooge's workplace in t...
1 FOG| WEATHER| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 Foggy weather was prevalent throughout the story
2 CITY| GEO| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The story takes place in a city with a freezin...
3 FREST| GEO| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The city was experiencing frequent fog and fro...
4 CHRISTMAS| EVENT| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The story occurs on Christmas Eve
.. ... ... ... ... ...
150 UNITED STATES GEO [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1 Country where Project Gutenberg is based
151 PROJECT GUTENBERG™ ORGANIZATION [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1 Trademark associated with Project Gutenberg
152 PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1
153 809 NORTH 1500 WEST, SALTLAKE CITY, UT 84116 GEO [ce7a46b0af4208721555b8e02d9645d4546ae01188f64... 1 The geographic location of the Project Gutenbe...
154 PROFESSOR MICHAEL S. HART PERSON [ce7a46b0af4208721555b8e02d9645d4546ae01188f64... 1 Professor Michael S. Hart is the originator of...
[155 rows x 5 columns], 'relationships': source target text_unit_ids weight
description
0 COUNTING-HOUSE| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 10.0 Scrooge is the employer of the Counting-House|
1 COUNTING-HOUSE| CITY| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 8.0 The Counting-House is located in the city|
2 FREST| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 5.0 Scrooge lives in a city experiencing frequent ...
3 CHRISTMAS| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 5.0 Scrooge celebrates Christmas|
4 EVE| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1.0 Scrooge celebrates Christmas on the Eve of the...
.. ... ... ... ... ...
125 DURKE BATAGLANI MEGGIE TAZBAH [65fdbee3de754a7c64bcfc64e8277e03359ac720c8dba... 10.0 Durke Bataglani and Meggie Tazbah are in the s...
126 PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 14.0 Project Gutenberg donates royalties to the Fou...
127 UNITED STATES PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 6.0 Project Gutenberg is based in the United States
128 PROJECT GUTENBERG™ PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1.0 PROJECT GUTENBERG™ is the trademark associated...
129 809 NORTH 1500 WEST, SALTLAKE CITY, UT 84116 PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION [ce7a46b0af4208721555b8e02d9645d4546ae01188f64... 1.0 The Foundation's business office is located at...
[130 rows x 5 columns]}
🚀 finalize_graph
{'entities': title type text_unit_ids frequency description
0 COUNTING-HOUSE| ORGANIZATION| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The Counting-House is Scrooge's workplace in t...
1 FOG| WEATHER| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 Foggy weather was prevalent throughout the story
2 CITY| GEO| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The story takes place in a city with a freezin...
3 FREST| GEO| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The city was experiencing frequent fog and fro...
4 CHRISTMAS| EVENT| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1 The story occurs on Christmas Eve
.. ... ... ... ... ...
150 UNITED STATES GEO [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1 Country where Project Gutenberg is based
151 PROJECT GUTENBERG™ ORGANIZATION [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1 Trademark associated with Project Gutenberg
152 PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1
153 809 NORTH 1500 WEST, SALTLAKE CITY, UT 84116 GEO [ce7a46b0af4208721555b8e02d9645d4546ae01188f64... 1 The geographic location of the Project Gutenbe...
154 PROFESSOR MICHAEL S. HART PERSON [ce7a46b0af4208721555b8e02d9645d4546ae01188f64... 1 Professor Michael S. Hart is the originator of...
[155 rows x 5 columns], 'relationships': source target text_unit_ids weight
description
0 COUNTING-HOUSE| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 10.0 Scrooge is the employer of the Counting-House|
1 COUNTING-HOUSE| CITY| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 8.0 The Counting-House is located in the city|
2 FREST| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 5.0 Scrooge lives in a city experiencing frequent ...
3 CHRISTMAS| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 5.0 Scrooge celebrates Christmas|
4 EVE| SCROOGEBE| [f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698... 1.0 Scrooge celebrates Christmas on the Eve of the...
.. ... ... ... ... ...
125 DURKE BATAGLANI MEGGIE TAZBAH [65fdbee3de754a7c64bcfc64e8277e03359ac720c8dba... 10.0 Durke Bataglani and Meggie Tazbah are in the s...
126 PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 14.0 Project Gutenberg donates royalties to the Fou...
127 UNITED STATES PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 6.0 Project Gutenberg is based in the United States
128 PROJECT GUTENBERG™ PROJECT GUTENBERG [9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a4... 1.0 PROJECT GUTENBERG™ is the trademark associated...
129 809 NORTH 1500 WEST, SALTLAKE CITY, UT 84116 PROJECT GUTENBERG LITERARY ARCHIVE FOUNDATION [ce7a46b0af4208721555b8e02d9645d4546ae01188f64... 1.0 The Foundation's business office is located at...
[130 rows x 5 columns]}
🚀 create_communities
id human_readable_id community level ... relationship_ids text_unit_ids period size
0 a56108da-164c-4b2e-afdb-399367ab3009 0 0 0 ... [0a4bb24a-dd98-4a7d-9dcc-1f14f603d451, 18c7142... [0c9cb061f158d3dd856ae99580a5f242267108a7666da... 2025-03-20 12
1 e84bbc2a-2744-490d-bfca-dfccc2f44bcc 1 1 0 ... [370bbb49-abd3-4686-82c5-aa7779638fe0, 4032e49... [0e0a282d7502845686a91ea6a1a291993603636a84502... 2025-03-20 6
2 da5089bd-984b-4d4d-8cae-ea8987932bde 2 2 0 ... [5c123508-f3c1-4dcb-854c-1af55a09ae21, 633f14c... [7a9311b7e329ae1fac52e6ac695fc4ed9d48febf2bcff... 2025-03-20 9
3 3ff297f0-5e37-4fd8-b620-0c4f35e54c79 3 3 0 ... [1fdcc946-9ba4-4df3-94a9-eb1f3370f5ff, 290b401... [3d84489c0be1064ccb4c6df8e3575d05744a0d22290a7... 2025-03-20 8
4 752cec27-e627-4c5d-baa2-d1320bd5beff 4 4 0 ... [1131b63c-8f1b-472f-97f5-c4d6bc74c0ac, 12464fb... [0c9cb061f158d3dd856ae99580a5f242267108a7666da... 2025-03-20 15
5 d9637720-5d64-4d59-887b-b1c7ede4f70b 5 5 0 ... [07cefa22-3848-46c5-a017-91f672ffd480, 22ef97f... [ab366da07d9f56b5f77c934da766c61839b5654cfba09... 2025-03-20 7
6 322ee6a2-4aae-420a-b9e2-f22e9dcb9b7b 6 6 1 ... [0a4bb24a-dd98-4a7d-9dcc-1f14f603d451, 756c5fd... [0c9cb061f158d3dd856ae99580a5f242267108a7666da... 2025-03-20 6
7 bebca51e-1d4a-4c0c-9627-3165d568a45c 7 7 1 ... [8f141d37-92f8-47a4-a4b2-7d63272196a7, 9300294... [32031f6041976bcd6a4755f14c8cf6f5504de9b8ae119... 2025-03-20 4
8 c79a63ea-06c7-425f-8873-8536bd93c0ac 8 8 1 ... [18c71422-98db-48e4-a0de-e1a2692b77d5] [a227efd3cc209ea9cfb69a943b753b820ecd2eb51347a... 2025-03-20 2
9 ea6a056f-59df-45ee-a31f-65661a4c94d9 9 9 1 ... [33ce26b8-53f3-425a-80c3-606823b08bda] [bfcc6aef4e7efdd5c32b0157c5691d3d2274b6ebda3d2... 2025-03-20 2
10 f2437218-4ecc-4795-ac4a-ebfbc57cafad 10 10 1 ... [1cee5c45-46bd-4b67-a162-b871f3aa6a2f, 8f080e4... [0c9cb061f158d3dd856ae99580a5f242267108a7666da... 2025-03-20 4
11 f8a46e6a-906f-401b-a41b-691e808c9e58 11 11 1 ... [290944a5-8447-41ff-958f-7526f6d017cd, 295d846... [0c9cb061f158d3dd856ae99580a5f242267108a7666da... 2025-03-20 5
12 c3ee7069-f6f8-46fa-9361-5005ab450105 12 12 1 ... [12464fb3-3cce-479e-9f69-d6024977a7e9, 43afcac... [0c9cb061f158d3dd856ae99580a5f242267108a7666da... 2025-03-20 4
[13 rows x 12 columns]
🚀 create_final_text_units
id human_readable_id ... relationship_ids covariate_ids
0 86f2dd5aa96fe727ccca072d573c1a41243bf8cab458ff... 1 ... None []
1 0827478fb3afa0e1da102b716b2d47c62f3c15c86c545c... 2 ... None []
2 f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698f... 3 ... [012cf16f-57a8-480a-81ed-800ae289a240, 803da95... []
3 ea40fa993bc68de06cf942686eddf15b52f9ac78e098be... 4 ... None []
4 caac5bc87dcc730bdcec5b004f703a0dfe16e4b607da84... 5 ... None []
5 9f07c229334decfb6501f878b2cd1cc40db3b4f12a320a... 6 ... None []
6 959d2ece7722c77132385313994d2286f0c67771b5e358... 7 ... [88ad4518-5a59-4902-8188-83a9325a0ad1] []
7 0022307a5a33bdaff1b8f7253abab9b629a9ddc3d8d91c... 8 ... None []
8 e8a75d7e6b6d2775cf2a15fdb85fcf157b91e7d24a0504... 9 ... None []
9 7c08f37daa6a7ca2d6fb45b017bf020a30dfc48e18d638... 10 ... None []
10 7106cedfc5aa145d96c1702893d6eea308e9716ca96d35... 11 ... None []
11 13ef805b5cf79d075a8f4a8e32d8725e4182cfad830486... 12 ... [81d382f0-a9f2-42e1-93ac-3ccfeed66ebf] []
12 8f550d4a33ae24fd7714d2cbf916191a2f38d0858e9080... 13 ... None []
13 3bfe0728d0f3a1e8849ab6f674ee5a306fc56b0c62e5a7... 14 ... None []
14 0e0a282d7502845686a91ea6a1a291993603636a845022... 15 ... [c28aeb97-98b4-4899-b241-24f6a52763d5, 885b5ba... []
15 3d84489c0be1064ccb4c6df8e3575d05744a0d22290a7c... 16 ... [72813675-177d-49ac-8a90-ef63198a7259, a946938... []
16 ab366da07d9f56b5f77c934da766c61839b5654cfba094... 17 ... [e3e39426-7dec-4fa3-aed8-d830adb1cc0c, c20f645... []
17 8c6dec0c51e945fcceae8251235aa9d768c8d9c41c7f06... 18 ... None []
18 7a9311b7e329ae1fac52e6ac695fc4ed9d48febf2bcff4... 19 ... [faffda5d-ce3f-40f6-9332-eb43c339b5f6, 7e2f4ea... []
19 a227efd3cc209ea9cfb69a943b753b820ecd2eb51347a9... 20 ... [e3e39426-7dec-4fa3-aed8-d830adb1cc0c, 18c7142... []
20 b759ca5379d5a531ef50ef76d61698acaee3036cf6cac1... 21 ... None []
21 bfcc6aef4e7efdd5c32b0157c5691d3d2274b6ebda3d23... 22 ... [61c02d60-bee7-4395-8153-41bcfcf68b30, 43afcac... []
22 0c9cb061f158d3dd856ae99580a5f242267108a7666da3... 23 ... [c5d806d0-b83c-44f4-8412-7e1e38b4813f, 756c5fd... []
23 47cbca4e8ee4879e2a76c00f5b0a9bf574a6cb38835471... 24 ... None []
24 32031f6041976bcd6a4755f14c8cf6f5504de9b8ae1190... 25 ... [f7ff7efe-fb45-438f-a128-f4dc49473fff, 78111ef... []
25 f2cfcaa0558776e37c69b7ad7d699be6a339877d52fe66... 26 ... None []
26 a4ead601c850e33f98e9d8557c1f202e171b35a6341618... 27 ... None []
27 a1a89e8413e412e2f2cfd4593f9a6ea8da1c518e7d1210... 28 ... [e3e39426-7dec-4fa3-aed8-d830adb1cc0c, 832fa52... []
28 42edb1421d271236b16218a24c4816128915b258ac6b0d... 29 ... None []
29 65fdbee3de754a7c64bcfc64e8277e03359ac720c8dbac... 30 ... [4afccac9-5dc4-4074-a242-f0d5048d78d4, 8f141d3... []
30 dc85e50cb0aace572490a39a839e1e9aa2230ba016b1ec... 31 ... None []
31 b4eb7ad3a32605c8d8ab7974be2cb0cd8e03b2fe3b1bec... 32 ... None []
32 457a565186babdf687cd01dce30b827a6a992931bbe03f... 33 ... None []
33 92dc0829fc2a9d47ac9811e67f75bb93d92da2370a514e... 34 ... None []
34 04c8bb3d14e0ee6d2bf625cd54f57b074b81d107c5a96a... 35 ... None []
35 5bd541b658bb07ac08bbaf75e10f6d512ab7278687ba0f... 36 ... None []
36 39978da5debac56b66bb05401e2657988eac8f858ad8e9... 37 ... None []
37 d2d2f89d4b28e7c9deb3e9a4f9103deb88cffdafe9eeb9... 38 ... None []
38 9666894f76db0df7fee03437fb9427ddafd4694c5a5899... 39 ... None []
39 227b5ec1d4d4dd9e677008e41a9eaa1fc2669f824e972d... 40 ... None []
40 9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a43... 41 ... [fcf6d3c3-5719-44e6-94de-605177fd8bf0, 5dd93af... []
41 ce7a46b0af4208721555b8e02d9645d4546ae01188f645... 42 ... []
[42 rows x 8 columns]
🚀 create_community_reports
id human_readable_id community level ... findings full_content_json period size
0 466d032308684f4eb43cdfcf61923833 6 6 1 ... [{'explanation': 'Scrooge McDuck is the centra... {\n "title": "Merry Christmas Community",\n... 2025-03-20 6
1 c6f1e4ede43542c1ba135436d79f44b0 7 7 1 ... [{'explanation': 'The community is centered ar... {\n "title": "Spectral Figures and Their Im... 2025-03-20 4
2 b236c7067efa454c9019e6af402a7071 8 8 1 ... [{'explanation': 'Phantom is a supernatural en... {\n "title": "Phantom and Spirit Community"... 2025-03-20 2
3 ea941310a2204fa3a30eef206837b802 9 9 1 ... [{'explanation': 'Bob Cratchit is the central ... {\n "title": "Bob Cratchit and Tiny Tim",\n... 2025-03-20 2
4 d03322a4ff3f470dbe86c52a3670b8ba 10 10 1 ... [{'explanation': 'Tiny Tim is the central enti... {\n "title": "Tiny Tim and Christmas Commun... 2025-03-20 4
5 3767297fb7c84a97a3ce3bfaab38cb76 11 11 1 ... [{'explanation': 'Christmas is a central event... {\n "title": "Christmas in the Park Communi... 2025-03-20 5
6 dd088106615e4cf082abd7ab9b470a6b 12 12 1 ... [{'explanation': 'Mrs. Cratchit is the central... {\n "title": "Cratchit Family Community",\n... 2025-03-20 4
7 1379244e9b234efe911a5367c5e81e1b 0 0 0 ... [{'explanation': 'The community is centered ar... {\n "title": "Spectral Figures Community",\... 2025-03-20 12
8 df029480069a406db0f18001f1629427 1 1 0 ... [{'explanation': 'The warehouse is the central... {\n "title": "The Warehouse Community",\n ... 2025-03-20 6
9 43a36b7781de496babae3129c797c30c 2 2 0 ... [{'explanation': 'Tiruzia's dual role as a cap... {\n "title": "Tiruzia and Scrooge's Journey... 2025-03-20 9
10 f23f279f5d76430c986faaf37a316020 3 3 0 ... [{'explanation': 'The GHOST OF CHRISTMAS PAST ... {\n "title": "GHOST OF CHRISTMAS PAST and D... 2025-03-20 8
11 489681dae8dd48629593d533d1eba920 4 4 0 ... [{'explanation': 'Christmas is the central eve... {\n "title": "Christmas Community",\n "s... 2025-03-20 15
12 4f1876e9a80e4f9fb3c19fb6f675ff79 5 5 0 ... [{'explanation': 'GHOST is a spectral or ghost... {\n "title": "GHOST and Scrooge McDuck Comm... 2025-03-20 7
[13 rows x 15 columns]
⠧ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
⠙ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
⠼ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
🚀 generate_text_embeddings
{'entity.description': id embedding
0 069001ed-250b-410c-b36f-a1d7bf79060f [0.013336181640625, -0.0291748046875, -0.05917...
1 04482ef7-60e5-4ac8-b717-7d47a226acfa [0.055450439453125, 0.040435791015625, -0.0041...
2 6d57e2eb-5319-4ee1-91f6-fb0cd2ce7e4f [0.02215576171875, 0.01611328125, -0.013130187...
3 58fb472b-7d3b-4aeb-9b44-51b8d4285775 [0.07794189453125, 0.0166168212890625, -0.0199...
4 a026563c-b71b-4659-8760-6e05d3845e02 [0.017730712890625, -0.00566864013671875, -0.0...
.. ... ...
135 50aaf6f0-234b-4c2e-b393-793130305207 [-0.004299163818359375, -0.0003888607025146484...
136 9210838a-e693-42b1-988e-bb9da8f7feb1 [0.0200653076171875, 0.01190948486328125, -0.0...
137 14a216a7-71a9-470c-8d11-0512ccc89c6c [0.0288848876953125, 0.01654052734375, -0.0383...
138 c47742bc-9876-42cc-ab5e-987752e889ac [0.0245819091796875, 0.0109100341796875, -0.04...
139 46c69667-1db5-4796-b6e4-d45c69973181 [0.0063934326171875, 0.0210113525390625, -0.01...
[140 rows x 2 columns], 'text_unit.text': id embedding
0 86f2dd5aa96fe727ccca072d573c1a41243bf8cab458ff... [0.00882244607670602, 0.013220256917784392, -0...
1 0827478fb3afa0e1da102b716b2d47c62f3c15c86c545c... [-0.004922754512354879, 0.034194676118360634, ...
2 f3c1a7d0720a6a2daf78645c772ca4baab4b84d39a698f... [0.032049642734957605, 0.010633019872349617, 0...
3 ea40fa993bc68de06cf942686eddf15b52f9ac78e098be... [0.03593359136069602, 0.008688902629535127, 0....
4 caac5bc87dcc730bdcec5b004f703a0dfe16e4b607da84... [0.026931099954841033, 0.015734939263579863, -...
5 9f07c229334decfb6501f878b2cd1cc40db3b4f12a320a... [0.01665174752754792, 0.011404684926670117, 0....
6 959d2ece7722c77132385313994d2286f0c67771b5e358... [0.011896647492005778, 0.02356717046968375, 0....
7 0022307a5a33bdaff1b8f7253abab9b629a9ddc3d8d91c... [0.01573561676304961, 0.013275094558892123, 0....
8 e8a75d7e6b6d2775cf2a15fdb85fcf157b91e7d24a0504... [0.017505201375519724, 0.01798905866795857, 0....
9 7c08f37daa6a7ca2d6fb45b017bf020a30dfc48e18d638... [0.024381526236313053, 0.018257783296891532, 0...
10 7106cedfc5aa145d96c1702893d6eea308e9716ca96d35... [0.024637063050040896, 0.017124824270163742, 0...
11 13ef805b5cf79d075a8f4a8e32d8725e4182cfad830486... [0.035329911768586394, 0.019519470982169024, 0...
12 8f550d4a33ae24fd7714d2cbf916191a2f38d0858e9080... [0.022302039026842035, 0.006083013423618261, -...
13 3bfe0728d0f3a1e8849ab6f674ee5a306fc56b0c62e5a7... [0.019330726888742707, 0.014563834865372452, 0...
14 0e0a282d7502845686a91ea6a1a291993603636a845022... [0.0039116497381819285, 0.009025392863138398, ...
15 3d84489c0be1064ccb4c6df8e3575d05744a0d22290a7c... [0.02540952568921682, 0.012900552926773582, 0....
16 ab366da07d9f56b5f77c934da766c61839b5654cfba094... [0.0261935284459428, 0.0067802014030263905, 0....
17 8c6dec0c51e945fcceae8251235aa9d768c8d9c41c7f06... [0.01339923395007285, 0.0061194008282607236, 0...
18 7a9311b7e329ae1fac52e6ac695fc4ed9d48febf2bcff4... [0.017741061833400358, 0.01252833259307375, 0....
19 a227efd3cc209ea9cfb69a943b753b820ecd2eb51347a9... [0.018120733963805022, 0.022549766877192835, 0...
20 b759ca5379d5a531ef50ef76d61698acaee3036cf6cac1... [0.027002034788384983, 0.012798369941850192, -...
21 bfcc6aef4e7efdd5c32b0157c5691d3d2274b6ebda3d23... [0.020610665498970563, 0.0021312374823002117, ...
22 0c9cb061f158d3dd856ae99580a5f242267108a7666da3... [0.028324790922061485, 0.0177569040856194, 0.0...
23 47cbca4e8ee4879e2a76c00f5b0a9bf574a6cb38835471... [0.026709724579148, 0.011145545637952756, 0.00...
24 32031f6041976bcd6a4755f14c8cf6f5504de9b8ae1190... [0.011826292757805818, 0.018347961680638077, 0...
25 f2cfcaa0558776e37c69b7ad7d699be6a339877d52fe66... [0.037505592067428414, 0.019856858512984164, 0...
26 a4ead601c850e33f98e9d8557c1f202e171b35a6341618... [0.024897250710127328, 0.015699422022807222, 0...
27 a1a89e8413e412e2f2cfd4593f9a6ea8da1c518e7d1210... [0.03220576987300127, 0.001417914125662899, 0....
28 42edb1421d271236b16218a24c4816128915b258ac6b0d... [0.028903241944403954, 0.01827391824344616, 0....
29 65fdbee3de754a7c64bcfc64e8277e03359ac720c8dbac... [0.02153422784865846, 0.027529376475587234, 0....
30 dc85e50cb0aace572490a39a839e1e9aa2230ba016b1ec... [0.02385180286729025, 0.03001220922049771, 0.0...
31 b4eb7ad3a32605c8d8ab7974be2cb0cd8e03b2fe3b1bec... [0.015606448916209131, 0.022281113216908188, 0...
32 457a565186babdf687cd01dce30b827a6a992931bbe03f... [0.029310604085387507, 0.018542481679589225, 0...
33 92dc0829fc2a9d47ac9811e67f75bb93d92da2370a514e... [0.0342599856413612, 0.007563569814795791, 0.0...
34 04c8bb3d14e0ee6d2bf625cd54f57b074b81d107c5a96a... [0.029037774777091584, 0.012568371798117112, 0...
35 5bd541b658bb07ac08bbaf75e10f6d512ab7278687ba0f... [0.02732498133835736, 0.020596984156224962, 0....
36 39978da5debac56b66bb05401e2657988eac8f858ad8e9... [0.0319149538871641, -0.0040861691373549275, -...
37 d2d2f89d4b28e7c9deb3e9a4f9103deb88cffdafe9eeb9... [0.04955422809465438, 0.019333490788860347, -0...
38 9666894f76db0df7fee03437fb9427ddafd4694c5a5899... [-0.005084390877702691, 0.028559282029695203, ...
39 227b5ec1d4d4dd9e677008e41a9eaa1fc2669f824e972d... [-0.020132940565389695, 0.02148138334542391, 0...
40 9f13c11e1922d18c9c67b4fa5ec706b669202d5ee67a43... [-0.0045029408825497785, 0.023006743340924122,...
41 ce7a46b0af4208721555b8e02d9645d4546ae01188f645... [-0.004663283405378878, 0.03216074275613732, -..., 'community.full_content': id embedding
0 466d032308684f4eb43cdfcf61923833 [0.033213690256378686, -0.012462437686045082, ...
1 c6f1e4ede43542c1ba135436d79f44b0 [0.02775466684868201, 0.0330957711219679, -0.0...
2 b236c7067efa454c9019e6af402a7071 [0.032684326171875, 0.003765106201171875, 0.00...
3 ea941310a2204fa3a30eef206837b802 [0.005465319509079031, -0.013447887870398797, ...
4 d03322a4ff3f470dbe86c52a3670b8ba [0.031185337560857687, -0.013620586190781785, ...
5 3767297fb7c84a97a3ce3bfaab38cb76 [0.04635439736271185, 0.012523019580724912, -0...
6 dd088106615e4cf082abd7ab9b470a6b [0.03840538982361507, 0.0012556088735436938, -...
7 1379244e9b234efe911a5367c5e81e1b [0.03982859064200358, 0.010862728954285656, 0....
8 df029480069a406db0f18001f1629427 [0.01582786997505617, 0.0038248518327457775, 0...
9 43a36b7781de496babae3129c797c30c [0.006010545554088743, 0.001259841801072193, -...
10 f23f279f5d76430c986faaf37a316020 [0.013455390253332124, 0.026490683796043916, -...
11 489681dae8dd48629593d533d1eba920 [0.03798627471688514, 0.005354767083495608, -0...
12 4f1876e9a80e4f9fb3c19fb6f675ff79 [0.049546505930166, -0.01678367082180875, -0.0...}
⠴ GraphRAG Indexer
├── Loading Input (InputFileType.text) - 1 files loaded (0 filtered) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_base_text_units ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_documents ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── extract_graph ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── finalize_graph ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_communities ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_final_text_units ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── create_community_reports ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
├── generate_text_embeddings ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 100% 0:00:00 0:00:00
🚀 All workflows completed successfully.
查询
global
graphrag query \
--root ./ragdemo \
--method global \
--query "What are the top themes in this story?"
# 回答
SUCCESS: Global Search Response:
Okay, so I need to figure out the top themes in this story based on the provided analyst reports. Let me start by reading through each report carefully.
First, Analyst 1 has five reports, all with an importance score of 70 or 60. The highest score is 70, and the content there talks about the interactions between Scrooge McDuck and the GHOST in various settings. It mentions the Spectral Figures and their impact on the community, along with some data points like reports 4, 6, 7, 8, 9, and 10. So, the first theme seems to be the dynamics between key characters and the Spectral Figures.
Next, looking at the other reports from Analyst 1, all of them (scores 60 and below) focus on the community centered around the Spectral Figures. They discuss their roles in shaping the narrative, their connections to miners' living conditions, and their distinct roles within the story. The data points here include reports 1, 3, 5, 7, 8. This reinforces the idea that the Spectral Figures are central to the community and narrative structure.
Another theme that comes up in multiple reports is the impact of spectral and supernatural elements. The reports mention how these elements influence the community, the narrative, and potential conflicts. Data points like 1, 3, 5, 7, 8 are repeated, so this is a consistent theme across different analyses.
Additionally, the stories explore the relationships between Scrooge McDuck, the GHOST, and the miners. The interactions and their roles in the community are highlighted, which ties into the first theme about dynamics between key entities.
Putting this all together, the top themes are:
1. The dynamics between key characters (Scrooge McDuck, GHOST) and the Spectral Figures.
2. The central role of the Spectral Figures in the community and narrative.
3. The influence of spectral and supernatural elements on the story's structure and implications.
I should make sure to list the top 5 record IDs without exceeding the limit. From the reports, the IDs mentioned are 1, 3, 5, 7, 8, and 4, 6, 9, 10. I'll list the top 5, which are 1, 3, 5, 7, 8, and note the rest with "+more".
</think>
The top themes in the story are centered around the interactions and roles of key characters and elements, with a strong emphasis on community dynamics and the influence of supernatural factors. Here's a structured summary:
1. **Key Character Dynamics**: The story emphasizes the interactions between Scrooge McDuck, the GHOST, and the Spectral Figures. These relationships are pivotal in shaping the narrative and community dynamics.
2. **Central Role of Spectral Figures**: The Spectral Figures are central to the community's structure and narrative. Their roles, connections to miners' living conditions, and distinct influences are recurrent themes.
3. **Influence of Supernatural Elements**: Spectral and supernatural elements play a significant role in the story's themes, contributing to the narrative's complexity and potential for conflict.
The supporting data references include records (1, 3, 5, 7, 8) and additional ones like (4, 6, 9, 10) noted as "+more". These elements collectively highlight the story's focus on community, character interactions, and supernatural impact.
local
graphrag query \
--root ./ragdemo \
--method local \
--query "Who is Scrooge and what are his main relationships?"
# 回答
INFO: Vector Store Args: {
"default_vector_store": {
"type": "lancedb",
"db_uri": "/home/li/work/projects/githubProjects/graphrag/ragdemo/output/lancedb",
"url": null,
"audience": null,
"container_name": "==== REDACTED ====",
"database_name": null,
"overwrite": true
}
}
SUCCESS: Local Search Response:
Okay, so the user is asking about Scrooge and his main relationships. Let me start by looking at the data provided. There are two stories here, one from "The Wicked Menace" and another from "A Night Without Rain." Both stories mention Scrooge McDuck, but they have different contexts.
In "The Wicked Menace," Scrooge is the main character, a character from "The Adventures of Scrooge McDuck." He's a central figure in the stories, so I should highlight his role there. The other story, "A Night Without Rain," seems to be a continuation or a different part of the same universe, so Scrooge is still a key character there.
Looking at the entities, Scrooge is associated with the character "THE SCROOGE," which is his name. He's also connected to "THE GHOST," another entity mentioned. The relationship between Scrooge and the Ghost is central in "A Night Without Rain," where the Ghost appears and interacts with Scrooge. In "The Wicked Menace," the focus is more on Scrooge's adventures and interactions with other characters like the Phantom and the Spectral Figure.
I should mention that Scrooge is a central character in both stories, highlighting his role in the plot and his relationships with other characters. In "A Night Without Rain," the Ghost is a significant figure, so I'll note that their relationship is a key part of the story. In "The Wicked Menace," Scrooge's interactions with other characters like the Phantom and the Spectral Figure are important, even though the Ghost isn't mentioned there.
I need to structure the response into sections, maybe one for each story, but since the user is asking about Scrooge and his relationships, I'll tie them together. I'll mention Scrooge's main relationships in both contexts, noting the key interactions and the significance of the Ghost in the second story.
I should also include the data references as specified, making sure to list the relevant record IDs and add "+more" where there are more references. I'll make sure not to include irrelevant information and keep the response concise but informative.
So, putting it all together, the response will outline Scrooge's role in both stories, his relationships with the Ghost and other characters, and highlight the key points supported by the data provided.
</think>
Scrooge McDuck, portrayed in the stories "The Wicked Menace" and "A Night Without Rain," is a central character whose relationships and interactions are pivotal to the narrative. In "The Wicked Menace," Scrooge is the protagonist, a character from "The Adventures of Scrooge McDuck," known for his eccentric behavior and his relationship with the Phantom and the Spectral Figure. His interactions with these characters drive his character development and the plot's progression.
In "A Night Without Rain," Scrooge's relationship with the Ghost becomes a central theme. The Ghost, a spectral figure, interacts with Scrooge, adding a supernatural element to the story. Their relationship is significant, as it highlights Scrooge's emotional vulnerability and his connection to the supernatural world. The Ghost's presence underscores Scrooge's internal struggles and his evolving understanding of his true self.
Key relationships for Scrooge include:
1. **The Phantom**: A character who appears briefly in "The Wicked Menace," offering guidance and insight into Scrooge's actions.
2. **The Spectral Figure**: Another character from "The Wicked Menace," who influences Scrooge's decisions and self-perception.
3. **The Ghost**: A spectral being introduced in "A Night Without Rain," highlighting Scrooge's emotional and psychological journey.
These relationships are supported by data references such as [Data: Sources (17), Stories (1, 2); Entities (36, 40, 90, 122, 123); Relationships (7, 81, 118, 119, 120, 121); Claims (2, 7, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258,
实战
跑个三体看看效果
mkdir -p ./three-body/input
cp three-body.txt ./three-body/input
graphrag init --root ./three-body
graphrag index --root ./three-body
查询
graphrag query --root ./three-body --method local --query "汪淼最后怎么样了?"
INFO: Vector Store Args: {
"default_vector_store": {
"type": "lancedb",
"db_uri": "/home/li/work/projects/githubProjects/graphrag/three-body-en/output/lancedb",
"url": null,
"audience": null,
"container_name": "==== REDACTED ====",
"database_name": null,
"overwrite": true
}
}
SUCCESS: Local Search Response:
嗯,用户问的是“汪淼最后怎么样了?”。首先,我需要从提供的数据中找到相关信息。数据里提到汪淼是天文学家,参与了多个项目,比如“中华二号”高能加速器项目,还涉及纳米材料的研究。他和叶文洁讨论过纳米项目,结果项目被停了。
然后,汪淼还参与了科学边界的讨论,可能涉及一些争议。数据中提到他和申玉菲有矛盾,导致项目停摆。还有,他和张教授一起拍摄照片,讨论科学问题。
另外,汪淼有女儿豆豆,可能在教育方面有影响。数据里提到他对女儿的教育方式,可能有些极端,只教理论,没有实际应用。
最后,汪淼还涉及过预测宇宙背景辐射的事情,可能有科学上的争议。综合这些信息,汪淼最后可能因为项目停摆和科学上的争议而受到影响,但具体结果没有明确提到,所以可能需要推测他最终没有继续这些项目。
</think>
汪淼是一位多才多艺且涉及科学、文学、工程等多个领域的个体。根据数据,他是一位天文学家,参与了多个项目,包括“中华二号”高能加速器项目、《三体》中的角色设计、以及与叶文洁讨论的“纳米材料”项目。此外,他还在科学边界学会中讨论了宇宙背景辐射的观测问题,并与张教授一起拍摄了照片,讨论科学问题。
汪淼的教育背景也值得一提,他是一位普通劳动力,但因其女儿豆豆的存在,参与了多方面的教育和研究活动。然而,这些活动也伴随着一些争议,例如他与叶文洁的矛盾,以及他与申玉菲的讨论,这些都可能影响了他最终的事业发展。
综合来看,汪淼的最终命运并未在数据中明确给出,但可以推测他可能因项目停摆和科学争议而未能继续这些高风险的活动。
graphrag query --root ./three-body --method global --query "叶文洁是谁?"
SUCCESS: Global Search Response:
好,我现在要回答用户的问题:“叶文洁是谁?”首先,我需要查看提供的分析师报告,找出关于叶文洁的详细信息。
首先,查看各个分析师的报告。分析师3有四份报告,都提到叶文洁是中国社会66岁的女性,《天iverse》的作者之一,与叶文强有长期婚姻关系,参与电视剧制作和投资。这些信息显示她不仅在电视剧领域有影响力,还涉及投资决策。
然后,分析师2也有两份报告,同样提到叶文洁是中国科学院空间观测中心的项目负责人,负责射电天线阵列的建设与维护,与汪淼有科学合作。这显示她在科学领域也有重要贡献,特别是在射电天文学方面。
接下来,分析师4有四份报告,其中提到叶文洁是地球三体组织的核心人物,与不同派系如救赎派、终极战争和东方人有复杂关系,影响组织的稳定和未来代人。这显示她不仅在电视剧中,还可能涉及组织或政治角色。
还有其他报告提到叶文洁与叶文强的婚姻关系复杂,涉及文化差异和法律问题,以及她在电视剧中的投资行为和决策对产业的影响。这些信息丰富了她的个人生活和社会影响。
综合来看,叶文洁是一位多才多艺的人物,涉及电视剧制作、投资、科学合作和组织活动等多个领域。她不仅在文化作品中活跃,还在科学和组织层面有重要影响。
现在,我需要将这些信息整理成一个连贯的回答,确保涵盖她的身份、工作领域、关系以及影响,同时注意引用正确的报告编号,并避免超过五条记录。
</think>
叶文洁是中国社会66岁的女性,是一位多才多艺的人物,涉及电视剧制作、投资和科学合作等多个领域。她与叶文强有长期的婚姻关系,并在电视剧《天iverse》中担任制作人、制片人和投资者,其投资行为对电视剧的市场推广产生了重要影响。叶文洁还是一响。她的工作不仅在电视剧产业中具有示范作用,还在射电天文学和组织活动领域展现了多方面的影响。
果然7B的模型还是太拉了
可视化
graphrag
经过索引后会生成lancedb
和parquet
数据文件,我们可以通过neo4j
查看一下数据
安装Neo4j
挂载路径改一下就行
services:
# neo4j
neo4j:
image: neo4j
container_name: neo4j
restart: always
ports:
- 7474:7474
- 7687:7687
environment:
NEO4J_apoc_export_file_enabled: true
NEO4J_apoc_import_file_enabled: true
NEO4J_apoc_import_file_use__neo4j__config: true
NEO4J_PLUGINS: '["apoc"]'
NEO4J_AUTH: none
networks:
- qingchen-network
volumes:
- /$QINGCHEN_DOCKER/volume/neo4j/data:/data
- /$QINGCHEN_DOCKER/volume/neo4j/logs:/logs
# - /$QINGCHEN_DOCKER/volume/neo4j/conf:/conf
networks:
qingchen-network:
name: qingchen-network
driver: bridge
docker-compose up -d .
数据导入
改一下配置和路径就行
# @Author : ljl
# @Time : 2025/3/17 上午11:15
import pandas as pd
from neo4j import GraphDatabase
import time
NEO4J_URI = "neo4j://cc:7687"
NEO4J_USERNAME = "neo4j"
NEO4J_PASSWORD = ""
NEO4J_DATABASE = "neo4j"
driver = GraphDatabase.driver(NEO4J_URI, auth=(NEO4J_USERNAME, NEO4J_PASSWORD))
GRAPHRAG_FOLDER = "./three-body/output"
statements = """
create constraint chunk_id if not exists for (c:__Chunk__) require c.id is unique;
create constraint document_id if not exists for (d:__Document__) require d.id is unique;
create constraint entity_id if not exists for (c:__Community__) require c.community is unique;
create constraint entity_id if not exists for (e:__Entity__) require e.id is unique;
create constraint entity_title if not exists for (e:__Entity__) require e.name is unique;
create constraint entity_title if not exists for (e:__Covariate__) require e.title is unique;
create constraint related_id if not exists for ()-[rel:RELATED]->() require rel.id is unique;
""".split(";")
for statement in statements:
if len((statement or "").strip()) > 0:
print(statement)
driver.execute_query(statement)
def batched_import(statement, df, batch_size=1000):
"""
Import a dataframe into Neo4j using a batched approach.
Parameters: statement is the Cypher query to execute, df is the dataframe to import, and batch_size is the number of rows to import in each batch.
"""
total = len(df)
start_s = time.time()
for start in range(0, total, batch_size):
batch = df.iloc[start: min(start + batch_size, total)]
result = driver.execute_query("UNWIND $rows AS value " + statement,
rows=batch.to_dict('records'),
database_=NEO4J_DATABASE)
print(result.summary.counters)
print(f'{total} rows in {time.time() - start_s} s.')
return total
doc_df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/documents.parquet', columns=["id", "title"])
doc_df.head(2)
# import documents
statement = """
MERGE (d:__Document__ {id:value.id})
SET d += value {.title}
"""
batched_import(statement, doc_df)
text_df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/text_units.parquet',
columns=["id", "text", "n_tokens", "document_ids"])
text_df.head(2)
statement = """
MERGE (c:__Chunk__ {id:value.id})
SET c += value {.text, .n_tokens}
WITH c, value
UNWIND value.document_ids AS document
MATCH (d:__Document__ {id:document})
MERGE (c)-[:PART_OF]->(d)
"""
batched_import(statement, text_df)
entity_df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/entities.parquet', columns=["title", "type", "description", "human_readable_id", "id", "text_unit_ids"])
entity_df.head(2)
entity_statement = """
MERGE (e:__Entity__ {id:value.id})
SET e += value {.human_readable_id, .description, name:replace(value.title,'"','')}
WITH e, value
CALL apoc.create.addLabels(e, case when coalesce(value.type,"") = "" then [] else [apoc.text.upperCamelCase(replace(value.type,'"',''))] end) yield node
UNWIND value.text_unit_ids AS text_unit
MATCH (c:__Chunk__ {id:text_unit})
MERGE (c)-[:HAS_ENTITY]->(e)
"""
batched_import(entity_statement, entity_df)
rel_df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/relationships.parquet',
columns=["source", "target", "id", "combined_degree", "weight", "human_readable_id", "description",
"text_unit_ids"])
rel_df.head(2)
rel_statement = """
MATCH (source:__Entity__ {name:replace(value.source,'"','')})
MATCH (target:__Entity__ {name:replace(value.target,'"','')})
// not necessary to merge on id as there is only one relationship per pair
MERGE (source)-[rel:RELATED {id: value.id}]->(target)
SET rel += value {.combined_degree, .weight, .human_readable_id, .description, .text_unit_ids}
RETURN count(*) as createdRels
"""
batched_import(rel_statement, rel_df)
community_df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/communities.parquet',
columns=["id", "level", "title", "text_unit_ids", "relationship_ids"])
community_df.head(2)
statement = """
MERGE (c:__Community__ {community:value.id})
SET c += value {.level, .title}
/*
UNWIND value.text_unit_ids as text_unit_id
MATCH (t:__Chunk__ {id:text_unit_id})
MERGE (c)-[:HAS_CHUNK]->(t)
WITH distinct c, value
*/
WITH *
UNWIND value.relationship_ids as rel_id
MATCH (start:__Entity__)-[:RELATED {id:rel_id}]->(end:__Entity__)
MERGE (start)-[:IN_COMMUNITY]->(c)
MERGE (end)-[:IN_COMMUNITY]->(c)
RETURn count(distinct c) as createdCommunities
"""
batched_import(statement, community_df)
community_report_df = pd.read_parquet(f'{GRAPHRAG_FOLDER}/community_reports.parquet',
columns=["id", "community", "level", "title", "summary", "findings", "rank",
"rating_explanation", "full_content"])
community_report_df.head(2)
# import communities
community_statement = """MATCH (c:__Community__ {community: value.community})
SET c += value {.level, .title, .rank, .rating_explanation, .full_content, .summary}
WITH c, value
UNWIND range(0, size(value.findings)-1) AS finding_idx
WITH c, value, finding_idx, value.findings[finding_idx] as finding
MERGE (c)-[:HAS_FINDING]->(f:Finding {id: finding_idx})
SET f += finding"""
batched_import(community_statement, community_report_df)
效果
原理探究
-
索引建立(index time)
索引建立阶段,属于数据预处理阶段,主要目的是从提供的文档集合中,提取出知识图谱(Knowledge Graph),然后以聚类算法(Leiden),将知识图谱分为数个社区(community),并总结每个社区(community)所表达的含义(community summary)。
-
查询(query time)
查询阶段,是建立在索引建立的阶段基础上,GraphRAG系统的终端用户,在此阶段加入进来,并向系统提供查询指令Instruct。GraphRAG将用户Instruct与每个社区的community summary进行相似度匹配,并将匹配结果作为最终喂给大模型的prompt的上下文(context),以生成返回给用户的最终回答。