|
1 |
| -# Spring-Boot-Neo4j-Movies |
2 |
| -Spring-Boot集成Neo4j并利用Spark的朴素贝叶斯分类器实现基于电影知识图谱的智能问答系统 |
3 |
| -博客地址:https://blog.csdn.net/appleyk |
| 1 | +# Spring-Boot-KBQA |
4 | 2 |
|
| 3 | +以Spring Boot框架为载体,通过集成hanLP、neo4j、spark-mllib实现基于电影知识图谱的简易问答系统。 |
5 | 4 |
|
6 |
| -升级Spark依赖,由原来的2.3升级到2.4,GitHub官方提醒> = 1.0.0,<= 2.3.2之间的版本容易受到攻击 |
7 |
| -spark2.4 == >scala2.11 and scala2.12 |
| 5 | +首先启动springboot后在浏览器中访问8080端口,接着在网页上输入关于电影的一些问题,前端页面通过AJAX请求将问题发送到后端接口,后端接收到请求后,先加载问题模板、字典、分类模型及自定义字典;再对问题分词后利用分类模型将原问题匹配到对应的问题模板上;最后针对不同种类的问题从图数据库neo4j中查询对应的答案并返回。 |
8 | 6 |
|
| 7 | +# 数据 |
| 8 | +- mysql (/data/movie_data_import.sql) |
| 9 | +- neo4j (先将mysql的数据导出csv文件,再导入到neo4j中,有利于比较两种数据库的关系,图数据库更适合对关系的处理。也可直接将/data/import.rar压缩包内的文件直接导入到neo4j中) |
| 10 | + ``` |
9 | 11 |
|
10 |
| -<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-core --> |
11 |
| -<dependency> |
12 |
| - <groupId>org.apache.spark</groupId> |
13 |
| - <artifactId>spark-core_2.12</artifactId> |
14 |
| - <version>2.4.0</version> |
15 |
| -</dependency> |
16 |
| -<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib --> |
17 |
| -<dependency> |
18 |
| - <groupId>org.apache.spark</groupId> |
19 |
| - <artifactId>spark-mllib_2.12</artifactId> |
20 |
| - <version>2.4.0</version> |
21 |
| - <scope>runtime</scope> |
22 |
| -</dependency> |
| 12 | + 找到neo4j的安装路径,并在D:\neo4j-community-3.4.0\目录下创建import目录 |
| 13 | + 完整路径如下D:\neo4j-community-3.4.0\import |
| 14 | + 因为neo4j支持导入csv文件,其默认目录入口是 ...\import |
23 | 15 |
|
24 | 16 |
|
| 17 | + //导入节点 电影类型 == 注意类型转换 |
| 18 | + LOAD CSV WITH HEADERS FROM "file:///genre.csv" AS line |
| 19 | + MERGE (p:Genre{gid:toInteger(line.gid),name:line.gname}) |
25 | 20 |
|
26 | 21 |
|
27 |
| -如果down下来的demo在本地无法运行,请自行降低版本,保证本地spark环境的版本号和pom中的spark依赖的jar包版本一致! |
| 22 | + //导入节点 演员信息 |
| 23 | + LOAD CSV WITH HEADERS FROM 'file:///person.csv' AS line |
| 24 | + MERGE (p:Person { pid:toInteger(line.pid),birth:line.birth, |
| 25 | + death:line.death,name:line.name, |
| 26 | + biography:line.biography, |
| 27 | + birthplace:line.birthplace}) |
| 28 | +
|
| 29 | +
|
| 30 | + // 导入节点 电影信息 |
| 31 | + LOAD CSV WITH HEADERS FROM "file:///movie.csv" AS line |
| 32 | + MERGE (p:Movie{mid:toInteger(line.mid),title:line.title,introduction:line.introduction, |
| 33 | + rating:toFloat(line.rating),releasedate:line.releasedate}) |
| 34 | +
|
| 35 | +
|
| 36 | + // 导入关系 actedin 电影是谁参演的 1对多 |
| 37 | + LOAD CSV WITH HEADERS FROM "file:///person_to_movie.csv" AS line |
| 38 | + match (from:Person{pid:toInteger(line.pid)}),(to:Movie{mid:toInteger(line.mid)}) |
| 39 | + merge (from)-[r:actedin{pid:toInteger(line.pid),mid:toInteger(line.mid)}]->(to) |
| 40 | +
|
| 41 | + //导入关系 电影是什么类型 == 1对多 |
| 42 | + LOAD CSV WITH HEADERS FROM "file:///movie_to_genre.csv" AS line |
| 43 | + match (from:Movie{mid:toInteger(line.mid)}),(to:Genre{gid:toInteger(line.gid)}) |
| 44 | + merge (from)-[r:is{mid:toInteger(line.mid),gid:toInteger(line.gid)}]->(to) |
| 45 | + ``` |
| 46 | +- 问题模板 (/data/question) |
| 47 | +- hanLP的数据 (https://github.com/hankcs/HanLP/releases 中的新数据包data-for-1.7.4.zip) |
| 48 | +- 自定义词典 (/data/自定义词典.zip解压后放到hanLP的相关目录下,具体路径参考/src/main/resources/application.properties) |
| 49 | +
|
| 50 | +
|
| 51 | +# windows下spark环境 |
| 52 | +
|
| 53 | +- [https://pan.baidu.com/s/1ZIsh5yRChR0zAJXnUui4jw](https://pan.baidu.com/s/1ZIsh5yRChR0zAJXnUui4jw) |
| 54 | +
|
| 55 | +# 参考 |
| 56 | +- [基于电影知识图谱的智能问答系统(八) -- 终极完结篇](https://blog.csdn.net/appleyk/article/details/80422055) |
0 commit comments