Skip to content

Commit 6c967de

Browse files
committed
更新文档
1 parent dff4531 commit 6c967de

File tree

123 files changed

+7518
-94
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

123 files changed

+7518
-94
lines changed

README.md

+20-81
Original file line numberDiff line numberDiff line change
@@ -21,107 +21,46 @@ spark-assembly-1.5.2-hadoop2.6.0.jar(下载地址: http://pan.baidu.com/s/1hrSxi
2121
SparkLearning项目带有数据,下载会比较慢,如果只想下载部分文件夹,可以使用svn。另外也在20160810弄了一个没有数据的project,方便下载:https://github.com/xubo245/SparkLearning_NoData
2222

2323
# 3.具体博客目录: #
24-
## (1).Spark基本学习篇: ##
25-
spark学习1之examples运行:http://blog.csdn.net/xubo245/article/details/48548079
26-
spark学习2之OutOfMemoryError错误的解决办法:http://blog.csdn.net/xubo245/article/details/48548507
27-
spark学习3之examples中的SparkPi:http://blog.csdn.net/xubo245/article/details/50596227
28-
spark学习4之集群上直接用scalac编译.scala出现的MissingRequirementError问题(已解决):http://blog.csdn.net/xubo245/article/details/50596822
29-
spark学习5之sbt问题:http://blog.csdn.net/xubo245/article/details/50603502
30-
spark学习6之scala版本不同的问题:http://blog.csdn.net/xubo245/article/ details/50609476
31-
spark学习7之IDEA下搭建SPark本地编译环境并上传到集群运行:http://blog.csdn.net/xubo245/article/details/50789983
32-
spark学习8之eclipse安装scala2.10和spark编译环境并上传到集群运行:http://blog.csdn.net/xubo245/article/details/50790463
33-
spark学习9之在window下进行源码编译打包:http://blog.csdn.net/xubo245/article/details/51386564
34-
spark学习10之将spark的AppName设置为自动获取当前类名:http://blog.csdn.net/xubo245/article/details/51428158
35-
spark学习11之在idea中将eclipse导入的java project改成maven project:http://blog.csdn.net/xubo245/article/details/51428502
24+
## (1).Spark基本学习篇: ##
25+
[SparkBaseLearning](docs/spark/SparkBaseLearning)
26+
3627

3728
## (2).Spark代码篇: ##
38-
Spark代码1之RDDparallelizeSaveAsFile:http://blog.csdn.net/xubo245/article/details/50791485
39-
Spark代码2之Transformation:union,distinct,join:http://blog.csdn.net/xubo245/article/details/50792201
40-
Spark代码3之Action:reduce,reduceByKey,sorted,lookup,take,saveAsTextFile:http://blog.csdn.net/xubo245/article/details/50800934
41-
Spark代码4之Spark 文件API及其对搜狗数据的操作:http://blog.csdn.net/xubo245/article/details/50801827
29+
[SparkCodeLearning](docs/Spark/SparkCodeLearning)
4230

4331

4432
## (3).Spark组件之Mllib学习篇 ##
45-
Spark中组件Mllib的学习1之Kmeans错误解决:http://blog.csdn.net/xubo245/article/details/51007690
46-
Spark中组件Mllib的学习2之MovieLensALS学习(集群run-eaxmples运行):http://blog.csdn.net/xubo245/article/details/51264145
47-
Spark中组件Mllib的学习3之用户相似度计算:http://blog.csdn.net/xubo245/article/details/51428175
48-
Spark中组件Mllib的学习4之examples中的MovieLensALS修改本地运行:http://blog.csdn.net/xubo245/article/details/51429221
49-
Spark中组件Mllib的学习5之ALS测试(apache spark):http://blog.csdn.net/xubo245/article/details/51429365
50-
Spark中组件Mllib的学习6之ALS测试(apache spark 含隐式转换):http://blog.csdn.net/xubo245/article/details/51429391
51-
Spark中组件Mllib的学习7之ALS隐式转换训练的model来预测数据:http://blog.csdn.net/xubo245/article/details/51429490
52-
Spark中组件Mllib的学习8之ALS训练的model来预测数据:http://blog.csdn.net/xubo245/article/details/51429503
53-
Spark中组件Mllib的学习9之ALS训练的model来预测数据的准确率研究:http://blog.csdn.net/xubo245/article/details/51439208
54-
Spark中组件Mllib的学习10之修改MovieLens来对movieLen中的100k数据进行预测:http://blog.csdn.net/xubo245/article/details/51439491
55-
Spark中组件Mllib的学习11之使用ALS对movieLens中一百万条(1M)数据集进行训练,并对输入的新用户数据进行电影推荐:http://blog.csdn.net/xubo245/article/details/51439920
56-
更多请见:https://github.com/xubo245/SparkLearning/tree/master/docs/Spark%20MLlib%E5%AD%A6%E4%B9%A0
33+
[MLlibLearning](docs\Spark\MLlibLearning)
5734

5835
## (4).Spark组件之SparkSQL学习篇 ##
59-
Spark组件之SparkSQL学习1之问题报错No TypeTag available for Person:http://blog.csdn.net/xubo245/article/details/51153243
60-
SparkSQL在代码库中还有不少,当时没写成博客
36+
[SparkSQLLearning](docs\Spark\SparkSQLLearning)
6137

6238
## (5).Spark组件之SparkR学习篇 ##
63-
Spark组件之SparkR学习1--安装与测试:http://blog.csdn.net/xubo245/article/details/51195287
64-
Spark组件之SparkR学习2--使用spark-submit向集群提交R代码文件dataframe.R:http://blog.csdn.net/xubo245/article/details/51199216
65-
Spark组件之SparkR学习3--使用spark-submit向集群提交R代码文件data-manipulation.R:http://blog.csdn.net/xubo245/article/details/51199813
66-
Spark组件之SparkR学习4--Eclipse下R语言环境搭建:http://blog.csdn.net/xubo245/article/details/51199918
67-
Spark组件之SparkR学习5--R语言函数调用(跨文件调用):http://blog.csdn.net/xubo245/article/details/51205276
39+
[SparkRLearning](docs\Spark\SparkRLearning)
6840

6941
## (6).Spark组件之Spark Streaming学习篇 ##
70-
Spark组件之Spark Streaming学习1--NetworkWordCount学习:http://blog.csdn.net/xubo245/article/details/51251970
71-
Spark组件之Spark Streaming学习2--StatefulNetworkWordCount 学习:http://blog.csdn.net/xubo245/article/details/51252142
72-
Spark组件之Spark Streaming学习3--结合SparkSQL的使用(wordCount):http://blog.csdn.net/xubo245/article/details/51252229
73-
Spark组件之Spark Streaming学习4--HdfsWordCount 学习:http://blog.csdn.net/xubo245/article/details/51254412
42+
[SparkStreamingLearning](docs\Spark\SparkStreamingLearning)
7443

7544
## (7). Spark组件之GraphX学习篇 ##
76-
Spark组件之GraphX学习1--入门实例Property Graph:http://blog.csdn.net/xubo245/article/details/51306975
77-
Spark组件之GraphX学习2--triplets实践:http://blog.csdn.net/xubo245/article/details/51307037
78-
Spark组件之GraphX学习3--Structural Operators:subgraph:http://blog.csdn.net/xubo245/article/details/51307162
79-
Spark组件之GraphX学习4--Structural Operators:mask:http://blog.csdn.net/xubo245/article/details/51307237
80-
Spark组件之GraphX学习5--随机图生成和消息发送aggregateMessages以及mapreduce操作(含源码分析):http://blog.csdn.net/xubo245/article/details/51307386
81-
Spark组件之GraphX学习6--随机图生成和出度入度等信息显示:http://blog.csdn.net/xubo245/article/details/51307641
82-
Spark组件之GraphX学习7--随机图生成和reduce最大或最小出度/入度/度:http://blog.csdn.net/xubo245/article/details/51307774
83-
Spark组件之GraphX学习8--随机图生成和TopK最大入度:http://blog.csdn.net/xubo245/article/details/51308278
84-
Spark组件之GraphX学习8--邻居集合:http://blog.csdn.net/xubo245/article/details/51308337
85-
Spark组件之GraphX学习9--使用pregel函数求单源最短路径:http://blog.csdn.net/xubo245/article/details/51314928
86-
Spark组件之GraphX学习10--PageRank学习和使用(From examples):http://blog.csdn.net/xubo245/article/details/51315240
87-
Spark组件之GraphX学习11--PageRank例子(PageRankAboutBerkeleyWiki):http://blog.csdn.net/xubo245/article/details/51316151
88-
Spark组件之GraphX学习12--GraphX常见操作汇总SimpleGraphX:http://blog.csdn.net/xubo245/article/details/51316317
89-
Spark组件之GraphX学习13--ConnectedComponents操作:http://blog.csdn.net/xubo245/article/details/51316654
90-
Spark组件之GraphX学习14--TriangleCount实例和分析:http://blog.csdn.net/xubo245/article/details/51317245
91-
Spark组件之GraphX学习15--we-Google.txt大图分析:http://blog.csdn.net/xubo245/article/details/51317594
92-
Spark组件之GraphX学习16--最短路径ShortestPaths:http://blog.csdn.net/xubo245/article/details/51317892
93-
Spark组件之GraphX学习20--待学习部分:http://blog.csdn.net/xubo245/article/details/51317710
45+
[GraphXLearning](docs\Spark\GraphXLearning)
9446

9547

9648
## (8).Spark-Avro学习篇 ##
97-
Spark-Avro学习1之使用SparkSQL读取AVRO文件:http://blog.csdn.net/xubo245/article/details/51295474
98-
Spark-Avro学习2之使用byDatabricksSparkAvroL读取AVRO文件:http://blog.csdn.net/xubo245/article/details/51295593
99-
Spark-Avro学习3之使用AvroCompression存储AVRO文件:http://blog.csdn.net/xubo245/article/details/51295604
100-
Spark-Avro学习4之使用AvroWritePartitioned存储AVRO文件时进行划分:http://blog.csdn.net/xubo245/article/details/51295627
101-
Spark-Avro学习5之使用AvroReadSpecifyName存储AVRO文件时指定name和namespace:http://blog.csdn.net/xubo245/article/details/51295642
102-
Spark-Avro学习6之Ubuntu下安装:http://blog.csdn.net/xubo245/article/details/51295674
103-
Spark-Avro学习7之Java Avro使用(生成code方式):http://blog.csdn.net/xubo245/article/details/51295843
104-
Spark-Avro学习8之Java Avro使用(不生成code方式):Spark-Avro学习8之Java Avro使用(不生成code方式)
105-
Spark-Avro学习9之SCALA环境下Avro使用(不生成code方式):http://blog.csdn.net/xubo245/article/details/51296717
49+
[SparkAvroLearning](docs\Spark\SparkAvroLearning)
10650

107-
## (9).Spark生态之Tachyon学习篇 ##
108-
Spark生态之Tachyon学习1---单机版搭建和运行(Alluxio):http://blog.csdn.net/xubo245/article/details/51318566
109-
Spark生态之Tachyon学习2---Spark从tachyon中读取文件(Alluxio):http://blog.csdn.net/xubo245/article/details/51318863
110-
Spark生态之Tachyon学习3---机器重启后数据存储位置的变化:http://blog.csdn.net/xubo245/article/details/51322437
111-
Spark生态之Tachyon学习4---下载源码通过maven install安装失败记录:http://blog.csdn.net/xubo245/article/details/51322911
112-
Spark生态之Tachyon学习5--tachyon的几个问题(待解决):http://blog.csdn.net/xubo245/article/details/51323101
113-
Spark生态之Tachyon学习6---集群版搭建和运行(Alluxio):http://blog.csdn.net/xubo245/article/details/51324273
114-
Spark生态之Tachyon学习7--下载源码通过maven安装成功:http://blog.csdn.net/xubo245/article/details/51325776
115-
Spark生态之Tachyon学习6---集群版搭建问题之集群无法全部启动:http://blog.csdn.net/xubo245/article/details/51325834
116-
Spark生态之Tachyon学习7---Tachyon的优点:http://blog.csdn.net/xubo245/article/details/51326644
51+
## (9).Spark生态之Alluxio(Tachyon)学习篇 ##
52+
[AlluxioLearning](docs\Spark\AlluxioLearning)
11753

11854

11955
## (10).Spark生态之spark-csv篇: ##
120-
Spark生态之Spark-csv学习1之安装和简单的examples:http://blog.csdn.net/xubo245/article/details/51184946
56+
[SparkCsvLearning](docs\Spark\SparkCsvLearning)
12157

12258
## (11).Spark疑问篇 ##
123-
Spark疑问1之如何查看sparkContext没有关闭的sc:http://blog.csdn.net/xubo245/article/details/51173463
124-
Spark疑问2之spark 丢了executor会恢复吗?:http://blog.csdn.net/xubo245/article/details/51173493
59+
[SparkQuestion](docs\Spark\SparkQuestion)
60+
61+
## (12).MLLearning: ##
62+
63+
[MLLearning](docs\Spark\MLLearning)
12564

126-
## (12).其他: ##
127-
MLlib学习文档:https://github.com/xubo245/SparkLearning/tree/master/docs/Spark%20MLlibLearning
65+
## (13). Spark源码学习
66+
[SparkSourceLearning](docs\SparkSourceLearning)

docs/Spark/AlluxioLearning/README.md

+10
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,10 @@
1+
## Spark生态之Tachyon学习篇 ##
2+
Spark生态之Tachyon学习1---单机版搭建和运行(Alluxio):http://blog.csdn.net/xubo245/article/details/51318566
3+
Spark生态之Tachyon学习2---Spark从tachyon中读取文件(Alluxio):http://blog.csdn.net/xubo245/article/details/51318863
4+
Spark生态之Tachyon学习3---机器重启后数据存储位置的变化:http://blog.csdn.net/xubo245/article/details/51322437
5+
Spark生态之Tachyon学习4---下载源码通过maven install安装失败记录:http://blog.csdn.net/xubo245/article/details/51322911
6+
Spark生态之Tachyon学习5--tachyon的几个问题(待解决):http://blog.csdn.net/xubo245/article/details/51323101
7+
Spark生态之Tachyon学习6---集群版搭建和运行(Alluxio):http://blog.csdn.net/xubo245/article/details/51324273
8+
Spark生态之Tachyon学习7--下载源码通过maven安装成功:http://blog.csdn.net/xubo245/article/details/51325776
9+
Spark生态之Tachyon学习6---集群版搭建问题之集群无法全部启动:http://blog.csdn.net/xubo245/article/details/51325834
10+
Spark生态之Tachyon学习7---Tachyon的优点:http://blog.csdn.net/xubo245/article/details/51326644
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,151 @@
1+
环境
2+
ubuntu 14.04
3+
Spark-1.5.2
4+
Tachyon-0.7.1
5+
6+
1.由于最近用的Spark-1.5.2系统默认的tachyon为0.7.1,在Spark-core的pom.xml可以查看
7+
另外虽然现在Tachyon改名为Alluxio ,但是不影响这个版本的使用
8+
9+
2.单机版搭建:
10+
下载地址:
11+
12+
https://github.com/Alluxio/alluxio/releases/tag/v0.7.1
13+
14+
下载的是:tachyon-0.7.1-bin.tar.gz
15+
16+
3.安装
17+
修改配置文件:
18+
19+
cp conf/alluxio-env.sh.template conf/alluxio-env.sh
20+
然后format和start:
21+
22+
./bin/alluxio format
23+
./bin/alluxio-start.sh local
24+
25+
查看安装情况:
26+
27+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ ./tachyon format
28+
Connecting to localhost as xubo...
29+
Warning: Permanently added 'localhost' (ECDSA) to the list of known hosts.
30+
Formatting Tachyon Worker @ xubo
31+
Connection to localhost closed.
32+
Formatting Tachyon Master @ localhost
33+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ jps
34+
12576 SparkSubmit
35+
13760 Jps
36+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ ls
37+
tachyon tachyon-mount.sh tachyon-start.sh tachyon-stop.sh tachyon-workers.sh
38+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ ./tachyon-start.sh local
39+
Killed 0 processes on xubo
40+
Killed 0 processes on xubo
41+
Connecting to localhost as xubo...
42+
Killed 0 processes on xubo
43+
Connection to localhost closed.
44+
[sudo] password for xubo:
45+
Formatting RamFS: /mnt/ramdisk (1gb)
46+
Starting master @ localhost
47+
Starting worker @ xubo
48+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ jps
49+
12576 SparkSubmit
50+
14088 Jps
51+
14058 TachyonWorker
52+
14030 TachyonMaster
53+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$
54+
55+
可以看出来master和worker已经启动了
56+
另外也可以从浏览器看:
57+
58+
http://localhost:19999/home
59+
localhost为自己的ip
60+
61+
说明安装成功
62+
63+
4. 使用
64+
(1)指令
65+
66+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ ./tachyon tfs -help
67+
Usage: java TFsShell
68+
[cat <path>]
69+
[count <path>]
70+
[ls <path>]
71+
[lsr <path>]
72+
[mkdir <path>]
73+
[rm <path>]
74+
[rmr <path>]
75+
[tail <path>]
76+
[touch <path>]
77+
[mv <src> <dst>]
78+
[copyFromLocal <src> <remoteDst>]
79+
[copyToLocal <src> <localDst>]
80+
[fileinfo <path>]
81+
[location <path>]
82+
[report <path>]
83+
[request <tachyonaddress> <dependencyId>]
84+
[pin <path>]
85+
[unpin <path>]
86+
[free <file path|folder path>]
87+
[getUsedBytes]
88+
[getCapacityBytes]
89+
[du <path>]
90+
91+
(2)测试用例
92+
93+
xubo@xubo:~/cloud/tachyon-0.7.1$ ./bin/tachyon runTest Basic CACHE_THROUGH
94+
/default_tests_files/BasicFile_CACHE_THROUGH has been removed
95+
2016-05-04 22:19:45,075 INFO (MasterClient.java:connect) - Tachyon client (version 0.7.1) is trying to connect with master @ localhost/127.0.0.1:19998
96+
2016-05-04 22:19:45,104 INFO (MasterClient.java:connect) - User registered with the master @ localhost/127.0.0.1:19998; got UserId 4
97+
2016-05-04 22:19:45,130 INFO (CommonUtils.java:printTimeTakenMs) - createFile with fileId 3 took 59 ms.
98+
2016-05-04 22:19:45,153 INFO (WorkerClient.java:connect) - Trying to get local worker host : xubo
99+
2016-05-04 22:19:45,166 INFO (WorkerClient.java:connect) - Connecting local worker @ xubo/219.219.220.222:29998
100+
2016-05-04 22:19:45,230 INFO (BlockOutStream.java:get) - Writing with local stream. tachyonFile: /default_tests_files/BasicFile_CACHE_THROUGH, blockIndex: 0, opType: CACHE_THROUGH
101+
2016-05-04 22:19:45,289 INFO (CommonUtils.java:createBlockPath) - Folder /mnt/ramdisk/tachyonworker/4 was created!
102+
2016-05-04 22:19:45,294 INFO (LocalBlockOutStream.java:<init>) - /mnt/ramdisk/tachyonworker/4/3221225472 was created! tachyonFile: /default_tests_files/BasicFile_CACHE_THROUGH, blockIndex: 0, blockId: 3221225472, blockCapacityByte: 536870912
103+
2016-05-04 22:19:45,370 INFO (CommonUtils.java:printTimeTakenMs) - writeFile to file /default_tests_files/BasicFile_CACHE_THROUGH took 239 ms.
104+
2016-05-04 22:19:45,420 INFO (CommonUtils.java:printTimeTakenMs) - readFile file /default_tests_files/BasicFile_CACHE_THROUGH took 50 ms.
105+
Passed the test!
106+
107+
(3)从本地上传文件到remote:
108+
109+
xubo@xubo:~/cloud/tachyon-0.7.1$ ./bin/tachyon tfs copyFromLocal pom.xml /
110+
Copied pom.xml to /
111+
可以在浏览器中查看
112+
113+
(4)显示文件内容:
114+
115+
xubo@xubo:~/cloud/test/tachyon$ ../../tachyon-0.7.1/bin/tachyon tfs copyFromLocal 1.txt /
116+
Copied 1.txt to /
117+
xubo@xubo:~/cloud/test/tachyon$ ../../tachyon-0.7.1/bin/tachyon tfs cat /1.txt
118+
hello tachyon
119+
1
120+
2
121+
3
122+
123+
(5)Spark调用tachyon文件:待完成,Spark出问题了,请见下一篇博文
124+
125+
(6)关闭tachyon
126+
127+
hadoop@Master:~/cloud/testByXubo/spark/tachyon/tachyon-0.7.1$ ./bin/tachyon-stop.sh
128+
Killed 1 processes on Master
129+
Killed 1 processes on Master
130+
Connecting to localhost as hadoop...
131+
jjKilled 0 processes on Master
132+
Connection to localhost closed.
133+
134+
135+
附:将tachyon脚本cp到usr的bin下:(需要本地编译,直接start的不行)
136+
137+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ sudo cp tachyon /usr/
138+
bin/ games/ include/ lib/ lib64/ local/ sbin/ share/ src/
139+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ sudo cp tachyon /usr/local/bin/
140+
xubo@xubo:~/cloud/tachyon-0.7.1/bin$ ta
141+
tabs tac tachyon tail tailf tali tangle tap2deb tap2rpm tapconvert tar tarcat targen taskset
142+
方便使用
143+
144+
145+
参考
146+
【1】http://alluxio.org/documentation/v0.7.1/
147+
【2】https://github.com/xubo245/SparkLearning
148+
【3】http://alluxio.org/documentation/v0.7.1/Running-Tachyon-Locally.html
149+
150+
151+

0 commit comments

Comments
 (0)