forked from veeraravi/Spark-notes
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathspark-max.txt
64 lines (53 loc) · 1.88 KB
/
spark-max.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
Given the following DataFrame:
+----+-----+---+-----+
| uid| k| v|count|
+----+-----+---+-----+
| a|pref1| b| 168|
| a|pref3| h| 168|
| a|pref3| t| 63|
| a|pref3| k| 84|
| a|pref1| e| 84|
| a|pref2| z| 105|
+----+-----+---+-----+
How can I get the max value from uid, k but include v?
+----+-----+---+----------+
| uid| k| v|max(count)|
+----+-----+---+----------+
| a|pref1| b| 168|
| a|pref3| h| 168|
| a|pref2| z| 105|
+----+-----+---+----------+
I can do something like this but it will drop the column "v" :
df.groupBy("uid", "k").max("count")
It's the perfect example for window operators (using over function) or join.
Since you've already figured out how to use windows, I focus on join exclusively.
scala> val inventory = Seq(
| ("a", "pref1", "b", 168),
| ("a", "pref3", "h", 168),
| ("a", "pref3", "t", 63)).toDF("uid", "k", "v", "count")
inventory: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 2 more fields]
scala> val maxCount = inventory.groupBy("uid", "k").max("count")
maxCount: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 1 more field]
scala> maxCount.show
+---+-----+----------+
|uid| k|max(count)|
+---+-----+----------+
| a|pref3| 168|
| a|pref1| 168|
+---+-----+----------+
scala> val maxCount = inventory.groupBy("uid", "k").agg(max("count") as "max")
maxCount: org.apache.spark.sql.DataFrame = [uid: string, k: string ... 1 more field]
scala> maxCount.show
+---+-----+---+
|uid| k|max|
+---+-----+---+
| a|pref3|168|
| a|pref1|168|
+---+-----+---+
scala> maxCount.join(inventory, Seq("uid", "k")).where($"max" === $"count").show
+---+-----+---+---+-----+
|uid| k|max| v|count|
+---+-----+---+---+-----+
| a|pref3|168| h| 168|
| a|pref1|168| b| 168|
+---+-----+---+---+-----+