diff --git a/content/5_Plat/DC-SRQ.problems.json b/content/5_Plat/DC-SRQ.problems.json index d2fbe4581a..afe5fb2528 100644 --- a/content/5_Plat/DC-SRQ.problems.json +++ b/content/5_Plat/DC-SRQ.problems.json @@ -93,11 +93,11 @@ { "uniqueId": "cf-840D", "name": "Destiny", - "url": "https://codeforces.com/problemset/problem/840/D", + "url": "https://codeforces.com/contest/840/problem/D", "source": "CF", "difficulty": "Hard", "isStarred": false, - "tags": [], + "tags": ["Wavelet"], "solutionMetadata": { "kind": "autogen-label-from-site", "site": "CF" diff --git a/content/6_Advanced/Wavelet.mdx b/content/6_Advanced/Wavelet.mdx index a01b117038..292d955421 100644 --- a/content/6_Advanced/Wavelet.mdx +++ b/content/6_Advanced/Wavelet.mdx @@ -1,19 +1,17 @@ --- id: wavelet title: 'Wavelet Tree' -author: Benjamin Qi +author: Benjamin Qi, Omri Jerbi prerequisites: - RURQ -description: "?" +description: Wavelet trees are data structures that support efficient queries for the k-th minimum element in a range by maintaining a segment tree over values instead of indices" frequency: 0 --- -## Wavelet Tree +# Wavelet Tree +Wavelet trees are data structures that support efficient queries for the k-th minimum element in a range by maintaining a segment tree over values instead of indices. - -Like a segment tree on values rather than indices. - -### Solution - Range K-th Smallest +Suppose you want to support the following queries: + +- Given a range $l$, $r$ count the number of occurrences of value $x$. +- Given a range $l$, $r$ find the $k$ smallest element + +With wavelet tree, you can easily support those queries in $log(M)$ time. +when M is the maximum value in the array. + +## Wavelet tree structure + +A wavelet tree is a binary tree where each node represents a range of values. +The root node covers the entire range, and each subsequent level splits the +range into two halves. + +We are going to maintain a segment tree over values instead of indices. Each segment will contain indices whose values lie within the segment's range. We'll save those indices in a vector. Notice that an index can be in at most $log(M)$ segments + + + +Let's say our array is: $[3,5,3,1,2,2,3,4,5,5]$ +Each node has an array representing the indecies of every number between l and r + +![Wavelet Tree Visualization](./assets/diagram.png) + + +## Solving the first type of query +**Given a range l, r count the number of occurrences of value x.** + +To calculate the number of occurrences from $𝑙$ to $𝑟$, we can use the following +formula: + +$$ +\begin{aligned} +\texttt{occurrences}(l, r) = \texttt{occurrences}(r) - \texttt{occurrences}(l) +\end{aligned} +$$ + +This reduces the problem to counting the number of occurrences in a prefix. + +One way to solve the problem is to go to the leaf node +and perform a binary search for the number of indices less than $𝑟$ +However, let's explore a different approach that can also be extended to the +second type of query. + +**A different way to find the Index of $𝑟$ in the list of vertices** + +Instead of binary searching on the leaf, we update $𝑟$ as we recurse down the +tree. +If we can determine the position (index) of $𝑟$ in the left and right +children of a node. +We can recurse down the tree and determine its position in the leaf node. + +To find the position of $𝑟$ in a node's left and right children, we need to +determine how many indices are smaller than the middle value (mid) and precede +$𝑟$. +This can be done using a prefix sum. + +Let's define: +- $c[i]$ = as 1 if $index[i]$ is smaller than mid otherwise 0 +- $prefixB[i]$ as prefix sum of $c[i]$ + +Formally + +$$ +c[i] = index[i] < mid ? 1 : 0; +prefixB[i] = prefixB[i - 1] + c[i] +$$ + + +To update $r$ as we recurse down, we do the following: +- To know the value of 𝑟 if we recurse left, we use prefixB[r] +- If we recurse right, we use 𝑟 - prefixB[r] + +## Solving the second type of query +**Given a range l, r find the k smallest element** + +We will determine whether the answer for a given node is in the left or the +right segment. +We can calculate how many times the elements within the segments' ranges appear +in our range $(l, r)$ using our first type of query. +Note that this also works for non-leaf nodes using the following formula: + +$$ +\texttt{occurrences}(l, r) = r - l +$$ + +This is similar to counting how many times a value appears up to index 𝑅 in our previous query. We did this by using the new 𝑅 value at the leaf node. But now, we consider the difference between the updated 𝑅 and 𝐿 + + +Therefore, the occurrences of the left node is + +$$ +\texttt{left\_occurrences} = prefixB[r] - prefixB[l] +$$ + + + Note that $\texttt{left\_occurrences}$ is the number of indices between l and r whose value is less than mid + + + +- If $\texttt{left\_occurrences}$ is greater or equal to $k$, it means the $k$-th smallest element is in + the left subtree. Therefore, we update our range and recurse into the left + child +- If $\texttt{left\_occurrences}$ is less than $k$, it means the + $k$-th smallest element is in the right subtree. We adjust k by subtracting + $\texttt{left\_occurrences}$ from $k$, update our range, and recurse into the right child + + + Notice we still update $l, r$ accordingly when we go left or right + + +the answer then will be the value of the node we end up on (leaf) + +## Implemention +**Time Complexity:** $\mathcal{O}(Q \cdot \log(M))$ + + + + +```cpp +#include + +using namespace std; + +struct Segment { + Segment *left = nullptr, *right = nullptr; + int l, r, mid; + bool children = false; + vector> indices; // Index, Value + vector prefixB; + + Segment(int l, int r, const vector> &indices) + : l(l), r(r), mid((r + l) / 2), indices(indices) { + calculate_b(); + } + + // Sparse since values can go up to 1e9 + void update() { + if (children) { return; } + children = true; + if (r - l > 1) { + // Split the indices for left and right child + vector> leftIndices, rightIndices; + partition_copy(indices.begin(), indices.end(), leftIndices.begin(), + rightIndices.begin(), + [this](const pair &elem) { + return elem.second < mid; + }); + + left = new Segment(l, mid, leftIndices); + right = new Segment(mid, r, rightIndices); + } + } + + // Calculates the prefix B + void calculate_b() { + int i = 1; + int j = 0; + prefixB.resize(indices.size() + 1); + for (auto [ind, val] : indices) { + if (val < mid) j++; + prefixB[i++] = j; + } + } + + int find_k_smallest(int a, int b, int k) { + update(); + if (r - l <= 1) { return l; } + + int lb = prefixB[a]; + int lr = prefixB[b]; + int inLeft = + lr - lb; // Amount of values in range (a,b) that are less the mid + + if (k <= inLeft) { + return left->find_k_smallest(lb, lr, k); // Appears in left + } else { + return right->find_k_smallest(a - lb, b - lr, + k - inLeft); // Appears in right + } + } +}; + +int main() { + int n, q; + cin >> n >> q; + + vector> indices; + for (int i = 0; i < n; ++i) { + int v; + cin >> v; + indices.emplace_back(i, v); + } + Segment seg(0, 1e9 + 2, indices); + + for (int i = 0; i < q; ++i) { + int a, b, k; + cin >> a >> b >> k; + k++; + cout << seg.find_k_smallest(a, b, k) << " "; + } +} +``` + + + + + +## Supporting updates + +Let's support updates of type: + - change value at index $i$ to $y$ + +We can traverse down to the leaf to remove the old element and also traverse down to add the new element. + +So what do the updates change? + - + Our indices vector + Our prefix vector + +To change the indices vector, what we can do is, instead of storing a vector, use a set. +Then erasing and adding values becomes easy. - +On the other hand, To change the prefix vector, since each update could change our prefix vector a lot, we can't maintain just the normal vector. What we could do is use a sparse segment tree. + - erasing and inserting can be done by just setting the value to 0 or 1 at the specific index + - querying for a prefix can be done by querying the segment tree from 0 to $i$ +This approach is not memory efficient and requires a segment tree's implementation. +A more friendly approach would be using an order statistics tree. +Such that querying for a prefix would be equivalent to order_of_key($i$) ### Problems diff --git a/content/6_Advanced/Wavelet.problems.json b/content/6_Advanced/Wavelet.problems.json index f8b1fee726..9b7ffd82a1 100644 --- a/content/6_Advanced/Wavelet.problems.json +++ b/content/6_Advanced/Wavelet.problems.json @@ -16,6 +16,42 @@ } ], "wavelet": [ + { + "uniqueId": "cf-840D", + "name": "Destiny", + "url": "https://codeforces.com/contest/840/problem/D", + "source": "CF", + "difficulty": "Normal", + "isStarred": false, + "tags": ["Wavelet"], + "solutionMetadata": { + "kind": "none" + } + }, + { + "uniqueId": "spoj-ILKQUERY2", + "name": "ILKQUERY2", + "url": "https://www.spoj.com/problems/ILKQUERY2/", + "source": "SPOJ", + "difficulty": "Normal", + "isStarred": false, + "tags": ["Wavelet"], + "solutionMetadata": { + "kind": "none" + } + }, + { + "uniqueId": "coci-20-index", + "name": "2021 - Index", + "url": "https://evaluator.hsin.hr/tasks/HONI202167index/", + "source": "COCI", + "difficulty": "Normal", + "isStarred": false, + "tags": ["Wavelet, Persistent Segtree"], + "solutionMetadata": { + "kind": "none" + } + }, { "uniqueId": "kattis-easyquery", "name": "Easy Query", diff --git a/content/6_Advanced/assets/diagram.png b/content/6_Advanced/assets/diagram.png new file mode 100644 index 0000000000..50038d2646 Binary files /dev/null and b/content/6_Advanced/assets/diagram.png differ