Skip to content

Commit 2954c63

Browse files
Merge pull request #52 from stackql/feature/deploy-updates-min
Feature/deploy updates min
2 parents 2c88afa + 8cd28a1 commit 2954c63

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

44 files changed

+1584
-59
lines changed

.gitignore

+5-1
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,14 @@ stackql-azure-cloud-shell.sh
44
stackql-google-cloud-shell.sh
55
stackql
66
/.stackql
7-
.env
7+
**/.env
88
.pypirc
99
stack/
1010
oss-activity-monitor/
1111
testcreds/
1212
*.log
13+
venv/
14+
.venv/
1315

1416
# Byte-compiled / optimized / DLL files
1517
__pycache__/
@@ -80,3 +82,5 @@ instance/
8082

8183
# Sphinx documentation
8284
docs/_build/
85+
86+
.DS_Store

.vscode/settings.json

+5
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
{
2+
"files.associations": {
3+
"*.iql": "sql"
4+
}
5+
}

CHANGELOG.md

+14
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,19 @@
11
# Changelog
22

3+
## 1.8.3 (2025-02-08)
4+
5+
- Added walkthrough for databricks bootstrap on aws.
6+
- Bugfix for expport variables on dry run.
7+
8+
## 1.8.2 (2025-01-16)
9+
10+
- Added timing output for `build`, `test` and `teardown` operations
11+
12+
## 1.8.1 (2025-01-11)
13+
14+
- Added `uuid()` templating function
15+
- Exports evaluation optimization for teardown operations
16+
317
## 1.8.0 (2024-11-09)
418

519
- Added option for command specific authentication

README.md

+6-1
Original file line numberDiff line numberDiff line change
@@ -241,8 +241,13 @@ stackql-deploy --help
241241
242242
To get started with **stackql-deploy**, install it locally using pip:
243243
244-
```
244+
```bash
245+
python3 -m venv venv
246+
source venv/bin/activate
245247
pip install -e .
248+
# ...
249+
deactivate
250+
rm -rf venv/
246251
```
247252
248253
### To Remove the Locally Installed Package

cicd/setup/setup-env.sh

+18
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
#!/usr/bin/env bash
2+
3+
CURRENT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
4+
5+
export REPOSITORY_ROOT="$(realpath ${CURRENT_DIR}/../..)"
6+
7+
python -m venv ${REPOSITORY_ROOT}/.venv
8+
9+
source ${REPOSITORY_ROOT}/.venv/bin/activate
10+
11+
pip install -r ${REPOSITORY_ROOT}/requirements.txt
12+
13+
cd ${REPOSITORY_ROOT} && python setup.py install
14+
15+
chmod +x examples/databricks/all-purpose-cluster/sec/*.sh
16+
17+
pip freeze
18+
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
# `stackql-deploy` example project for `databricks`
2+
3+
This exercise is to bootstrap a databricks / aws tenancy using `stackql-deploy`. It is an important use case for platform bootstrap and we are excited to perform it with the `stackql` toolchain. We hope you enjoy and find this valuable. Please drop us a note with your forthright opinion on this and check out our issues on github.
4+
5+
## A word of caution
6+
7+
Please take the greatest care in performing this exercise; it will incur expenses, as it involves creating (and destroying) resources which cost money. Please be aware that you **must** cancel your databricks subscription after completing this exercise, otherwise you will incur ongoing expenses. That is, do **not** skip the section [Cancel databricks subscription](#cancel-databricks-subsription). We strongly advise that you verify all resources are destroyed at the conclusion of this exercise. Web pages and certain behaviours may change, so please be thorough in your verification. We will keep this page up-to-date on a best effort basis only. It is very much a case of owner onus applies.
8+
9+
## Manual Setup
10+
11+
Dependencies:
12+
13+
- aws Account Created.
14+
- Required clickops to set up databricks on aws:
15+
- Turn on aws Marketplace `databricks` offering, using [the aws manage subscriptions page](https://console.aws.amazon.com/marketplace/home#/subscriptions), per Figure S1.
16+
- Follow the suggested setup flow as directed, from this page. These clickops steps are necessary at this time for initial account setup. The way I followed this, it created a workspace for me at setup, per Figure S3. We shall not use this one and rather, later on we shall dispose of it; because we do not trust auto-created resources out of hand. In the process of creating the databricks subscription, a second aws account is created.
17+
- Copy the databricks account id from basically any web page in the databricks console. This is done by clicking on the user icon at the top RHS and then the UI provides a copy shortcut, per Fugure U1. Save this locally for later use, expanded below.
18+
- We need the aws account id that was created for the databricks subscription. It is not exactly heralded by the web pages, nor is it actively hidden. It can be captured in a couple of places, including the databricks storage account creatted in the subscription flow, per Figure XA1. copy and save this locally for later use, expanded below.
19+
- Create a service principal to use as a "CICD agent", using the page shown in Figure S4.
20+
- Grant the CICD agent account admin role, using the page shown in Figure S5.
21+
- Create a secret for the CICD agent, using the page shown in Figure S6. At the time you create this, you will need to safely store the client secret and client id, as prompted by the web page. These will be used below.
22+
23+
Now, is is convenient to use environment variables for context. Note that for our example, there is only one aws account apropos, however this is not always the case for an active professional, so while `DATABRICKS_aws_ACCOUNT_ID` is the same as `aws_ACCOUNT_ID` here, it need not always be the case. Create a file in the path `examples/databricks/all-purpose-cluster/sec/env.sh` (relative to the root of this repository) with contents of the form:
24+
25+
```bash
26+
#!/usr/bin/env bash
27+
28+
export ASSETS_aws_REGION='us-east-1' # or wherever you want
29+
export aws_ACCOUNT_ID='<your aws account ID>'
30+
export DATABRICKS_ACCOUNT_ID='<your databricks account ID>'
31+
export DATABRICKS_aws_ACCOUNT_ID='<your databricks aws account ID>'
32+
33+
# These need to be created by clickops under [the account level user managment page](https://accounts.cloud.databricks.com/user-management).
34+
export DATABRICKS_CLIENT_ID='<your clickops created CICD agent client id>'
35+
export DATABRICKS_CLIENT_SECRET='<your clickops created CICD agent client secret>'
36+
37+
## These can be skipped if you run on [aws cloud shell](https://docs.aws.amazon.com/cloudshell/latest/userguide/welcome.html).
38+
export aws_SECRET_ACCESS_KEY='<your aws secret per aws cli>'
39+
export aws_ACCESS_KEY_ID='<your aws access key id per aws cli>'
40+
41+
```
42+
43+
## Optional step: sanity checks with stackql
44+
45+
Now, let us do some sanity checks and housekeeping with `stackql`. This is purely optional. From the root of this repository:
46+
47+
```
48+
49+
source examples/databricks/all-purpose-cluster/convenience.sh
50+
51+
stackql shell
52+
53+
```
54+
55+
This will start a `stackql` interactive shell. Here are some commands you can run (I will not place output here, that will be shared in a corresponding video):
56+
57+
58+
```sql
59+
60+
registry pull databricks_account v24.12.00279;
61+
62+
registry pull databricks_workspace v24.12.00279;
63+
64+
-- This will fail if accounts, subscription, or credentials are in error.
65+
select account_id FROM databricks_account.provisioning.credentials WHERE account_id = '<your databricks account id>';
66+
67+
68+
select account_id, workspace_name, workspace_id, workspace_status from databricks_account.provisioning.workspaces where account_id = '<your databricks account id>';
69+
70+
```
71+
72+
For extra credit, you can (asynchronously) delete the unnecessary workspace with `delete from databricks_account.provisioning.workspaces where account_id = '<your databricks account id>' and workspace_id = '<workspace id (numeric)>';`, where you obtain the workspace id from the above query. I have noted that due to some reponse caching it takes a while to disappear from select queries (much longer than disappearance from the web page), and you may want to bounce the `stackql` session to hurry things along. This is not happening on the `stackql` side, but session bouncing forces a token refresh which can help cache busting.
73+
74+
## Lifecycle management
75+
76+
Time to get down to business. From the root of this repository:
77+
78+
```bash
79+
80+
source examples/databricks/all-purpose-cluster/convenience.sh
81+
82+
source ./.venv/bin/activate
83+
84+
85+
```
86+
87+
Then, do a dry run (good for catching **some** environmental issues):
88+
89+
```bash
90+
stackql-deploy build \
91+
examples/databricks/all-purpose-cluster dev \
92+
-e aws_REGION=${ASSETS_aws_REGION} \
93+
-e aws_ACCOUNT_ID=${aws_ACCOUNT_ID} \
94+
-e DATABRICKS_ACCOUNT_ID=${DATABRICKS_ACCOUNT_ID} \
95+
-e DATABRICKS_aws_ACCOUNT_ID=${DATABRICKS_aws_ACCOUNT_ID} \
96+
--dry-run
97+
```
98+
99+
You will see a verbose rendition of what `stackql-deploy` intends to do.
100+
101+
102+
Now, let use do it for real:
103+
104+
```bash
105+
stackql-deploy build \
106+
examples/databricks/all-purpose-cluster dev \
107+
-e aws_REGION=${ASSETS_aws_REGION} \
108+
-e aws_ACCOUNT_ID=${aws_ACCOUNT_ID} \
109+
-e DATABRICKS_ACCOUNT_ID=${DATABRICKS_ACCOUNT_ID} \
110+
-e DATABRICKS_aws_ACCOUNT_ID=${DATABRICKS_aws_ACCOUNT_ID} \
111+
--show-queries
112+
```
113+
114+
The output is quite verbose, concludes in:
115+
116+
```
117+
2025-02-08 12:51:25,914 - stackql-deploy - INFO - 📤 set [databricks_workspace_id] to [482604062392118] in exports
118+
2025-02-08 12:51:25,915 - stackql-deploy - INFO - ✅ successfully deployed databricks_workspace
119+
2025-02-08 12:51:25,915 - stackql-deploy - INFO - deployment completed in 0:04:09.603631
120+
🚀 build complete
121+
```
122+
123+
Success!!!
124+
125+
We can also use `stackql-deploy` to assess if our infra is shipshape:
126+
127+
```bash
128+
stackql-deploy test \
129+
examples/databricks/all-purpose-cluster dev \
130+
-e aws_REGION=${ASSETS_aws_REGION} \
131+
-e aws_ACCOUNT_ID=${aws_ACCOUNT_ID} \
132+
-e DATABRICKS_ACCOUNT_ID=${DATABRICKS_ACCOUNT_ID} \
133+
-e DATABRICKS_aws_ACCOUNT_ID=${DATABRICKS_aws_ACCOUNT_ID} \
134+
--show-queries
135+
```
136+
137+
Again, the output is quite verbose, concludes in:
138+
139+
```
140+
2025-02-08 13:15:45,821 - stackql-deploy - INFO - 📤 set [databricks_workspace_id] to [482604062392118] in exports
141+
2025-02-08 13:15:45,821 - stackql-deploy - INFO - ✅ test passed for databricks_workspace
142+
2025-02-08 13:15:45,821 - stackql-deploy - INFO - deployment completed in 0:02:30.255860
143+
🔍 tests complete (dry run: False)
144+
```
145+
146+
Success!!!
147+
148+
Now, let us teardown our `stackql-deploy` managed infra:
149+
150+
```bash
151+
stackql-deploy teardown \
152+
examples/databricks/all-purpose-cluster dev \
153+
-e aws_REGION=${ASSETS_aws_REGION} \
154+
-e aws_ACCOUNT_ID=${aws_ACCOUNT_ID} \
155+
-e DATABRICKS_ACCOUNT_ID=${DATABRICKS_ACCOUNT_ID} \
156+
-e DATABRICKS_aws_ACCOUNT_ID=${DATABRICKS_aws_ACCOUNT_ID} \
157+
--show-queries
158+
```
159+
160+
Takes its time, again verbose, concludes in:
161+
162+
```
163+
2025-02-08 13:24:17,941 - stackql-deploy - INFO - ✅ successfully deleted aws_iam_cross_account_role
164+
2025-02-08 13:24:17,942 - stackql-deploy - INFO - deployment completed in 0:03:21.191788
165+
🚧 teardown complete (dry run: False)
166+
```
167+
168+
Success!!!
169+
170+
## Optional step: verify destruction with stackql
171+
172+
Now, let us do some sanity checks and housekeeping with `stackql`. This is purely optional. From the root of this repository:
173+
174+
```
175+
176+
source examples/databricks/all-purpose-cluster/convenience.sh
177+
178+
stackql shell
179+
180+
```
181+
182+
This will start a `stackql` interactive shell. Here are some commands you can run (I will not place output here):
183+
184+
185+
```sql
186+
187+
registry pull databricks_account v24.12.00279;
188+
189+
registry pull databricks_workspace v24.12.00279;
190+
191+
192+
193+
select account_id, workspace_name, workspace_id, workspace_status from databricks_account.provisioning.workspaces where account_id = '<your databricks account id>';
194+
195+
```
196+
197+
## Cancel databricks subsription
198+
199+
This is **very** important.
200+
201+
Go to [the aws Marketplace manage subscriptions page](https://console.aws.amazon.com/marketplace/home#/subscriptions), navigate to databricks and then cancel the subscription.
202+
203+
## Figures
204+
205+
206+
![Create aws databricks subscription](/examples/databricks/all-purpose-cluster/assets/create-aws-databricks-subscription.png)
207+
208+
**Figure S1**: Create aws databricks subscription.
209+
210+
---
211+
212+
![Awaiting aws databricks subscription resources](/examples/databricks/all-purpose-cluster/assets/awaiting-subscription-resources.png)
213+
214+
**Figure S2**: Awaiting aws databricks subscription resources.
215+
216+
---
217+
218+
![Auto provisioned workspace](/examples/databricks/all-purpose-cluster/assets/auto-provisioned-worskpace.png)
219+
220+
**Figure S3**: Auto provisioned workspace.
221+
222+
---
223+
224+
![Capture databricks account id](/examples/databricks/all-purpose-cluster/assets/capture-databricks-account-id.png)
225+
226+
**Figure U1**: Capture databricks account id.
227+
228+
---
229+
230+
![Capture cross databricks aws account id](/examples/databricks/all-purpose-cluster/assets/capture-cross-databricks-aws-account-id.png)
231+
232+
**Figure XA1**: Capture cross databricks aws account id.
233+
234+
---
235+
236+
![Create CICD agent](/examples/databricks/all-purpose-cluster/assets/create-cicd-agent.png)
237+
238+
**Figure S4**: Create CICD agent.
239+
240+
---
241+
242+
![Grant account admin to CICD agent](/examples/databricks/all-purpose-cluster/assets/grant-account-admin-cicd-agent.png)
243+
244+
**Figure S5**: Grant account admin to CICD agent.
245+
246+
---
247+
248+
![Generate secret for CICD agent](/examples/databricks/all-purpose-cluster/assets/generate-secret-ui.png)
249+
250+
**Figure S6**: Generate secret for CICD agent.
251+
252+
---
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)