Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH action to generate report #199

Draft
wants to merge 94 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 67 commits
Commits
Show all changes
94 commits
Select commit Hold shift + click to select a range
d16abb2
Inital commit to add GH action to generate report
asmacdo Sep 25, 2024
713d64c
Assume Jupyterhub Provisioning Role
asmacdo Sep 25, 2024
519360c
Fixup: indent
asmacdo Sep 25, 2024
e6f4814
Rename job
asmacdo Sep 25, 2024
72496f4
Add assumed role to update-kubeconfig
asmacdo Sep 25, 2024
8428d3a
No need to add ProvisioningRole to masters
asmacdo Sep 25, 2024
e170b59
Deploy a pod to the cluster, and schedule with Karpenter
asmacdo Sep 25, 2024
bfce046
Fixup: correct path to pod manifest
asmacdo Sep 25, 2024
0993129
Fixup again ugh, rename file
asmacdo Sep 25, 2024
87027d2
Delete Pod even if previous step times out
asmacdo Sep 25, 2024
686f686
Hack out initial du
asmacdo Oct 11, 2024
ff52971
tmp comment out job deployment, test dockerhub build
asmacdo Nov 8, 2024
ca6db89
Fixup hyphens for image name
asmacdo Nov 8, 2024
d228f9d
Write file to output location
asmacdo Nov 8, 2024
68f707f
use kubectl cp to retrieve report
asmacdo Nov 8, 2024
ad6b589
Combine run blocks to use vars
asmacdo Nov 8, 2024
f18e8b7
Mount efs and pass arg to du script
asmacdo Nov 8, 2024
387cfc1
Comment out repo pushing, lets see if the report runs
asmacdo Nov 8, 2024
04b4193
Restrict job to asmacdo for testing
asmacdo Nov 8, 2024
a443081
Sanity check. Just list the directories
asmacdo Nov 8, 2024
99ac264
Job was deployed, but never assigned to node, back to sanity check
asmacdo Nov 8, 2024
6ee89b2
change from job to pod
asmacdo Nov 8, 2024
a8f6ed3
deploy pod to same namespace as pvc
asmacdo Nov 8, 2024
664853b
Use ns in action
asmacdo Nov 8, 2024
e35c974
increase timeout to 60s
asmacdo Nov 8, 2024
a8af5f2
fixup: image name in manifest
asmacdo Nov 8, 2024
024cf6e
increase timeout to 150
asmacdo Nov 8, 2024
49c346e
override entrypoint so i can debug with exec
asmacdo Nov 8, 2024
0191c85
bound /home actually meant path was /home/home/asmacdo
asmacdo Nov 8, 2024
3eb9157
Create output dir prior to writing report
asmacdo Nov 8, 2024
676a00e
pod back to job
asmacdo Nov 11, 2024
c085751
Fixup use the correct job api
asmacdo Nov 11, 2024
3e18a37
Add namespace to pod retrieval
asmacdo Nov 11, 2024
0fa5ece
write directly to pv to test job
asmacdo Nov 11, 2024
e1ecbc3
fixup script fstring
asmacdo Nov 11, 2024
082d3cc
no retry on failure, we were spinning up 5 pods, lets just fail 1 time
asmacdo Nov 11, 2024
d46ea44
Fixup backup limit job not template
asmacdo Nov 11, 2024
965a81e
Initial report
asmacdo Nov 11, 2024
7366d2d
disable report
asmacdo Nov 11, 2024
747f0a4
deploy ec2 instance directly
asmacdo Dec 2, 2024
6156e21
Update AMI image
asmacdo Dec 2, 2024
588892c
update sg and subnet
asmacdo Dec 2, 2024
958630b
terminate even if job fails
asmacdo Dec 2, 2024
e24a666
debug: print public ip
asmacdo Dec 2, 2024
0e58f10
explicitly allocate public ip for ec2 instance
asmacdo Dec 2, 2024
5c28c0e
Add WIP scripts
asmacdo Dec 6, 2024
21811dd
rm old unused
asmacdo Dec 6, 2024
97de713
initial commit of scripts
asmacdo Dec 6, 2024
644f8c3
clean up launch script
asmacdo Dec 6, 2024
e176592
make scripe executable
asmacdo Dec 6, 2024
a101f18
fixup cleanup script
asmacdo Dec 6, 2024
615baf2
add a name to elastic ip (for easier manual cleanup)
asmacdo Dec 6, 2024
bb8f25a
Exit on fail
asmacdo Dec 6, 2024
a8a615a
Add permission for aws ec2 wait instance-status-ok
asmacdo Dec 6, 2024
8157a12
Upload scripts to instance
asmacdo Dec 6, 2024
d3f6f52
explicitly return
asmacdo Dec 6, 2024
f1f687f
output session variables to file
asmacdo Dec 11, 2024
a10bc2a
modify cleanup script to retrieve instance from temporary file
asmacdo Dec 11, 2024
7fd340d
All ec2 persmissions granted
asmacdo Dec 11, 2024
1649b35
Add EFS mount (hardcoded)
asmacdo Dec 11, 2024
30aa60c
No pager for termination
asmacdo Dec 11, 2024
cc845d4
force pseudo-terminal, otherwise hangs after yum install
asmacdo Dec 11, 2024
7854124
Add doublequotes to variable usage for proper expansion
asmacdo Dec 11, 2024
9fbad37
Fixup -t goes on ssh, not scp
asmacdo Dec 11, 2024
4fc9dde
Mount as a single command, since we dont have access to pty
asmacdo Dec 11, 2024
86e645e
add todos for manual steps
asmacdo Dec 11, 2024
c614004
Disable job for now
asmacdo Dec 11, 2024
5a207bc
Update AMI to ubuntu
asmacdo Dec 12, 2024
8ce97ee
Roll back to AL 2023
asmacdo Dec 12, 2024
f7fe412
drop gzip, just write json
asmacdo Dec 13, 2024
e9904c8
include target dir in relative paths
asmacdo Dec 13, 2024
7da2aae
Second script will not produce user report, but directory stats json
asmacdo Dec 13, 2024
41a65ed
inital algorithm hackout
asmacdo Dec 13, 2024
8eb0f06
Clean up and refactor for simplicity
asmacdo Dec 13, 2024
40947ef
Add basic tests
asmacdo Dec 13, 2024
0e9c065
test multiple directories in root
asmacdo Dec 13, 2024
e4794de
comment about [:-1]
asmacdo Dec 13, 2024
ee2c3b1
support abspaths
asmacdo Dec 14, 2024
e1dcd63
[DATALAD RUNCMD] blacken
asmacdo Dec 14, 2024
541f1f3
test propagation with files in all dirs
asmacdo Dec 14, 2024
ac364fb
Write files to disk as they are inspected
asmacdo Dec 15, 2024
05609a1
Comment out column headers in output
asmacdo Dec 15, 2024
7560db2
Write all fields for every file
asmacdo Dec 15, 2024
502ff76
Convert to reading tsv
asmacdo Dec 15, 2024
639f279
Fixup: update test to match tsv-read data
asmacdo Dec 15, 2024
96490e5
update for renamed script
asmacdo Dec 15, 2024
64d69a7
install pip
asmacdo Dec 15, 2024
6f3dae5
install parallel
asmacdo Dec 15, 2024
b38be7d
install dependencies in launch script
asmacdo Dec 15, 2024
f0b0709
Output to tmp, accept only 1 arg, target dir
asmacdo Dec 15, 2024
326bb55
add up sizes
asmacdo Dec 16, 2024
c881287
print useful info as index is created
asmacdo Dec 16, 2024
a3505f9
dont fail if output dir exists
asmacdo Dec 16, 2024
fcd9531
Create a report dict with only relevant stats
asmacdo Dec 16, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
64 changes: 1 addition & 63 deletions .aws/terraform-jupyterhub-provisioning-policies.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,69 +4,7 @@
{
"Effect": "Allow",
"Action": [
"ec2:AllocateAddress",
"ec2:AssociateAddress",
"ec2:AssociateRouteTable",
"ec2:AssociateVpcCidrBlock",
"ec2:AttachInternetGateway",
"ec2:AttachNetworkInterface",
"ec2:AuthorizeSecurityGroupEgress",
"ec2:AuthorizeSecurityGroupIngress",
"ec2:CreateInternetGateway",
"ec2:CreateLaunchTemplate",
"ec2:CreateLaunchTemplateVersion",
"ec2:CreateNatGateway",
"ec2:CreateNetworkAcl",
"ec2:CreateNetworkAclEntry",
"ec2:CreateNetworkInterface",
"ec2:CreateNetworkInterfacePermission",
"ec2:CreateRoute",
"ec2:CreateRouteTable",
"ec2:CreateSecurityGroup",
"ec2:CreateSubnet",
"ec2:CreateTags",
"ec2:CreateVpc",
"ec2:DeleteInternetGateway",
"ec2:DeleteLaunchTemplate",
"ec2:DeleteLaunchTemplateVersions",
"ec2:DeleteNatGateway",
"ec2:DeleteNetworkAcl",
"ec2:DeleteNetworkAclEntry",
"ec2:DeleteNetworkInterface",
"ec2:DeleteRoute",
"ec2:DeleteRouteTable",
"ec2:DeleteSecurityGroup",
"ec2:DeleteSubnet",
"ec2:DeleteTags",
"ec2:DeleteVpc",
"ec2:DescribeAddresses",
"ec2:DescribeAddressesAttribute",
"ec2:DescribeAvailabilityZones",
"ec2:DescribeInternetGateways",
"ec2:DescribeLaunchTemplateVersions",
"ec2:DescribeLaunchTemplates",
"ec2:DescribeNatGateways",
"ec2:DescribeNetworkAcls",
"ec2:DescribeNetworkInterfacePermissions",
"ec2:DescribeNetworkInterfaces",
"ec2:DescribeRouteTables",
"ec2:DescribeSecurityGroupRules",
"ec2:DescribeSecurityGroups",
"ec2:DescribeSubnets",
"ec2:DescribeVpcAttribute",
"ec2:DescribeVpcs",
"ec2:DetachInternetGateway",
"ec2:DetachNetworkInterface",
"ec2:DisassociateAddress",
"ec2:DisassociateRouteTable",
"ec2:DisassociateVpcCidrBlock",
"ec2:ModifyNetworkInterfaceAttribute",
"ec2:ModifyVpcAttribute",
"ec2:ReleaseAddress",
"ec2:ReplaceRoute",
"ec2:RevokeSecurityGroupEgress",
"ec2:RevokeSecurityGroupIngress",
"ec2:RunInstances",
"ec2:*",
"ecr-public:GetAuthorizationToken",
"eks:*",
"elasticfilesystem:CreateFileSystem",
Expand Down
35 changes: 35 additions & 0 deletions .github/manifests/disk-usage-report-job.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
apiVersion: batch/v1
kind: Job
metadata:
name: disk-usage-report-job
namespace: jupyterhub
spec:
backoffLimit: 0 # No retry on failure
template:
metadata:
labels:
app: disk-usage-report
spec:
containers:
- name: disk-usage-report
image: dandiarchive/dandihub-report-generator:latest
args:
- "/home/"
volumeMounts:
- name: persistent-storage
mountPath: "/home"
subPath: "home"
restartPolicy: Never
nodeSelector:
NodeGroupType: default
NodePool: default
hub.jupyter.org/node-purpose: user
tolerations:
- key: "hub.jupyter.org/dedicated"
operator: "Equal"
value: "user"
effect: "NoSchedule"
volumes:
- name: persistent-storage
persistentVolumeClaim:
claimName: efs-persist
20 changes: 20 additions & 0 deletions .github/manifests/hello-world-pod.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
# manifests/hello-world-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: hello-world-pod
spec:
containers:
- name: hello
image: busybox
command: ['sh', '-c', 'echo Hello, World! && sleep 30']
nodeSelector:
NodeGroupType: default
NodePool: default
hub.jupyter.org/node-purpose: user
tolerations:
- key: "hub.jupyter.org/dedicated"
operator: "Equal"
value: "user"
effect: "NoSchedule"

63 changes: 63 additions & 0 deletions .github/scripts/cleanup-ec2.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
#!/usr/bin/env bash

set -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -e
set -eu


# Load environment variables from the file if they are not already set
ENV_FILE=".ec2-session.env"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ENV_FILE=".ec2-session.env"
ENV_FILE="/run/user/$(id -u)/ec2-session.env"

if [ -f "$ENV_FILE" ]; then
echo "Loading environment variables from $ENV_FILE..."
source "$ENV_FILE"
else
echo "Warning: Environment file $ENV_FILE not found."
fi

# Ensure required environment variables are set
if [ -z "$INSTANCE_ID" ]; then
echo "Error: INSTANCE_ID is not set. Cannot proceed with cleanup."
exit 1
fi

if [ -z "$ALLOC_ID" ]; then
echo "Error: ALLOC_ID is not set. Cannot proceed with cleanup."
exit 1
fi

# Check for AWS CLI and credentials
if ! command -v aws &>/dev/null; then
echo "Error: AWS CLI is not installed. Please install it and configure your credentials."
exit 1
fi

if ! aws sts get-caller-identity &>/dev/null; then
echo "Error: Unable to access AWS. Ensure your credentials are configured correctly."
exit 1
fi

# Terminate EC2 instance
echo "Terminating EC2 instance with ID: $INSTANCE_ID..."
if aws ec2 terminate-instances --instance-ids "$INSTANCE_ID" --no-cli-pager; then
echo "Instance termination initiated. Waiting for the instance to terminate..."
if aws ec2 wait instance-terminated --instance-ids "$INSTANCE_ID"; then
echo "Instance $INSTANCE_ID has been successfully terminated."
else
echo "Warning: Instance $INSTANCE_ID may not have terminated correctly."
fi
else
echo "Warning: Failed to terminate instance $INSTANCE_ID. It may already be terminated."
fi

# Release Elastic IP
echo "Releasing Elastic IP with Allocation ID: $ALLOC_ID..."
if aws ec2 release-address --allocation-id "$ALLOC_ID"; then
echo "Elastic IP with Allocation ID $ALLOC_ID has been successfully released."
else
echo "Warning: Failed to release Elastic IP with Allocation ID $ALLOC_ID. It may already be released."
fi

# Cleanup environment file
if [ -f "$ENV_FILE" ]; then
echo "Removing environment file $ENV_FILE..."
rm -f "$ENV_FILE"
fi

echo "Cleanup complete."
66 changes: 66 additions & 0 deletions .github/scripts/create-file-index.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
#!/usr/bin/env python3

import os
import time
import json
import sys
import gzip
from datetime import datetime

def list_files_with_metadata(directory, output_file):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

write a simple test, could even probably be in this file, where you populate directory with nested folders and symlinks and you know the ground truth to aim for and compare against.

# Record the start time
start_time = time.time()

# Get the current date and time for indexing
index_timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")

files_metadata = []

for root, dirs, files in os.walk(directory):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FTR. walk seems to be already doing desired (right) thing and does not follow symlinked folders. We get

In [14]: list(os.walk('/tmp/1234'))
Out[14]: 
[('/tmp/1234', ['infinitum', 'subdir', 'linkgood'], []),
 ('/tmp/1234/subdir', ['subdir2'], ['file']),
 ('/tmp/1234/subdir/subdir2', [], ['file2'])]

for

❯ tree /tmp/1234
/tmp/1234
├── infinitum -> /tmp/1234
├── linkgood -> subdir
└── subdir
    ├── file
    └── subdir2
        └── file2

note: we do not monitor empty folders below

for name in files:
filepath = os.path.join(root, name)
relative_path = os.path.relpath(filepath, directory)

try:
metadata = {
"path": relative_path,
"size": os.path.getsize(filepath),
asmacdo marked this conversation as resolved.
Show resolved Hide resolved
"modified": time.ctime(os.path.getmtime(filepath)),
"created": time.ctime(os.path.getctime(filepath))
}
files_metadata.append(metadata)
except (FileNotFoundError, PermissionError) as e:
print(f"Skipping {filepath}: {e}")
asmacdo marked this conversation as resolved.
Show resolved Hide resolved

# Record the end time and calculate the duration
end_time = time.time()
duration = end_time - start_time

# Prepare the output data with additional metadata
output_data = {
"index_timestamp": index_timestamp,
"duration_seconds": duration,
"files": files_metadata
}

# Compress and write the output data to a .json.gz file
with gzip.open(output_file, "wt", encoding="utf-8") as gz_file:
json.dump(output_data, gz_file, indent=4)

print(f"Indexing completed. Compressed results written to {output_file}")

# Ensure the script is called with the required arguments
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python script.py <directory_to_index> <output_json_gz_file>")
sys.exit(1)

directory_to_index = sys.argv[1]
output_json_gz_file = sys.argv[2]

# Ensure the output filename ends with .json.gz for clarity
if not output_json_gz_file.endswith(".json.gz"):
output_json_gz_file += ".json.gz"

list_files_with_metadata(directory_to_index, output_json_gz_file)

133 changes: 133 additions & 0 deletions .github/scripts/launch-ec2.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
#!/usr/bin/env bash

set -e
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
set -e
set -eu


# Check for AWS CLI and credentials
if ! command -v aws &>/dev/null; then
echo "Error: AWS CLI is not installed. Please install it and configure your credentials."
exit 1
fi

if ! aws sts get-caller-identity &>/dev/null; then
echo "Error: Unable to access AWS. Ensure your credentials are configured correctly."
exit 1
fi

# Set variables
AWS_REGION="us-east-2"
# TODO document that this key needs to be created
KEY_NAME="dandihub-gh-actions"
# TODO create if DNE
# allow gh-actions to ssh into ec2 job instance from anywhere
SECURITY_GROUP_ID="sg-0bf2dc1c2ff9c122e"
# TODO retrieve subnet id (public, created by dandi-hub eks-dandihub-public-us-east-2a)
SUBNET_ID="subnet-0f544cca61ccd2804"
AMI_ID="ami-088d38b423bff245f"
EFS_ID="fs-02aac16c4c6c2dc27"
LOCAL_SCRIPTS_DIR=".github/scripts"
REMOTE_SCRIPTS_DIR="/home/ec2-user/scripts"
MOUNT_POINT="/mnt/efs"
ENV_FILE=".ec2-session.env"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dislike the fact that it would be just dumping into some hidden file in my current directory.
Could we make it dumped into some tmpdir and may be establish there env variable with its path so cleanup script could take it from that environment

Suggested change
ENV_FILE=".ec2-session.env"
ENV_FILE="/run/user/$(id -u)/ec2-session.env"

but then we might want to add logic to react if file already exist -- since that would mean likely that cleanup did not remove it and instance might still be running etc.


# Ensure the environment file is writable
echo "# Environment variables for EC2 session" > $ENV_FILE
echo "# Auto-generated by launch script on $(date)" >> $ENV_FILE

# Run EC2 instance
echo "Launching EC2 instance..."
export INSTANCE_ID=$(aws ec2 run-instances \
--image-id $AMI_ID \
--count 1 \
--instance-type t3.micro \
--key-name $KEY_NAME \
--security-group-ids $SECURITY_GROUP_ID \
--subnet-id $SUBNET_ID \
--tag-specifications "ResourceType=instance,Tags=[{Key=Name,Value=dandihub-gh-actions}]" \
--query 'Instances[0].InstanceId' \
--output text)

if [ -z "$INSTANCE_ID" ]; then
echo "Error: Failed to launch EC2 instance."
exit 1
fi
echo "Instance ID: $INSTANCE_ID"
echo "export INSTANCE_ID=$INSTANCE_ID" >> $ENV_FILE

# Wait for instance to initialize
echo "Waiting for instance to reach status OK..."
aws ec2 wait instance-status-ok --instance-ids "$INSTANCE_ID"

# Allocate Elastic IP
echo "Allocating Elastic IP..."
export ALLOC_ID=$(aws ec2 allocate-address \
--tag-specifications "ResourceType=elastic-ip,Tags=[{Key=Name,Value=dandihub-gh-actions-eip}]" \
--query 'AllocationId' \
--output text)

if [ -z "$ALLOC_ID" ]; then
echo "Error: Failed to allocate Elastic IP."
exit 1
fi
echo "Elastic IP Allocation ID: $ALLOC_ID"
echo "export ALLOC_ID=$ALLOC_ID" >> $ENV_FILE

# Associate Elastic IP with instance
echo "Associating Elastic IP with instance..."
export EIP_ASSOC=$(aws ec2 associate-address \
--instance-id "$INSTANCE_ID" \
--allocation-id "$ALLOC_ID" \
--query 'AssociationId' \
--output text)

if [ -z "$EIP_ASSOC" ]; then
echo "Error: Failed to associate Elastic IP."
exit 1
fi

# Get Elastic IP address
export PUBLIC_IP=$(aws ec2 describe-addresses \
--allocation-ids "$ALLOC_ID" \
--query 'Addresses[0].PublicIp' \
--output text)

echo "Elastic IP Address: $PUBLIC_IP"
echo "export PUBLIC_IP=$PUBLIC_IP" >> $ENV_FILE

# Upload scripts to EC2 instance
echo "Uploading scripts to EC2 instance..."
scp -i "$EC2_SSH_KEY" -o "StrictHostKeyChecking=no" \
$LOCAL_SCRIPTS_DIR/produce-report.py $LOCAL_SCRIPTS_DIR/create-file-index.py \
ec2-user@"$PUBLIC_IP":"$REMOTE_SCRIPTS_DIR/"

if [ $? -eq 0 ]; then
echo "Scripts uploaded successfully to $REMOTE_SCRIPTS_DIR on the instance."
else
echo "Error: Failed to upload scripts to the instance."
exit 1
fi

# TODO automate
# eks-dandihub-efs sg is created by dandi-hub install
# this sg needs to accept incoming 2049 from the sg created for this ec2
# sg-061d875722e569724 - eks-dandihub-efs
# aws ec2 authorize-security-group-ingress \
# --group-id sg-061d875722e569724 \
# --protocol tcp \
# --port 2049 \
# --source-group $SECURITY_GROUP_ID

# Mount EFS on the EC2 instance
echo "Mounting EFS on the EC2 instance..."
ssh -i "$EC2_SSH_KEY" -o "StrictHostKeyChecking=no" ec2-user@"$PUBLIC_IP" \
"sudo yum install -y amazon-efs-utils && \
sudo mkdir -p $MOUNT_POINT && \
sudo mount -t efs $EFS_ID:/ $MOUNT_POINT && \
echo '$EFS_ID:/ $MOUNT_POINT efs defaults,_netdev 0 0' | sudo tee -a /etc/fstab && \
echo 'EFS mounted at $MOUNT_POINT'"

# Output SSH command for convenience
echo "To connect to your instance, use:"
echo "ssh -i \$EC2_SSH_KEY ec2-user@$PUBLIC_IP"

echo "Environment variables saved to $ENV_FILE."
echo "Run 'source $ENV_FILE' to restore the environment variables."
Loading