feat: add temporary result storage and smoke test options #39

peterfoldes · 2025-06-03T21:16:44Z

$ time uv run python tests/smoke.py \
  --api-key-file ~/prod.key --wide --debug --region aws-us-west-2 --api-endpoint api.cloud.wherobots.com --version latest -s --single --presigned-url \
  "SELECT timestamp, geometry FROM wherobots_pro_data.osm.osm_all_nodes LIMIT 100"

2025-09-04 19:41:43.207 INFO                 root: Query 57f7d53f-e70d-46b8-9371-5652ba0e2b01 succeeded; full message is {'kind': 'state_updated', 'execution_id': '57f7d53f-e70d-46b8-9371-5652ba0e2b01', 'state': 'succeeded', 'result_uri': 'https://wbts-temp-user-data-aws-us-west-2-prod.s3.amazonaws.com/dfqlwcrxlk/vit9eoc0upejpv/sql-session-queries/57f7d53f-e70d-46b8-9371-5652ba0e2b01/part-00000-105dd55e-be15-49e6-aa7d-4c7235e4b9e8-c000.snappy.parquet?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Credential=ASIAUZT33PSSQSDR7UJH%2F20250905%2Fus-west-2%2Fs3%2Faws4_request&X-Amz-Date=20250905T024143Z&X-Amz-Expires=14400&X-Amz-SignedHeaders=host&X-Amz-Security-Token=IQoJb3JpZ2luX2VjEAMaCXVzLXdlc3QtMiJGMEQCICxfgi59w3A%2Ff4VXttIugjySYZXr1LGy8AK25q9Z8TYhAiBl7nVYTcpZYZnqoEioaJVLX%2FvY%2FXU5U%2Bnk26OKaHjCryqUBQhsEAAaDDMyOTg5ODQ5MTA0NSIMXbLUBVLio2dduVgyKvEENgCTkoulZfix31oz%2FkdIDXzI9zGVs3L1bllkfcFEKdZwvsytjTwilFq06mbeVxXQKQ%2F8CL2bP8MZaPtrI6nAlCIvRNX8s6%2FL3Unr5ALXwoqCUtdH3F%2BI7NpFbGk3OJgfrqep4yg39A8ZjwygcyPlcGbjTEg8hK1iKKuJjk4n%2FIEchf7QU4lnIhIvAKUZRurM7qLfvmWYIQ4QmPHLuvKB8Ih8ufEwZdlZ%2B%2BOFQbXkbXgF1EswISR86ZbBWJ7TjdVE6r88b%2FmheYmH3Emaw7O7qZhZdI70kWOXVNKmAbEupLilsBhh5ieo8s2NwLyenPzwaz121CeNSN6GyYRGh32Oae2EaKvXgVI9lTd4NsTMAURwo3MkURKBA5PL49ehjoF90z5Ls9zMpZqlGe%2FV8G5gg%2FgUzLFsWM4uPiTp3NVjG9ivl9j2MAfOdJCrbakJzZSMFsbJ6CeBXRRr0eJgtYo3VvT1ufiN4VMhcWWQN5YZk1eKArOGCcn7jEoys8tDbXl5HSroM%2BjmLca5N%2F8fzx1fRs%2FmXxanA6JjGEUJ4SjD8GYrAesW%2FtQPJ6dBhtOe7Q2aPgFwW50LznQKgV26%2BUH27OnpC0IfhReEEDoKr%2F9vgyd0Dj5TPwuURqH6rU7ZgGbiyI4K6O%2BVS8zmeWQ5pIalPFpwhj6lIjpukWzv%2Bi4ikv7V2Vug520Qw94%2FpmEUpQIaUvnA1KiX2ZGkyIAQ8i9kOF7%2BIe%2BQk%2B5rZIVfaVsG71lbXcruHZM5%2Biw7E3KZBrrQ0%2F2I2ypv62NKFrrlwgBGHaxXKn7O3X0CZnORwVgStjAalmKLT5Z3lGvYJ4mu7fcZLzDnm%2BnFBjqcAaQWSFs6fQlOvKOOzfHNCGwusyNUmfRs%2BbeqE0h3UtBbtPaQxQMCB%2F98dekptGlR59O%2BDHB4gYH7q0jXpEd3qo9m%2Fw%2BEkqbmuyMM72wxasYHAtIqptnC5OquQ7eAtBgOau5XAst0yfRg%2BaLIRCMlcMhncH9Db7ufa%2BvKhLy3a%2FgnNDCtd43X1KdRJrtjjQgWDCOUFA8UK%2BAmMJ5aIA%3D%3D&X-Amz-Signature=1f7e7d97114bd658da86c2f04d768e3bd8f64402e3b62a3017eb26474bbd0664', 'size': 2899}

sfishel18 · 2025-06-05T20:50:22Z

i'm getting some errors running this sql statement: SHOW SCHEMAS IN wherobots_open_data

it fails if i try to use either geoparquet or geojson as the storage format (ex. poetry run python tests/smoke.py --api-endpoint api.staging.wherobots.com --api-key-file ./api_key.txt "SHOW SCHEMAS IN wherobots_open_data" --store --storage-format geoparquet)

sfishel18 · 2025-06-05T20:53:34Z

so iiuc this change doesn't provide a way to get access to the results URI from the sdk, correct? but you can opt into behind-the-scenes temp storage. which you would do because it increases the limit on the number of results you can get back? and/or you get more efficient cursoring through the results because sql-session can pull them from its s3 cache based on the execution id?

sfishel18 · 2025-06-06T03:25:52Z

another interesting behavior i discovered, when passing boolean values in the execute sql web socket message, they have to be stringified. i.e. this is valid:

{
    kind: "execute_sql"
    execution_id: "bf1ffba3-bc64-485d-8bbe-475264884676"
    statement: "..."
    store: {
      "format": "geoparquet",
      "single": "true",
      "generate_presigned_url": "true"
    }
}

but this is not (note no quotes around true in store.single and store.generate_presigned_url:

{
    kind: "execute_sql"
    execution_id: "bf1ffba3-bc64-485d-8bbe-475264884676"
    statement: "..."
    store: {
      "format": "geoparquet",
      "single": true,
      "generate_presigned_url": true
    }
}

wherobots/db/connection.py

mpetazzoni · 2025-09-05T02:55:51Z

wherobots/db/connection.py

@@ -134,6 +137,9 @@ def __listen(self) -> None:
                # On a state_updated event telling us the query succeeded,
                # ask for results.
                if kind == EventKind.STATE_UPDATED:
+                    logging.info(


Do you want to keep that?

Yes and no. Since result_uri and size is currently part of the state updated event (and not the retrieve result) we might want to have a way to surface that info programatically. Right now it's an easy way to just log dump it, but honestly it's a poor way to surface it.

We should talk about really making it part of a request result call even if it's another extra round or have a nicer way to handle it from the client side.

Indeed. My recommendation is to extend the interface of Connection beyond what the DB-API interface calls for, as that part is specifically for "DB driver"-like work where you get query results directly through the Cursor from the cursor() method.

Maybe we provide another, different type of handler:

def cursor(self) -> Cursor: return Cursor(self.__execute_sql, self.__cancel_query) def stored(self, format: ResultFormat, single: bool, presigned: bool) -> StoredResult: return StoredResult(self.__execute_sql, self.__cancel_query, format, single, presigned)

Which exposes:

class StoredResult: def __init__(self, exec_fn, cancel_fn, format: ResultFormat, single: bool, presigned: bool): # ... def execute(self, operation: str, parameters: dict[str, Any] = None): # ... call exec_fn with the correct store{} parameters def get_result_url(self) -> str: # ...

wherobots/db/connection.py

peterfoldes requested review from sfishel18, mpetazzoni and zongsizhang June 3, 2025 21:16

sfishel18 previously approved these changes Jun 5, 2025

View reviewed changes

sfishel18 reviewed Jun 9, 2025

View reviewed changes

wherobots/db/connection.py Show resolved Hide resolved

mpetazzoni requested changes Jul 1, 2025

View reviewed changes

wherobots/db/connection.py Outdated Show resolved Hide resolved

wherobots/db/connection.py Show resolved Hide resolved

peterfoldes added 3 commits September 3, 2025 09:29

feat: add temporary result storage and smoke test options

b391677

feat: make result store a dataclass

43899d6

chore: increment version

d31f4e7

peterfoldes dismissed sfishel18’s stale review via d31f4e7 September 5, 2025 02:51

peterfoldes force-pushed the peter/result_store branch from 63fdf92 to d31f4e7 Compare September 5, 2025 02:51

peterfoldes requested a review from mpetazzoni September 5, 2025 02:51

mpetazzoni reviewed Sep 5, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add temporary result storage and smoke test options #39

feat: add temporary result storage and smoke test options #39

Uh oh!

peterfoldes commented Jun 3, 2025 •

edited

Loading

Uh oh!

sfishel18 commented Jun 5, 2025

Uh oh!

sfishel18 commented Jun 5, 2025

Uh oh!

sfishel18 commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpetazzoni Sep 5, 2025

Uh oh!

peterfoldes Sep 5, 2025

Uh oh!

mpetazzoni Sep 5, 2025

Uh oh!

Uh oh!

Uh oh!

feat: add temporary result storage and smoke test options #39

Are you sure you want to change the base?

feat: add temporary result storage and smoke test options #39

Uh oh!

Conversation

peterfoldes commented Jun 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sfishel18 commented Jun 5, 2025

Uh oh!

sfishel18 commented Jun 5, 2025

Uh oh!

sfishel18 commented Jun 6, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

mpetazzoni Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

peterfoldes Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

mpetazzoni Sep 5, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

peterfoldes commented Jun 3, 2025 •

edited

Loading