Replies: 1 comment
-
MIght be worth trying -> PR with proposal is better to discuss such things, and by preparing a PR proposal you might find out more limitations/issues or that it is actually easy. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hello!
I've been trying to help my colleagues to set up
SqlToS3Operator
recently, and together we've come across the fact that this operator can't handle big tables correctly. See, it usesget_pandas_df
method to read the full table in RAM first, then loads it to S3, optionally in multiple files ifmax_rows_per_file
argument is provided. The problem is, this logic is not suitable for big tables, but can be (with not so much effort) fixed.Given that
SqlToS3Operator._get_hook()
method is designed to returnDbApiHook
instance, and that the latter has aget_pandas_df_by_chunks
method, isn't it only natural to use this method instead ofget_pandas_df
whenmax_rows_per_file
is specified for theSqlToS3Operator
?Beta Was this translation helpful? Give feedback.
All reactions