-
Notifications
You must be signed in to change notification settings - Fork 60
Add async support #169
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think a separate client class make sense. As for everything else, I think an approach like that of https://github.com/zytedata/python-zyte-api would be best. |
asyncio.get_running_loop() should be fine, but I'm not sure about interop with Twisted in Scrapy. Because asyncio is still optional for Scrapy, many users would use Twisted, and there might be conflicts asyncio with Twisted. Still I'm not sure we need to worry too much about it because there are several use cases of python-scrapinghub and many are outside Scrapy. If async would be optional users can simply opt-out and use blocking version without worries. I guess main problem will be that async API will be breaking for many applications. You will have to call await before each method call, users would have to update their code. It would be nice to have just one way of doing things, and make library async by default, but for backward compatibility it might make more sense to have async and non async at the same time. This will add complexity and require more development effort to maintain. So maybe indeed separate client would be better? Then we could phase out this library and replace it with new client. |
That's right, my ideas is to keep offering the old interface and allow people to migrate to the new one if they want.
Do you think it would be better to create a new library instead? I was reviewing the code and I found lot of legacy code which I'm not sure if it's still used anywhere. That would make adding new features inside this package much more difficult. |
Using the asyncio reactor in Scrapy should not be an issue, and we are promoting the asyncio Twisted reactor in Scrapy at the moment (the I don’t think maintaining both interfaces, sync and async, in the same library, will add complexity. If anything, it will remove it. We can implement new features only in the async interface, to avoid duplicate work. Or we can port them to the sync interface but have the sync interface basically call the asyncio code. In any case, I would keep it in the same library. |
I think if the interface is the same or similar, it makes sense to have it a single library. If API is significantly different, then a separate library looks good to me. It also allows to have separate releases, etc., and it makes it easier to justify not supporting all the features from the day 1, which could be an expectation from the users. I'm thinking about an example of scrapy-autoextract library, where we provided 2 ways of integration, and it seems it caused nothing but confusion. |
Well, another point for the separate library is that it gives us a chance for rebranding, i.e. |
If we scope the lib to the APIs of Scrapy Cloud then it should be Also, given that we are still using the endpoints |
Maybe, you could consider Niquests as part of the solution? Its a drop-in replacement for Requests, with both sync and async interfaces (and more...). |
Scrapy now supports calling async methods from many components. It would be great to provide the option to call Scrapinghub API using async methods to avoid either blocking calls or having to work with twisted deferreds and threads.
In order to do that, there are several design decisions that needs to be taken:
Define Async interface
AsyncScrapinghubClient
that only provides support for async methods?How to implement async methods
Possible obstacles?
If we want to use python-scrapinghub in standalone mode, will it create their own task loop? or should it expect it to be already created?
If we run it inside Scrapy, could this cause conflict issues?
I don't have much experience designing async libs so It would be great if someone with a better understanding could provide some pointers here.
The text was updated successfully, but these errors were encountered: