Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: Passing endpoint as a string for curl_cffi.requests #75

Open
ericy1000 opened this issue Mar 24, 2025 · 5 comments
Open

Question: Passing endpoint as a string for curl_cffi.requests #75

ericy1000 opened this issue Mar 24, 2025 · 5 comments
Labels
question Further information is requested

Comments

@ericy1000
Copy link

I've been trying to figure this out for hours and am hitting a brick wall.
Wondering if you have an idea on a solution?

Situation:

  • I utilize a python library for web scraping that utilizes "curl_cffi.requests" instead of regular old "requests"
  • Per my research, "curl_cffi.requests" cannot be passed an HTTPAdapter object (aka your APIGateway) and instead needs to be passed proxy_url as a dictionary (or string, as the library takes care of formatting a proxy_url string into the appropriate dictionary format)
  • I could rewrite the scraping library to utilize the "requests" library, but this is a situation I'd prefer to avoid.

Question: How can I get a URL that can be passed to "curl_cffi.requests" from the APIGateway object?

My Findings so far:

  • If I just want to utilize the endpoint in a web browser, I need to call gateway.endpoints[0] (just get the first item in the list, and append "https://" to the front and "/ProxyStage/" to visit the website in question (https://xxxxxxxxxxx.execute-api.us-east-2.amazonaws.com/ProxyStage/).
  • However, using the endpoint this way doesn't work when I plug it into the proxy_url of curl_cffi.requests.
  • In addition, if I call it in a web browser, the IP it is listed as coming from is my own IP (vs. the random IP of APIGateway). The random IP works just fine when I use APIGateway mounted to a request.sessions (aka when I call "https://api64.ipify.org" using the HTTPAdapter object mounted to sessions, I get a different IP everytime as it should).

Thank you for any help or insight you can provide (even if its simply that you can't get a proxy_url from the APIGateway and I need to rewrite the scraping library to use requests)
Love your library; hugely powerful and simplifies things immensely.

@Ge0rg3 Ge0rg3 added the question Further information is requested label Mar 24, 2025
@Ge0rg3
Copy link
Owner

Ge0rg3 commented Mar 24, 2025

Hi @ericy1000, thanks for using the project.

I'll take a longer look at this later - but yes, visiting the ProxyStage URL is enough to work. When you are testing api64.ipify.org, you will only see rotated IPs if the ApiGateway instance you created was explicitly for "https://api64.ipify.org" - each ProxyStage URL is scoped to a single domain.

Hope this helps

@ericy1000
Copy link
Author

Thank you!

Yes, I create the API gateway for specific domains that I'm trying to scrape (or in ipify.org's case, test library functionality).
Curiously, when I plug the proxystage URL into chrome, it spits back the same IP no matter how many times I refresh. Even tried it in Edge to same result so it appears that the proxystage URL rotates IPs when used in the HTTPAdapter, but not when called from a web browser.
Not sure what's happening behind the scenes suffice to say, it's not core to the problem I'm trying to solve (i.e. getting this to work as a proxy_url)
I'll probably try again tomorrow when I'm fresh and see if anything changes.

@Ge0rg3
Copy link
Owner

Ge0rg3 commented Mar 24, 2025

Hi @ericy1000, visiting the ProxyStage URLs directly should definitely rotate IPs - I have just verified this directly with webhook.site.

Please can you show how you are initialising the ApiGateway and to get the URL?

@ericy1000
Copy link
Author

Here is the code implemented in VSCode and as mentioned, the IP rotates correctly.

Image

Here is how I generate an unclosed APIGateway (which I manually close later in AWS Console)

Image
When I use the API gateway URL and add /ProxyStage/ to the end to use in a web browser for testing purposes, this is what I get

Image
The IP matches my current one (I'm connected through a VPN so no privacy issues here on my IP being visible)

Image

I've refreshed the browser several times (not shown) and even used different browser and get same result.

I read somewhere that some VPNs will forward your real IP address so maybe that's what's happening when you use the endpoint in a web browser, but not when you use it as part of an HTTPAdapter for the requests library.

This line of questioning makes me wonder if I am somehow able to get the APIGatewawy as a proxy_url as a string to work in curl_cffi.requests, will it still forward my actual IP. Plan on testing this out and will report back, but just thinking out loud at this point.

@ericy1000
Copy link
Author

Updating as I now realize why the API endpoint works fine when I call it with requests (aka random IP) vs. calling it in a website (aka returning my IP)
Per the 3rd line of the readme: "X-Forwarded-For headers are automatically randomised and applied unless given. This is because otherwise, AWS will send the client's true IP address in this header."
Doh!

Still trying to figure out how to utilize this library as part of curl_cffi.requests, which does not accept a HTTPAdapter input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants