Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTTP Error 429 when fetching tweets #262

Open
erno98 opened this issue Mar 19, 2020 · 12 comments
Open

HTTP Error 429 when fetching tweets #262

erno98 opened this issue Mar 19, 2020 · 12 comments

Comments

@erno98
Copy link

erno98 commented Mar 19, 2020

I have a long list of keywords (around 700). I want to fetch all of them since February, without any other criterias. Now, I immediately get struck with "An error occured during an HTTP request: HTTP Error 429: Too Many Requests", and when I open given in link browser, everything works fine.
I tried to fetch for 1 day periods only (for example 01-02-2020 to 02-02-2020, etc.), but it still doesn't work (because of the same error). Any ideas how to solve it? I tried to sleep the script after such error, but even an hour of waiting doesn't seem to affect it in any way.

After some waiting, the script runs for around 10% of the tweets, and gets the error again.

@rbkhb
Copy link

rbkhb commented Mar 20, 2020

I have the same problem with this and all other non-API tweet scrapers at the moment. You can collect about 14,000 tweets before hitting the request limit.

@tiaringhio
Copy link

Same problem here, do you happen to know after how much time that number resets? @rbkhb

@rbkhb
Copy link

rbkhb commented Mar 20, 2020

Haven't figured that out, no

@valentin-pecten
Copy link

I have the same problem and can confirm the 14000 tweets limit. I was able to retry after a couple of minutes (5 or less) need to check the exact time.

@tiaringhio
Copy link

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

@ajithex
Copy link

ajithex commented Apr 7, 2020

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi,
I think you have search for coronavirus and u have datam can u please send to me
Thanks in advance
ajithex@gmail.com

@p-dre
Copy link

p-dre commented Apr 10, 2020

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi,
I think you have search for coronavirus and u have datam can u please send to me
Thanks in advance
ajithex@gmail.com

Hi,

I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.

@erno98
Copy link
Author

erno98 commented Apr 10, 2020

Hey @p-dre, that's a nice solution. However, I've encountered another problem - what if given query search, on one day, exceeds the 14k limit?

@asif-faizan
Copy link

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi,
I think you have search for coronavirus and u have datam can u please send to me
Thanks in advance
ajithex@gmail.com

Hi,

I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.

Because I am going to write my masterthesis about coronavirus with Twitter data, I am interested to know what your plan is. So maybe contact me
paul.drecker@stud.uni-due.de

Can u please share how you uses proxies and which proxy provider.

@p-dre
Copy link

p-dre commented Apr 20, 2020

@erno98 If you inside the package you will find a loop over the batches. I at a sleep time after 14000 tweets

@roy601912008
Copy link

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi,
I think you have search for coronavirus and u have datam can u please send to me
Thanks in advance
ajithex@gmail.com

Hi,

I think I find a solution to get more than 14000 tweets per day with a small change in the package themself. You only have to install a sleeping time after 14000 tweets. In combination with a loop over the dates and rotation over proxy, this works for me very well.

Hi, could you share your code with me since I really want to know how to set up sleep time after 14000 tweets. I have just started programming, many thanks!

@luyang1210
Copy link

luyang1210 commented Jul 10, 2020

I found a solution, not ideal but it works, maybe you can help me make it better:

# Date to start from
date_upper = datetime.datetime(2020, 3, 1)
date_lower = datetime.datetime(2020, 2, 29)

date_until = date_upper
date_start = date_lower

start_string = date_start.strftime("%Y-%m-%d")
until_string = date_until.strftime("%Y-%m-%d")

for i in range(29):
    # Create a custom search term and define the number of tweets
    tweetCriteria = got.manager.TweetCriteria().setQuerySearch(
        'Coronavirus').setSince(start_string).setUntil(until_string).setLang('it').setMaxTweets(count)
    # Call getTweets and saving in tweets
    print('--- Starting query... ---')
    tweets = got.manager.TweetManager.getTweets(tweetCriteria)
    print('--- Adding to list... ---')
    add_to_list()
    print('--- Writing JSON... ---')
    # Saving list to JSON file
    json.dump(tweet_list, open('./JSON/saver_output.json', 'w'))
    print('--- Going to sleep... ---\n\n')
    time.sleep(60*5)
    # Add 1 to date after each passage
    date_start += datetime.timedelta(days=1)
    date_until += datetime.timedelta(days=1)
    # Convert dates to string
    start_string = date_start.strftime("%Y-%m-%d")
    until_string = date_until.strftime("%Y-%m-%d")

Doing like so i was able to retrieve almost 120k tweets in a night sleep without any hiccups, i know the code could be much shorter but i wrote it just before going to bed.

Hi,
Could you share the program with me via luyang1210@gmail.com? I have come across the same issue and really want to solve it. Many thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants