Page crashes after 4,500 Quora answers downloaded #97

vttoth · 2018-10-14T16:47:05Z

I am trying to download my Quora answers (over 6,000). When the count gets to 4505, the page crashes (Chrome crash, the "aw snap" message.)

eloquence · 2018-10-14T21:15:09Z

Thanks for the report, trying to reproduce now.

eloquence · 2018-10-14T23:54:13Z

Hi Viktor, the browser crash was most likely due to running out of memory. I can see if I can improve memory management during the extension run, but in the meantime, ensuring sufficient free memory before attempting the download should fix the issue.

How much is sufficient? I was able to download 6,072 answers for you on a machine with 16GB RAM, but I did have to end all other applications before it would go all the way through (it did crash on the first run, with an out of memory error from the operating system).

I do have the 6,072 answers in JSON format if that would be helpful and would be happy to email them to you, just ping me at eloquence AT gmail DOT com.

vttoth · 2018-10-15T00:55:49Z

Thanks for the response. This machine (Windows 10 64-bit) actually has 32 GB of RAM, yet Chrome crashes (with plenty of free RAM remaining). But I was able to download my stuff just fine on a Linux machine (also 32 GB). In fact, I just came back to report that fact when I saw your message. Thanks for the quick support!

vttoth · 2018-12-11T02:43:33Z

I am now running into the same issue on Linux, too. After a little less than 4000 answers, aw, snap, says Chrome. Chrome is up to date, machine has 32 GB of RAM, same issue occurs on a machine with a lot less memory, same issue occurs on Windows 10 and Windows Server 2016. Suggestion (forgive me if it is just incompatible with the plugin architecture): Would it be possible to break up the download into, say, 1000-answer chunks?

vttoth · 2018-12-11T02:44:52Z

I should have added, Chrome is up-to-date and all other plugins were disabled.

eloquence · 2018-12-11T06:49:39Z

Thanks for the report, and sorry you're now experiencing this issue on all machines. Unfortunately I don't see an obvious way to split the download into chunks. We're basically pretending to keep paging through the content the way a user would, and there does not appear to be any support for offsets in Quora's internal APIs, at least not in a way that I can determine from the highly obfuscated nature of the network requests.

There is one other technical avenue which could work, which is the https://www.quora.com/content set of pages, which is at least segmented by year. The downsides:

It's only accessible for the logged in user, which makes it hard for me, for example, to test with larger accounts (I only have a handful of answers on mine).
Each answer is its own page, which could have its own problems in terms of performance and reliability.

Since that's a possible dead end, and hard for me to test, I'm not going down that road yet, but I would encourage others to try that approach as well.

As far as I can tell, the problem with our current approach is that memory usage keeps growing with each request, even though elements are removed from the DOM as we go. I suspect there are standard optimization techniques we can use to make sure the process frees up more memory as it goes, which then would reduce the "Aw, snap!" likelihood dramatically. That seems the most fruitful avenue to dig into further, but it would take a few hours of research, so will take me a while to get into.

If you yourself are interested in poking at the extension, and would like a code walkthrough, please do let me know, and I'd be happy to assist with that.

eloquence · 2019-01-22T06:15:24Z

I did a bit more poking today to see if I can do anything in the extension itself to improve memory usage.

Unfortunately, my preliminary investigation suggests that the increased memory usage as we load more and more answers is caused by the code that Quora itself runs. Beyond just rendering the answers, it holds references to them in memory, which the extension cannot clear out.

Your best bet right now is to use Quark, which is a Firefox extension. It doesn't let you publish your answers to FreeYourStuff.cc, but it does let you download them: https://addons.mozilla.org/en-US/firefox/addon/quark/

Quark does what I'm suggesting above, which is to spider https://www.quora.com/content year-by-year and answer-by-answer. That approach is much less prone to memory leaks. I've taken a quick look at the code, and it doesn't look like it's doing anything evil. :)

The biggest problem with this approach is still that I can't easily test it with accounts other than my own, as https://www.quora.com/content is per-user, whereas the "answers" URL is public. Since I only have a few answers, I'm worried that if I switch to that approach, I'll lose the ability to test. That said, it may be worth offering it as an experimental option at least.

In any case, if you haven't already done so, it would be useful if you could give Quark a spin and let me know if it works on your account.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Page crashes after 4,500 Quora answers downloaded #97

Page crashes after 4,500 Quora answers downloaded #97

vttoth commented Oct 14, 2018

eloquence commented Oct 14, 2018

eloquence commented Oct 14, 2018

vttoth commented Oct 15, 2018

vttoth commented Dec 11, 2018

vttoth commented Dec 11, 2018

eloquence commented Dec 11, 2018

eloquence commented Jan 22, 2019 •

edited

Loading

Page crashes after 4,500 Quora answers downloaded #97

Page crashes after 4,500 Quora answers downloaded #97

Comments

vttoth commented Oct 14, 2018

eloquence commented Oct 14, 2018

eloquence commented Oct 14, 2018

vttoth commented Oct 15, 2018

vttoth commented Dec 11, 2018

vttoth commented Dec 11, 2018

eloquence commented Dec 11, 2018

eloquence commented Jan 22, 2019 • edited Loading

eloquence commented Jan 22, 2019 •

edited

Loading