You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
✨ Add shutdown event and save per page option (#102)
* ✨ Add shutdown event and save per page option
* Update documentation and tests
* Lint
* Add docstring
* Bump version
* Update feature list
* Add test for shutdown
- Option to follow all links indefinitely (Crawler/Spider). WARNING: Do not use yet until https://github.com/roniemartinez/dude/pull/27 has been implemented.
116
+
- Option to follow all links indefinitely (Crawler/Spider).
117
+
- Events - attach functions to startup, pre-setup, post-setup and shutdown events.
118
+
- Option to save data on every page.
117
119
118
120
## Supported Parser Backends
119
121
@@ -219,14 +221,6 @@ Here is the summary of features supported by each parser backend.
219
221
Read the complete documentation at [https://roniemartinez.github.io/dude/](https://roniemartinez.github.io/dude/).
220
222
All the advanced and useful features are documented there.
221
223
222
-
## Support
223
-
224
-
This project is at a very early stage. This dude needs some love! ❤️
225
-
226
-
Contribute to this project by feature requests, idea discussions, reporting bugs, opening pull requests, or through Github Sponsors. Your help is highly appreciated.
The option `--save-per-page` is best used with events to make sure that connections or file handles are opened
79
+
and closed properly. Check the examples below.
80
+
45
81
## Examples
46
82
47
-
A more extensive example can be found at [examples/custom_storage.py](https://github.com/roniemartinez/dude/tree/master/examples/custom_storage.py).
83
+
A more extensive example can be found at [examples/custom_storage.py](https://github.com/roniemartinez/dude/tree/master/examples/custom_storage.py) and
--pages PAGES Maximum number of pages to crawl before exiting (default=1). This is only valid when a navigate handler is defined.
30
30
--output OUTPUT Output file. If not provided, prints into the terminal.
31
-
--format FORMAT Output file format. If not provided, uses the extension of the output file or defaults to "json". Supports "json", "yaml/yml", and "csv" but can be extended using the @save()
32
-
decorator.
31
+
--format FORMAT Output file format. If not provided, uses the extension of the output file or defaults to "json". Supports "json", "yaml/yml", and "csv" but can be extended using the @save() decorator.
33
32
--proxy-server PROXY_SERVER
34
33
Proxy server.
35
34
--proxy-user PROXY_USER
36
35
Proxy username.
37
36
--proxy-pass PROXY_PASS
38
37
Proxy password.
39
38
--follow-urls Automatically follow URLs.
39
+
--save-per-page Flag to save data on every page extraction or not. If not, saves all the data at the end.If --follow-urls is set to true, this variable will be automatically set to true.
- Option to follow all links indefinitely (Crawler/Spider). WARNING: Do not use yet until https://github.com/roniemartinez/dude/pull/27 has been implemented.
18
+
- Option to follow all links indefinitely (Crawler/Spider).
19
+
- Events - attach functions to startup, pre-setup, post-setup and shutdown events.
0 commit comments