Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: Add e2e tests for playwright templates and pip based templates #1136

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

Pijukatel
Copy link
Collaborator

Description

  • Add e2e tests for playwright templates
  • Add e2e tests for pip-based templates
  • Fix template issue with pip template that has additional dependencies
  • Separate all e2e tests in CI to run in it's own runner to allow selective re-triggers (e2e tests work on real network and it will always have some flakiness due to that)
  • Reduce max_requests_per_crawl to 10 in all templates to be aligned with doc examples and to make the tests faster.

Issues

@github-actions github-actions bot added this to the 111th sprint - Tooling team milestone Apr 3, 2025
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Apr 3, 2025
@Pijukatel
Copy link
Collaborator Author

@Pijukatel Pijukatel requested a review from Copilot April 3, 2025 15:16
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds end-to-end tests for both Playwright and pip-based templates while addressing an issue with dependency installations and aligning the maximum allowed crawl requests to documentation examples.

  • Added new e2e test parametrizations and markers for various template types and package managers.
  • Refactored the function patching the crawlee version to support multiple package managers and updated related tests accordingly.
  • Adjusted Dockerfile template and crawler configuration files to use a lower max_requests_per_crawl value.

Reviewed Changes

Copilot reviewed 7 out of 10 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
tests/e2e/project_template/utils.py Updated function to conditionally patch projects based on package manager and modified Dockerfile/requirements handling.
tests/e2e/project_template/test_static_crawlers_templates.py Updated test parametrizations and function calls to reflect the new patching function and limits.
tests/e2e/conftest.py Added pytest markers configuration for new test parameters.
src/crawlee/project_template/templates/main_playwright.py Lowered max_requests_per_crawl to align with documentation and test expectations.
src/crawlee/project_template/templates/main_parsel.py Lowered max_requests_per_crawl to align with documentation and test expectations.
src/crawlee/project_template/templates/main_beautifulsoup.py Lowered max_requests_per_crawl to align with documentation and test expectations.
.github/workflows/templates_e2e_tests.yaml Updated the workflow matrix and commands to incorporate the new test parameters and optimize runner utilization.
Files not reviewed (3)
  • Makefile: Language not supported
  • src/crawlee/project_template/{{cookiecutter.project_name}}/Dockerfile: Language not supported
  • src/crawlee/project_template/{{cookiecutter.project_name}}/requirements.txt: Language not supported
Comments suppressed due to low confidence (1)

tests/e2e/project_template/utils.py:25

  • Using [0] on the result of re.findall may raise an IndexError if no match is found. Consider using a safer retrieval method such as using next(iter(...), '') or verifying that the list is non-empty.
        crawlee_extras = re.findall(r'crawlee(\[.*\])', requirements)[0] or ''

@Pijukatel Pijukatel added the adhoc Ad-hoc unplanned task added during the sprint. label Apr 3, 2025
@Pijukatel Pijukatel marked this pull request as ready for review April 3, 2025 15:50
@Pijukatel Pijukatel requested review from vdusek and janbuchar April 3, 2025 15:50
@Pijukatel
Copy link
Collaborator Author

More parallelized run with properly ordered tests to lower overall tests time: https://github.com/apify/crawlee-python/actions/runs/14260627276

6.5 minutes

Copy link
Collaborator

@vdusek vdusek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a few minor comments...

Comment on lines -5 to -8
secrets:
APIFY_TEST_USER_API_TOKEN:
description: API token of the Python testing user on Apify
required: true
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this secret no longer required? The template tests are not running on Apify?

@@ -1,5 +1,5 @@
# % if cookiecutter.crawler_type == 'playwright-camoufox'
camoufox
camoufox[geoip]>=0.4.5
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather major version upper bound?

Comment on lines +3 to +4


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

3 empty lines? :)

Suggested change

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants