Skip to content

test_indexing_item_not_front test is failing #258

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
benoit74 opened this issue Apr 29, 2025 · 2 comments
Open

test_indexing_item_not_front test is failing #258

benoit74 opened this issue Apr 29, 2025 · 2 comments
Assignees
Milestone

Comments

@benoit74
Copy link
Collaborator

I had to skip this test which is failing for now with most recent libzim:

def test_indexing_item_not_front(tmp_path: pathlib.Path, png_image: pathlib.Path):
fpath = tmp_path / "test.zim"
main_path = "welcome"
with Creator(fpath, main_path).config_dev_metadata() as creator:
creator.add_item(
StaticItem(
filepath=png_image,
path="welcome",
title="brain food", # title used for suggestions
index_data=IndexData(
title="screen", content="car" # title and content used for search
),
hints={libzim.writer.Hint.FRONT_ARTICLE: False}, # mark as not front
)
)
assert fpath.exists()
reader = Archive(fpath)
# "brain" works as a suggestion but "food" doesn't work because since no front
# article is present in the zim file, libzim doesn't create a title xapian index.
# so, when searching suggestion, libzim is fallback to a binary search on the title
# and return only article starting by the query.
# see https://github.com/openzim/libzim/issues/902#issuecomment-2223050129
assert "welcome" in list(reader.get_suggestions("brain"))
assert "welcome" not in list(reader.get_suggestions("food"))
assert "welcome" not in list(reader.get_suggestions("screen"))
assert "welcome" not in list(reader.get_suggestions("car"))
assert reader.get_search_results_count("screen") >= 1
assert reader.get_search_results_count("car") >= 1
assert reader.get_search_results_count("brain") == 0
assert reader.get_search_results_count("food") == 0

This looks like an upstream issue, hopefully only at read time: openzim/libzim#981

@benoit74
Copy link
Collaborator Author

We've agreed upstream that this was more a kind of bug / unwanted behavior. I will hence fix the test on scraperlib side.

@benoit74
Copy link
Collaborator Author

I will move this out of scraperlib 5.2.0:

  • it is not requiring any code change on the lib itself, only on tests
  • we need to test / document / align on more things around all these searches, and this is going to take time ; I would like to be sure about the full-text search behavior when non-front text/html article is passed for instance

@benoit74 benoit74 modified the milestones: 5.2.0, 5.3.0 May 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant