-
-
Notifications
You must be signed in to change notification settings - Fork 96
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failure to recognize differently-coloured text within larger chunk of text #69
Comments
Thank you, I'm not sure if I can do anything about it soon, but it's an interesting insight. |
@ngoomie MangaOcr preprocessed the image by first making it gray-scale. This is how your image looks like when converted to gray-scale. As you can see, it's barely readable. In your case of white text with some red over black background, maybe it's better to preprocess the image to turn red into white and then get the results. import numpy as np
from PIL import Image
img = Image.open('aceattorney.jpg')
# Make sure to not have alpha channel
img = img.convert('RGB')
# Convert the image to a NumPy array
img = np.array(img)
# Compute the maximum value for each pixel across the RGB channels
img = np.max(img, axis=2)
# Create a new grayscale image from the maximum values
img = Image.fromarray(img)
print(mocr(img)) And this is the recognized text: This way of preprocessing the image also works if the text is green, blue, or colored in general. But it works differently for text that is black over white background (in which case you should probably take the minimum instead of maximum). Hope it helps. |
Hi! I'm using manga-ocr to help me play through the PC release of the Ace Attorney trilogy. Often the games will have certain keywords highlighted in a different colour for emphasis, and when this happens, manga-ocr will usually fail to properly recognize the differently-coloured text. If the entire block of text is in a colour other than black or white it will be fine, and so it also is fine if you select just the word that manga-ocr failed to process on the prior attempt.
An example:

First pass this gets OCR'd like this:
Second pass on just the red word, it gets OCR'd correctly as 証拠品.
This has happened rather consistently with any instances of text like this that I've found. I can probably provide more examples if needed.
I also understand that this is somewhat out of the scope of manga-ocr since, well, it's not manga! And it technically works fine enough to be usable anyways. So I understand if this issue doesn't get touched on at all, but I figured it might be worth reporting anyways just in case.
The text was updated successfully, but these errors were encountered: