You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
wikiHow uses several image formats through the website:
PNG and JPEG obviously
SVG
WebP
SVG
SVG is a text-based vector format for which we have no optimizer (yet) on scraperlib. Unlike bitmap formats, it is expected vector ones to scale without quality loss. We can thus consider SVG users to expect unaltered image at any size, whatever the source image sizes are.
Currently, we treat them specially and simply upload them to S3 without change.
Options:
Use source, don't store on S3. Don't like this option because wikiHow websites are slow and may requests might be throttled. No using our cache is high-risk.
Use source, uploaded to S3. What current code does. Kinda break the purpose of an optimization cache.
Convert to WebP. While not direct, there are way to convert SVG to bitmap (PNG) and then to WebP. Too risky as the in-svg size are frequently unrelated to the used-ones because of scalability capabilities. This would decrease the rendered quality in many scenarios. Not sure it would make sense size-wise.
Optimize SVG lossless. There are some SVG optimizer around. Usually starts by removing the verbose clutter many editors add to the source. Other also include simplifications of drawings, lossless and destructive.
This last option, optimizing SVG using a lossless tool and uploading to S3 feels like the most appropriate. Even if source SVG are already optimized (haven't checked), we'd benefit from the cache. It keeps in sync with our objectives and is generic enough to be replicated elsewhere.
Our goal for using WebP is to optimize source bitmaps as WebP is generally better in most cases. Having a source WebP might indicate that optimization was already a concern on the source website. It is important to note that wikiHow is not just serving WebP files, it is serving alternatives as well. A typical Webp image is represented as:
wikiHow is not trying to polyfill but relies on JS to detect WebP support and adjust accordingly, defaulting to JPEG for users without JS enabled. They thus maintain two copies of each of those images.
Current code looks for Webp url and passes that to our pipeline, which means it is re-optimized and uploaded.
What should we do with WebP?
Use non-webp alt and re-encode/upload ?
Use webp and re-encode-upload (current behavior)?
Use webp from source URL (no upload) ?
Use webp and upload without re-encode ?
FYI, here's an example of an image that was barely readable and is now unreadable after our re-encoding. Probably an edge case though
wikiHow uses several image formats through the website:
SVG
SVG is a text-based vector format for which we have no optimizer (yet) on scraperlib. Unlike bitmap formats, it is expected vector ones to scale without quality loss. We can thus consider SVG users to expect unaltered image at any size, whatever the source image sizes are.
Currently, we treat them specially and simply upload them to S3 without change.
Options:
This last option, optimizing SVG using a lossless tool and uploading to S3 feels like the most appropriate. Even if source SVG are already optimized (haven't checked), we'd benefit from the cache. It keeps in sync with our objectives and is generic enough to be replicated elsewhere.
WebP
Our goal for using WebP is to optimize source bitmaps as WebP is generally better in most cases. Having a source WebP might indicate that optimization was already a concern on the source website. It is important to note that wikiHow is not just serving WebP files, it is serving alternatives as well. A typical Webp image is represented as:
wikiHow is not trying to polyfill but relies on JS to detect WebP support and adjust accordingly, defaulting to JPEG for users without JS enabled. They thus maintain two copies of each of those images.
Current code looks for Webp url and passes that to our pipeline, which means it is re-optimized and uploaded.
What should we do with WebP?
FYI, here's an example of an image that was barely readable and is now unreadable after our re-encoding. Probably an edge case though
@Kelson, your input is requested on this
The text was updated successfully, but these errors were encountered: