Genii Weblog
Defects on images found in the wild
Wed 16 Aug 2017, 05:01 PM
Tweetby Ben Langhinrichs
As part of our Defect Detection feature, we encounter and usually fix a number of different defects in images. The following list is the different scenarios we detect which cause any issue with extracting or rendering the images. Of these 31, at least 27 have been encountered in actual customer documents. The other four are left in because they might be some day. While none of these problems are common, some are likely to be encountered in any large data repository, especially in mail which has been converted back and forth between MIME and rich text (as with replies or forwards).
A few of these problems are completely unfixable, but only a few. For example, with scenarios 1a and 17, we have no image information to work with at all.
Many of these are easily fixed. For example, 25 or 26 are easily deduced from the data.
The rest can be partially fixed. Part of the image may have low resolution, or a section may be missing, but usually the image is at least intact enough to identify.
1) Image header lists n segments, but there are only m (where m < n)
1a)Image header lists n segments, but there are none
2) Image header lists n segments, but there are m (where m > n)
3) Image segments contain multiple GIF starts (often associated with #2)
4) Image segments occur with no image header
5) Multiple image headers
6) Image header lists data size other than actual (except where PNG)
7) Win meta header lists n segments, but there are only m (where m < n)
8) Win meta lists n segments, but there are m (where m > n)
9) Win meta segments occur with no Win meta header
10) Multiple win meta headers
11) Win meta header lists data size other than actual
12) Bitmap header lists n segments, but there are only m (where m < n)
13) Bitmap lists n segments, but there are m (where m > n)
14) Bitmap segments occur with no Bitmap header
15) Multiple bitmap headers
16) Bitmap header lists data size other than actual
17) No headers or segments
18) Both image segment or win meta and bitmap
19) Image resource that is missing
20) Pseudo-attached image that is missing.
21) Image segments occur before image header
22) Win meta segments occur before win meta header
23) Bitmap segments occur before bitmap header
24) No graphic record but other graphic elements
25) Graphic record says resized, but has 0 width or height
26) PNG record missing
27) PNG header lists n segments, but there are only m (where m < n)
28) PNG header lists n segments, but there are m (where m > n)
29) Multiple PNG headers
30) Image segments contain multiple PNG starts (often associated with #28)
31) PNG header lists data size other than actual
Phew! As you can see, there are a lot of potential issues, but with Defect Detection, we can fix the majority. That's why we do it.
Copyright © 2017 Genii Software Ltd.
What has been said: