As a human, it is relatively easy to determine the contents of an image or video upon observation. We can easily identify a skateboarder performing a trick in an image, through our years of learned knowledge. But when we were young and first learning, we had to be told what the objects were and built up that understanding over time. A computer is not much different in this sense. Given enough examples at what a skateboard is, a model can be trained to identify this object using classification.

Some of the theories behind computer vision are as early as the first computers and in the 1950s, but it wasn’t until a couple decades ago that computational ability of our technologies started to catch up with the mathematical theory. Now the power of computer vision is seeing widespread use, in such fields as security, finance, sociology, business, and can be applied nearly anywhere. A common everyday use is facial recognition ability of your cell phone camera, applying zoom and focus of the aperture dependent on where faces are recognized within the frame.

Microsoft’s Azure Cognitive Services has some great examples of the capabilities of refined Computer Vision models.

Simple object detection & image interpretation:

man performing skateboard trick


[ { “rectangle”: { “x”: 238, “y”: 299, “w”: 177, “h”: 117 }, “object”: “Skateboard”, “confidence”: 0.903 }, { “rectangle”: { “x”: 118, “y”: 63, “w”: 305, “h”: 321 }, “object”: “person”, “confidence”: 0.955 } ]

Here we can see Microsoft’s service recognized a person and a skateboard with over 90% confidence in its accuracy.

A bit more complex example:

adult and child with backpacks walking along subway


[ { “name”: “train”, “confidence”: 0.9974923 },
{ “name”: “platform”, “confidence”: 0.9955777 },
{ “name”: “station”, “confidence”: 0.979665935 },
{ “name”: “indoor”, “confidence”: 0.9272351 },
{ “name”: “subway”, “confidence”: 0.838868737 },
{ “name”: “clothing”, “confidence”: 0.5561282 },
{ “name”: “person”, “confidence”: 0.505803 },
{ “name”: “pulling”, “confidence”: 0.431911945 } ]

Even with the blurry image, machine learning is fast approaching the ability to interpret images even better than a human would.

Reading handwritten text:

text on post it note

Analysis of retail/physical space layouts and optimization based on consumer habits:

shopper habits interpreted through video feed

Some additional use cases:

  • Counterfeit money detection
  • Signature matching
  • Video Security feed detection of possible concerns
  • Monitoring of hazardous environments or required maintenance of hardware in factories or workplace environments.
  • Art authenticity verification

Understanding Data Mining and Image Processing

Image Processing is a subset of a greater field of Computer Vision. This encompasses a set of techniques and algorithms that can process image/video data mathematically and programmatically with a desired outcome. It is the process of enhancing, altering, preparing, or extracting useful information from an image.

Data Mining, as a very general definition, is the process of discovering some kind useful information from a large set of data. This is key for applying machine learning concepts to image processing as an image is just a large set of data to a machine.

So, How Does It Work?

How does a computer ‘see’? We, as a living organism, utilize our visual cortex to process and interpret objects. How can a machine emulate this?

To a computer, an image is just an array of numbers. An array in this sense is a collection of blocks, each block containing numeric values that represent location, color, & intensity information for each pixel in the image.

Vertical pixel location ( Y axis ), horizontal pixel location ( X axis ), and RGB color values (known as channels).
For our examples below, X & Y will have a starting coordinate, or [ 0,  0 ] at top left of the image.

The last coordinate will be the last pixel in each direction, at bottom right.The red, green and blue channels use 8 bits each, which have integer values from 0 to 255. The number represents the intensity of that particular color in the pixel, from completely transparent to full color saturation.

That gives a total of 256 * 256 * 256 = 16,777,216 possible colors for each pixel.

Red     =   rgb(255, 0, 0)
Green =   rgb(0, 255, 0)
Red     =   rgb(0, 0, 255)
Black   =   rgb(0, 0, 0)
White  =  rgb(255, 255, 255)

Below, we can see an example breakdown of a greyscale image of the number 8. Each ‘box’ is a pixel of X & Y coordinates and a 0 – 255 color value.

image broken down into numerical values

A computer will see the above image as
[ [ [0, 0, 0], [0, 1, 2], [0, 2, 15], [0, 3, 4]..... [0, 14, 0] ]            // (first line)
[ [21, 0, 0], [21, 1, 0], .... [21, 14, 0] ] ]                  // (last line)

As a fun exercise, mac computers have a built-in app, Digital Color Meter, in which you can mouse hover over any part of your screen and it will show the RGB value. Windows does not have a similar app built-in, but you can
discover more about RGB values and the color spectrum here:

Short coding demo – Comparing for Image Similarity

Introducing OpenCV – read, write, display, perform manipulation and calculations on images. OpenCV is a library of programming functions mainly aimed at real time computer vision and image processing. You can read more at

From the site:

OpenCV (Open Source Computer Vision Library) is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.

Using python3 and openCV, with additional packages numpy for array manipulation and matplotlib for charting, we can quickly perform an image similarity comparison using two separate techniques. Image Similarity is a measurement of how dissimilar or similar two images are, relative to each other. Take the following images:

A human can very easily see that the two images are identical with the exception of a green star in the second image. For a computer, gathering this outcome is a bit more difficult.

First, we will import our necessary packages

Each image is read into memory and assigned to variables, prepped for comparison.

Implementing a Subtraction Comparison

OpenCV then offers us a subtraction comparison function. This function basically subtracts the value at each pixel location, gaining a measurable value for similarity or difference between the two images.

the sum of the difference of every single pixel value

As a demonstration, the first example takes the difference for diff_1 subtracted the values of img_sign and itself, which prints out a resulting sum of 0. This should make sense, as all values are completely identical, so all values would cancel each other out when finding the difference.

The second comparison, between the stop sign image and stop sign with star image, gave a total pixel sum difference of 4,485,132.

To gain perspective on what this value means, we must find out the max difference you can get from this image. To calculate that, we take 1200 (X pixels) x 1200 (Y pixels) x 3 (channels) x 255 (colors) = 1,101,600,000

We can apply these figures to find our percentage of similarity measure.

As we can see, the similarity between ss.jpg and itself is 100%, and between the one with a green star is 99.59% similar. The subtract method has advantages and disadvantages, and is great for determining if images are identical. However, it is solely a pixel-to-pixel comparison of images. To remove pixel location from the comparison, it is useful to view color differences without regard to location.

Implementing a Histogram Comparison

A Histogram is a representation of the distribution of numerical data. Removing the coordinates from the comparison, we can compare pixel color overall within the image. The data points (pixels), are arranged from 0 – 255 (pixel value), with a higher frequency occuring at each value spiking the chart higher.

We should be able to interpret the large spike in the 0 – 10 range as the lime green star in the second image. We can then compare histograms mathematically.

Using OpenCV’s compareHist function, we determine the two images are 99.65% similar.

For more than 17 years, gap intelligence has served manufacturers and sellers by providing world-class services monitoring, reporting, and analyzing the 4Ps: prices, promotions, placements, and products. Email us at or call us at 619-574-1100 to learn more.