An Intuition for Convolutional Neural Networks
Machine learning models are interesting pieces of math, and when I first learned them, it always fascinated me how somebody came up with the ideas. It seemed entirely random. Some of these models are very complex, and for a long time, it seemed like people tried random things to get the results. Even when professors said, use a ReLU or sigmoid activation function, I didn't quite understand how they knew which to use, or why. They seemed to suggest that they figured out through practice and experience, and not necessarily because they understood the math.
This assumption is likely false. But I have rarely heard ML teachers explain why the model works in class. They blurt out the model’s math, but not much else. I think trying to understand what the math is doing is useful. In some cases, perhaps building an intuition behind what the model is doing can be helpful in your growth of understanding a subject, and certainly ML.
Consider this diagram similar to one I was shown during a class on machine learning.
There is a lot going on in this diagram, and if you are a beginner, this isn’t very easy to understand. I always wondered how machine learning research scientists or mathematicians came up with the CNN. I don't know for a fact how the original researchers came up with the CNN, but I did arrive at an intuition. Let me explain with a story.
Sherlock Holmes and the Mona Lisa!
Once, the French police called on Sherlock Holmes to help them solve a crime at the Louvre. Jumping on the thought of another adventure, Holmes and his trusty friend Dr. Watson left their home in Baker Street and grabbed a train to France. Once they arrived at the Louvre, the inspector took Holmes and Watson to the crime scene, where a grisly scene awaited them. A man lay dead next to the Mona Lisa.
"What happened here? This man has bled to death!" asked Watson.
"We were informed by the morning staff at 7 AM this morning. They found the museum curator, Dr. Osbourne, lying dead right here, next to the Mona Lisa. It is clear that he was murdered, but there is no note, and nothing was stolen. We are not aware if the late Doctor has enemies. The people we talked to say he was a kind and affable," replied the inspector.
"Interesting," said Watson.
"We had heard of Mr. Holmes and his exploits from our friends in the Scotland Yard and decided to call on him to request assistance," said the Inspector. "We hope you can help us, Mr. Holmes."
Holmes did not acknowledge the Inspector. He walked over and knelt towards the curator. He saw that the museum curator's fingertips were red, but there were no signs of blood or open wounds on the hands. He intently looked at the Mona Lisa, then pulled out his magnifying glass.
Watson and the Inspector observed Holmes intently, but neither could figure out what Holmes was up to. They watched Holmes sniff the painting and scan his magnifying glass. After 20 minutes, Holmes returned to the Inspector.
"You'd better leave this investigation to me. You are not equipped to handle this. This is beyond you," said Holmes.
"What? Surely, we can assist. We are the French Police, after all! We have staff to support the case," replied the Inspector.
"No, no. It won't do. Watson and I will handle this," said Holmes. The inspector was stumped.
"Let's go, Watson," said Holmes.
After they stepped out of the Louvre, Watson asked, "So you know who did it?"
"Yes. It’s our dear friend, Moriarty, up to his mischief again."
Elementary, My Dear Watson!
"When you have eliminated the Impossible, what remains, however improbable, must be the truth" - Sherlock Holmes.
'How' did Sherlock Holmes deduce that this was Moriarty?
Holmes observed that the curator's fingertips had some blood. It looked like the curator had left a message. Since he didn't find any on the walls, he turned to the paintings. He scanned the paintings closely, from left to right and top to bottom. When he sniffed, he noticed the smell of blood on the Mona Lisa.
Upon closer inspection, he saw handwritten text markings on the Mona Lisa. Most were small letters, but some were bigger. He noted down the bigger letters.
I R M T O Y A R
He then tried different combinations of these texts and arrived at:
M O R I A R T Y
Convolutional Neural Networks
Unlike Sherlock Holmes, computers don't have sight like humans. Digital images are made up of numbers, and that is what computers process. Consider a 32 x 32 pixel image, which is the size of an icon. A color image has 3 channels namely red, green and blue channel. A gray scale image has just one channel. Each pixel ranges from 0 to 255. In a grayscale image goes from 0 (black) to 255 (white).
This is represented mathematically by tensors (a 3 dimensional matrix). For the Mona Lisa image that is 32x32 pixels in size, we have a 32x32x3 tensor that represents the image.
Convolution
Going back to our analogous story, Sherlock Holmes scans the image with his microscope. He focus his microscope on a portion of the image and slowly scans the portrait, left-to-right, top-to-bottom. We can do the same mathematically, using a concept called convolution.
We choose a dimension size, for example 4x4 px and step through the tensor for each channel. How large a step to the right or to the bottom, should we take? This is determined by a parameter called the stride. We take each of the 4x4 px sub-matrix and multiple with another matrix called the kernel. This is equivalent to Holmes looking through the image and identifying the anomaly where he identifies the painted blood marks of the museum curator on the Mona Lisa. He has identified and extracted something important (in this case, alpha numberic characters) from the image. In ML, convolution process has created a newer smaller tensor of dimension (16x16x3).
Pooling
Next, Holmes paid attention to the text and observed that some letters are much larger than the rest. He identified those letters, and ignored the rest.
In a CNN, we take the matrix that was just convoluted and perform pooling. There are different kinds of pooling like min pooling, max pooling, and average pooling. Since, Holmes extracted the letters that were the largest in size, imagine you want the CNN model to pull the largest number - that is max pooling.
Unwrap Tensor to a Fully Connected Layers
Once pooling is complete, we take the 3 dimensional tensor and convert it to a single dimensional fully connected layer at which point it becomes similar to a standard ML problem. In theory, we could have done this at the very beginning and not needed to do any convolutions, but then the computational time would have taken much, much longer. This is analogous to Holmes extracting the letters and writing them down into a sequence.
I R M T O Y A R
Solve for the Answer
Finally, he tries different combinations of the letters to arrive at one that makes sense.
M O R I A R T Y
Once we have unwrapped the Tensor, the problem becomes simpler to solve, and we can arrive at a solution.
Conclusion
This is my intuition about convolutional neural network. It helps me mentally recall what the CNN model is. Of course, real application CNN models may involve more than one convolution, and we may try different kernel sizes, stride lengths, and other methods.
Do you find this helpful? Does this help you understand what the CNN is doing?
This is a public episode. If you would like to discuss this with other subscribers or get access to bonus episodes, visit krishna31254.substack.com [https://krishna31254.substack.com?utm_medium=podcast&utm_campaign=CTA_1]