A photon (light) travels through a lens, is focused onto a surface.
That surface is made of materials which are sensitive to light and will produce a small electrical signal when stimulated by light. The amount of signal you get depends on the brightness of the light and the colour (wavelength) of that light. This is digitally recorded along with the values from every other signal when you take the image.
If you place several million of these devices in a grid a few thousand wide and a few thousand tall and make them small enough, they will in effect “capture” a representation of the image which was seen through the lens. If you scaled up this grid to be visible, it would look like a gigantic tic-tac-toe game with each spot filled with a single colour per spot.
Resolution is determined by:
– The size of the individual light sensitive devices, in modern cameras these are usually ~5 micrometers in size (about 1/10 the width of a human hair).
– The size of the grid of devices you make. A modern smartphone will be on the order of 5000×5000 pixels give or take. That would equate to a 25megapixel (25MP) image.
This is a very simplified view of a camera sensor and there are many other layers of complexity when discussing capturing RGB colour, filtering, denoising, pixel binning, and other more advanced aspects of what goes in to capturing an accurate image.
This is in essence the exact same principle used to produce images on your phone, a grid made of several million tiny dots will create the illusion of an image if you cannot resolve each individual dot (this is also how most printers work as well).
Latest Answers