Repost: Practical differences between cameras and human vision

H51-B0019280 copy
Synthetic moon rising. Why is it so difficult to get sunsets to appear ‘right’? Read on for the answer.

Many photographs do not work as we intended. Subsequently, we find out they do not work because there is a difference between what you saw and what your audience sees in the image. Sometimes this comes down to lack of skill in translating an idea, but often it’s more subtle than that: the camera doesn’t see what we see, and we need to be both highly aware of that and how to compensate for it. For instance: it’s no big deal to make a monochrome image, but our eyes only perceive a lack of color under very exceptional circumstances. Yet it’s these differences that make some images stand out as being exceptional, and others not really ‘work’.

There are a few important properties of the eye/brain combination to note: firstly, we have an incredibly wide dynamic range, but it isn’t linear. Highlights seldom clip to white or oversaturate, though blacks go black fairly quickly. Also, our vision switches to luminance-sensitive rods at lower brightness levels, meaning that the darker it gets, the more desaturated and flat things appear. A camera mostly maintains linear tonal response across the entire tonal range, and thus the final output will look different to both what we see and our idea of how a scene of that given brightness should look.

This is a structural difference: a sensor’s photosites are identical and equally sensitive across the entire field of view. They are color filtered (assuming a Bayer sensor) and ‘actual’ color reinterpolated later in software, but this interpolation is again spatially uniform. On the other hand, our eyes are not spatially linear at all. There are two types of cells; one sensitive to color but only with adequate amounts of light; one sensitive to luminance but less so to color and able to work across a much wider brightness range. Density of luminance-sensitive photoreceptor cells – 100-120 rods million in all – falls off gradually towards the edges, ending about 80 degrees or so off axis. There is a central ‘hot spot’ known as the macula, covering 3-4 degrees or thereabouts where there are few luminance-sensitive rods but a very high density of color-sensitive cones – about 6-7 million of them. These are responsible for both color and detail vision. This too falls off gradually further out to about 20 degrees off-axis or so.

This is the cause of the second difference: visual acuity of our eyes varies across the visual field; the corners are not as sharp or well defined as the centre, but there are no real corners to begin with. We can perceive motion in these areas, but not much detail or color. This structure is the reason we are aware of an entire scene at a macro level, but then ‘focus in’ on a detail even whilst simultaneously retaining some awareness of the wider context. Finally, the image processing portion complicates things further: our brains correct for all sorts of optical deficiencies (such as distortion, chromatic aberration, noise, skewed horizons or minor keystoning) from both experience and automation. Our eyes automatically scan a scene and use a latent image to perceive color and detail across a wider field of view than should be possible with a static eyeball and the cells in the macula alone. A photographic image obviously cannot do this: firstly, it has distinct ‘hard’ boundaries which make you aware of the edges of the image and your eyes scan an image as they would the real scene, so you still have to maintain acuity across the entire portion of the image which you wish your viewer to scan.

_64Z5319 copy
Dark areas must be less saturated to appear ‘natural’ – the higher-density, smaller, color-sensing cells in our retina grow less effective as light levels drop, leaving only monochrome vision with lower acuity. This is one of the reasons low key B&W images look much more natural than high key ones.

The real differences in translation amount to what is probably best described as ‘strange tones’ and ‘subjects that looked a lot stronger in reality’. Tones look strange, because as mentioned before, a camera responds linearly to both luminance and color, but our eyes do not. Achieving this linearity has been the goal of engineers since the the advent of the device. However, now that the technical capabilities of sensors have come close to matching the eye in both resolution and absolute dynamic range – more thought needs to be given to the output presentation.

Anybody who has a color-sensitive eye and subsequently attempted to use a spectrometer to profile a monitor, camera or printer will know that the results aren’t quite what you expect. Even on a wide-gamut monitor, you may still land up with the images not quite appearing like reality; this is simply because a spectrometer functions like a camera – but we interpret the output with our eyes. The calibration profile put out by the spectrometer (in the form of brightness map instructions so that each RGB pixel matches an absolute color on output) is tuned for absolute output, not perceptual output. So you may still be able to achieve 100% saturation at 10% luminance, even if in reality our eyes cannot perceive this. This may at first seem odd: if we can’t perceive it, how would we know the calibration looks incorrect? Remember too that the luminance of the output device isn’t linear, either – introducing another complication into the equation. In other words, blacks are not truly black (even though they are input as RGB 0,0,0). This makes more of a difference to output than you might think, and it’s also the reason why after trying a large variety of calibration tools – I still find the best results to be achieved by eye because this is the ultimate viewing/input device anyway.

_8513695bw copy

You’ll notice I haven’t said anything about dynamic range: this is a very difficult question to answer in absolute terms, but I have a feeling that the eyes are very highlight-biased. By that, I meant that we see shadows blocking up to black fairly quickly; it’s normal to see black. But it isn’t at all normal to see something fully over saturated to white with no spatial detail whatsoever – it has to be a painfully bright object for this to happen. Of course, the response is again nonlinear: we have less ability to differentiate between different luminance levels the brighter those luminance levels become. This is not the same as any sensor: most of the time, it’s a linear input-output relationship. The closest camera I’ve seen to matching this nonlinearity is the D810 in combination with ACR 8x – shadow recoverability is traded for a very long highlight shoulder. This is not so good for working in low light because of the noise penalty, but it renders normal to bright scenes better than anything else I’ve seen.

X1D5_B0000211 copy
Smooth, natural highlight rolloff is not so much about not clipping as clipping smoothly: small sensors are very bad at this, as all channels tend to clip simultaneously; larger sensors may still retain information in one or more channels for longer, allowing for a more gradual transition.

The answer to the disconnect between perceived subject prominence and photographic prominence is related to the ability of our eyes to ‘lock in’ on one specific angle of view, letting the periphery fade out thanks to those less-sensitive and less-dense rods: the rest simply seems less prominent because we’re not taking in as much information in those areas. For reasons explained earlier, a camera doesn’t do this. Furthermore, when we view an image, we scan the entire image (and thus take in plenty of extraneous distractions) rather than instantly focus in on one specific area.

Executionally, this means we have to pay extra attention to ensuring that the non-subject areas are distinctly non-prominent and don’t attract more attention (by breaking pattern) than the subject areas. Perhaps this explains why bokeh is so popular: it approximates the way our eyes work by smoothing out the areas that are not meant to be subject. It isn’t a perfect replication by any means, because of two more structural differences: firstly, we don’t have the ability to produce shallow depth of field with our eyes – or even really control it; the irises are limited to about f4 equivalent and we have no conscious control over how wide they open (in that respect, they’re more for light control than resolution or depth of field). Secondly, there are two of them: stereoscopic and binocular vision with two eyes means that the visual field is both rectangular in shape, and we are able to ascertain relative distances by interpreting the difference between images from the left and right eyes.

In reality, our eyes are somewhat like a video camera in program or shutter priority mode with auto-ISO: we maintain smooth motion and increase the impression of detail by continuous scanning; to do that, the exposure time must remain relatively constant. The amount of light collected is regulated automatically by the iris – removing depth of field control – and further compensated for by retina and brain to manage situations where the iris cannot open any larger to admit more light, or close down any more to restrict it. The iris’ action is involuntary, and the only slight control we have over it is to squint, which sometimes helps us to resolve distant or bright objects by both controlling the amount of light and stopping down.

_7501039 copy
For something to stand out, it has to really stand out ion a highly exaggerated way. But even with faithful tones and luminance, this image still appears unnatural and flat because of the perspective compression and elimination of greater context – our eyes simply don’t work this way. Does it make it any less interesting? Not necessarily.

And here we’re back to the difference in perceived and captured subject prominence again: the eyes are getting a bit of a bonus thanks to an extra dimension. We need to imply spatial relationships in a two dimensional image with depth of field cues and shadows; if a shadow falls on something as opposed to behind it, then you know the something must be behind the object casting the shadow. We can of course use these properties to create visual non-sequiteurs: removal of depth of field cues or creation of false ones through careful lighting placement and perspective allows a photograph to represent reality in a way that is not immediately perceivable to the naked eye. These image are interesting precisely because they are clearly reality, but in the same way, not fully agreeing with our personal visual experiences of it.

Here is where we need to learn to see like a camera. The easiest way is to compose the image as close to the final presentation medium as possible; I think it’s why we see people able to compose just fine with iPads and iPhones but struggling with optical finders. The large LCDs are simply much closer to how the image will be eventually viewed, and of course also preview focus and exposure – neither of which are accurately represented even with the best SLR finders. The advantage of optical finders of course remains immediacy and the ability to see every single nuance in the scene, both in tonality and detail; it requires some imagination and experience to translate that into a finished image.

Ironically, to come to a finished output image – let’s say a print – that represents the scene we’d see with our eyes, we have to do a lot of nonintuitive things. We are perceiving the output through the same ‘lens’ as we would perceive the actual scene – so in effect, we need to compensate for the limitations of both capture and output mediums to restore transparency. It’s not as easy as it looks – remember, thanks to pattern recognition, we already have an expectation from experience of how reality ‘should’ appear. The more familiar the scene, the harder it becomes to reproduce it in a transparent, ‘ordinary’ way simply because we have more personal experience with such scenes. We return to the sunset question posed at the very start of the article: every day, we add to our ‘experience database’ of what a sunset can and should look like. We perceive it as a dynamic thing that changes with time and physical position of observation. Colors are relative, especially if the main light source is heavily biased warm. Yet our image is static, there are extremes of dynamic range (especially with a moon in play), and we have no color reference point if nothing is actually or perceptually white. See the challenge? There is of course no right or wrong between camera and eye – we can use the limitations and properties of either to translate an idea into an image in an unexpected way, and create something memorable as a result. But we can’t do that without understanding both the technical differences and their impact of perception. MT

We go into far more detail on good compositional practice and the underlying psychology behind it all in The Fundamentals and the Making Outstanding Images workshop video series – available here from the teaching store.

__________________

Visit the Teaching Store to up your photographic game – including workshop videos, and the individual Email School of Photography. You can also support the site by purchasing from B&H and Amazon – thanks!

We are also on Facebook and there is a curated reader Flickr pool.

Images and content copyright Ming Thein | mingthein.com 2012 onwards unless otherwise stated. All rights reserved

Comments

  1. Steve Gombosi says:

    This was a really enlightening (no pun intended) post – particularly for those of us who are belatedly transitioning from MF/LF film to digital. Thanks so much for reposting it!

  2. Brian F. says:

    “…I think it’s why we see people able to compose just fine with iPads and iPhones but struggling with optical finders…”

    My sentiments exactly. I have found the larger the optical finder, the more the eye has to “hunt” to nail down the compositional elements for a particular shot (i.e., is the horizon level? Are off-center elements too far over to one side, ruining balance but not immediately apparent when the shot is taken?, etc.).

    • I also think it has something to do with the crappiness of most optical finders these days – what was considered barely adequate in an economy film SLR of the 60s or 70s is still better than our pro cameras today for actually seeing nuance and judging focus…

    • Steve Gombosi says:

      “…I think it’s why we see people able to compose just fine with iPads and iPhones but struggling with optical finders…”

      I don’tunderstand how they do it, but then I’ve been using optical finders of one flavor or another for 5 decades or so. I don’t think I’ve ever taken even a barely acceptable photo with an iPhone. I find composing, etc. on an LCD to be an exercise in frustration.

      And yes, I found the optical finder in Leica S to be really unsatisfactory (and difficult to focus, even with the microprism screen). I don’t understand why so many people rave about it – so I second the “crappiness” comment.

      The X1D, on the other hand, is an absolute delight to use with recent firmware.

  3. Samuel Jessop says:

    Really informative article, and not something I have seen discussed anywhere else. Two things jump out at me. The first is that to gain a more realistic portrayal of night scenes, desaturating the dark areas is worth trying. The second is that where you discuss the highlight bias of our eyesight, it makes more sense now how C41 films look more natural than E6 generally speaking. From this I wonder if to some extent we should work harder to protect highlights even if this risks some of the shadows, and further underlines the advantages of modern high DR sensors.

    • I’ve always desaturated my shadows a bit – it’s mentioned in most of the workflow videos. Otherwise the intended visual balance of the composition seems a bit off (I.e. not quite matching what I saw at the scene). C41 films have a smooth highlight roll off, but E6 does not – it isn’t the absolute dynamic range per se but the way transitions are handled. Any time there’s contrast we notice it specifically, which if unintended becomes compositionally distracting…

  4. Martin Fritter says:

    Exceptional piece of technical writing. Great guidance. Thanks so much.

  5. “I find modern sensors have to be ETTR’d …”

    I would love to read your commentary on both how and when one should ETTR.

    • Simple: all the time, until the image is *just* clipped within recoverable ability of your usual workflow. His of course requires some experience and experimentation to determine how much recovery is possible and how to tell on the back of your camera at the time of capture…

  6. Classic Ming Thein. I don’t think even long term readers would have the slightest objection to reposting more articles of this depth and detail. It’s more like gift to those of us who don’t have the tenacity to read through all 1,400 posts/2.8 million words compiled thus far.

    • Thanks. Perhaps time to search the archives again then – there’s stuff in there to be honest I don’t even remember writing! 🙂

      • jean pierre (pete) guaron says:

        I’m trying, Ming – the archives are a wonderful source of information – I often dip into them.

        Thanks for posting yet another brilliant & highly informative/instructive article.

      • Terry B says:

        Ming, I thought that you must have covered ETTR at some point and went looking, but as you point out, it is a bit of a slog wading through your posts and whilst I did have a go nothing jumped out screaming “this is about ETTR” so I gave up.

        Frankly, I didn’t know what ETTR was until a few weeks back when it popped up on dpr in reference to the Sony R1. The Sony R1 still stands as my favourite all-time digital camera used within its now dated specification, but despite its near APS-C sensor it came out in an era where noise was far more problematic than today. So to come across a specific reference to it and ETTR aroused my interest. But the site looks at two cameras, the Sony DSC R1 and Olympus OM-D E-M5, so it could be of interest to some Olympus users, too.

        The noise reduction for the R1 at 1600 and 3200 ISO is impressive, but until I carry out my own tests, I can’t quite figure out if the images will be sharper (or less noisy) than had they been exposed natively at 400 and 800 ISO in the first place without ETTR.

        It can be found here: http://mattihartikainen.net/about-exposing-to-the-right-ettr-sony-dsc-r1-and-olympus-om-d-e-m5/

  7. Terry B says:

    An interesting article, Ming, and one that I’m sure will raise a lot of questions. I notice you didn’t refer to exposure per se, but IMO this must have an effect in colour work as meters are generally calibrated to 18% grey and left to their own devices the result will be to lighten darker areas and darken highlights. So, whilst we think we are giving correct exposure, we won’t be, unless we make specific allowances for the actual subject matter.

    In my own experience, not having colour to worry about, this was easier shooting b/w film as I didn’t have colour shifts to deal with and I could deploy colour filters where I needed separation of tones in interpreting colour into b/w prints. And for me, at least, this gave me the best “visual interpretation”. I found then, and today with digital, that my preferred exposure was to expose for the highlight and just left the rest to drift off into shadow. Exposing this way the shadows would be deeper, it goes without saying, but for the reasons you mention, my eyes find this more natural. And I even find now that my digital street views at night in colour are better shot this way.

    I appreciate that this goes against the grain as technology tries to give us so much dynamic range that detail is clearly visible at both ends of the spectrum, so to speak

    • I find modern sensors really need to be ETTR’d or you land up with some very strange color results after you make the tonal/luminance values faithful to the scene – a sensor tuned for night or low key/ low light subjects I suspect won’t work for bright subjects; the former requires dense (but not completely clipped) shadows

    • I find modern sensors have to be ETTR’d and toned to taste later because it seems the native tonal response for bright or low key/dark subject matter is simply very different. ETTR ensures you have the most information to work with, but a sensor tuned for one won’t work well for the other – bright subjects need more data in the midtones with smooth highlight rolloff, higher lower shadow contrast (for the relative *impression* of contrast/brightness) and saturation; darker subjects are both lower saturation and heavily lower midtone/shadow loaded, but with some small high-saturation areas in the highlights to serve as brightness references. The main thing our eyes do that cameras do not is change saturation with luminosity: a function of the cellular hardware in our retinas (rods vs cones) but for all arguments on naturalistic reproduction, I’m not sure why this hasn’t yet been incorporated into any camera.

  8. I have to say – this is an impressive explanatory article. Maybe the best I’ve read on the topic yet. Great job!

Thoughts? Leave a comment here and I'll get back to you.

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: