Back to basics: Rules of vision (or, things we can’t help seeing) – part I

Did you notice the sign above the man’s head? What about the house number? Or what appears to be a Cuban flag in the doorway? Or was the moving man the first ‘anchor’?

Judging from the correspondence and comments flying around recently, it’s about time we did a refresher course here on the fundamentals of composition and image-making. As usual, there’s far too much obsession over hardware and not enough thought about what it’s actually being used for. This will be the first of several posts from the archives in this theme. That said, those people are unlikely to read these posts anyway…

Regular readers will know that I hate arbitrary maxims labelled as ‘photographic’ rules simply because there is no such thing as a ‘universal scene’ or universal set of parameters for every image. Every composition is different, and every creative intention is different, which means the whole premise of there being a fixed set of laws that make a ‘good’ image or ‘image that works’ can only be nonsense. However, I do think there are some fundamental principles of human vision – and consequently psychological response to elements in an image – that we cannot ignore since they directly influence the response of our audience to the ideas we are trying to present. That is what I wish to address today: what are the autonomous/ subconscious/ reflex/ automatic – pick your preferred term – visual responses that we should be aware of and seek to utilise when we compose an image? Think of this post as the predecessor to The Four Things: it’s the underlying reason why some of the Things have to be the way they are.

Here’s what I’ve found so far, and their implications on the underlying structure of an image – there may well be more that I’m not consciously aware of:

We are attracted to anything that breaks pattern
This is the first principle in many ways, because everything that comes after this is contingent upon it: a uniform surface that fills our entire field of vision does not attract any attention in itself, yet the slightest flaw or imperfection or break draws our eyes straight away. The same is true of a scene in the real world: anything that seems out of place, either logically or in a contrast/color/tone continuity sort of way, will jump out. The only time something that’s uniform and homogeneous draws attention is when it’s a much smaller part of a whole – then it breaks pattern from the rest of the frame by being empty.
Photographic implication: Make sure your subject breaks pattern to stand out; think of camouflage and reverse camouflage.

We follow lines
Lines break pattern because they divide up the overall composition – our eyes then follow the break until we reach the end, presumably looking for the reunification of the two sides of the break. Where a line begins and ends inside a frame, we mostly ignore it. Where it does not, or takes up a large enough portion of the frame, we cannot, and we follow it to the end. Since most cultures read top left to bottom right, we tend to follow lines in this order of priority; however, we also follow lines from bottom to top of the frame because that follows the general layout of the world – things at our feet are generally closer than things in the middle of our field of view, unless this is broken by other circumstances.
Photographic implication: Use lines to lead the eye to the intended subject, or draw the audience through the frame past subjects in the intended causal direction for your story.

We look inside frames
Frames are closed lines that land up isolating a single specific section of the composition from the rest of it. They are boundaries between the isolated element and the background. For the same reasons we use frames to isolate a picture from the rest of the wall, but still require the rest of the wall for situational context, an image is stronger when structured this way.
Photographic implication: Put your subject inside an area of uniform textural continuity; it doesn’t have to be an actual frame, but even a plain background running most of the way around will do the trick.

We prioritise more and bigger
A big object will stand out more than a small one within the same presentation space, as will a brighter one, a more colourful one, etc. Tension and balance are created when you have two objects of similar or identical visual weight competing for your attention. They do not have to be identical objects, just objects of identical prominence relative to the background.
Photographic implication: Consider the visual mass distribution of your composition: it shouldn’t tip to one side as though it’s trying to throw you outside the frame, and the heaviest part should be where you want your audience’s attention to settle. Note that there are factors that can significantly alter visual weight such as pattern recognition…

We pattern recognise familiar objects and text
Humans have a sort of built in visual vocabulary: objects we encounter regularly are stored in our subconscious so that they don’t require conscious and active effort to process every time we see them; this is directly related to the element of surprise. If we have seen it before, the brain has an expectation of both the object and any causality of events attached to it – and we only become surprised when actual fact does not match our experience. What this means is that we are drawn to certain elements in an image – e.g. human faces, text, animals etc. – and much more strongly than their normal visual prominence as an abstract form would suggest. It also means that we perhaps don’t evaluate them as closely as an unfamiliar object, because our brains make up the gaps – we see what we expect to see.
Photographic implication: Familiar elements can be very small and still have the same visual impact; they can be large and have much higher visual impact; however, if they exist in a composition and are not the main subject, the they can land up distracting much more than you might expect.

We draw causal inferences from proximity
Since the space of the ‘stage’ in an image is defined and restricted by the frame boundaries, we look at relative spacing between elements to determine if there’s any implied relationship between them; less distance implies closeness and familiarity. A composition is somewhat misleading in that it can exclude surrounding elements, too; you can eliminate competition for proximity simply by leaving it out of the frame.
Photographic implication: The impression of closeness can be created by removing external distractions, or by artificially creating the feeling of space by putting subjects in the foreground and using a wide lens to exploit perspective.

We draw spatial inferences from shadow overlap and contrast
If an object projects a shadow on top of another object, it is assumed to be in front of it; the ‘bottom’ is always assumed to be darker than the ‘top’ – even if this may not physically be the case by the laws of gravitation. More distant objects are assumed to have lower contrast – think misty mountains.
Photographic implication: If you’re lighting something physically impossible, you can shoot it in any orientation you wish, so long as the ‘top’ and ‘bottom’ difference in brightness is maintained – even very subtle differences can be enough. On top of that, further structure in an image can be created by both gradients in luminance and contrast – this encourages progression through the frame and suggests varying distances, too.

This is a rather heavy article, so I’ll give it a day or so to digest and continue it in Part 2


  1. I think of Saul Leiter’s street photographs when I read this – the way he has objects in the frame that obtrude between the ultimate subject and the camera, framing them as though by chance. And of course it is by chance, but the act of taking the photo at that moment is not by chance.

  2. This is really helpful, and nicely explained and described. Thank you!

  3. Ming you added the necessary caveats but your non rule rules still contain the limiting concept of the image having a subject and attracting attention. For commercial and perhaps reportage photography those are good assumptions but when aiming for art they might well be obstacles and rightly so.

    Many Dusseldorf school photographers have produced very powerful images where your eye isn’t really led to any endpoint and there isn’t one subject. The whole frame is an almost equivalent field of interest. Your eye may follow structure but likely continue moving.

    Perhaps Thomas Struth is the best example. Most of his images, even portraits, have this quality. A perhaps to obvious example would be his shot of Milan cathedral.

    To my eye the people at the base aren’t the endpoint of viewing. They are just the slightest bump when scanning across the surface.

    • We can argue about what constitutes art ad infinitum, but to me – I’d have deleted that photo. But I know enough gallerists who’d sell it if it was the right name and the right amount of money, so what do I know?

      • That’s a hilarious, and rather telling, response! Do you know, as in have seen exhibitions, of Thomas Struth, Axel Hütte or Thomas Ruff? They are obviously ‘old hat’ but I’ve actually never encountered anyone, who is visually/art receptive, not taken by their work. Obviously since the work is mostly for large format prints exhibitions are a must to fully experience it but I still find that they work in books etc.

        I realize this isn’t the forum to dive into this but I’m very curious how you experience the above photo. Do you arrive at your quality assessment analytically or emotionally or through a combination? It’s a bit of a cliché, but I find it accurate, that much of the best work is slow to digest and difficult to pin down. Spectacular often turn one-liner, the image is over once you’ve seen it. Such an image will never be art unless it’s for some specific unusual context where this is quality is important.

        Hüttes landscape work in interesting as it could almost be nat geo type crap but the emotional response is miles from it. Articulating why it’s exceptional is very hard work. It’s just expected that it’s apparent.

        • I have – and I must be missing something, but with the exception of Axel, the vast majority of the images don’t do anything for me. I wondered if they were promoted as some kind of practical joke by the gallerists…if I posted anything like Struth or Ruff even here, I’d be laughed at.

          ” I’m very curious how you experience the above photo. Do you arrive at your quality assessment analytically or emotionally or through a combination? “
          It just isn’t interesting to me. There isn’t anything that holds my attention or commands I look longer/deeper, and it wasn’t a casual glance. If it works purely on basis of quality alone, then there must be some magic going on because I’ve probably got more experience in quality losses in translation of the medium than most…(Forest series, Over Australia, Verticality, all of my MF work that was intended to be printed large etc)

          Hutte is much more abstract than Nat Geo-style, which is decidedly documentary. It’s this abstraction of something identifiable that suggests a sort of endless visual loop to get lost in, whereas anything documentary is very causal and has a narrative distinct flow – at least if it’s done right.

          • Thanks for taking the time for a thoughtful response.

            • pascaljappy says:

              Hi Sigurd, this is an interesting debate and an interesting photograph to highlight it.

              The photographers taught by the Bechers all share a deadpan aesthetic that has its role in many situations. For example, the typologies used very similar, low contrast lighting. They used long focal lenses to flatten perspective as in portraits. And the near identical framing / composition was there to produce a global uniformity that helped the differences between the objects stand out. It’s a very scientific approach, in a way. Normalise everything, and anything that sticks out is different. I find the typologies fascinating.

              But I’m with Ming on this specific photograph. It’s conceptual. The composition highlights the scale of the building. It juxtaposes tourists with a place of worship. It is perfectly executed. It has sold for half a million. It has all the ingredients. But I too find the recipe lacking. While the Bechers (and some other work by Struth) are all about exactitude and objectivity, I can’t help feel the crop is sloppy, and not in a way that feels intentional. The perfect symmetry of the building contrasts with the random sprinkling of humans. But still, it feels like there is no real connection between the two (unlike in his brilliant series on museum goers). To me, the relation between the two evokes neglect but the pristine building denies that.

              But I guess the fact we are talking about it so much means it is a great photograph 🙂
              All the best,

  4. Kristian Wannebo says:

    good that you republish some of your more basic articles on photographing!
    Your archive has grown so large that perhaps you should add a subcathegory of those to make them easier for newer readers to find?

  5. jean pierre (pete) guaron says:

    I hope you don’t mind me keeping a copy of this article, Ming. I think it’s an essential addition to my personal library on photography! (I missed the cuban flag – my “bad”!)

  6. Kathleen says:

    Excellent! Remembering the fundamentals makes for much better images! Thank you!


