Rules of vision (or, things we can’t help seeing) – part I

H51-B0003019 copy
Did you notice the sign above the man’s head? What about the house number? Or what appears to be a Cuban flag in the doorway? Or was the moving man the first ‘anchor’?

Regular readers will know that I hate arbitrary maxims labelled as ‘photographic’ rules simply because there is no such thing as a ‘universal scene’ or universal set of parameters for every image. Every composition is different, and every creative intention is different, which means the whole premise of there being a fixed set of laws that make a ‘good’ image or ‘image that works’ can only be nonsense. However, I do think there are some fundamental principles of human vision – and consequently psychological response to elements in an image – that we cannot ignore since they directly influence the response of our audience to the ideas we are trying to present. That is what I wish to address today: what are the autonomous/ subconscious/ reflex/ automatic – pick your preferred term – visual responses that we should be aware of and seek to utilise when we compose an image? Think of this post as the predecessor to The Four Things: it’s the underlying reason why some of the Things have to be the way they are.

Here’s what I’ve found so far, and their implications on the underlying structure of an image – there may well be more that I’m not consciously aware of:

We are attracted to anything that breaks pattern
This is the first principle in many ways, because everything that comes after this is contingent upon it: a uniform surface that fills our entire field of vision does not attract any attention in itself, yet the slightest flaw or imperfection or break draws our eyes straight away. The same is true of a scene in the real world: anything that seems out of place, either logically or in a contrast/color/tone continuity sort of way, will jump out. The only time something that’s uniform and homogeneous draws attention is when it’s a much smaller part of a whole – then it breaks pattern from the rest of the frame by being empty.
Photographic implication: Make sure your subject breaks pattern to stand out; think of camouflage and reverse camouflage.

We follow lines
Lines break pattern because they divide up the overall composition – our eyes then follow the break until we reach the end, presumably looking for the reunification of the two sides of the break. Where a line begins and ends inside a frame, we mostly ignore it. Where it does not, or takes up a large enough portion of the frame, we cannot, and we follow it to the end. Since most cultures read top left to bottom right, we tend to follow lines in this order of priority; however, we also follow lines from bottom to top of the frame because that follows the general layout of the world – things at our feet are generally closer than things in the middle of our field of view, unless this is broken by other circumstances.
Photographic implication: Use lines to lead the eye to the intended subject, or draw the audience through the frame past subjects in the intended causal direction for your story.

We look inside frames
Frames are closed lines that land up isolating a single specific section of the composition from the rest of it. They are boundaries between the isolated element and the background. For the same reasons we use frames to isolate a picture from the rest of the wall, but still require the rest of the wall for situational context, an image is stronger when structured this way.
Photographic implication: Put your subject inside an area of uniform textural continuity; it doesn’t have to be an actual frame, but even a plain background running most of the way around will do the trick.

We prioritise more and bigger
A big object will stand out more than a small one within the same presentation space, as will a brighter one, a more colourful one, etc. Tension and balance are created when you have two objects of similar or identical visual weight competing for your attention. They do not have to be identical objects, just objects of identical prominence relative to the background.
Photographic implication: Consider the visual mass distribution of your composition: it shouldn’t tip to one side as though it’s trying to throw you outside the frame, and the heaviest part should be where you want your audience’s attention to settle. Note that there are factors that can significantly alter visual weight such as pattern recognition…

We pattern recognise familiar objects and text
Humans have a sort of built in visual vocabulary: objects we encounter regularly are stored in our subconscious so that they don’t require conscious and active effort to process every time we see them; this is directly related to the element of surprise. If we have seen it before, the brain has an expectation of both the object and any causality of events attached to it – and we only become surprised when actual fact does not match our experience. What this means is that we are drawn to certain elements in an image – e.g. human faces, text, animals etc. – and much more strongly than their normal visual prominence as an abstract form would suggest. It also means that we perhaps don’t evaluate them as closely as an unfamiliar object, because our brains make up the gaps – we see what we expect to see.
Photographic implication: Familiar elements can be very small and still have the same visual impact; they can be large and have much higher visual impact; however, if they exist in a composition and are not the main subject, the they can land up distracting much more than you might expect.

We draw causal inferences from proximity
Since the space of the ‘stage’ in an image is defined and restricted by the frame boundaries, we look at relative spacing between elements to determine if there’s any implied relationship between them; less distance implies closeness and familiarity. A composition is somewhat misleading in that it can exclude surrounding elements, too; you can eliminate competition for proximity simply by leaving it out of the frame.
Photographic implication: The impression of closeness can be created by removing external distractions, or by artificially creating the feeling of space by putting subjects in the foreground and using a wide lens to exploit perspective.

We draw spatial inferences from shadow overlap and contrast
If an object projects a shadow on top of another object, it is assumed to be in front of it; the ‘bottom’ is always assumed to be darker than the ‘top’ – even if this may not physically be the case by the laws of gravitation. More distant objects are assumed to have lower contrast – think misty mountains.
Photographic implication: If you’re lighting something physically impossible, you can shoot it in any orientation you wish, so long as the ‘top’ and ‘bottom’ difference in brightness is maintained – even very subtle differences can be enough. On top of that, further structure in an image can be created by both gradients in luminance and contrast – this encourages progression through the frame and suggests varying distances, too.

This is a rather heavy article, so I’ll give it a day or so to digest and continue it in Part 2


100D_MG_0968 copy
My ultimate photographers’ daybag in collaboration wth Frankie Falcon is available here until the end of October, in a strictly limited production run.


Visit the Teaching Store to up your photographic game – including workshop and Photoshop Workflow videos and the customized Email School of Photography. You can also support the site by purchasing from B&H and Amazon – thanks!

We are also on Facebook and there is a curated reader Flickr pool.

Images and content copyright Ming Thein | 2012 onwards. All rights reserved


  1. hi sifu, i could not quite follow the last point’s >>>Photographic implication: If you’re lighting something physically impossible, you can shoot it in any orientation you wish, so long as the ‘top’ and ‘bottom’ difference in brightness is maintained – even very subtle differences can be enough.<<< , if you don't mind, could you elaborate further? thanks, ken. happy diwali 🙂

  2. Fred Thomas says:

    A thought provoking post that is easily tied to Gestalt Theory for Photographic Composition and the six or seven Gestalt Principles (depends on who you read whether you get six or seven). Which is intended to define how the viewer responds to an image.


  3. Hi Ming
    I know this article is not intended to provide a full catalog of visual cue types and how they work however I think there maybe one more worth some discussion, and that is boundaries (and possibly transitions). I think that this is often a more a passive visual cue that is more related to background and subject context. By boundary I am thinking separation and delineation and not something that is is linear that implies directional along as per your description of lines. A fence or kerb could be graphically linear but also perceived as a boundary delineation. Changes in materials of a surface such as wall or paving can be pattern creating or pattern disruptive, or the change could demarcate a different environment, location or status.

    I know I am being a bit contradictory and do not have a complete confidence in what I am writing and could be overthinking it. Either way interested to know what you think, and look forward to reading part two.

    • Good point: I considered that, too, but wasn’t sure if it would fall into pattern recognition instead – we recognise a fence as a physical divider first, rather than a metaphorical boundary, for instance.

  4. jean pierre (pete) guaron says:

    What could I possibly say, Ming? Except “thank you”!
    We all have to start somewhere – the “rule of thirds”, for example 🙂 But perhaps some of us have stayed “somewhere” for too long, and it is time for us all to move forward.
    Your article is very stimulating and thought provoking and hopefully will improve the quality of our photography.

  5. Interesting, of course. What would help a lot would be to illustrate each point with an image, which I am sure is just a matter of the time spent finding the images, which you may have just now. Eventually, this would make a good section in a book, with illustrations..

  6. Very instructional Ming. Looking forward to part 2.

  7. I didn’t read this as a rant against rules in the least but rather as a thoughtful exploration of what naturally attracts our attention, informs our perception and how, in some cases, it relates to the conventional rules of composition. As such it is anything but “redundant”.

    • Bingo. Not once do I mention ‘thirds’… 🙂

    • “I hate arbitrary maxims labelled as ‘photographic’ rules” Sounds like the start to a rant to me. A artfully put together rant but a rant non the less. It belongs with the ten thousand other very similair such words on the rules we have online. Now having said this I really appreciate Mr Thien blog so please don’t take it personally. Just participating in an honest way. 🙂

  8. I get so tired of these rants against the rules. It is just getting to the point of redundant stupidity. The rules are there as a guide to methods that work. That in no way ever means that not using them is a freaking disaster to be avoided at all cost. They are good to learn I think for many people who just seem to have no inner talent for things like composing a good shot. However many people just know it when they see it. Break the rules have fun and create what makes you happy. For those maybe just learning. The rules can be a great tool as they move along the path to improving on their craftsmanship. Peace

    • I think you’ve just inadvertently proven pattern recognition is a human trait…

      • In this case reading just a few words suggested a pattern sufficiently vivid to elicit an emotional reaction. The psychological need for pattern is so strong that the mind unburden with logic will create one whether it is correct or not, pareidolia in action.

    • Kristian Wannebo says:

      There certainly is something in that – but a big problem too.
      It can take a huge effort to unlearn rules!
      Works by self-taught artists are often more interesting!


  1. […] an image is determined by somewhat immutable physiological underpinnings (rules of vision, part I and part II) that are common across all human observers with normal vision – the important […]

  2. […] to frame and compose them in a way that’s aesthetically pleasing (and/or different), uses the fundamentals of vision and The Four Things to crate the clearest distillation possible, and preferably with a little […]

  3. […] an image is determined by somewhat immutable physiological underpinnings (rules of vision, part I and part II) that are common across all human observers with normal vision – the important […]

  4. […] touched on the cliches, we’ve touched on the physiology (much more detail in this and this article) but we haven’t touched on some things that generally make sense; I use the […]

  5. […] Human visual physiology is almost hardwired by default to notice differences – presumably as a holdover from early days when we as a species still had natural predators (apart from fellow man). Being able to note differences to a normal, ‘safe’, defined, already-explored environment is precisely what might make the difference between being alive or not. The world today has however grown to a point where there’s so many differences at the micro-level that we tend to simply ignore anything that isn’t either related to something of interest to us, or act like gawking tourists. It’s almost necessary to survive and not get overwhelmed by stimuli – visual and otherwise; a good example is eventually how one tunes out advertising to the point you only really notice when there isn’t any (in Cuba or Myanmar a few years back, for example). Bottom line: we feel stimulated only when we notice something different. And we only notice what we have some personal interest in. […]

  6. […] stars align, and for an ideal audience, the photographer has done everything they can to compose according to the way most people’s visual cortexes work: the audience themselves may not have the necessary local knowledge or cultural/ local context to […]

  7. […] used. Nor are they hard rules in the sense that they are formulaic; they are guidelines based on the way our visual cortex processes information and the way our brains interpret […]

  8. […] to frame and compose them in a way that’s aesthetically pleasing (and/or different), uses the fundamentals of vision and The Four Things to crate the clearest distillation possible, and preferably with a little […]

  9. […] Ideas to do with luminance are easy to express: bright and dark refer respectively to the presence or absence of light. We in turn make other conceptual and emotional associations with those quantitative properties: there’s a sense of clarity, intensity, purity, simplicity, happiness etc. associated with light scenes. Dark scenes can translate anywhere from mysterious to elegant to oppressive to dangerous. Where, exactly, an image lies on this continuum depends very much on other properties: color, subject matter, spatial arrangement and composition. When an image is fairly uniform in luminance across the entire frame, we do not really perceive one particular area as being more prominent than anywhere else – however, since this is almost never the case in reality, there always remains one or more isolated areas that attract view attention – they break pattern, are different in luminance, and thus higher in contrast. It’s the contrast that attracts our eyes, more than anything else – but I am at risk of digressing into the physiological nature of vision. […]

  10. […] the last two articles on rules of vision, it seemed very appropriate to finish the mini-series with this little reminder from 18 months […]

  11. […] from part I – hopefully the first part has had time to settle and digest; let us press […]

%d bloggers like this: