Gratuitous header; moment of enlightenment.
One of the unavoidable buzzwords of the last couple of years has been ‘computational photography’. Besides sounding slightly oxymoronic and insulting to the ‘real’ photographer who presumably represents what they see and doesn’t attempt to manipulate objects into (or out of) being that aren’t physically there, the reality is that it’s unavoidable and has been unavoidable since the start of the digital era. Everything that requires photons to be converted into electrical signals and back to photons again (whether off a display or reflected off a print) – must be mathematically interpreted and altered in some form before output. It is not possible to avoid this: the Bayer interpolation, in-camera JPEG conversions, any file format saving, conversion to print color space – a ‘computation’ has to be performed to translate the data. Hell, there’s already an implicit computation in the analog to digital stage (although arguably photons are already ‘digital’ since they represent discrete quanta of energy, but that’s another discussion for another time). However, what I’d like to discuss today* is something one step further down that road, and following on from the previous posts on format illusions: in light of the broader possibilities of computational photography, what does ‘format’ even mean?
*I.e. excluding things like subject recognition for tracking, depth mapping and simulated shallow DOF transitions etc. for the time being; we’ll revisit that later.
Beyond the obvious erasure of memory cards, the concept of format refers to the size (and relative size) of the recording medium. Different sizes of recording medium have different tradeoffs, strengths and weaknesses; but generally, the more the merrier. The reason this becomes largely academic (and possibly highly confusing from a characteristic property standpoint) is that computational photography allows you to effectively change the size of the recording medium on the fly, without necessarily any accompanying physical changes. And precisely therein lies the first aspect of confusion: optical properties are hardwired to physical ones, meaning that no matter how much you pano-stitch on your iPhone, the lens still retains the physical and optical properties of a 4.25mm f1.8 (or whatever it happens to be) – even if the angle of view is different, and the dynamic range/ color/ tonal expectations don’t equate to a 1/2.3″ sensor. Though the particular example of panoramic stitching changes the effective overall angle of view but not the local properties of the recording medium in each small portion of the image, there are other examples like HDR stacking or super resolution stacking that change the effective photosite properties but not the angle of view. And we haven’t even talked about depth mapping yet, which is a bit more complex and we’ll leave for later.
Bottom line: the ‘traditional’ expectations of this sensor size and this angle of view equals this kind of look (or maximum range of looks due to the latitude of the medium) is simply no longer true. Tonally, at any rate, there has been improvement enough that the capabilities of most sensors in most situations outweigh what you actually need to represent both the scene, and more importantly, the photographer’s idea of how the scene should look. There’s more resolution than most output media. This means that we have headroom to spare, and sufficient flexibility to manipulate the tonal map of output to whatever we wish. In short: the imagination now has to catch up with the technological possibilities afforded by capably-used equipment.
Important diversion: resolution is NOT independent of tonality. More resolution doesn’t mean just more ability to count hairs, but the ability to create fine tonal gradations even across areas of subject that don’t necessarily have any high frequency detail. The stacking of images shot from slightly different points of view to create a super resolution image does just this – providing of course your subject doesn’t move between captures. But make the capture speed fast enough, and you effectively have a static situation most of the time – even if the camera is handheld. You just need enough difference to provide at least a sub-pixel shift between captures, and this is what most of the pixel shift and super resolution cameraphone modes do. Something similar (or even combined) can be achieved tonally by varying exposure between captures. In an ideal world, you’d be able to do both: shoot a lot of pictures very quickly at different exposures and then blend the whole in a natural way to create an image that’s both high resolution and with input dynamic range precisely matching the scene.
That takes care of tonality, dynamic range and resolution. Even the preserve of larger formats – focal plane transitions – is eroding. This is where the depth mapping comes in: with cameras in two locations it’s possible to calculate the distance of each subject element in the scene that’s visible to both cameras; it’s a matter of trigonometry to figure out the physical distance to subject if you already know the focal length and physical separation between the cameras (which presumably does not change). The greater the angular difference, the better this depth map will be – either by bringing the subject closer, increasing focal length, increasing camera separation – or all three. Or you could go a separate route and split the beam further down the road to record two separate images on sensor (Lytro’s approach). There are degrees to this implementation, of course – most of us are familiar with face detection and faux-bokeh modes on smartphones, but they’re limited by both physical size (camera separation and focal length) and the amount of time consumers are willing to wait for computation and rendering. The most flexible kind of depth map would have to record the whole scene in focus and with sufficient resolution and dynamic range from two fairly different vantage points – from here we could change composition within a fairly wide latitude after the fact. From here, we are only one step away from having ray tracing ability and the ability to change light after the fact. Oddly enough, there doesn’t yet seem to be that much overlap between VR content generation and photography – this may be due to practical hardware limitations, or perhaps it’s just a step too far for most hobbyists; after all, we already see a sharp decline in interest as effort increases even in conventional photography.
I wouldn’t leave this property up to the viewer, though: firstly, the amount of data required would be colossal, and secondly, we would lose the most important part of photography: the creative interpretation of the photographer. Without this conscious exclusion of elements in the scene that they do not find interesting or relevant to their vision, then we might as well just be creating a VR tour. I’m also not sure that this much choice is a good thing for photographers: it means you are not forced to make a decision at the time of capture; later reflection on the scene may yield very different feelings or no feelings at all, and the transience of that moment becomes lost. There’s also the distinct possibility of paralysis by choice: it’s already bad enough most of the time with a zoom lens.
I suspect the reason we haven’t really seen any of this take off yet isn’t a technological one, or a cost-related one. It’s more to do with imagination: it already requires a lot of practice and training to effectively visualise how three dimensions reduces to two for the scene in front of you; it’s a step further to then apply a different perspective to the ones naturally afforded by your eyes; then compound this by attempting to imagine the scene from a different vantage point from the one you are physically at. And we still need to add post processing to that, which if done properly, isn’t a ‘try the filters at random’ approach but has repercussions on how you need to shoot to capture the exposure data you need later for a smooth tonal manipulation. All of that is already difficult, and we haven’t even left traditional single-capture photography yet. The concept of variable depth of field already didn’t take off – even with the post-focus correction ability that’s come with it, despite the latter being genuinely useful for fast situations where your camera might have missed focus slightly and that minor tweak can make the difference between a near miss and ‘nailed it’.
In essence, a transition to fully computational photography is one step away from asking people to become animated film directors – you control camera angle, light, field of view, depth of field etc. after capturing the scene data. It shifts the creative process from being in a place, seeing an idea and capturing it to capturing data and then trying to extract an idea afterwards; these are very different cognitive processes. On top of that, it complicates the process too much: we go even further down the road of being button pushers rather than being simply seers. I know even the best of us have a fixed latitude of concentration: there’s only so much awareness of the scene you can have if you’ve also got complex machinery to operate. In a way, that’s perhaps why some of the modern cameraphones really deserve recognition: not because of what they can do, but what we’re not aware of that they are doing in the background: I know most of the time the result looks as I’d expect at the output sizes I’d use something like that for, and perhaps a bit beyond (don’t look too closely, it’s ugly at the pixel level; perhaps almost deliberately so to trick the eyes into believing there’s more continuity of texture than was actually captured) – but I don’t know how it’s doing it for any given frame. You only notice it when the result isn’t as you expect, and even then, it happens less and less. Personally, I like this vision of computational photography because of the freedom it affords to focus on the picture, not the process. MT
__________________
Visit the Teaching Store to up your photographic game – including workshop videos, and the individual Email School of Photography. You can also support the site by purchasing from B&H and Amazon – thanks!
We are also on Facebook and there is a curated reader Flickr pool.
Images and content copyright Ming Thein | mingthein.com 2012 onwards unless otherwise stated. All rights reserved
Very well written as always, Ming. On the same time, it is getting harder to separate photorealistic computer generated imagery from actual photographs (you might have seen the computer generated faces at thispersondoesnotexist.com). I’m worried that digital/computational photography is heading towards a future where the pixels of a digital photography have as little relation to reality as the pixels of a computer game. When that happens, photography is lost. What do you think?
Thanks. Yes, those faces are a bit surreal – but who’s to say they don’t exist vs haven’t been seen yet?
Photography has always been about interpretation of the visual – what that means is changing; whether we ant to have a new term for it or not is something else, I guess.
Two years ago I accidentally discovered that the iPhone 7 was “painting over” details in the not-so-dark shadows that were visible with a much older 5mp real camera and very clear with a D810 of course. The iPhone (it its default settings) completely erased a thick written engraving and changed it to.. context mud that had nothing to do with reality. I posted the pics and made a video about them. The iPhone lied. A camera is not supposed to lie. This year I also saw clearly that “Gigapixel AI” was… inventing its own roofs on buildings (!), completely different than what was there in real life, and was changing window shadow with stucco and painting over windows (!!). My experience so far is that “computational photography” is a series of Photoshop-like presets and combos that alter reality far beyond what even aggressive upscaling would do, and God help us if “computational photography” photos were ever admitted in Court as evidence. Photography is simple, and if you have good taste and trained eye and good influences, photos will turn out nice. The rest is “500 different recipes for broccoli”.
I think it may even be 500 ways to make broccoli from cauliflower, parsley etc…
I could not possibly be more disinterested in the direction modern “photography” is moving towards. Long before even this topic appeared I burnt out on the endless carrot that digital offered. Now I find loading up my Nikon FM2n with a fresh roll of HP5+ offers an infinite amount of joy, passion, and contentment than whatever digital has today and most likely tomorrow.
But everybody knows your photos will be rubbish without pet eye tracking, beauty retouch filters and handheld pixel shift superresolution!
Jokes aside, the only serious improvement to photographic UI that increases one’s creative capacity (by taking away unnecessary distractions of the ‘mechanical’ process) is probably the original iPhone – newer versions have too many gimmicks. Notably, they left in the two most important controls: focus and exposure compensation. Has this been implemented in any other camera with a ‘serious’ sensor? No; perhaps due to patent restrictions, perhaps because the marketeers have spent so long convincing consumers that if your spec sheet and manuals aren’t the length of War and Peace, the camera is worthless.
As a former 3D artist and a photographer, this subject is certain intriguing to me, but I’m still not convinced of the current state of computational photography.
The 3D world is a very different one from the photography world, and much of the digital content creation world in general. I think that many photographers are interested in computational photography as far as it is done behind the scenes. I don’t see many photographers getting to the point where they map out a scene with a light field and create renders of each scene to improve IQ and composition. Photographers are only interested in the tip of the iceberg as it pertains to what they do.
Sometimes I wonder if there is a limit to this “format independence” that we hear so much about. Sure, we have certain aspects of larger format that are MIMICKED with computational photography, but to well trained eye or under close inspection it still is not even close to perfect (ie. Fine detail at 100%, the finer points of foreground and background blur, etc) it does not take into account things like lens projection, edge, distortion, etc. These are some of the subtle hallmarks of larger formats that I wonder if computational photography will ever be able to mimic.
At least where the technology stands, I see it being only a quick substitute for a larger format in a few areas. That being said, its a hit. I’m sure there are many photographers, especially the ones on the fence about dedicated tech that see this is the future. I will be sticking with my larger sensors… for now.
I’m with you on this one. It only has to be ‘good enough’ as far as standing up to scrutiny at typical output sizes and media; and this is almost always hugely down sampled (e.g. capture at 50MP is nothing uncommon, but 4K monitors barely display 8MP – let’s not even go into 20MP phones vs 1MP social media.) A full depth map is simply too much effort for the average consumer who mostly doesn’t even know or use exposure composition! But I wouldn’t be surprised if we eventually have devices that simplify this process to the point that it’s mostly automated other than initial aiming…
I have just placed my zone system books back in their place on the self. What I learned is that the more complete the understanding of the workings of the tools at hand has always allowed for more control over the unavoidable translation of input to output. The knobs are different but the onus placed on the artist remains the critical element.
As has always been the case… 🙂
Nice article Ming. The obvious question is when will high end smart phone camera replace the need for a dedicated camera for the amateur photographer?
For the majority of the public, the latest image output of a smart phone is enough for all their needs(social media, memory, occasional small print).
To get a similar result to a smartphone, I’d need a decent size camera( 1 inch sensors don’t really cut it anymore) , transfer the image and post process for a few seconds for every image.
The absolute pixel level result is better from a dedicated camera of course. The dynamic range is better from a decent size sensor for now. The question is, how long can the middle range cameras be better than the phones for most situations?
I think the answer is still ‘not yet’. As a recording or impromptu device, we’ve already been at that point for some time; as a full-blown creative device, it’s worth remembering that everything that can be done with a smartphone image can also be done with a larger sensor one. The difference comes in implementation of the computing part. My guess is the changeover point is going to be when input overtakes output: i.e. we can’t tell the difference at the usual output sizes and formats we view the results on. We can see this has already happened for consumers: you can’t tell the ‘fake bokeh is fake at IG sizes…
Was it a sad day when I realized that my iPhone is a better photographer than I am? Or just another day, one to be noted and seen as an inflection point for a course change?
I’d see it as a sort of liberation – the less mental capacity has to be devoted to the ‘mechanical’ processes, the more is left over for the creative ones. Yes, we see a huge quantity of technically okay but compositionally disasterous garbage made and shared thanks to social media, but there are also some gems in there I suspect might never have happened had their creators been put off trying at all by the intimidation of learning the process…
Heady territory – and being a moron, I’m going to study this in depth later – can’t do it on the run.
I imagine ANY form of post processing would be “computational” photography?
But weren’t we doing exactly that, before, with things like tilting the table beneath the lens of the enlarger, to straighten verticals? Dodging & burning? Retouching, to get rid of those awkward spots caused by tiny air bubbles on either the negative or the print?
And although it won’t apparently happen in my life time, does this change with the “super sensor” already invented by the guy who invented the sensors in use today, which promises to deliver a single pixel for each individual photon?
Pretty much – though mechanical or physical image manipulation probably falls into the category of post processing rather than manipulation of data to create something otherwise physically impossible (which I think underpins the core of what computational photography actually is).
As for the super sensor – sure, we’d have an incredible amount of information, but what would be the point of going so far beyond the ability to display it? Would having so much information that infinite crops are possible eventually reduce most things to just being snippets of Google (super) street view – or some satellite image?