Still, more mundane search parameters yield similar match-quality
results as microstock photo sites. And users aren't flocking to Getty
with large quantities of marketing-laden words for which to find matching
images. This means that, for the time being, we are still working in
an economic model where all competitors are roughly equal. This is why
major stock sites are having a hard time competing with microstocks.
But the playing field won't remain level for long. In an economic cycle for an industry that's reached this level of maturation, we are currently seeing consolidation. Once that cycle nears its end, the remaining competitors etch out their niches in the market and define their various competitive advantages. Which brings us back again to image search results. Preparing for that future should be at the forefront of the minds of both photo companies and photographers alike.
Yet, the problem isn't as easy to solve as one might think. One reason why, as most of this article discusses, is because optimizing search results requires more than just images with descriptive keywords. It requires a more intelligent approach that involves both the keyworder and the searching mechanism itself. The operative word here is "intelligent," and trying to get two large, independent groups of people (companies and photographers) to cooperate on such a thing requires an entity to lead the way. And that's the missing piece.
Barring who that leader is, let's at least wrap our head around what the problem is and a direction for solutions.
Keywords As we all know, keywords are used to describe the elements of an image. It's part of a larger abstraction called metadata. Or, as the techies call it, "data about data." In the photo world, metadata includes information like location, camera settings, and the grand daddy of them all, keywords. Using the following photo as an example, let's look at how we might keyword it.
It's a picture of an old man (who happened to be a camel shepherd in Turkey) standing next to a woman. When assigning keywords to this photo, you may start with "woman, old, man, camel herder, turkey, sunglasses, hat, beard." These keywords would describe the general contents of the image and where it was taken. So, how does it do in an image search? If a travel company's photo editor is searching for tourists interacting with native countrymen for use in an ad, these keywords won't satisfy that search, even though the photo itself may fit the bill. Realizing this, you start adding more words to the mix: "tourist" makes sense. What about "native?" Well, sure, since that's really what the picture is about. You rationalize that your target buyer may be just such a travel company, so you add "native."
When the photo is actually online, you discover that a search came in for "native turkeys." Is this really a good candidate picture for that search request?
Turns out, keywording images is a surprisingly complicated problem. In fact, it's far more complicated than indexing traditional text-based web pages and documents because those are based on natural language, which has rules for grammar and syntax that can be used to derive meaning. Photos have nothing but a flat and tiny list of equally-weighted "words" that have no contextual relationship to one another. The best the photographer can do is describe the photo as thoroughly as possible. For example, there's no way to know that the man in the photo is a camel shepherd unless the image had that information embedded into its metadata.
What does this lead you to conclude about keywording? Most people conclude that they should enter as much information as possible. In fact, the industry is driving fast and furiously in this direction. But, as I'll get into shortly, there are downsides to this approach that will be exacerbated once the economics of how the photo industry changes as sites begin to compete in the area of search. The problem with this keywording approach is that fails to address two key, pivotal problems: false positives, and "contextualization." (Now there's a winning scrabble word.) If one performs a search using the phrase, "old man," the sample photo above will come up in search results, but it will also come up for a search using the phrase "old woman," because it contains both those keywords too. This is a false positive because even though it positively met the search criteria, it isn't what the user was looking for. And under the current design of metadata or search technology, there's no way to assign a keyword's context.
Since the current trend is to populate keyword lists with as many possible words that describe an image (so they'll come up in search results), the likelihood of false positives rises higher as more and more keywords are added. So, as people further embrace this approach, the worse the problem of false-positives becomes. It also contributes to a related problem, that of "near positives." This is the "second-hand smoke" of the keywording problem, and it happens when you come up with synonyms for your keywords in an attempt to cast an even wider net to cover all potentially related search patterns.
For example, someone looking for a photo of "architectural ruins" might use the search term "ancient" as a colloquial part of American vernacular. So, the keyworder might add "ancient" along with "old" to that image's keyword list. And this inspires the keyworder to start thinking of other synonyms: antiquated, antique, archaic..." These might all be possible terms used by searchers, making them "near positives" in search results. This is even worse than the original problem simply because so many more potentially irrelevant keywords are being added to an image, causing an even greater likelihood of false positives.
Because people envision only the successful search results from this keywording method, and do not pay a penalty for the undesirable side effects I outlined, the photographer community is lurching forward in full force behind this strategy. In other words, there is no perceived downside to your photos being found in a search. In fact, if they don't come up, you lose.
What's making the problem even worse still, is the search for automated methods for keywording. For example, the alert reader will notice that each of the words I used in my above synonym list are in alphabetical order, which can lead you to surmise that I went to a thesaurus to find them. That's right, and that's what many people do when keywording images. And once the keyworder notices that he's got thousands of images ahead of him in line, that's when the palm-slap-in-the-forehead event happens, and he thinks to himself: "Ok, hold on. This is going to take forever. There must be an easier way." Indeed there is. (Note: I said "easier," not "better.")
One such solution is a body of work called "the controlled vocabulary." One website on the subject can be found here. In addition to the white-paper-like description of the problem of keywording, it offers a product that is somewhat self-described as a "keyword thesaurus," where it uses a large (11,000 word) database of words that can be used to streamline the keywording process. It not only automates some of the process, but it helps avoid the common human errors of keywording, such as word inconsistencies and typographical (spelling) mistakes. The theory being, if you use a single source for your vocabulary, and if it can automate your keywording process, the more likely it is that this image will come up in searches.
The logical steps that lead to the need for such a tool is understandable. And the group-think has caught on, which is what got the industry in the state that's in today. As the lemmings follow each other over this cliff, more and more companies enter into the fray and offer products that do similar things are various degrees of completeness. And they all echo in similar ways the same principle, "use the most extensive list of words that describe your image in its keywords list."
There are also keywording services that will do this for you. A simple search on google finds many companies charging between $1/image to $4/image, depending on how much you want done. AtoZ is a service that, for the premium service, will attache over 150 keywords to each image, including word variations, singular and plurals, verb tenses, adjectives, adverbs, slang terms, and even different spellings for hyphenated words.
Photo District News, one of the most prominent industry magazines, has keywording contests each year. A fun-but-gimmicky event that is gaining a lot of copycats in other photo-related companies trying to attract members or customers. The swell of interest in keywording is growing at an exceedingly fast pace, which can be measured simply by searching for sites that offer the service or software packages.
All this leads to a phenomenon I call "keyword pollution." This is a term I use to describe an image whose keywords are such that the number of false positives causes the searcher to get exasperated and quit. There's no defined line when this exasperation sets in, but it can be measured by looking at the conversion rate between image searching and buying. When site traffic statistics show lots of searches, but few buyers, this is a good sign that keywords aren't assigned well.
Keyword pollution isn't measured only by its quantity of keywords, but its quality. The other side of the coin is that the fewer keywords it has, the image may not come up in search results that it should. But the real overriding factor is that overly-keywording images is driven by an economic incentive for the photographer.
Stock sites, on the other hand, do lose, because the more unreliable their search results are, the less likely buyers will use them. Once they figure this out, they'll want to address this short-coming (and hence, compete more effectively in the marketplace). That's when substantial changes will occur in the perception of how keywording is handled.
This might also explain why Flickr is waiting to enter into the licensing business: their search results for images are pretty weak. From purely a technical perspective, the searching is markedly unintelligent. A search for "mountain range" has a different set of results than a search for "mountain ranges" (the only difference being the plural form of "range" in the second case). This shouldn't happenthe searcher should get exactly the same results in both searches, but Flickr's search mechanism is simplistic word-for-word matching, which is the sort of thing that encourages bad keywording behavior by users: images that match for both searches are those that double-keyworded "range" (to use both the singular and plural forms). If Flickr is going to enter this market, they'll do well so long as the image search landscape doesn't change. If it does, they'll have to fix this.
A buyer will pay a higher price if he can find the right image in 10 minutes, than having to spend hours upon hours at microstock sites, even though the price may only be $1/image. Changing the guidelines for how photographers keyword their images will change the economic incentive for everyone, and we'll begin to see a shift in the keywording process in the opposite direction from where it is now.
How does one avoid keyword pollution, yet still have a thorough list of keywords? Funny you should ask.
A Two-Tiered Approach Addressing the problem of search
quality requires a two-tier approach, involving both the photographer (or
initial keywording process), and with the search mechanism itself. Simply
put, all the automated and non-creative work that the
photographer is doing to populate his image's keyword list needs to be
moved over to the search engine. That is, rather than stuffing
synonyms, conjugations, word variations, plurals, and so on, into a
photo's keyword list, images' keywords need to remain as minimal and
simple as possible. I call this the "core keywords." Let the search engine
apply all those synonyms and other automated processes in real-time as
searches are being done on those core keywords. Shifting of the work from
one side to the other has extremely broad implications that can not only
optimize search results, but put even more control in the hands of the
searcher, who, after all, is the one that needs to be satisfied.
Having this kind of intelligence on the search side of the communication offers flexibility that cannot be achieved if keywords were hard-coded into images. Not only can that flexibility be tuned by the site administrators, but it can also be personalized by each user to adjust to his tolerance for false positives. Allowing the user to control the use of plurals, synonyms and antonyms, conjugations and even language translation (or localization within a language) is where the bulk of the progress can be realized without a huge investment.
This not only minimizes keyword pollution, but it has many other benefits. One of the first is that it isn't as critical to use a "controlled vocabulary." Note that the searcher isn't going to know about a controlled vocabulary, so the search mechanism is going to have to deal with input patterns anyway. Just as the search engine will translate an image's keywords on the fly, it will also translate the user's input in the same way. Using these two sets of potential matching terms is how one builds the results list.
If an image contains "old" and the user types "ancient", a potential match may occur, but it could also be the other way around: the image has the keyword "ancient," but the user typed "old." In the old method where the user added each of these words to the keyword list, they would be "equal" in weighting, so the search results would have to arbitrarily choose which is a "higher" rating. In the new format, because the search engine itself is doing the keyword expansion into synonyms and the like, it can use that as a weighting for how it ranks the search results back to the user. If the user's "core" keyword list included "ancient" (because the photographer felt it was the most appropriate description of the architectural ruins) and no other synonym, then this would be a more likely intended match for the user who typed "ancient" as well. If the user searched for "old building," the photo of the architectural ruins wouldn't be as highly because a match with "old" wasn't in the core set of keywords.
Another advantage of having a minimal set of core keywords is translation and localization to other languages. Here, a translator only needs to do a few, resulting in faster turn-around and fewer mistakes.
Still another advantage to minimal keywording is the avoidance of having to correct for errors discovered later. When I was getting started in photography, I too was using extensive lists of words as others were. As conscientious as I was to strive for consistency, I found that I was still using different words on different days... "cityscape" one day, "skyline" another. I also added plurals and as much as I could to be thorough. Of course, this also introduces typos, which are frustratingly laborious to correct on a piecemeal basis. I had so much keyword pollution by the time I was putting my images into a practical search online, I actually found it better to purge all my images' keywords and start anew. Not coincidentally, this was also the time that I realized the value of having the smarts on the search side, not the image side.
Today, when I keyword my images, I apply anywhere from 3-10 words per image. Most are around 5 or 6. When I upload these images to my site, the user sees that images have anywhere from 10-35 keywords because I employ a very rudimentary intelligence into my page-generation program to present a slightly expanded list of words derived from the basic ones I manually input (including location information that may be embedded in the IPTC header of the image), filename, path elements, and other translation mechanisms that provide added hints.
When the user does a search, my own input interpreter parses the user's typed input, first by breaking it into tokens called "stems." These represent the base form of words. For example, "walk" is the base stem for "walks, walking, walked," and so on. Rather than add "all" those variations of the word into a given image's keyword list, I only need one, which will match the same "stem" as the user typed. Of course, I want to use the most accurate variation to describe the image. If it's a photo of people walking, I may use "walking", but if it's a photo of a "walk/don't walk" street light, I would use "walk." Stemming covers plurals, singulars, possessives, participles, conjugations, and other word derivatives. Using a combination of the permutation of embedded keywords in an image, and that of user input, search results are similar to those I would get if I'd keyworded my images with 150 words under a more traditional "exact match" search mechanism like that currently found in Flickr.
I also implemented a rudimentary a translation table to test the viability of using an external synonym list, or thesaurus, to further test the premise of extensibility of working with a minimal, rudimentary keyword dataset. In this design, I add or translate words based on entries in the table. Because it's all real-time in the search engine (not in the images as keywords), I can experiment with different meanings for words before establishing "permanent" keyword settings, all without ever having to rekeyword images directly.
I even experimented with a simplistic English-French dictionary and, lo and behold, all my keywords came up in French. (One can see having a different entry point or search parameter to use different languages, all based on the same small set of core keywords in an image.) (I removed this because it was hardly complete, and I had no time or inclination to target the French-language image buying market.)
An interesting test to see how effective this entire model worked was when I searched for the keyword "photographer," and results came up for images in my Czech Republic collection, which I had inadvertently neglected to ever keyword at all! (I was busy that year, and it slipped through the cracks.) But the keywords that my search engine interpolated for this series still yielded a good basis for matching many rudimentary search patterns. (In this case, it took the keyword from filenames.)
It turns out, I found another unexpected benefit to this design: implicit keyword hierarchies. For example, if an image has any of "man, woman, child" etc., then the keyword "people" is added. I can define hierarchies conceptually at multiple levels, without having to hard-code those words into the image itself.
Once again, the important rationale for this is to avoid having a large, complicated set of potentially inaccurate keywords hard-coded into thousands of images that can result in large sets of false positives. As long as the base-level core keywords are accurate and minimal, even the most minimally "intelligent" search algorithm that I wrote performs much better than an overly-keyworded image set.
Now, before you go off thinking that I'm building the next Google, I make no claims to be even an amateur in the field of search technology. Far from it. My site's implementation of what I've described is the low rung on the ladder. Yet, it is exceedingly effective, and computationally acceptable. And I've done it using only several hundred lines of perl code. So, it doesn't load down my system at all. My single-CPU Linux box supports 20-30K visitors a day, many of them doing searches, all of them doing real-time keyword translations, expansions, stemming, and matching. And for each incremental step in my home-made implementation, I can measure its effectiveness by seeing a proportional incremental step up in traffic, retained visitors, pageviews, and image sales.
The concept I'm proving is also not a technical oneit's an application of a very simple technology to solve an entirely business-oriented problem. And, as the saying goes, if I can do it, so can you. Or rather, so will the larger photo-sharing and stock agency sites. And as they do, the trickle-down effect will touch all those who are tasked with keywording images.
So, the counter-intuitive lesson is that images should have the fewest number of keywords possible, so long as they are sufficient to accurately describe the image. But there's a close cousin to having sufficient keywords. That's knowing how to choose intelligent ones. This is not an easy task to define, let alone do. Knowing to use "ancient" over "old" is one thing, but appealing to a buyer's true intentions requires knowing some more about the buyer.
Getty is probably the most aggressive in this area, having spent vast sums of money paying to have several hundred thousand images keyworded. But, where they placed their real emphasis is on the creative part of keywording. A photo of a blue sky with clouds (a picture that just about anyone with a camera can take...and has) may also have the keyword "future." Simple and conceptual. It's this kind of creative thinking spread across a very large body of images, each of which having had conscientious attention given to it, that buyers are really paying for. Granted, "future" is a simple example, but applying this kind of knowledge and infrastructure for each industry segment (such as advertisers, or car companies) is what brings in the big bucks. The sheer volume of consistently keyworded images for many different industries is far more important than just having an accurate description of an image.
So, the art of keywording is not just about having words that describe your imageit's the "conceptual vision" that only the human mind can generate that really brings value to one's indexed images.
Now, you might be thinking that you could be as creative as well, so you keyword your images with the same care and feeding that Getty did, and you submit them to a photo-sharing or a microstock site. Are you better off? Let's take a look at that search I described much earlier in this article, where I did a search for "future" on Flickr and... YIKES! what an awful array of bad photos! Sure, there are one or two "futuristic" themed photos submitted by creatives like yourself, but the vast majority are just home snapshots of people's babies with captions like "future president."
So, the fact that you're an intelligent and creative keyworder may be to your advantage, but it's like having the most expensive house in a really bad neighborhood. You're not going to sell your house for as much as it's worth. The good news is, though, that you bought into a neighborhood that's going to improve. And when the the conventional wisdom about better keywording practices are eventually imposed upon photographers, you'll not only be ready, but your images will already be at the top of the list.
In fact, we can test this today by looking at side-by-side comparisons of what stock agencies would return for a given request. An example is a9.com, which is like a metasearch engine, but instead of it doing the searches (or querying search results of other search engines), it queries sites that register with them as responsive to search requests. That is, you type an input parameter, and a9 will send that query to the registered sites, allow them to do the search on their own datasets, and then return its results. a9 presents all the responses from each site on a single page on a side-by-side table. To see this in action, go to a9.com and choose the "image" tab. You'll be presented with about five of stock image sites, including webshots.com, trove.net, iStockPhoto and smugmug. Using the keyword "future" again, the results make it pretty clear which company takes keywording seriously.
We can also see how early we are in the evolution of this approach, and of the photo industry as a whole, but noticing that there are only five sites registered with a9.com. In five years, there are likely to be a very large set, each of which with much better results.
At this point, there's sufficient background for me to finally get to that winning scrabble word, "contextualization." In this case, the term refers to the relationship between keywords in order to derive a more precise meaning about a photo. For example, the photo of the woman and an old man at the top of this article identified the keywords, "old" and "man," but there is nothing associating the two together. (Hence, the false positive when searching for "old woman.")
The process of establishing the relationship between keywords is contextualization. No matter how intelligent the search side is, or how thoroughly the photographer keywords his images, there is no "standard" way to indicate a relationship between keywords. For this to even begin to work, the syntax of keywords themselves need to evolve in the open market. An example might be "old:man, young:woman". Here, it's clear that the keyword "old" applies to "man" not "woman". But this is not a standard keyword syntax, and search sites will never parse these as conjoined keywords.
There are a few proposals in the public realm at the moment, some of which are implemented in prototype applications. But, a google search on the topic doesn't yield much at all, which usually means that it doesn't have a lot of traction yet. (If it did, more sites would discuss it, which would show up in search results.) Such standards usually find their genesis somewhere in the bowels of XML (eXtensible Markup Language, the meta language above HTML), and a cursory search in that realm doesn't yield much there either.
I once heard of a photo keywording application that uses "." to bind keywords together in a hierarchical fashion, but I lost the reference and haven't been able to find it since. (I'm sure someone will send me email on this.) The "tree" of keywords might look like "people.men.old", to use our example. This method not only contextualizes the components with one another, but their hierarchical order is preserved as well. This creates an excellent foundation for searching for objects according to a class system.
As with the intelligent searching prototyping mechanism I wrote for my site, a new syntax for context-sensitive keywords only involves identifying a keyword-binding character. I've experimented with this on my own site using a small set of images prototyped with contextual keywording, and it's surprisingly simple enough to implement. (And if I can do it, surely, the staff of much smarter engineers at a stock photo site can roll out a production version in just a few short weeks.)
The Bottom(less) Line
Despite poor search results and false positives for image searches, people will still go to Flickr and other sites to buy photos because, with the rare and unusual exception of some sites, most photo sites don't really offer much better results among one another. Even Getty doesn't yield results that differ substantial for less creative search terms, like "bridge" or "man and cigar." So, Flickr's still in the game, along with the others.
But the big stock sites have the clear advantage here. Flickr and other microstock sites suffer from the same problem: the fact that all images are thrown into the same pool without any sort of hierarchy or organizational structure (or keywording oversight) is their achilles heel. And because their business model calls for $1/image, there's not much margin left to do value-added investment like this.
On the other hand...
Because of the "social" nature of photo-sharing sites, more and more users participate, and in so doing, build an intrinsic ranking of images according to popularity. When such sites mature to the point of doing serious photo licensing, their advantage over traditional stock sites is that image searches can take popularity into account. This is the quid-pro-quo to the keywording advantage that established photo agencies have.
Taking all the various issues into account, the photographer needs to think carefully about how he wants to sell his images before making decisions on his keywording methods, or which sites he wants to submit his images. With masses of other people behind an unintelligent search mechanism, which is still the norm for most photo sites, you will do better off using more keywords because you have no other objective than simply to be seen. So, for the short-term, if you're keywording photos to be used on a microstock site, or a site like flickr, more keywords will get your images seen by arbitrary searches. But consider this: there's currently not a lot of money to be had, even if your images are found. Flickr doesn't yet license images, and microstock sites that license images for $1 means that your potential income is pretty low.
Keywording for photo sites where there is little financial opportunity at the expense of longer term planning, is similar to following the old advice of using thumbnail-size images on websites 5-10 years ago, when most users used dial-up connections. Sure, you can serve up photos to people at home, but back in those days, they weren't buying photos. Image buyers were at larger companies who used high-speed broadband connections. While photo industry pundits were advocating web designs that discouraged higher-res images on sites because it took longer to download them (and you bore the risk of having images stolen), I was designing my site with larger images because larger images sell better than thumbnails, and I knew that the buyer was on a high-speed connection.
When planning your keywording strategy, the future of licensing images on microstock sites means that you need to consider what their search mechanisms will be like when they actually start making money at it, which will probably be after their search engine has been upgraded to be more intelligent.
So what industry forces will push this forward? The current state of the industry is that the larger stock agencies already implement this in their own proprietary way, so they have no incentive to participate in a push for standards that could be adopted by their competitors. A search for "bridge" on Getty's site generates a prompt, asking me to choose between three contexts for which the term may apply. Thus, Getty has done exactly the contextualization of keywords that others must eventually adopt. But, they aren't storing this information in the image's keyword listthey can't, or they open up their proprietary system to the open market. So, it's implemented entirely internally using databases.
Since neither Getty or Corbis or other major players have no incentive to hop on a standardization bandwagon, it'll take someone else to push this forward. As we have learned many times over from Microsoft, true development and innovation in an industry can be stifled if the industry leader is unwilling to participate.
Smaller stock agencies are too focused on current business matters to be thinking about next generation issues like this, despite it being in their own best-interests to do so. I'm sure they would certainly adopt standards if someone were to push forward a proposal, but unless that happens, the worst of all worlds may be realized: each stock site defines its own methods, and requires users to submit images according to their criteria. Why is this bad? Because by doing so, users will pragmatically have no way to submit images to multiple stock sites, locking photographers in by virtue of their having made such a significant investment in keywording images to those specifications. (This is such a huge investment, in fact, that re-keywording for another site would be impractical.) Who would benefit most from this scenario? Industry leaders such as Getty and Corbis. Yet more incentive for them not to participate.
Someone with clout needs to spearhead this effort and promote a syntax for keywording that can imply context-sensitivity. It could be a major search engine, or a major photo industry software application. I feel that Adobe is best positioned for this task because of their high profile and agnostic alliances with stock agencies and other photo businesses.
In the meantime, what people can best do for themselves is avoid keyword "pollution" and be sparse and judicious in choosing keywords properly. Borrowing from Occam's Razor, "the best keywords are usually the simplest." And to that, I must add Einstein's retort: "Make everything as simple as possible, but not simpler." The point is, keep it simple, but don't swing too far in that direction. Maintain pragmatism.
As for being "creative" (such as knowing when to use "future" or "tourist" as keywords), be conscientious about the perils of being wrong. If you pollute your images with too much creativity, you not only harm yourself in ways that will not reveal themselves for time to come, but backing out of years' of keyworded images later is a task I wouldn't wish on my worst enemies.