In 2011 when the actor, comedian, and tech writer Stephen Fry was asked to compile his list of the greatest gadgets of all time for a television show, he raised more than a few eyebrows by including an apple peeler on the list. How could such a trivial thing stand alongside the huge strides in wireless and portable devices of the last few decades? Well, it's only in seeing the peeler in action that you begin to understand why; just by inserting an apple and turning a handle, the device cores, peels and slices the apple in a wonderfully satisfying motion. Rather than choosing to take on the single problem of designing an apple corer, a separate device for peeling, or even designing one device with a built-in corer, peeler, and slicer, the designer chose a solution that combined all three into one.
Versatility is a highly underrated attribute of new inventions; the drive to solve a single problem in a novel way tends to focus the mind of the designer a little too much, meaning that the only thing that gets solved is indeed that single problem. For this reason, users of Micro Focus IDOL, our enterprise search and data analytics platform, have good reason to be grateful to the early designers of the product; the designers could have created a probabilistic search engine for searching documents, much like others out there, but with better retrieval algorithms. Instead, they chose to create a versatile and extensible framework that could store far more than documents within its index. At the core of this versatility is what are known in the IDOL world as Agents.
Prior to the creation of IDOL, most web searches consisted of a small number of keywords. Picking those keywords became an art form: pick too many or choose them poorly and you'd get no results, pick too few and you'd get millions. In most cases, a user was able to describe what she was trying to find, but just couldn't accurately distill it down to those few keywords. Or just as likely, she had an article she enjoyed reading and wanted to find more like it, but again struggled to describe it in very few search terms. IDOL was created to allow a user to describe what she was after in as many words as desired, or to point at a separate web page or document and say "this is the kind of thing that I'm looking for."
So, IDOL needs to be able to process a piece of text, a document, or even a set of documents and use them to match other documents. To do this it first creates an Agent. In its simplest form, an Agent is a sort of signature encapsulating the ideas and topics in the text used to create it. It consists of anywhere from a few to perhaps even a hundred or more words or concepts that have been chosen by IDOL as representative of that text, each with statistical weightings chosen to indicate the importance of each concept in the matching process.
Once the Agent has been created, it is then able to rapidly find relevant matches from IDOL's index, which are then returned to the user. Seem simple? Yes, but this is just the beginning, as the real power of Agents is their versatility.
We've already described how Agents can be created from a piece of text of any length, or indeed from any document or set of documents. Perhaps these documents are in fact all documents created by a particular user or all documents viewed by that user. In that case, the Agent or Agents created will come to represent the interests of a user. Or perhaps a sports website wishes to categorize news stories by their sport, then we can train an Agent for each sport by finding a couple dozen news stories on that sport. In each of these cases, the ideas, topics and concepts are encapsulated in the Agent which can then be stored and used to find relevant matches on demand. But even now, this is only the first part of the power of Agents.
Traditional retrieval engines are designed to store and retrieve documents, be they web pages or files. IDOL of course does this, but a key part of its design was to abstract this storage process, which in turn allows Agents to be stored in its index. Suddenly the traditional retrieval process has been flipped on its head. Instead of storing documents and firing queries at the engine to find matching documents, if we can, in fact, store the queries themselves in the engine and fire a document at it to see which match. This may seem odd, but the functionality that results turns out to have a number of uses. For one, if our Agents stored in IDOL are our sports categories, then firing a document at them to see which sport(s) match is nothing other than the process of Categorization, which allows that document to be tagged or sorted according to which sport it is about. Or if the Agents represent the research interests of a company's employees, then whenever a new research article is published it can be fired against those Agents to see whether it is of interest to anyone in the company, and if so, it can made available to them!
But why limit this to just documents? Given Agents can also represent different criteria (i.e. keywords, natural language or Boolean expression), documents, or people, plus the fact that we can now both query WITH all of these and (given they can all be stored in IDOL's index) query AGAINST all of these, the simple Agent approach enables all the functionality in this table.
To stress, in all other retrieval engines in the market, only one of these entries is possible: standard document search. In the case of IDOL, the addition of new functionality of this sort is simple; versatility is its watchword. Now I think I'll go rustle up a quick apple tart for this evening.