(This article was originally published at Data Mining: Text Mining, Visualization and Social Media, and syndicated at StatsBlogs.)
The web search community, in recent months and years, has heard quite a bit about the 'knowledge graph'. The basic concept is reasonably straightforward - instead of a graph of pages, we propose a graph of knowledge where the nodes are atoms of information of some form and the links are relationships between those statements. The knowledge graph concept has become established enough for it to be used as a point of comparison between Bing and Google.
Last night, I went to see a performance of Kodo - regarded internationally as the premier taiko group. A search on Bing for 'kodo' produced the following result:
Bing showed good results for the web and images as well as a knowledge driven portion of the answer from wikipedia with links to play some of their songs. Not bad - but no mention of the performance.
As Kodo were performing at Meany Hall on the University of Washington campus, I did another search on Bing for the venue:
Here we see something better - the venue is recognized as a venue and consequently joined with the events that are known to Bing, including the concert I was attending. As the event information included a link to the performer (the blue Kodo link in the screen shot) I followed through and found Bing gave me event information.
In these interactions, we can see part of the promise of the knowledge graph, but many areas for improvements. The event node relates the performer to the venue to the event. However the venue information in this part of the graph is isolated from that used to deliver the result for the query purely about the venue (note that the addresses are different - a common problem with campus and mall-like areas). The above experience, I think, shows the true challenge of the knowledge graph proposition - bringing all the isolated data graphs together correctly when the nodes in the graphs are actually representations of the same real world entities.
Note that in exploring this particular scenario, Bing appeared to be doing a little better than Google, though Google had partial event information associated with the Kodo entity.
Much of what we see out there in the form of knowledge returned for searches is really isolated pockets of related information (the date and place of brith of a person, for example). The really interesting things start happening when the graphs of information become unified across type, allowing - as suggested by this example - the user to traverse from a performer to a venue to all the performers at that venue, etc. Perhaps 'knowledge engineer' will become a popular resume-buzz word in the near future as 'data scientest' has become recently.
Please comment on the article here: Data Mining: Text Mining, Visualization and Social Media