Google is developing a system that will enable web publishers of any size to automatically submit new content to Google for indexing within seconds of that content being published. Search industry analyst Danny Sullivan told us today that this could be “the next chapter” for Google.
Last Fall we were told by Google’s Brett Slatkin, lead developer on the PubSubHubbub (PuSH) real time syndication protocol, that he hoped Google would some day use PuSH for indexing the web instead of the crawling of links that has been the way search engines have indexed the web for years.
Google senior product manager Dylan Casey said yesterday at Sullivan’s Search Marketing Expo in Santa Clara, California that the company plans to soon publish a standard way for site owners to participate in a program much like that.
How the system might work
PuSH is a syndication system based on the ATOM format where a publisher tells the world about a Hub that it will notify every time new content is published. Subscribers then tell the Hub “when this Publisher posts new content, please deliver it to me right away.” So instead of the Subscriber checking back with the Publisher all the time to see if there’s new content, they just sit and wait to be told that there is by the Hub. The Publisher publishes something, then tells the Hub that it’s available, then the Hub goes and delivers it to all the Subscribers. This can take as little as a few seconds.
If Google can implement an Indexing by PuSH program, it would ask every website to implement the technology and declare which Hub they push to at the top of each document, just like they declare where the RSS feeds they publish can be found. Then Google would subscribe to those PuSH feeds to discover new content when it’s published.
PuSH wouldn’t likely replace crawling, in fact a crawl would be needed to discover PuSH feeds to subscribe to, but the real-time format would be used to augment Google’s existing index.
As Danny Sullivan told us today, Google would have to implement some sort of spam control and not just let content be pushed live to the index unvetted. That was what happened in the earliest days of search and it was a real mess, he told us.
[Thanks Haydndup]




