Building Real-Time Apps with PubSubHubbub (Live Blogging Google I/O 2010)
Posted by walter.roth on May 19, 2010 in Events, Live Blogging | 0 commentsBuilding Real-Time Apps with PubSubHubbub (e.g., hubbub)
Brett Slatkin (Google) on Google App Engine Team (one of the innovators of this)
notes: tinyurl.com/push-io2010
onbigfluke.com to follow him
Agenda:
Intro, publishing, subscribing, hubs, special guests, progress & adoptions, future work….
- What is is? Turns RSS and Atom feeds into real-time streams .. a single api for web scale, low latency messaging. Publisher, Subscriber and Hub.
- Design Goals: decentralized (no one in control), scale to the size of the whole web, publishing and subscribing as easy as possible, push any complexity towards the hug, Pragmatic (i.e., not theoretically perfect, but solve huge, known use cases with minimal effort)
- Demo (Brett’s test blog)
- Friend Feed, Google Reader, Blogger, Buzz, cliqset
- Post a feed in blogger, each service that is hubsub enabled … each ones updates!!!! Very quick.
- Real Time Web technologies
- Many vendors are publishers and subscribers
- Why another protocol?
- Almost every company already has their own internal system: TIBCO, webshpheremq, ActiveMQ, etc. Proprietary message payloads, topics, networks
- Existing attempts at a standard haven’t caught on. Xmpp started in 1999, still isn’t used for interop widely beyound IM (may change with onesocialweb.org)….Overkill”: xep, WS, AMQP, RestMS, new REST-*
- There is a reason why many of these didn’t caught on …
- How to for publishers….its so easy you can do it from a bookmarklet…
- Publishers best practices
- Use URLS for server-side filtering
- Use URLS for authorization (can give urls that only work for them, google reader will only let them read the urls)
- How to for subscribers … showed code …. (can get only new items or changed items)
- The role of a Hub (accepts, receives, extracts, send Dos protections). Also logical component. Publishers may be their own hug. Combined hub/publisher has p2p speed up. Quality re: scalability and reliability.
- Julian from SuperFeedr
- Default hug (works with any feed)
- Hosted hubs
- PubSubHubbub + Benefits
- Default Hub
- started last year, doing feed polling on behalf of other people
- avoid polling (smart scheduling, protocol mapping: rsscloud, sup, xml-rpc ping …)
- push to subscribers (xmpp too)
- schema mapping (we do the hard work in mapping everything into the same format)
- Use Cases
- iPhone notification: urban airship, boxcar
- Feed reader: webwag, feedingo
- Desktop notificaiton: adobe Wave
- Semantic Search: guzzle.it, Twingly!
- Social Web: SixApart (social apps that want to aggregate social data from all the social services … we fetch all the data for them)
- Hosted Hubs
- Don’t reinvent the wheel
- Don’t run/maintain/debug the wheel
- Your hug, your data
- Analytics, callbacks and more
- References: tumblr, ping.fm, etc.
- Schema Mapping
- Tons of different formats: RSS X, Atom Y
- Tons of different namespaces: Digg vs. Mix vs. Yahoo Buzz. Same semantics
- Tons of invalid stuff (missing tags, data, unique id…)
- Location: Geo-RSS
- Social: ActivityStreams
- Extensions
- Digest notifications (heartbeat + Digest)
- Feed status (querying superfeedr)
- Subscription callback
- Virtual feeds (e.g., subscribe to a Craigslist feed, we’ll combined feeds from different sources)
- Infrastructure (70k feeds a second, lots of cacheing for diff’s)
- We are botnet!
- Independent XMPP workers with their own lifecycle
- Massive “ring” for scheduling
- Clustered cache for diff-ing
- A few numbers
- contnet pshed to 1.8M endpoints
- 20M+ of atom entries pushed daily
- around 50 hosted hubs
- 45 “dispatchers”
- 80 parsers
- 50 servers
- Growth … hocky stick .. about 3M by end of june
- @julien51 on twitter
Back to Brett
- Adoption
- Over 100M feeds are enabled
- Companies, Superfeedr, Google, SixApart, LiveJournal, Myspace,, Twitterfeed, netvibes, cliqset, gnip, postrank, etc.
- Google products, Buzz, Feedburner, blogger, reader shared items, google alerts, fastflip, …
- Fun numbers from the reference Hub
- 200+ feed fetches per second (peak av)
- 250+ items delivered per second
- includes item updates
- 70 million active subscribers
- 1.2 billion items seen since July 2009
- (pubsubhubbub.googlecode.com)
- Publisher clients in many languages, including c# (active mailing list)
- More publishers, subscribers, hubs on the way …
- Future Work
- Arbitrary content types (JSON, HTML, XML)
- Microformats folks want HTML push-io2010 (more focus on human presentation)
- Google wants XML Sitemaps updates
- Plan on build on LRDD web linking
- Facebook uses ½ PuSH for their new API’s (ActivityStreams want this too)
In Progress
- Private Feeds
- Fully encrypted, authorized, authenticated
- Integration with Oauth, webfinger
- Apply business policies (to enforce business model)
- Per-item privacy control (e.g., need to enforce plausible deniability, how to do that .. if you don’t have access to the item, you don’t even know that it exist .. each subscriber will have a unique url that the subscriber subscribes to, and the hub will send the items, and it understands who you are, and only sends things you have permissions to … so it puts the difficult parts in the HUB which is a part of the design principles, etc. Working on scale now and the actual design, etc. Hope is to have fully authetnicated feeds, getting fully authenticated and private in a decentralized center….join the mailing list if you are interested in that …)
Good reads:
Facebook real-time AP
developers.facebook.com/docs/api/realtime
