Podcache RSS Namespace

Author: Garth T Kidd
Organisation:iPodder "Lemon Edition" Team
Version: 1.3
Date: 2004/11/24

This specification describes a simple standard by which friendly people can host a cache of podcasted content and re-feed it out either as direct downloads or (more sensibly) via BitTorrent.

Rather than having one output file per feed, this standard assumes a single file containing cache entries for multiple feeds. Podcatching software can simply download the cache feed first, look up wanted enclosures in the cache feed, and download the content from either the cache or directly if the cache is out of date or doesn't seem to be working.

This standard assumes and describes a balance of responsibility between the cache administrators, users, podcatcher developers, and podcasters. Everyone has a role to play, and the standard describes mechanisms by which more responsible parties can influence less responsible parties to behave. We feel this is a much more pragmatic approach than appointing some podcache standard nanny to yell at people who abuse the capability.

This standard is designed to ensure that:

The tailored summaries give more information for each of these key audiences.

If you have any questions about this specification, hassle Garth either privately or in public in the ipodder-dev mailing list.

Table of Contents

Change Notes

Tailored Summaries

To save you the effort of running the protocol in your head to figure out what it means, this document provides tailored summaries for:

If you don't fall into one of these categories, hassle Garth and he'll explain it from your choice of frame of reference.

... for Podcasters

One of the primary design goals of this standard was simplicity for podcasters:

If you do nothing at all, you'll get almost all of the benefits with none of the hassle.

If the user has configured a podcache that is caching your podcast, you shouldn't notice anything except a substantial savings on your bandwidth bill. A well behaved podcache-enabled podcatcher will still hit your RSS feed so you know they're listening, will ignore the podcache if it has fallen out of date, and will download directly from you if the podcache happens to be broken.

There are some new RSS elements you can add to your feed to help prevent mistakes in caching your content or turn caching off altogether. For more details, see Elements for Podcast Feeds. You don't need to add these elements, though: they just make it easier to catch caching bugs, and those bugs will be detected pretty quickly if just a few popular feeds generate the elements correctly.

If you'd rather concentrate on getting your content right than fuss about how the technology works under the hood, and you're using some piece of software you didn't write to generate your RSS, you can leave all the detail to your software developer. If you hand-roll your RSS and can't be bothered learning about XML namespaces, you can do absolutely nothing and watch your bandwidth bill drop without lifting a finger.

... for Podcatcher developers

On behalf of the podcasters whose feeds you'd like to cache as a favour to their bandwidth bills, we'd like to ask your assistance in doing everything you can to cache responsibly, and stop caching when you can't cache responsibly.

In short, you need to make sure the cache gets it right and that you bypass the cache if the cache looks like it's out of date or wrong. If you don't do that, any bug in the caching software could cause chaos and both users and podcasters will lose trust in the idea of caching.

Don't let the sense of urgency make it seem difficult, though. All you need to do is look for the Elements for Podcast Feeds when you grab a feed's RSS and compare their contents against the Elements for Podcache Feeds contained in the cache feed's RSS, and against the length and MD5 digest of the enclosure you get via the podcache. It's more work than the podcasters have to do, but significantly less than the podcache providers have to do.

It's also your responsibility to help the podcache providers stay afloat by correctly handling the podcache:cacheannouncement attribute and inserting the announcements in the playlists full of enclosures downloaded from or via the podcache. Don't worry: users aren't being forced to do anything, here. If they don't want to hear announcements, they can stop using the podcache. It's up to you to enforce that arrangement.

... for Podcache providers

Oh, boy, do you have a lot of hard work to do. You can't escape reading this entire document and comprehending the detail, I'm afraid. Sorry. If that strikes you as unfair, consider this: running a podcache saves podcasters money but costs you money. Due to efficiencies of scale, it'll probably cose you less than it costs them, but it's still your money being spent. If you think writing the code is hard, wait until you try to figure out a way to recover your costs without pissing everyone off. Ouch.

Elements for Podcache Feeds

The difference between a normal feed and a podcache feed is special podcache: item attributes and enclosure attributes to let podcatchers know important information about what it is you're caching so they can

  1. use your cache, and
  2. use your cache responsibly.

If you don't specify podcache:originalurl, podcatchers won't even know to grab something from your cache, so we expect you'll be eager to insert that one. If you don't specify the others, podcatchers should really stop using your feed because it makes it too difficult for them to ensure they're not grabbing out-of-date items.

Namespace Definition

First, you should make XML parsers happy (and let everyone know how to find this document) by adding the podcache namespace definition to your rss tag:

<rss xmlns:podcache="http://ipodder.sourceforge.net/docs/podcache.html">
   ...
</rss>

iPodder and anything else using a permissive feed parser won't mind if you skip that, but I suspect other podcatchers using strict XML parsers won't use your cache and will either go elsewhere or download the content directly from the podcaster's site.

Item Attributes

There's only one new item attribute to generate:

podcache:feedurl

For each item you're caching, you must add podcache:feedurl to let compatible podcatchers know which feed the item came out of. If you're polling the recent 100 feed, you can get this from the source tag.

<item podcache:feedurl="http://originalfeed.com/episode.mp3">
   ...
</item>

Enclosure Attributes

There are a handful of enclosure attributes to generate:

The latter three should be used by podcatchers to verify that the enclosure hasn't been updated since you fetched it.

Not at all compulsory, but interesting to know about, is a means to podcast short announcements to people using your cache feed:

podcache:originalurl

For each enclosure, you must add podcache:originalurl. This is the attribute by which podcatchers will identify that you have a cached version of an enclosure they want. You must not normalize the URL: it must be exactly what was in the original url attribute, or podcatchers won't match it.

<enclosure podcache:originalurl="http://.../" />

podcache:length

As some podcasters don't put the right length in their feed, and the length in your cached item will be that of the BitTorrent response file you generated, you should add podcache:length to let podcatchers know how big the file they'll get will actually be. They'll also figure that out once they grab the .torrent file, but I'm sure they'll make good use of the information.

<enclosure podcache:length="193732" />

podcache:contentlength

So that podcatchers can verify that what you're seeding matches the enclosure in the feed you're caching, you should add podcache:contentlength to let them know the content of the Content-Length: HTTP header you received when you fetched the content. If there was no such header, include the attribute but leave it empty.

<enclosure podcache:contentlength="" />

podcache:etag

So that podcatchers can verify that what you're seeding matches the enclosure in the feed you're caching, you should add podcache:etag to let them know the content of the ETag: HTTP header you received when you fetched the content. If there was no such header, include the attribute but leave it empty.

<enclosure podcache:etag="b15e400c8dbd6697f26385216d32a40f" />

podcache:lastmodified

So that podcatchers can verify that what you're seeding matches the enclosure in the feed you're caching, you should add podcache:lastmodified to let them know the content of the Last-Modified: HTTP header you received when you fetched the content. If there was no such header, include the attribute but leave it empty.

<enclosure podcache:lastmodified="Wed, 15 Nov 1995 04:58:08 GMT" />

Podcache Announcements

To make announcements to your cache feed's users, include a normal item with an enclosure but without any of the podcache attributes except for podcache:cacheannouncement. Podcatchers must insert your announcement into a playlist where the user will hear it.

Podcatchers must not provide a way to use a cache feed without downloading announcements and inserting them into playlists. Podcasters can talk to their users; podcache providers should also be given an opportunity to do so. Given the expense of hosting a cache, podcache providers might even need to advertise. If the podcatchers don't help, the caches and then the podcasters will be crushed under the weight of their bandwidth bill.

That said, there should be balance: podcatchers should make it easy for users to stop using your feed if they get sick of your announcements. We'd insist, but we don't need to: , basic market selection will take care of it for us.

podcache:cacheannouncement

The podcache:announcement attribute should simply be set to true:

<item podcache:cacheannouncement="true">
   ...
   <enclosure ... />
</item>

Elements for Podcast Feeds

One of the primary design goals of the podcache was simplicity for podcasters:

If you do nothing at all, you'll get most of the benefits with none of the hassle.

If the user has configured a podcache that is caching your podcast, you shouldn't notice anything except a substantial savings on your bandwidth bill. A well behaved podcache-enabled podcatcher will still hit your RSS feed so you know they're listening, will try its best to ignore the podcache if it has fallen out of date, and will download directly from you if the podcache happens to be broken.

If you want to make your feed podcache-aware, though, there are some elements you might find interesting:

Namespace Definition

First, you should make XML parsers happy (and let everyone know how to find this document) by adding the podcache namespace definition to your rss tag:

<rss version="2.0" xmlns:podcache="http://ipodder.sf.net/docs/podcache/">
   ...
</rss>

This is more urgent for podcasters than it is for podcachers. If someone can't use a podcache because the feed isn't strict XML, that saves the podcacher bandwidth. If someone can't download your feed because it isn't strict XML, they can't listen to you. Oops.

Channel Elements

The podcache:forbid and podcache:preferredcache elements on the channel control the behaviour of podcatchers.

Warning

We don't know whether to put these in as tags or elements, hence the lack of any example XML. That in turn makes it pretty hard to implement either a compliant feed writer or software to read it. Sorry about that.

podcache:forbid

If you specify podcache:forbid for your channel or any item, iPodder and any other well behaved podcatcher will ignore any cached entries for your feed. Well behaved podcaches will stop caching your feed, though they might keep polling once a day to see if you've taken forbid off.

Podcatchers must obey podcache:forbid once we figure out where to put it.

podcache:preferredcache

You can specify a preferred podcache with podcache:preferredcache. This might be useful if you're in serious trouble with your bandwidth bill: just configure your feed so that only one cache can download your enclosures, and set it as your preferred cache.

Podcatchers may obey podcache:preferredcache; there might be network topology or security reasons why they can't access your preferred cache and might want to instead use some other cache (which might be caching your preferred cache).

Enclosure Elements

The podcache:expectmd5 and podcache:expectlength attributes on enclosure tags let caches make sure they got your enclosure accurately, and let podcatchers know the enclosure they got from the cache is exactly the same as the enclosure they would have fetched from you directly.

podcache:expectmd5

So that corruption of your enclosure can be detected even if the length is correct, put a hexified MD5 digest in the podcache:expectmd5 attribute. You'll know if your implementation is correct if your hexified digest for the word "fnord" is the same as that given in the example below:

<enclosure podcache:expectmd5="b15e400c8dbd6697f26385216d32a40f" />

For what it's worth, the Python code to compute the digest is:

import md5

def hexified_digest(blocks): 
    """Return a hexified digest. 

    blocks -- a sequence of blocks of data."""
    engine = md5.new()
    for block in blocks: 
        engine.update(block)
    return engine.digest().encode('hex')

Podcatchers should check downloaded enclosures against podcache:expectmd5 and tell users of any mismatches.

podcache:expectlength

Podcaches and podcatchers can't rely on your length attribute being correct because too many feeds either leave it out entirely or put the same (incorrect) length on every enclosure. To let them know that you're serious, put another copy in the podcache:expectlength attribute:

<enclosure podcache:expectlength="1875233" length="1875233" />

Podcatchers should check downloaded enclosures against podcache:expectlength and tell users of any mismatches.