====================== Podcache RSS Namespace ====================== :Author: Garth T Kidd :Organisation: iPodder "Lemon Edition" Team :Version: $Revision: 1.3 $ :Date: 2004/11/24 This specification describes a simple standard by which friendly people can host a cache of podcasted content and re-feed it out either as direct downloads or (more sensibly) via BitTorrent. Rather than having one output file per feed, this standard assumes a single file containing cache entries for multiple feeds. Podcatching software can simply download the cache feed first, look up wanted enclosures in the cache feed, and download the content from either the cache or directly if the cache is out of date or doesn't seem to be working. This standard assumes and describes a balance of responsibility between the cache administrators, users, podcatcher developers, and podcasters. Everyone has a role to play, and the standard describes mechanisms by which more responsible parties can influence less responsible parties to behave. We feel this is a much more pragmatic approach than appointing some podcache standard nanny to yell at people who abuse the capability. This standard is designed to ensure that: * Users get their podcasts; * Podcasters can still see the users, and know that users got the podcast accurately; and * Cache administrators have an opportunity to address to the users, too. The `tailored summaries`_ give more information for each of these key audiences. .. _hassle Garth: If you have any questions about this specification, hassle Garth__ either privately or in public in the `ipodder-dev`__ mailing list. __ mailto:garthk@gmail.com __ http://groups.yahoo.com/group/ipodder-dev/ .. contents:: Table of Contents Change Notes ============ * Nov 19: Renamed sections; reorganised content. * Nov 19: Tidied up `Elements for Podcast Feeds`_, adding `podcache:expectmd5`_ and `podcache:expectlength`_. * Nov 18: Initial version .. _tailored summaries: Tailored Summaries ================== To save you the effort of running the protocol in your head to figure out what it means, this document provides tailored summaries for: * `podcasters`_; * `podcatcher developers`_; and * `podcache providers`_. If you don't fall into one of these categories, `hassle Garth`_ and he'll explain it from your choice of frame of reference. .. _podcasters: ... for Podcasters ------------------ One of the primary design goals of this standard was simplicity for podcasters: If you do nothing at all, you'll get almost all of the benefits with none of the hassle. If the user has configured a podcache that is caching your podcast, you shouldn't notice anything except a substantial savings on your bandwidth bill. A well behaved podcache-enabled podcatcher will still hit your RSS feed so you know they're listening, will ignore the podcache if it has fallen out of date, and will download directly from you if the podcache happens to be broken. There are some new RSS elements you can add to your feed to help prevent mistakes in caching your content or turn caching off altogether. For more details, see `Elements for Podcast Feeds`_. You don't *need* to add these elements, though: they just make it easier to catch caching bugs, and those bugs will be detected pretty quickly if just a few popular feeds generate the elements correctly. If you'd rather concentrate on getting your content right than fuss about how the technology works under the hood, and you're using some piece of software you *didn't* write to generate your RSS, you can leave all the detail to your software developer. If you hand-roll your RSS and can't be bothered learning about XML namespaces, you can do absolutely nothing and watch your bandwidth bill drop without lifting a finger. .. _podcatcher developers: ... for Podcatcher developers ----------------------------- On behalf of the podcasters whose feeds you'd like to cache as a favour to their bandwidth bills, we'd like to ask your assistance in doing everything you can to cache responsibly, and stop caching when you can't cache responsibly. In short, you need to make sure the cache gets it right and that you bypass the cache if the cache looks like it's out of date or wrong. If you *don't* do that, any bug in the caching software could cause chaos and both users and podcasters will lose trust in the idea of caching. Don't let the sense of urgency make it seem difficult, though. All you need to do is look for the `Elements for Podcast Feeds`_ when you grab a feed's RSS and compare their contents against the `Elements for Podcache Feeds`_ contained in the cache feed's RSS, and against the length and MD5 digest of the enclosure you get via the podcache. It's more work than the podcasters_ have to do, but significantly less than the `podcache providers` have to do. It's also your responsibility to help the podcache providers stay afloat by correctly handling the `podcache:cacheannouncement`_ attribute and inserting the announcements in the playlists full of enclosures downloaded from or via the podcache. Don't worry: users aren't being forced to do anything, here. If they don't want to hear announcements, they can stop using the podcache. It's up to you to enforce that arrangement. .. _podcache providers: ... for Podcache providers -------------------------- Oh, boy, do *you* have a lot of hard work to do. You can't escape reading this entire document and comprehending the detail, I'm afraid. Sorry. If that strikes you as unfair, consider this: running a podcache saves podcasters money but costs you money. Due to efficiencies of scale, it'll probably cose you less than it costs them, but it's still your money being spent. If you think writing the code is hard, wait until you try to figure out a way to recover your costs without pissing everyone off. Ouch. Elements for Podcache Feeds =========================== The difference between a normal feed and a podcache feed is special ``podcache:`` `item attributes`_ and `enclosure attributes`_ to let podcatchers know important information about what it is you're caching so they can a) use your cache, and b) use your cache responsibly. If you don't specify `podcache:originalurl`_, podcatchers won't even know to grab something from your cache, so we expect you'll be eager to insert that one. If you don't specify the others, podcatchers should really stop using your feed because it makes it too difficult for them to ensure they're not grabbing out-of-date items. Namespace Definition -------------------- First, you *should* make XML parsers happy (and let everyone know how to find this document) by adding the ``podcache`` namespace definition to your ``rss`` tag:: ... iPodder and anything else using a permissive feed parser won't mind if you skip that, but I suspect other podcatchers using strict XML parsers won't use your cache and will either go elsewhere or download the content directly from the podcaster's site. .. _item attributes: Item Attributes --------------- There's only one new ``item`` attribute to generate: * `podcache:feedurl`_ podcache:feedurl ~~~~~~~~~~~~~~~~ For each ``item`` you're caching, you *must* add ``podcache:feedurl`` to let compatible podcatchers know which feed the item came out of. If you're polling the `recent 100 feed`__, you can get this from the ``source`` tag. :: ... __ http://audio.weblogs.com/top100.xml .. _enclosure attributes: Enclosure Attributes -------------------- There are a handful of ``enclosure`` attributes to generate: * `podcache:originalurl`_ * `podcache:length`_ * `podcache:contentlength`_ * `podcache:etag`_ * `podcache:lastmodified`_ The latter three *should* be used by podcatchers to verify that the enclosure hasn't been updated since you fetched it. Not at all compulsory, but interesting to know about, is a means to podcast short announcements to people using your cache feed: * `podcache:cacheannouncement`_ podcache:originalurl ~~~~~~~~~~~~~~~~~~~~ For each enclosure, you *must* add ``podcache:originalurl``. This is the attribute by which podcatchers will identify that you have a cached version of an enclosure they want. You *must not* normalize the URL: it must be *exactly* what was in the original ``url`` attribute, or podcatchers won't match it. :: podcache:length ~~~~~~~~~~~~~~~ As some podcasters don't put the right ``length`` in their feed, and the ``length`` in your cached ``item`` will be that of the BitTorrent response file you generated, you *should* add ``podcache:length`` to let podcatchers know how big the file they'll get will actually be. They'll also figure that out once they grab the ``.torrent`` file, but I'm sure they'll make good use of the information. :: podcache:contentlength ~~~~~~~~~~~~~~~~~~~~~~ So that podcatchers can verify that what you're seeding matches the enclosure in the feed you're caching, you should add ``podcache:contentlength`` to let them know the content of the ``Content-Length:`` HTTP header you received when you fetched the content. If there was no such header, include the attribute but leave it empty. :: podcache:etag ~~~~~~~~~~~~~ So that podcatchers can verify that what you're seeding matches the enclosure in the feed you're caching, you should add ``podcache:etag`` to let them know the content of the ``ETag:`` HTTP header you received when you fetched the content. If there was no such header, include the attribute but leave it empty. :: podcache:lastmodified ~~~~~~~~~~~~~~~~~~~~~ So that podcatchers can verify that what you're seeding matches the enclosure in the feed you're caching, you should add ``podcache:lastmodified`` to let them know the content of the ``Last-Modified:`` HTTP header you received when you fetched the content. If there was no such header, include the attribute but leave it empty. :: Podcache Announcements ---------------------- To make announcements to your cache feed's users, include a normal ``item`` with an ``enclosure`` but without any of the ``podcache`` attributes except for `podcache:cacheannouncement`_. Podcatchers *must* insert your announcement into a playlist where the user will hear it. Podcatchers *must not* provide a way to use a cache feed without downloading announcements and inserting them into playlists. Podcasters can talk to their users; podcache providers should also be given an opportunity to do so. Given the expense of hosting a cache, podcache providers might even need to advertise. If the podcatchers don't help, the caches and then the podcasters will be crushed under the weight of their bandwidth bill. That said, there should be balance: podcatchers *should* make it easy for users to stop using your feed if they get sick of your announcements. We'd insist, but we don't need to: , basic market selection will take care of it for us. podcache:cacheannouncement ~~~~~~~~~~~~~~~~~~~~~~~~~~ The ``podcache:announcement`` attribute should simply be set to ``true``:: ... Elements for Podcast Feeds ========================== One of the primary design goals of the podcache was simplicity for podcasters: If you do nothing at all, you'll get most of the benefits with none of the hassle. If the user has configured a podcache that is caching your podcast, you shouldn't notice anything except a substantial savings on your bandwidth bill. A well behaved podcache-enabled podcatcher will still hit your RSS feed so you know they're listening, will try its best to ignore the podcache if it has fallen out of date, and will download directly from you if the podcache happens to be broken. If you want to make your feed podcache-aware, though, there are some elements you might find interesting: * `podcache:forbid`_ stops podcatchers from using caches for your feed. * `podcache:preferredcache`_ indicates your preferred cache. * `podcache:expectmd5`_ and `podcache:expectlength`_ let caches and podcatchers make sure they didn't break your enclosure during handling. Namespace Definition -------------------- First, you *should* make XML parsers happy (and let everyone know how to find this document) by adding the ``podcache`` namespace definition to your ``rss`` tag:: ... This is more urgent for podcasters than it is for podcachers. If someone can't use a podcache because the feed isn't strict XML, that saves the podcacher bandwidth. If someone can't download your feed because it isn't strict XML, they can't listen to you. Oops. Channel Elements ---------------- The `podcache:forbid`_ and `podcache:preferredcache`_ elements on the channel control the behaviour of podcatchers. .. warning:: We don't know whether to put these in as tags or elements, hence the lack of any example XML. That in turn makes it pretty hard to implement either a compliant feed writer or software to read it. Sorry about that. podcache:forbid ~~~~~~~~~~~~~~~ If you specify ``podcache:forbid`` for your channel or any item, iPodder and any other well behaved podcatcher will ignore any cached entries for your feed. Well behaved podcaches will stop caching your feed, though they might keep polling once a day to see if you've taken ``forbid`` off. Podcatchers *must* obey ``podcache:forbid`` once we figure out where to put it. podcache:preferredcache ~~~~~~~~~~~~~~~~~~~~~~~ You can specify a preferred podcache with ``podcache:preferredcache``. This might be useful if you're in serious trouble with your bandwidth bill: just configure your feed so that only one cache can download your enclosures, and set it as your preferred cache. Podcatchers *may* obey ``podcache:preferredcache``; there might be network topology or security reasons why they can't access your preferred cache and might want to instead use some other cache (which might be caching your preferred cache). Enclosure Elements ------------------ The `podcache:expectmd5`_ and `podcache:expectlength`_ attributes on ``enclosure`` tags let caches make sure they got your enclosure accurately, and let podcatchers know the enclosure they got from the cache is exactly the same as the enclosure they would have fetched from you directly. podcache:expectmd5 ~~~~~~~~~~~~~~~~~~ So that corruption of your enclosure can be detected even if the length is correct, put a hexified MD5 digest in the ``podcache:expectmd5`` attribute. You'll know if your implementation is correct if your hexified digest for the word "fnord" is the same as that given in the example below:: For what it's worth, the Python code to compute the digest is:: import md5 def hexified_digest(blocks): """Return a hexified digest. blocks -- a sequence of blocks of data.""" engine = md5.new() for block in blocks: engine.update(block) return engine.digest().encode('hex') Podcatchers *should* check downloaded enclosures against ``podcache:expectmd5`` and tell users of any mismatches. podcache:expectlength ~~~~~~~~~~~~~~~~~~~~~ Podcaches and podcatchers can't rely on your ``length`` attribute being correct because too many feeds either leave it out entirely or put the same (incorrect) length on every enclosure. To let them know that you're serious, put another copy in the ``podcache:expectlength`` attribute:: Podcatchers *should* check downloaded enclosures against ``podcache:expectlength`` and tell users of any mismatches.