John Udell is taggins some of his del.icio.us links to podcasts with transcriptavailable, transcripts have been generated manually. This could be a nice source of data for experiments with information retrieval from podcasts.
Sort of relatedly, I just discovered LibriVox which hosts volunteer recordings of out of copyright literary works (eg. Project Gutenberg books). I sampled War of the Worlds and the quality seems great. Worth a browse.