MogileFS
Looks interesting, but I would guess for 95% of problems this solves, using S3 would be easier and probably cheaper.
Adding Dynamic Contents to IFrames | while($alive) LiveAndLearn();
Interesting to note, avoid waiting for iframe to load by using document.open(); document.close(); before manipulating DOM.
boilerpipe - Project Hosting on Google Code
Java library for web page text extraction