boilerpipe - Project Hosting on Google Code

Java library for web page text extraction