Package Torello.HTML.Tools.NewsSite

Utilities for scraping news web-sites. Scraping is performed in two steps. The first is retrieving Article URL's from the main-page and sub-sections of the newspaper site. The second is for retrieving the Article's themselves. The articles are saved to disk, unless a specialized ScrapedArticleReceiver is provided, and they are encoded using Java's Serializable routines. A method is provided for converting these data-files to '.html' files, and for retrieving / 'localizing' the images encountered on the Article-pages.