Class NewsSite

  • All Implemented Interfaces:
    java.io.Serializable

    public class NewsSite
    extends java.lang.Object
    implements java.io.Serializable
    The 'data flow' encapsulation class that contains most of the salient features of a news oriented web-site.

    This class is intended to allow a programmer to store the entire list of object references necessary to download a day's news-content from a news website. This class may be serialized, and saved to disk.
    See Also:
    Serialized Form


    • Constructor Detail

      • NewsSite

        🡅  🡇     🗕  🗗  🗖
        public NewsSite​(java.lang.String siteName,
                        Country country,
                        java.lang.String siteURLAsStr,
                        LC languageCode,
                        java.lang.String description,
                        java.util.Vector<java.net.URL> sectionURLs,
                        URLFilter filter,
                        LinksGet linksGetter,
                        ArticleGet articleGetter,
                        StrFilter bannerAndAddFinder)
        Simple constructor for this data-class.
        Parameters:
        siteName - This site's name
        country - The country-of-origin for this news web-site.
        siteURLAsStr - The primary URL for the news web-site.
        languageCode - If this site uses a non-English system, the 'languageCode' parameter can keep track of the language.
        description - Brief Description of the site.
        sectionURLs - This should list the primary news-sections on the web-site. News sections include lists such as "Life", "Health", "Business", "World News", "Sports" - but this list could actually include just about anything.
        filter - If, when scraping a section, there are URL's that need to be filtered, this parameter can help filtering non-Article, non-news links. As explained in the class ScrapeURL's, this is often a simple one-lined lambda-expression that identifies which URL's match a Regular-Expression Pattern.
        linksGetter - This is a 'getter', which also is often just a one line regular-expression lambda for retrieving the links from a section web-page.
        articleGetter - This should implement the ArticleGet interface.
        bannerAndAddFinder - Filter for finding repetitive ads or banners.
        Code:
        Exact Constructor Body:
         this.siteName           = siteName;
         this.country            = country;
         this.languageCode       = languageCode;
         this.description        = description;
         this.sectionURLs        = (Vector<URL>) sectionURLs.clone();
         this.filter             = filter;
         this.linksGetter        = linksGetter;
         this.articleGetter      = articleGetter;
         this.bannerAndAddFinder = bannerAndAddFinder;
        
         try
             { this.siteURL = new URL(siteURLAsStr); }
        
         catch (MalformedURLException e)
         {
             throw new NewsSiteException(
                 "Unable to instantiate the parameter 'siteURLAsStr'.  There was a Malformed URL " +
                 "Exception thrown.  Please see this Exceptions Throwable.getCause() for more " +
                 "details.", e
             );
         }
        
    • Method Detail

      • sectionURLsIter

        🡅  🡇     🗕  🗗  🗖
        public java.util.Iterator<java.net.URL> sectionURLsIter()
        Retrieves the Section URL's (life, comedy, sports, business, world) for this news-site
        Returns:
        An Iterator<URL> of the different sections for a particular news-site.
        Code:
        Exact Method Body:
         return new RemoveUnsupportedIterator<URL>(sectionURLs.iterator());
        
      • sectionURLsVec

        🡅     🗕  🗗  🗖
        public java.util.Vector<java.net.URL> sectionURLsVec()
        Retrieves the Section URL's (life, comedy, sports, business, world) for this news-site
        Returns:
        A Vector<URL> of the different sections for a particular news-site.
        Code:
        Exact Method Body:
         return (Vector<URL>) sectionURLs.clone();