Package Torello.HTML.Tools.NewsSite
Class NewsSite
- java.lang.Object
-
- Torello.HTML.Tools.NewsSite.NewsSite
-
- All Implemented Interfaces:
java.io.Serializable
public class NewsSite extends java.lang.Object implements java.io.Serializable
The 'data flow' encapsulation class that contains most of the salient features of a news oriented web-site.
This class is intended to allow a programmer to store the entire list of object references necessary to download a day's news-content from a news website. This class may be serialized, and saved to disk.- See Also:
- Serialized Form
Hi-Lited Source-Code:- View Here: Torello/HTML/Tools/NewsSite/NewsSite.java
- Open New Browser-Tab: Torello/HTML/Tools/NewsSite/NewsSite.java
File Size: 6,455 Bytes Line Count: 177 '\n' Characters Found
-
-
Field Summary
Serializable ID Modifier and Type Field static longserialVersionUIDPrimary NewsSite Data Modifier and Type Field CountrycountryStringdescriptionLClanguageCodeStringsiteNameURLsiteURLGetters (@FunctionalInterface - Lambdas) Modifier and Type Field ArticleGetarticleGetterStrFilterbannerAndAddFinderURLFilterfilterLinksGetlinksGetter
-
Constructor Summary
Constructors Constructor NewsSite(String siteName, Country country, String siteURLAsStr, LC languageCode, String description, Vector<URL> sectionURLs, URLFilter filter, LinksGet linksGetter, ArticleGet articleGetter, StrFilter bannerAndAddFinder)NewsSite(String siteName, Country country, String siteURLAsStr, LC languageCode, String description, Vector<URL> sectionURLs, StrFilter filter, LinksGet linksGetter, ArticleGet articleGetter, StrFilter bannerAndAddFinder)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Iterator<URL>sectionURLsIter()Vector<URL>sectionURLsVec()
-
-
-
Field Detail
-
serialVersionUID
public static final long serialVersionUID
This fulfils the SerialVersion UID requirement for all classes that implement Java'sinterface java.io.Serializable. Using theSerializableImplementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.- See Also:
- Constant Field Values
-
siteName
public final java.lang.String siteName
A Simple Name for the news-site
-
country
-
siteURL
public final java.net.URL siteURL
URLof the main-page for the news web-site
-
languageCode
public final LC languageCode
A Language Code instance for the web-site, if needed.
-
description
public final java.lang.String description
A simple text description of the news web-site
-
filter
public final URLFilter filter
- See Also:
ScrapeURLs
-
linksGetter
public final LinksGet linksGetter
An instance ofLinksGetfor retrieving Article-URLlinks from a section page- See Also:
ScrapeURLs
-
articleGetter
public final ArticleGet articleGetter
An instance ofArticleGetused to retrieve news-articles from this site.- See Also:
ScrapeArticles
-
bannerAndAddFinder
public final StrFilter bannerAndAddFinder
An instance ofStrFilterfor finding banner's or ad's- See Also:
ScrapeArticles
-
-
Constructor Detail
-
NewsSite
public NewsSite(java.lang.String siteName, Country country, java.lang.String siteURLAsStr, LC languageCode, java.lang.String description, java.util.Vector<java.net.URL> sectionURLs, StrFilter filter, LinksGet linksGetter, ArticleGet articleGetter, StrFilter bannerAndAddFinder)
Convenience Constructor
May pass aStrFilterto theURLFilterparameter instead.
Invokes:NewsSite(String, Country, String, LC, String, Vector, URLFilter, LinksGet, ArticleGet, StrFilter)- Code:
- Exact Constructor Body:
this( siteName, country, siteURLAsStr, languageCode, description, sectionURLs, URLFilter.fromStrFilter(filter), linksGetter, articleGetter, bannerAndAddFinder );
-
NewsSite
public NewsSite(java.lang.String siteName, Country country, java.lang.String siteURLAsStr, LC languageCode, java.lang.String description, java.util.Vector<java.net.URL> sectionURLs, URLFilter filter, LinksGet linksGetter, ArticleGet articleGetter, StrFilter bannerAndAddFinder)
Simple constructor for this data-class.- Parameters:
siteName- This site's namecountry- The country-of-origin for this news web-site.siteURLAsStr- The primaryURLfor the news web-site.languageCode- If this site uses a non-English system, the'languageCode'parameter can keep track of the language.description- Brief Description of the site.sectionURLs- This should list the primary news-sections on the web-site. News sections include lists such as "Life", "Health", "Business", "World News", "Sports" - but this list could actually include just about anything.filter- If, when scraping a section, there areURL'sthat need to be filtered, this parameter can help filtering non-Article, non-news links. As explained in theclass ScrapeURL's, this is often a simple one-lined lambda-expression that identifies whichURL'smatch a Regular-ExpressionPattern.linksGetter- This is a 'getter', which also is often just a one line regular-expression lambda for retrieving the links from a section web-page.articleGetter- This should implement theArticleGetinterface.bannerAndAddFinder- Filter for finding repetitive ads or banners.- Code:
- Exact Constructor Body:
this.siteName = siteName; this.country = country; this.languageCode = languageCode; this.description = description; this.sectionURLs = (Vector<URL>) sectionURLs.clone(); this.filter = filter; this.linksGetter = linksGetter; this.articleGetter = articleGetter; this.bannerAndAddFinder = bannerAndAddFinder; try { this.siteURL = new URL(siteURLAsStr); } catch (MalformedURLException e) { throw new NewsSiteException( "Unable to instantiate the parameter 'siteURLAsStr'. There was a Malformed URL " + "Exception thrown. Please see this Exceptions Throwable.getCause() for more " + "details.", e ); }
-
-
Method Detail
-
sectionURLsIter
public java.util.Iterator<java.net.URL> sectionURLsIter()
Retrieves the Section URL's (life, comedy, sports, business, world) for this news-site- Returns:
- An
Iterator<URL>of the different sections for a particular news-site. - Code:
- Exact Method Body:
return new RemoveUnsupportedIterator<URL>(sectionURLs.iterator());
-
sectionURLsVec
public java.util.Vector<java.net.URL> sectionURLsVec()
Retrieves the Section URL's (life, comedy, sports, business, world) for this news-site- Returns:
- A
Vector<URL>of the different sections for a particular news-site. - Code:
- Exact Method Body:
return (Vector<URL>) sectionURLs.clone();
-
-