java.lang.Object
- Torello.HTML.Tools.NewsSite.Article

All Implemented Interfaces:

java.io.Serializable
```
public class Article
extends java.lang.Object
implements java.io.Serializable
```
When a news article is downloaded from a URL, its contents are parsed, and the information-HTML is stored in this class.

This class will store the results from downloading / scraping a news-article from a news-site. Instances of this class are produced by calls to the class ScrapeArticles. These results can be saved to a vector, or stored to the File-System for later use. Internally they can contain the original News-Site Article Web-page, and the paired down Article-Body Web-Page.

See Also:

Serialized Form
Hi-Lited Source-Code:
- View Here: Torello/HTML/Tools/NewsSite/Article.java
- Open New Browser-Tab: Torello/HTML/Tools/NewsSite/Article.java
File Size: 4,726 Bytes Line Count: 123 '\n' Characters Found

Field Summary

Serializable ID

Modifier and Type	Field
`protected static long`	`serialVersionUID`

Primary Article Data
Modifier and Type	Field
`Vector<HTMLNode>`	`articleBody`
`Vector<HTMLNode>`	`originalPage`
`String`	`titleElement`
`URL`	`url`
`boolean`	`wasErrorDownload`

Article Image Data
Modifier and Type	Field
`int[]`	`imagePosArr`
`Vector<URL>`	`imageURLs`

Torello.HTML.PageStats
Modifier and Type	Field
`PageStats`	`originalPageStats`
`PageStats`	`processedArticleStats`

Constructor Summary

Constructors
Constructor	Description
`Article(URL url, String titleElement, Vector<HTMLNode> originalPage, Vector<HTMLNode> articleBody, Vector<URL> imageURLs, int[] imagePosArr)`	Builds an instance of this class.

Method Summary
- Methods inherited from class java.lang.Object
  clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Field Detail
  - serialVersionUID
    
    🡇 ⇈ ⮫ 🗕 🗗 🗖
    protected static final long serialVersionUID
    
    This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.
    
    See Also:
    
    Constant Field Values
    
    Code:
    
    Exact Field Declaration Expression:
    
    protected static final long serialVersionUID = 1;
  - wasErrorDownload
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final boolean wasErrorDownload
    
    This should inform the user that an error occurred when downloading an article. If this field, after instantiation is TRUE, all other fields in this class should be thought of as "irrelevant."
  - url
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final java.net.URL url
    
    This is the article's URL from the news website.
  - titleElement
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final java.lang.String titleElement
    
    This is the title that was scraped from the main page. The title is the content of the <TITLE>...</TITLE> element on the article HTML page.
  - originalPage
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final java.util.Vector<HTMLNode> originalPage
    
    This is the original, and complete, HTML vectorized-page download. It contains the original, un-modified, article download.
  - articleBody
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final java.util.Vector<HTMLNode> articleBody
    
    This is the pared down article-body. It is what is retrieved from class ArticleGet
  - imageURLs
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final java.util.Vector<java.net.URL> imageURLs
    
    The image-URL's that were found in the news-article. The easiest way to think about this field is that the following instructions were called on the article-body after downloading the article:
    
    Vector<TagNode> imageNodes = TagNodeGet.all(article, TC.OpeningTags, "img"); Vector<URL> imageURLs = Links.resolveSRCs(imageNodes, articleURL); // The results of the above call are stored in this field / Vector<URL>.
  - imagePosArr
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final int[] imagePosArr
    
    This list contains the "Image Positions" inside the vectorized-article for each image that was found inside the article. The easiest way to think about this field is that the following instructions were called on the article-body after downloading that article:
    
    int[] imagePosArr = TagNodeFind.all(page, TC.OpeningTags, "img");
  - originalPageStats
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final PageStats originalPageStats
    
    This contains an instance of class PageStats that has been generated out of an original Newspaper Article Page.
    
    Java Line of Code:
    
    this.originalPageStats = new PageStats(originalPage);
  - processedArticleStats
    
    🡅 🡇 ⇈ ⮫ 🗕 🗗 🗖
    public final PageStats processedArticleStats
    
    This contains an instance of class PageStats that has been generated from the post-processed Newspaper Article.
    
    Java Line of Code:
    
    this.processedArticleStats = new PageStats(articleBody);
- Constructor Detail
  - Article
    
    🡅 ⇈ ⮫ 🗕 🗗 🗖
    public Article(java.net.URL url, java.lang.String titleElement, java.util.Vector<HTMLNode> originalPage, java.util.Vector<HTMLNode> articleBody, java.util.Vector<java.net.URL> imageURLs, int[] imagePosArr)
    
    Builds an instance of this class.
    
    Parameters:
    
    url - The web-address from whence this news-article was downloaded / retrieved.
    
    titleElement - The contents of the HTML <TITLE> tag, as a String.
    
    originalPage - Vectorized-HTML of the original article web-page, in its entirety.
    
    articleBody - Vectorized-HTML of the body of the article's page, as extracted by the ArticleGet function-pointer.
    
    imageURLs - A list of all HTML <IMG> elements found inside the 'articleBody'
    
    imagePosArr - The Vector-indices where the images (if any) were found in the article.

Fields
Vector	articleBody
int[]	imagePosArr
Vector	imageURLs
Vector	originalPage
PageStats	originalPageStats
PageStats	processedArticleStats
long	serialVersionUID
String	titleElement
URL	url
boolean	wasErrorDownload
Constructors
Article(URL url, String titleElement, Vector originalPage, Vector articleBody, Vector imageURLs, int[] imagePosArr)

Class Article

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

serialVersionUID

wasErrorDownload

url

titleElement

originalPage

articleBody

imageURLs

imagePosArr

originalPageStats

processedArticleStats

Constructor Detail

Article