Enclosing class:

HTMLPage

Functional Interface:

This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.
```
@FunctionalInterface
public static interface HTMLPage.Parser
```
A function-pointer / lambda-target that (could) potentially be used to replace this library's current regular-expression based parser with something possibly faster or even more efficient.

This Functional Interface is identical to QuintFunction<A, B, C, D, E, X> in the 'Java.Additional'package, but adds the ability to throw an IOException. Having the ability to "swap parsers" is actually not a very important 'feature' - unless one has identified a way to optimize past the abilities of the current parser, or desires something different altogether. This 'feature' shall remain in place since there is essentially zero over-head costs incurred here. To see the actual parser code used by this package, view the documentation for class-HTMLPage, and scroll to 'View Source Files'.

NOTE: If one desired, for instance, to ignore the debugging log-files feature, that is easily done by ignoring the three file-name parameters. However, this can easily be achieved in class HTMLPage by invoking one of the methods where those log file-names are passed null-value strings.

See Also:

HTMLPage.parser
Hi-Lited Source-Code:
- View Here: Torello/HTML/HTMLPage.java
- Open New Browser-Tab: Torello/HTML/HTMLPage.java
File Size: 1,852 Bytes Line Count: 35 '\n' Characters Found

Method Summary

@FunctionalInterface: (Lambda) Method

Modifier and Type	Method
`Vector<HTMLNode>`	`parse(CharSequence html, boolean eliminateHTMLTags, String rawHTMLFile, String matchesFile, String justTextFile)`

- Method Detail
  - parse
    
    ⇈ ⮫ 🗕 🗗 🗖
    java.util.Vector<HTMLNode> parse(java.lang.CharSequence html, boolean eliminateHTMLTags, java.lang.String rawHTMLFile, java.lang.String matchesFile, java.lang.String justTextFile) throws java.io.IOException
    
    Parse html source-text into a Vector<HTMLNode>.
    
    Parameters:
    
    html - This may be any form of java.lang.CharSequence, and it will be converted into a String. This should contain HTML that needs to be parsed, and vectorized.
    
    eliminateHTMLTags - When this parameter is TRUE, all TagNode and CommentNode elements are eliminated from the returned HTML Vector. A Vector having only the page-text (as instances of TextNode) is returned, instead.
    
    rawHTMLFile - If this parameter is non-null, an identical copy of the HTML that is retrieved will be saved (as a text-file) to the file named by parameter 'rawHTMLFile'. If this parameter is null, it will be ignored (and the raw-HTML discarded).
    
    NOTE: If you have decided to implement a parser, and you wish to ingore this parameter (and don't want to output such a file) - it is (hopefully) obvious that you may skip this step!
    
    matchesFile - If this parameter is non-null, a parser-output file, consisting of the regular-expression matches obtained while parsing the HTML, will be saved to disk using this file-name. This is a legacy feature, which can be helpful when debugging and investigating the contents of output HTML-Vector's. This parameter may be null, and if it is, Regular-Expression Match Data will simply be discarded by the parser, after use.
    
    NOTE: As above, you may skip implementing this.
    
    justTextFile - If this parameter is non-null, a copy of the each and every character of text found on the downloaded web-page - that is not inside of an HTML TagNode or CommentNode - will be saved to disk using this file-name. This is also a legacy feature. The text-file generated makes it easy to quickly scan the words that would be displayed on the page. If this parameter is null, it will be ignored.
    
    NOTE: As above, you may skip implementing this.
    
    Returns:
    
    A Vector of HTMLNode's (called 'Vectorized HTML') that represents the available parsed-content provided by the input-source.
    
    Throws:
    
    java.io.IOException - This exception throws if there are any problems while processing the input-source HTML content (or writing output, if any).

Interface HTMLPage.Parser

Method Summary

Method Detail

parse