Package Torello.HTML
Class HTMLTags
- java.lang.Object
-
- Torello.HTML.HTMLTags
-
public class HTMLTags extends java.lang.Object
Primary "HTML-5 Tags" class - keeps a list of all122 Tags
in aTreeSet<String>
, and many accessor methods that are used by he HTML Parser, or potentially any class or function that may need this list.
The purpose of this class is to maintain the list of valid HTML tags in Java memory. There are under 200 of these, and they aid the HTMLParse
class in picking valid HTML tags when scraping. This class also maintains in memory some "pre-instantiated" Java-HTMLHTMLNode - TagNode
instances. Theclass TagNode
contains only "final variables" (is immutable) because at least 80% of HTML on any given page is just a tag / element instance that never needs to change in memory. Call thepublic TagNode hasTag(String, TC)
to obtain a valid instance ofclass TagNode
.
-
-
Method Summary
Basic Methods Modifier and Type Method static String
getDescription(String tag)
static TagNode
hasTag(String tag, TC openOrClosed)
List Known Tags Modifier and Type Method static Iterator<String>
iterator()
static Iterator<String>
iteratorAddedForHTML5()
static Iterator<String>
iteratorBlockTags()
static Iterator<String>
iteratorDeprecatedForHTML5()
static Iterator<Map.Entry<String,
String>>iteratorDescriptions()
static Iterator<String>
iteratorInlineTags()
static Iterator<String>
iteratorSingletonTags()
Check Tag Categories Modifier and Type Method static boolean
deprecated(String tok)
static boolean
isBlock(String tok)
static boolean
isHTML5(String tok)
static boolean
isInline(String tok)
static boolean
isSingleton(String tok)
static boolean
isTag(String tag)
Add or Remove Tags (to/from the Internal-List) Modifier and Type Method static boolean
addSingleton(String htmlTagSingleton)
static boolean
addTag(String htmlTag)
static boolean
removeSingleton(String htmlTagSingleton)
static boolean
removeTag(String htmlTag)
Print the Internal Tag List Modifier and Type Method static void
printAll(Appendable a, boolean printDescriptions)
static void
printAllToTerminal(boolean printDescriptions)
Utilities Modifier and Type Method static String
getTag_MEM_HEAP_CHECKOUT_COPY(String tag)
static void
loadDescriptions()
static void
main(String[] argv)
static byte
maxTokenLength()
-
-
-
Method Detail
-
main
public static void main(java.lang.String[] argv)
-
printAllToTerminal
public static void printAllToTerminal(boolean printDescriptions)
This simply prints all data that is stored in the JAR file to terminal output. It uses the method with the near-same name, but utilizes'System.out'
for theAppendable
instance. Because'System.out'
does not throw theIOException
when printing, it is caught here, for convenience.- Parameters:
printDescriptions
- If this is set toTRUE
, then it will ensure that the JAR Descriptions-Data-File is loaded into memory. If not, then the description-String's
will not be loaded. TheseString's
contain a one-sentence-long text-description of each HTML Element listed in this class. If this parameter isFALSE
the data-file will not be visited, and the HTML Element descriptions will not be sent to the output stream.- See Also:
printAll(Appendable, boolean)
- Code:
- Exact Method Body:
try { printAll(System.out, printDescriptions); } catch (IOException e) { }
-
printAll
public static void printAll(java.lang.Appendable a, boolean printDescriptions) throws java.io.IOException
This simply prints all data that is stored in the JAR data-file to ajava.lang.Appendable
.- Parameters:
a
- This parameter provides an instance that will receive the text output. This parameter may not be null, or aNullPointerException
will throw. This expects an implementation of Java'sjava.lang.Appendable
interface which allows for a wide range of options when logging intermediate messages.Class or Interface Instance Use & Purpose 'System.out'
Sends text to the standard-out terminal Torello.Java.StorageWriter
Sends text to System.out
, and saves it, internally.FileWriter, PrintWriter, StringWriter
General purpose java text-output classes FileOutputStream, PrintStream
More general-purpose java text-output classes
Checked IOException:
TheAppendable
interface requires that the Checked-ExceptionIOException
be caught when using itsappend(...)
methods.printDescriptions
- If this is set toTRUE
, then the ensure that the JAR Descriptions-Data-File has already been loaded into memory. If not, then the description-String's
will be loaded into memory. TheseString's
contain a one-sentence-long text-description of each HTML Element listed in this class. If this parameter isFALSE
the data-file will not be visited, and the HTML Element descriptions will not be sent to the output stream.- Throws:
java.io.IOException
- The general purposeinterface java.lang.Appendable
requires checking for anIOException
throw when printing information. If the'Appendable'
provided to this method fails, this exception shall propagate out.- Code:
- Exact Method Body:
a.append("TAGS: "); for (String tag : tags) a.append(tag + ", "); a.append("\n\nDEPRECATED: "); for (String deprecatedTag : deprecated) a.append(deprecatedTag + ", "); a.append("\n\nHTML5: "); for (String html5Tag : html5Tags) a.append(html5Tag + ", "); a.append("\n\nSINGLETON-TAGS: "); for (String selfClosingTag : singletonTags) a.append(selfClosingTag + ", "); a.append("\n\nBLOCK-TAGS: "); for (String blockTag : blockTags) a.append(blockTag + ", "); a.append("\n\nINLINE-TAGS: "); for (String inlineTag : inlineTags) a.append(inlineTag + ", "); a.append("\n\ntagNodesOpening: "); for (String s : tagNodesOpening.keySet()) a.append(tagNodesOpening.get(s).toString() + ", "); a.append("\n\ntagNodesClosing: "); for (String s : tagNodesClosing.keySet()) a.append(tagNodesClosing.get(s).toString() + ", "); a.append("\n\ntagNodesOpeningUC: "); for (String s : tagNodesOpeningUC.keySet()) a.append(tagNodesOpeningUC.get(s).toString() + ", "); a.append("\n\ntagNodesClosingUC: "); for (String s : tagNodesClosingUC.keySet()) a.append(tagNodesClosingUC.get(s).toString() + ", "); if (printDescriptions) { loadDescriptions(); // Will only load if descriptions have not already been loaded. a.append("\n\n"); for (String s : descriptions.keySet()) a.append(s + ((s.length() >= 7) ? ":\t" : ":\t\t") + descriptions.get(s) + "\n"); }
-
loadDescriptions
public static void loadDescriptions()
The data-structure (ajava.util.TreeMap
) that holds the individualtext-descriptions
of each HTML tag is not loaded into memory from the JAR file automatically. When the class-loader for this class loads this class, it employs a "Lazy Loading" Heuristic to prevent unnecessary memory-usage.
Instead, if a programmer has decided that he would like to start printing information about HTML-Tags, and would like to include a short, one or two sentence description of the HTML Elements (using the methodgetDescription(String)
, then and only then will this method'loadDescriptions'
be invoked to load those one-sentence HTML-Tag Summaries.
As an aside, the purpose of keeping these sentences in a jar file is that they are a kind of long, and really never used at all - unless you are interested in doing some reporting. By keeping them in the jar-file (unless requested) some amount of "over-head" resource usage is saved.
If the text-descriptions have already loaded, this method will just exit and return, rather than loading them a second time.- See Also:
LFEC.readObjectFromFile_JAR(Class, String, boolean, Class)
- Code:
- Exact Method Body:
if (descriptions.size() == 0) descriptions.putAll((TreeMap<String, String>) LFEC.readObjectFromFile_JAR (HTMLTags.class, "data-files/HTMLTagDescriptions.tmdat", true, TreeMap.class));
-
maxTokenLength
public static byte maxTokenLength()
This will compute theString
-length of the longest HTML token saved in the internal stateTreeSet<String>
of HTML Tokens.- Returns:
- The length of the longest HTML Token String.
- Code:
- Exact Method Body:
return MAX_TOKEN_LENGTH;
-
addTag
public static boolean addTag(java.lang.String htmlTag)
Adds a new HTML element to the list of elements that may be parsed, created and checked. This is not always advisable, as the complete list of HTML-5 tags are already internally stored, but if you would like to add or remove certain tags, there are two methods for doing this.- Parameters:
htmlTag
- Any HTML tag that you would like to see parsed by the HTML page parser. If the parser encounters a construct such as:<YOUR_NEW_TAG ATTRIBUTES="...">
it will treat that as a new HTML element.- Returns:
TRUE
if the element was indeed a new element to the list, andFALSE
if the HTML-tokens-list already contained this HTML element. If so, this method call will just return gracefully - with no changes being made to the underlying list of acceptable HTML tokens.- Throws:
HTMLTokException
- If theString
parameter'htmlTag'
contains non-alpha-numeric characters.- Code:
- Exact Method Body:
Matcher m = HTML_TAG_ALPHA_NUMERIC.matcher(htmlTag); if ((! m.find()) || (htmlTag.length() != m.group().length())) throw new HTMLTokException( "The HTML-Tag Parameter that was passed [" + htmlTag + "] doesn't conform to the " + "expected requirements for HTML-Tags. It may only contain alpha-numeric characters, " + "and it must not begin with a number." ); String tag = htmlTag.trim().toLowerCase(); if (tag.length() > 127) throw new HTMLTokException( "The (trimmed) HTML-Tag Parameter that was passed [" + tag + "] is longer than 127 " + "characters. This is not allowed here." ); boolean ret = tags.add(tag); if (ret) { // NOTE: These four private, static fields are of type TreeMap<String, TagNode> // tagNodesOpening, tagNodesOpeningUC, tagNodesClosing, tagNodesClosingUC // // They can provide a significant savings for the Garbage Collector. For any // HTML Element that does not have any attributes, and has a standard 'case' // (all upper-case, or all lower-case), the parser will "re-use" pre-existing // instances of class TagNode, rather than building a new one. // // FOR EXAMPLE: The parser will "re-use" the same instance of a "<BR>" TagNode, or // any one, actually, as long as it does not have attributes. Since 40% // to 50% of class TagNode are "TC.ClosingTags", this can be a significant // improvement // Build a Lower-Case, Pre-Instantiated, Zero-Attribute version of the HTML Element // Uses specialized package-only visible TagNode constructor. // Not available to the general public tagNodesOpening.put(tag, new TagNode(tag, TC.OpeningTags)); tagNodesClosing.put(tag, new TagNode(tag, TC.ClosingTags)); // Build an Upper-Case, Pre-Instantiated, Zero-Attribute version of the HTML Element tag = tag.toUpperCase(); tagNodesOpeningUC.put(tag, new TagNode("<" + tag + ">")); tagNodesClosingUC.put(tag, new TagNode("</" + tag + ">")); // Update the MAX_TOKEN_LENGTH - but only if necessary. if (tag.length() > MAX_TOKEN_LENGTH) MAX_TOKEN_LENGTH = (byte) tag.length(); } return ret;
-
removeTag
public static boolean removeTag(java.lang.String htmlTag)
Removes and HTML element from the list of elements that may be parsed, created and checked. This is not always advisable, as the complete list of HTML-5 tags are already internally stored, but if you would like to add or remove certain tags, there are two methods for doing this.- Parameters:
htmlTag
- Any HTML tag that you no longer want to see parsed by the HTML page parser. HTML nodes that contain this tag as their element will cause the parser to ignore the node, and treat it like aTextNode
.- Returns:
TRUE
if the element was removed, andFALSE
if it was not - because it wasn't in the HTML-tokens-list in the first place.- Code:
- Exact Method Body:
String tag = htmlTag.trim().toLowerCase(); boolean ret = tags.remove(tag); if (ret) { // "Lower-Case" and "Pre-Instantiated" (Zero-Attributes) version of TagNode tagNodesOpening.remove(tag); tagNodesClosing.remove(tag); tag = tag.toUpperCase(); // "Upper-Case", Pre-Instantiated, Zero-Attribute version of TagNode tagNodesOpeningUC.remove(tag); tagNodesClosingUC.remove(tag); // After removal, there is a small chance the // MAX_TOKEN_LENGTH is, now, shorter if (tag.length() == MAX_TOKEN_LENGTH) setMaxTokenLength(); } return ret;
-
addSingleton
public static boolean addSingleton(java.lang.String htmlTagSingleton)
Removes an HTML-element to the list of singleton HTML-elements. A singleton may only have an "opening" tag, and may not have a closing-version tag. For instance the<IMG SRC="...">
is the classic-singleton, it's data is all stored internally as attribute values.- Parameters:
htmlTagSingleton
- Any HTML tag that you would like to see listed as a singleton HTML-element.- Returns:
TRUE
if the element was indeed a new element to the list, andFALSE
if the HTML-singleton tokens-list already contained this HTML element. If so, this method call will just return gracefully - with no changes being made to the underlying list of singleton tokens.- Throws:
java.lang.IllegalArgumentException
- If you have tried to "register" a singleton tag that isn't a fundamental HTML-tag, then this method will throw an exception directing you to first add your token to the HTML-tags/tokens internal-list.- Code:
- Exact Method Body:
String tag = htmlTagSingleton.trim().toLowerCase(); if (! tags.contains(tag)) throw new IllegalArgumentException( "The HTML token you have attempted to add [" + tag + "] may not be added to the " + "singletons list, because it is not a known/registered HTML token, as of now. " + "First, make sure it is listed as one of the parser's tokens by calling " + "'addTag(token)', and then invoking this method with that token." ); // Internally, there is a private & static TreeSet<String> which saves the names // of all HTML 'singleton' elements. Use Java's TreeSet.add(E) method return singletonTags.add(tag);
-
removeSingleton
public static boolean removeSingleton(java.lang.String htmlTagSingleton)
Adds an HTML-element to the list of singleton HTML-elements. A singleton may only have an "opening" tag, and may not have a closing-version tag. For instance the<IMG SRC="...">
is the classic-singleton, it's data is all stored internally as attribute values.- Parameters:
htmlTagSingleton
- Any HTML tag that you no longer want to see in the HTML-singleton tokens-list.- Returns:
TRUE
if the element was removed, andFALSE
if it was not - because it wasn't in the HTML-Singleton tokens-list in the first place.- Code:
- Exact Method Body:
String tag = htmlTagSingleton.trim().toLowerCase(); // Internally, there is a private & static TreeSet<String> which saves the names // of all HTML 'singleton' elements. Use Java's TreeSet.remove(Object) method return singletonTags.remove(tag);
-
hasTag
public static TagNode hasTag(java.lang.String tag, TC openOrClosed)
The purpose of this function/method is to provide a little "optimization." Since 100% ofclass HTMLTag
information is stored as constant/final - this class facilitates instantiating only one copy of each node when building HTML page node-Vectors.
Internal to this class is a'Vector<TagNode>'
of each and every HTML-Tag available - both in upper-case tag-versions, and also in lower-case tags. There must also be an opening-version of theTagNode
, and also a closing-version of the sameTagNode
.
This does, indeed, make a total of four total pre-instantiated tags that are stored within the Java-HTML JAR File. There is ajava.util.TreeMap
that is holding these serialized-TagNode
instances. ThisTreeMap
has also been serialized and saved in the Java-HTML JAR, and it is loaded into memory by the Class-Loader as soon as an invocation to an HTML Method is made.
It is not mandatory to "reuse" instantiated HTMLTagNode
's, but for memory management, garbage-collection efficiency, and other optimizations, the classes in this package use the pre-instantiated versions of these objects whenever possible.- Parameters:
tag
- Any valid HTML tag. If the String passed is not a valid HTML tag, then this method will return null.openOrClosed
- IfTC.OpeningTags
is passed, then an "open" version of the HTML tag will be returned, and ifTC.ClosingTags
is passed, then a closing version will be returned. IfTC.Both
is accidentally passed - it will default toTC.OpeningTags
- Returns:
- An opening (or closing)
TagNode
- ornull
if the passedString tag
does not represent any valid HTML-Tag - Code:
- Exact Method Body:
// FAIL-FAST: Check Input's immediately. Throw Exception for invalid input. if (openOrClosed == null) throw new NullPointerException ("Parameter 'openOrClosed' is null, but this is not allowed."); if (openOrClosed == TC.Both) throw new IllegalArgumentException ("Parameter 'openOrClosed' was specified as TC.Both, but this is not allowed here."); // IMPORTANT NOTE: For Singleton-Tags: There is no closing-version, so one SHOULD NOT be // requested. (There is no '</IMG>' tag!) However, this method DOES NOT throw // IllegalArgumentException in this case, but rather it just exits gracefully, and returns // null. String tagLC = tag.toLowerCase(); if (singletonTags.contains(tagLC) && (openOrClosed == TC.ClosingTags)) return null; // First, Check if the 'tag' is all lower-case. If it is, the string would be identical to // the 'tagLC' variable we have just created. if (tagLC.equals(tag)) { // Debugging Information, Debug-println. Un-comment to follow. DO NOTE DELETE THIS LINE. // System.out.println("Used a pre-instantiated TagNode, Lower-Case TreeMap"); return (openOrClosed == TC.OpeningTags) ? tagNodesOpening.get(tag) : tagNodesClosing.get(tag); } // Now, here, the variable could not have been all-lower-case. NEXT, Check if it is // all-upper-case // // NOTE: There are pre-defined tables that include pre-instantiated TagNode's - both for // lower-case tags and for upper-case tags. String tagUC = tag.toUpperCase(); if (tagUC.equals(tag)) { // Debugging Information, Debug-println. Un-comment to follow. DO NOTE DELETE THIS LINE. // System.out.println("Used a pre-instantiated TagNode, Upper-Case TreeMap"); return (openOrClosed == TC.OpeningTags) ? tagNodesOpeningUC.get(tag) : tagNodesClosingUC.get(tag); } // SPECIAL CASE: (Very Rare / Unlikely, but possible) The user has created an HTML Element // that has some lower-case alphabet letters, and some upper-case as well. This does not // guarantee that it is a valid HTML Token, though, so check // // FOR EXAMPLE: If somebody typed <SeCtIoN>, we need to preserve the case, no matter how // bizarre. In such a case, a pre-packaged TagNode cannot be used, and instead // a new TagNode must be instantiated. if (openOrClosed == TC.OpeningTags) return (tagNodesOpening.get(tagLC) == null) ? null : new TagNode("<" + tag + ">"); else return (tagNodesClosing.get(tagLC) == null) ? null : new TagNode("</" + tag + ">");
-
getTag_MEM_HEAP_CHECKOUT_COPY
public static java.lang.String getTag_MEM_HEAP_CHECKOUT_COPY (java.lang.String tag)
This is an optimized, internal method that is used to prevent lots of duplicate HTML token-String's
from being created by theparser.
Internally, there ought to be just one-instance ofString's
like:"img", "br", "div",
etc... This is used by theparser
to reuse an already instantiated tokenString
.
This method probably has relatively little use outside of the internal HTMLparser
code.- Parameters:
tag
- This is an HTML token. An identicalString
to this 'token'String
, but possible different memory reference on the heap shall be returned.- Returns:
- The returned
String
shall obey this issue:- assert(tag.equals(returned_string)); // Identical
String
is returned
- assert(! (tag == returnedString)); // Probably a different memory allocation on the
// heap. PROBABLY!
Note that Java does not make any contracts regardingString
references! (This can only help...)
IMPORTANT: If the tag passed is not a valid HTML-Tag, then this method shall return null. - assert(tag.equals(returned_string)); // Identical
- Code:
- Exact Method Body:
// Obviously, for the 200 or so "pre-instantiated" (having-no-attributes) instances of // class TagNode that are kept, internally, in the data-structures of this class, // 'HTMLTags' We cannot retrieve a "pre-allocated" copy of the tag-as-a-string from // the heap, because we are building the data-file for the first time! if (BUILDING_DATA_FILE___SKIP_OPTIMIZATION_TEMPORARILY) return tag.toLowerCase(); TagNode tn = tagNodesOpening.get(tag.toLowerCase()); // If the tag isn't found, make sure not to throw NullPointerException! if (tn == null) return null; // This "version" (of the exact same html-element-name is already on the heap) // Obviously, because, variable 'tn' has already been instantiated and is in the TreeMap // If this EXACT SAME REFERENCE IS USED FOR ALL "TagNode.tok" instances, quite a bit of // wasted-space in the heap's lookup table will be eliminated as the same "token" // (which is the name of the HTML Element: "div," "img," "span," etc...) is reused over // and over and over again. Helps a little bit! Not that complicated! return tn.tok;
-
isTag
public static boolean isTag(java.lang.String tag)
Checks if aString
is registered as a proper HTML tag according to the internally maintained lists.
View Tags List:
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's'tag'
parameter, and result in a return value of TRUE. This list is the complete list of HTML Element Names that are maintained, by default, in this class' internalLookup Table
ofHTML
Tags.
HTML Elements
Case Insensitive:
The test performed by this method shall ignore case.
Modifying this List:
The list ofHTML Elements
may, in fact, be altered. To add a newElement Name
to the internal lookup table of valid HTML Elements, useaddTag(String)
. To remove an HTML Element from the internal list, useremoveTag(String)
.- Returns:
TRUE
if this is a valid HTML tag. NOTE: All HTML-5 Element-TagStrings
will returnTRUE
as they are contained in the default internal list.- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the standard HTML Tags. Just uses Java's TreeSet.contains(Object) method. return tags.contains(tag.toLowerCase());
-
isHTML5
public static boolean isHTML5(java.lang.String tok)
Checks if aString
is a proper HTML-5 (only) tag. This list is rather short, and only containsHTML Elements
which specifically for the release of HTML 5. AnyHTML Element
which is both a validHTML Release 4
(or earlier) and anHTML 5 Element
will not result inTRUE
being returned by this method.
View Tags List:
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's'tok'
parameter, and result in a return value of TRUE. This list is the complete list of HTML-5 Element Names that are maintained, by default, in this class' internalLookup Table
ofHTML-5
Tags.
Elements Added for HTML-5
Case Insensitive:
The test performed by this method shall ignore case.- Parameters:
tok
- Any HTML-Tag as aString
.- Returns:
TRUE
if this is a tag that was added for HTML-5, and not included in HTML 4, or earlier- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML-5 Tags. Just uses Java's TreeSet.contains(Object) method. return html5Tags.contains(tok.toLowerCase());
-
deprecated
public static boolean deprecated(java.lang.String tok)
Checks if aString
is listed as an HTML Element that was deprecated for HTML 5
View Tags List:
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's'tok'
parameter, and result in a return value of TRUE. This list is the complete list of Deprecated HTML Element Names that are maintained, by default, in this class' internalLookup Table
ofDeprecated HTML
Tags.
Elements Deprecated for HTML-5
Case Insensitive:
The test performed by this method shall ignore case.- Parameters:
tok
- Any HTML-Tag as aString
.- Returns:
TRUE
if this tag was deprecated for HTML-5- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the deprecated-for-HTML-5 Tags. Just uses Java's TreeSet.contains(Object) // method. return deprecated.contains(tok.toLowerCase());
-
isSingleton
public static boolean isSingleton(java.lang.String tok)
This method checks whether specific HTML elements are both "opening and closing" elements, such as:P, DIV, SPAN,
along with myriad others, OR if this one of the (very few) "singleton HTML elements", such as the HTML<IMG SRC="...">
element which may not have a closing tag. Such tags are also called "Self-Closing" tags.
View Tags List:
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's'tok'
parameter, and result in a return value of TRUE. This list is the complete list of Singleton Element Names that are maintained, by default, in this class' internalLookup Table
ofSingleton
Tags.
Singleton Elements
Case Insensitive:
The test performed by this method shall ignore case.
Modifying this List:
The list ofSingleton HTML Elements
may, in fact, be altered. To add a newSingleton HTML Element Name
to the internal lookup table of valid Singleton Elements, useaddSingleton(String)
. To remove an HTML Elementfrom the internal list, useremoveSingleton(String)
.- Parameters:
tok
- This is the HTML element name to be tested.- Returns:
TRUE
if this is a'singleton'
HTML Element - a.k.a., onlyOpeningTag
versions of the element exist, because singleton HTML elements don't need / may not have a closing tag.Singleton
examples include:IMG, HR, INPUT
etc...FALSE
is returned if the tag is not asingleton
parameter.- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the 'singleton' HTML Tags. Just uses Java's TreeSet.contains(Object) method. return singletonTags.contains(tok.toLowerCase());
-
isBlock
public static boolean isBlock(java.lang.String tok)
This method checks whether specific HTML elements are among the'Block'
Tag elements list. An explanation of what a'block'
or'inline'
tag is, is beyond the scope of this document.
View Tags List:
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's'tok'
parameter, and result in a return value of TRUE. This list is the complete list of Block Element Names that are maintained, by default, in this class' internalLookup Table
ofBlock
Tags.
HTML Block Elements
Case Insensitive:
The test performed by this method shall ignore case.- Parameters:
tok
- This is the HTML element name to be tested.- Returns:
TRUE
if this is a'block'
HTML Element,FALSE
otherwise.- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Block' Tags. Just uses Java's TreeSet.contains(Object) method. return blockTags.contains(tok.toLowerCase());
-
isInline
public static boolean isInline(java.lang.String tok)
This method checks whether specific HTML elements are among the'Inline'
Tag elements list. An explanation of what a'block'
or'inline'
tag is, is beyond the scope of this document.
View Tags List:
The HTML Elements which are listed (in the link below), indicate exactly what may be passed to this method's'tok'
parameter, and result in a return value of TRUE. This list is the complete list of Inline Element Names that are maintained, by default, in this class' internalLookup Table
ofInline
Tags.
HTML Inline Elements
Case Insensitive:
The test performed by this method shall ignore case.- Parameters:
tok
- This is the HTML element name to be tested.- Returns:
TRUE
if this is an'inline'
HTML Element,FALSE
otherwise.- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Inline' Tags. Just uses Java's TreeSet.contains(Object) method. return inlineTags.contains(tok.toLowerCase());
-
getDescription
public static java.lang.String getDescription(java.lang.String tag)
Returns a brief, English Language Description, of an HTML Tag. These descriptions are stored in a small data-file,
Loading from JAR-File:
This method will attempt to load a particular data-file from the JAR-library into memory. This file contains a one-sentence description, stored asjava.lang.String's
for each of the HTML Elements known to this class. Under normal operation, theseString
-arrays remain on-disk, only.- Parameters:
tag
- Any valid HTML tag.- Returns:
- A short English-Language description of the Tag in HTML, or null if this tag is unknown.
- See Also:
loadDescriptions()
- Code:
- Exact Method Body:
// Loads the descriptions map, ONLY IF they have not already been loaded into memory from // the JAR data-files loadDescriptions(); return descriptions.get(tag.toLowerCase());
-
iterator
public static java.util.Iterator<java.lang.String> iterator()
Internally, tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
Remove Unsupported:
In order to prevent accidental removal of HTML-Tags via theIterator's 'remove()'
method, the returned-Iterator
instance has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destorying a member of the this class' vital data-structures.
Data File Contents:
The contents of thisIterator
may be viewed here:
HTML Elements
- Returns:
- an
Iterator<String>
that iterates over all the Tag-String's
in alphabetical order. - See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the standard HTML Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(tags.iterator());
-
iteratorDescriptions
public static java.util.Iterator<java.util.Map.Entry<java.lang.String,java.lang.String>> iteratorDescriptions ()
Will build anIterator
that can return attributes and their text-String
descriptions.
Data File Contents:
The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load is only performed on request. The contents of this data-file (and the list ofMap.Entry's
returned by theIterator
) may be viewed, here, by clicking the link below:
HTML Elements with Descriptions
Lazy Loading:
In this class, if the methods invoked do not require the Event-DescriptionString
-Data, then the Class-Loader will not load this extensive text-data into memory from the JAR data-files.- Returns:
- an
Iterator
that iterates the HTML-Tag / HTML-Tag-Description key-value pairs as instances of"Map.Entry<String, String>"
- See Also:
loadDescriptions()
,RemoveUnsupportedIterator
- Code:
- Exact Method Body:
loadDescriptions(); // Will only load if descriptions have not already been loaded. return new RemoveUnsupportedIterator<Map.Entry<String, String>> (descriptions.entrySet().iterator());
-
iteratorAddedForHTML5
public static java.util.Iterator<java.lang.String> iteratorAddedForHTML5()
Internally, HTML-5 tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
Remove Unsupported:
In order to prevent accidental removal of HTML-5-Tags via theIterator's 'remove()'
method, the returned-Iterator
instance has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destorying a member of the this class' vital data-structures.
Data File Contents:
The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class-Loader. The Data-File (Iterator
) contents may be viewed here, by clicking the link below:
Elements Added for HTML-5
- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that were added for in HTML-5. - See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML-5 Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(html5Tags.iterator());
-
iteratorDeprecatedForHTML5
public static java.util.Iterator<java.lang.String> iteratorDeprecatedForHTML5 ()
Internally, deprecated tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
Remove Unsupported:
In order to prevent accidental removal of Deprecated-Tags via theIterator's 'remove()'
method, the returned-Iterator
instance has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destorying a member of the this class' vital data-structures.
Data File Contents:
The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class-Loader. The Data-File (Iterator
) contents may be viewed here, by clicking the link below:
Elements Deprecated for HTML-5
- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that were removed for HTML-5. - See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the deprecated Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(deprecated.iterator());
-
iteratorSingletonTags
public static java.util.Iterator<java.lang.String> iteratorSingletonTags()
Internally, singleton / self-closing tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
Remove Unsupported:
In order to prevent accidental removal of Singleton-Tags via theIterator's 'remove()'
method, the returned-Iterator
instance has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destorying a member of the this class' vital data-structures.
Data File Contents:
The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class-Loader. The Data-File (Iterator
) contents may be viewed here, by clicking the link below:
Singleton Elements
- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that qualify as singleton elements, and may not have closing-tag versions. - See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Singleton' Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(singletonTags.iterator());
-
iteratorBlockTags
public static java.util.Iterator<java.lang.String> iteratorBlockTags()
Internally, singleton / self-closing tags are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator()
method on thatTreeSet
.
Remove Unsupported:
In order to prevent accidental removal of Block-Tags via theIterator's 'remove()'
method, the returned-Iterator
instance has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destorying a member of the this class' vital data-structures.
Data File Contents:
The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class-Loader. The Data-File (Iterator
) contents may be viewed here, by clicking the link below:
HTML Block Elements
- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that qualify as block elements. - See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Inline' Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(blockTags.iterator());
-
iteratorInlineTags
public static java.util.Iterator<java.lang.String> iteratorInlineTags()
Internally, "HTML Block Tags" are stored in a Javajava.util.TreeSet<String>
. This method invokes theiterator();
method on thatTreeSet
.
Remove Unsupported:
In order to prevent accidental removal of Inline-Tags via theIterator's 'remove()'
method, the returned-Iterator
instance has been overloaded - "wrapped" - in a simple class that throws an exception ifremove()
is invoked. The purpose is to prevent a user from accidentally destorying a member of the this class' vital data-structures.
Data File Contents:
The contents of thisIterator
are loaded from a (small) internal data-file stored in the JAR Distribution for this Java HTML Package. Load of this data is performed as soon as this class is loaded by the Class-Loader. The Data-File (Iterator
) contents may be viewed here, by clicking the link below:
HTML Inline Elements
- Returns:
- an
Iterator<String>
that cycles through the list of HTML Tag-String's that qualify as inline elements. - See Also:
RemoveUnsupportedIterator
- Code:
- Exact Method Body:
// Internally, this class has a private & static TreeSet<String> that stores a list // of all the HTML 'Block' Tags. Just uses Java's TreeSet.iterator() method. // // NOTE: The 'RemoveUnsupportedIterator' wrapper class prohibits modifications to this // TreeSet return new RemoveUnsupportedIterator<String>(inlineTags.iterator());
-
-