Package Torello.HTML
Class Features
- java.lang.Object
-
- Torello.HTML.Features
-
public class Features extends java.lang.Object
Tools to retrieve and insert tags into the<HEAD>
of a web-page.Simple Tools for Retrieving and Updating Page-Header Pieces
Note that if updates are being performed, and if there are many, it is more efficient to extract aSubSection
, and update thatSubSection
- and re-insert it back into the page afterwards.
Example:
// Scrape any Page Vector<HTMLNode> page = HTMLPage.getPageTokens(new URL("http://some.url.com/page.html"), false); // IMPORTANT: By extracting a "SubSection", the next several lines which insert HTML into the // header section **DO NOT** require shifting hundreds of HTML nodes forward to // perform these inserts. Only the nodes in the header are shifted forward during // these insert operations. SubSection header = TagNodePeekInclusive.first(page, "head"); // Add Some HTML / CSS Header Elements Features.insertFavicon(header.html, "../../SiteLogo.png"); Features.insertCSSLink(header.html, "../../MyCSSPage.css"); Features.Meta.insertUTF8MetaTag(header.html); Features.Meta.insertKeyWords(header.html, "Java", "HTML", "Parse"); // Re-Insert the header back into the main page. Shifting all of the nodes on the main-page // is now done only once! page = ReplaceNodes.r(page, header);
This class handles some of the extremely common features found in HTML web-pages. The collection of capabilities listed here are sometimes referred to as "SEO" or "Search Engine Optimization." Generally, the features do not actually work as well as Search Engine Companies would like you to believe. Sure, there are tags SEO tags for companies that cater to very specialized, and very "niche markets." If you have a website that sells cup-cakes in Dallas, and you specialize in cup-cakes, your SEO settings will probably work all-right - probably!
If you have decided to write a Java-Based HTML Search Engine, and would like your Java Libraries to be ranked at the top of search-engine requests any-time a user types the words "Java and HTML" into a browser, there is not a lot SEO will be able to do for you - not even using the features in this "Features" class!
Hi-Lited Source-Code:- View Here: Torello/HTML/Features.java
- Open New Browser-Tab: Torello/HTML/Features.java
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 10 Method(s), 10 declared static
- 7 Field(s), 7 declared static, 7 declared final
-
-
Nested Class Summary
Nested Classes Modifier and Type Class static class
Features.Meta
-
Field Summary
Fields Modifier and Type Field static String
canonicalTag
static String
cssExternalSheet
static String
cssExternalSheetWithMediaAttribute
static String
favicon
static String
javaScriptExternalPage
protected static TextNode
NEWLINE
static String
NO_HEADER_MESSAGE
-
Method Summary
Retrieve HTML Header Elements Modifier and Type Method static Vector<TagNode>
getAllCSSLinks(Vector<? extends HTMLNode> html)
static String[]
getAllExternalJSLinks(Vector<? extends HTMLNode> html)
static String
hasCanonicalURL(Vector<? extends HTMLNode> html)
static String
hasFavicon(Vector<? extends HTMLNode> html)
Insert HTML Elements into <HEAD>...</HEAD> Modifier and Type Method static void
insertCanonicalURL(Vector<HTMLNode> html, String canonicalURLAsStr)
static void
insertCSSLink(Vector<HTMLNode> html, String externalCSSFileURLAsString)
static void
insertCSSLink(Vector<HTMLNode> html, String externalCSSFileURLAsString, String mediaInnerTagValue)
static void
insertExternalJavaScriptLink(Vector<HTMLNode> html, String externalJSFileURLAsString)
static void
insertFavicon(Vector<HTMLNode> html, String imageURLAsString)
Internal Methods Modifier and Type Method protected static void
checkForSingleQuote(String s)
-
-
-
Field Detail
-
NO_HEADER_MESSAGE
public static final java.lang.String NO_HEADER_MESSAGE
- See Also:
- Constant Field Values
- Code:
- Exact Field Declaration Expression:
public static final String NO_HEADER_MESSAGE = "You are attempting to insert an HTML INSERT-STR, but such an element belongs in the " + "page's header. Unfortunately, the page or sub-page you have passed does not have a " + "<HEAD>...</HEAD> sub-section. Therefore, there is no place to insert the elements.";
-
favicon
public static final java.lang.String favicon
This String may be inserted in the HTML<HEAD> ... </HEAD>
section to add a "logo-image" at the top-left corner of the web-browser's tab for the page when it loads. Some people call this a'favicon'
. I mean I don't usually call it that, but I guess here it's going to be a'favicon.'
- See Also:
insertFavicon(Vector, String)
,hasFavicon(Vector)
, Constant Field Values- Code:
- Exact Field Declaration Expression:
public static final String favicon = "<link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' />";
-
cssExternalSheet
public static final java.lang.String cssExternalSheet
This String may be inserted in the HTML<HEAD> ... </HEAD>
section to add a Cascading Style Sheet (a ".css" file) to your page. The web-browser that ultimately loads the HTML that you are exporting will render the style elements across all the HTML elements in your page that match the CSS selectors. Without going into a diatribe about how CSS works, instead, theString
that is ultimately instantiated as aTagNode
is provided here.- See Also:
insertCSSLink(Vector, String)
,getAllCSSLinks(Vector)
, Constant Field Values- Code:
- Exact Field Declaration Expression:
public static final String cssExternalSheet = "<link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' />";
-
cssExternalSheetWithMediaAttribute
public static final java.lang.String cssExternalSheetWithMediaAttribute
This String may be inserted in the HTML<HEAD> ... </HEAD>
section to add a Cascading Style Sheet (a ".css" file) to your page. This string allows for the "media" inner-tag / attribute.- See Also:
insertCSSLink(Vector, String)
,insertCSSLink(Vector, String, String)
,getAllCSSLinks(Vector)
, Constant Field Values- Code:
- Exact Field Declaration Expression:
public static final String cssExternalSheetWithMediaAttribute = "<link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' " + "media='INSERT-MEDIA-ATTRIBUTE-VALUE-HERE' />";
-
javaScriptExternalPage
public static final java.lang.String javaScriptExternalPage
This String may be inserted in the HTML<HEAD> ... </HEAD>
section to add a Java-Script'.js'
file. The web-browser will download this Java-Script page from the URL that you ultimately provide and load all variable definitions, and dispatch to any methods that are invoked by the event-handlers when user or operating system events are fired.
IMPORTANT NOTE: Inserting an external java-script page has one important difference vis-a-vis inserting an external CSS page. Inserting a link to a'.js'
page requires both the opened and the closed HTML<SCRIPT ..></SCRIPT>
tag-elements. This is expected and required even-when / especially-when there is no actual java-script code being placed on the'.html'
page itself. Effectively, regardless of whether you are putting actual java-script code on your HTML page, or just inserting a link knowing the browser will download the external'.js'
file for you, you still must create an both the open and the closed HTML<SCRIPT SRC='...'></SCRIPT>
elements and insert them into your vectorized-html web-page.
HTML Elements:
<!-- This is a short note about including the HTML SCRIPT element in your web-pages. --> <HTML> <HEAD> <!-- Version #1 Inserting a java-script 'variables & functions' external-page --> <SCRIPT TYPE='text/javascript' SRC='/script/javaScriptFiles/functions.js'> </SCRIPT> <!-- Right here (line above) we always need the closing Script-tag, even when there is no actual java-script present, and the methods/variables are going to be downloaded from the java-script file identified in by the SRC="..." attribute! --> <SCRIPT TYPE='text/javascript'> var someVar1; var someVar2; function someFunction() { return; } </SCRIPT> <!-- Either way, the closing-script tag is expected. -->
- See Also:
insertExternalJavaScriptLink(Vector, String)
,getAllExternalJSLinks(Vector)
, Constant Field Values- Code:
- Exact Field Declaration Expression:
public static final String javaScriptExternalPage = "<script type='text/javascript' src='INSERT-URL-STRING-HERE'>";
-
canonicalTag
public static final java.lang.String canonicalTag
If you have pages on your site that are almost identical, then you may need to inform search engines which one to prioritize. Or you might have syndicated content on your site which was republished elsewhere. You can do both of these things without incurring a duplicate content penalty – as long as you use a canonical tag.
Instead of confusing Google and missing your ranking on the SERP's, you are guiding the crawlers as to which URL counts as the “main” one. This places the emphasis on the right URL and prevents the others from cannibalizing your SEO.
Use canonical tags to avoid having problems with duplicate content that may affect your rankings.
NOTE: Content of this java-documentation description was copied from a page on web-domain'http://searchenginewatch.com'
. It was lifted on May 24th, 2019. See link below, if still valid:
https://searchenginewatch.com/2018/04/04/a-quick-and-easy-guide-to-meta-tags-in-seo/- See Also:
insertCanonicalURL(Vector, String)
,hasCanonicalURL(Vector)
, Constant Field Values- Code:
- Exact Field Declaration Expression:
public static final String canonicalTag = "<link rel='canonical' href='INSERT-URL-STRING-HERE' />";
-
-
Method Detail
-
checkForSingleQuote
protected static void checkForSingleQuote(java.lang.String s)
This method checks whether the parameter string contains a single-quotations punctuation-mark anywhere in theString
. If so, an exception is thrown. This is generally an internal-helper method.- Parameters:
s
- This is any java-string, but generally it is one used to insert into an HTML 'content' attribute.- Throws:
QuotesException
- If the passed parameter string contains any instance of single-quotation.- Code:
- Exact Method Body:
int pos; if ((pos = s.indexOf("'")) != -1) throw new QuotesException( "The passed string-parameter may not contain a single-quote punctuation mark. " + "Yours was: [" + s + "], and has a single-quotation mark at string-position " + "[" + pos + "]" );
-
insertFavicon
public static void insertFavicon(java.util.Vector<HTMLNode> html, java.lang.String imageURLAsString)
This inserts a favicon HTML link element into the right location so that a particular web-page will render an "browser icon image" in the page's browser-tab left corner when the page loads into a browser.- Parameters:
html
- The vectorized-html web-page, but it is important that it be one that contains an HTML<HEAD> ... </HEAD>
sub-section, or this method will generate / throw a'NodeNotFoundException'
because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML'header'
section.imageURLAsString
- This is theString
that will be copied into thepublic static final String 'favicon'
and converted into an HTML'TagNode'
It will then be inserted as the first element of the html page header.- Throws:
NodeNotFoundException
- This is thrown if there is no HTML<HEAD> ... </HEAD>
section on the page where the<link rel='icon' href='image_url'>
would have to be inserted.QuotesException
- If the image URL uses a single-quote mark, anywhere in the URL-string.- See Also:
favicon
,checkForSingleQuote(String)
- Code:
- Exact Method Body:
// Insert the Favicon <LINK ...> element into the <HEAD> section of the input html page. // <link rel='icon' type='image/INSERT-IMAGE-TYPE-HERE' href='INSERT-URL-STRING-HERE' /> checkForSingleQuote(imageURLAsString); // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw. DotPair header = TagNodeFindInclusive.first(html, "head"); if (header == null) throw new NodeNotFoundException (NO_HEADER_MESSAGE.replace("INSERT-STR", "favicon <LINK> element")); String ext = IF.getGuess(imageURLAsString).extension; if (ext == null) throw new IllegalArgumentException( "The Image-Type of the 'imageURLAsString' parameter could not be determined. " + "The method IF.getGuess(faviconURL) returned null. Please provide a favicon with " + "standard image file-type. This is required because the image-type is required " + "to be placed inside the HTML <LINK TYPE=... HREF=...> Element 'TYPE' Attribute." ); // Build a new Favicon TagNode. TagNode faviconTN = new TagNode( favicon .replace("INSERT-URL-STRING-HERE", imageURLAsString) .replace("INSERT-IMAGE-TYPE-HERE", ext) ); // Insert the Favicon into the page. Put it at the top of the header, just after <HEAD> Util.insertNodes(html, header.start + 1, NEWLINE, faviconTN, NEWLINE);
-
hasFavicon
public static java.lang.String hasFavicon (java.util.Vector<? extends HTMLNode> html)
This method will search for an HTML<LINK REL="icon" ...>
element, specifically expecting the link element to contain an inner-tag / attribute name'REL'
whose value is'icon'
. If it finds one, it will return the value of the other attribute named'HREF=...'
.- Parameters:
html
- Any html page, but preferably one that contains a<LINK REL="icon" ...>
element.- Returns:
- This method will return the
String
value of the'HREF=...'
attribute found inside the'<LINK>'
element, if this page or sub-page has such an element, with such an attribute. If there are no LINK elements found on this page, then'null'
will be returned.
NOTE: In the event that multiple copies of the HTML<LINK>
element are found, and more than one has a'REL'
attribute whose value contains theString
'icon'
, this method will just return the first value it finds. - See Also:
InnerTagGet
,favicon
,TagNode.AV(String)
- Code:
- Exact Method Body:
// InnerTagGet.all: Returns a vector of TagNode's that resemble: <LINK rel="icon" ...> // // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality // String-Comparison. // Trim the 'rel' Attribute-Value String of possible leading & trailing // White-Space before performing the comparison. Vector<TagNode> list = InnerTagGet.all (html, "link", "rel", TextComparitor.EQ_CI_TRM, "icon"); // If there were no HTML "<LINK ...>" elements with REL='ICON' attributes, then // there was no favicon. if (list.size() == 0) return null; // Just in case there were multiple favicon <LINK ...> tags, just return the first // one found. Inside of a <LINK REL="icon" HREF="..."> the 'HREF' Attribute contains // the Image-URL. Use TagNode.AV("HREF") to retrieve that image url. String s; for (TagNode tn : list) if ((s = tn.AV("href")) != null) return s; // If for some reason, none of these <LINK REL='ICON' ...> elements had an "HREF" // attribute, then just return null. return null;
-
insertCSSLink
public static void insertCSSLink (java.util.Vector<HTMLNode> html, java.lang.String externalCSSFileURLAsString)
This inserts an HTML'<LINK ...>'
element into the right location for linking an externally-defined Cascading Style Sheet'.css'
page.- Parameters:
html
- Any vectorized-html web-page, but it is important that it be one that contains an HTML<HEAD> ... </HEAD>
sub-section, or this method will generate / throw a'NodeNotFoundException'
because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML'header'
section.externalCSSFileURLAsString
- This is theString
that will be copied into thepublic static final String 'cssExternalSheet'
and converted into an HTML'TagNode'
It will then be inserted as the first element of the html page header.- Throws:
NodeNotFoundException
- This is thrown if there is no HTML<HEAD> ... </HEAD>
section on the page where the<link rel='stylesheet' type='text/css' href='local-url/someFile.css' />
would have to be inserted.QuotesException
- If the CSS-sheet URL uses a single-quote mark, anywhere in the URL-string.- See Also:
cssExternalSheet
,cssExternalSheetWithMediaAttribute
,insertCSSLink(Vector, String, String)
,getAllCSSLinks(Vector)
,checkForSingleQuote(String)
,DotPair
,TagNode
- Code:
- Exact Method Body:
// Inserts an external CSS Link into the <HEAD> section of this html page vector // <link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' /> checkForSingleQuote(externalCSSFileURLAsString); // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw. DotPair header = TagNodeFindInclusive.first(html, "head"); if (header == null) throw new NodeNotFoundException( NO_HEADER_MESSAGE.replace ("INSERT-STR", "externally-linked CSS page <LINK> element") ); TagNode cssTN = new TagNode (cssExternalSheet.replace("INSERT-URL-STRING-HERE", externalCSSFileURLAsString)); // Insert the Style-Sheet link into the page. Put it at the top of the header, // just after <HEAD> Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
-
insertCSSLink
public static void insertCSSLink (java.util.Vector<HTMLNode> html, java.lang.String externalCSSFileURLAsString, java.lang.String mediaInnerTagValue)
This inserts an HTML'<LINK ...>'
element into the right location for linking an externally-defined Cascading Style Sheet'.css'
page.- Parameters:
html
- Any vectorized-html web-page, but it is important that it be one that contains an HTML<HEAD> ... </HEAD>
sub-section, or this method will generate / throw a'NodeNotFoundException'
because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML'header'
section.externalCSSFileURLAsString
- This is theString
that will be copied into thepublic static final String 'cssExternalSheet'
and converted into an HTML'TagNode'
It will then be inserted as the first element of the html page header.mediaInnerTagValue
- Externally linked CSS pages, which are included using the HTML<LINK ...>
element may explicitly request a'media'
attribute be inserted into the link element. That media attribute may take one of five values. The media attribute in a link tag specifies when the CSS rules are to be applied.
Here are the most common values for attribute 'media,' below:Attribute Value Intended CSS Meaning screen indicates for use on a computer screen. projection for projected presentations. handheld for handheld devices (typically with small screens). print to style printed web pages. all (default value) This is what most people choose. You can leave off the media attribute completely if you want your styles to be applied for all media types. - Throws:
NodeNotFoundException
- This is thrown if there is no HTML<HEAD> ... </HEAD>
section on the page where the<link ... type='text/css' rel="stylesheet" href="local-url/someFile.css" media="some-media">
node would have to be inserted.QuotesException
- If the CSS-sheet URL, or the media-tag, use a single-quote mark, anywhere inside theString's
.- See Also:
cssExternalSheet
,cssExternalSheetWithMediaAttribute
,insertCSSLink(Vector, String)
,getAllCSSLinks(Vector)
,checkForSingleQuote(String)
,DotPair
,TagNode
- Code:
- Exact Method Body:
// Inserts an external CSS Link (with 'media' attribute) into the <HEAD> section of // this html page vector // <link rel='stylesheet' type='text/css' href='INSERT-URL-STRING-HERE' // media='INSERT-MEDIA-ATTRIBUTE-VALUE-HERE' /> checkForSingleQuote(externalCSSFileURLAsString); checkForSingleQuote(mediaInnerTagValue); // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw. DotPair header = TagNodeFindInclusive.first(html, "head"); if (header == null) throw new NodeNotFoundException( NO_HEADER_MESSAGE.replace ("INSERT-STR", "externally-linked CSS page <LINK> element") ); // Build the TagNode TagNode cssTN = new TagNode( cssExternalSheetWithMediaAttribute .replace("INSERT-URL-STRING-HERE", externalCSSFileURLAsString) .replace("INSERT-MEDIA-ATTRIBUTE-VALUE-HERE", mediaInnerTagValue) ); // Insert the Style-Sheet link into the page. Put it at the top of the header, just // after <HEAD> Util.insertNodes(html, header.start + 1, NEWLINE, cssTN, NEWLINE);
-
getAllCSSLinks
public static java.util.Vector<TagNode> getAllCSSLinks (java.util.Vector<? extends HTMLNode> html)
This will retrieve all linked CSS pages from a vectorized-html web-page.- Parameters:
html
- This may be any vectorized-html web-page.- Returns:
- This will return the links as a list of
TagNode
- See Also:
insertCSSLink(Vector, String)
,insertCSSLink(Vector, String, String)
,InnerTagGet
- Code:
- Exact Method Body:
// InnerTagGet.all: Returns a vector of TagNode's that resemble: // <LINK rel="stylesheet" ...> // // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality // String-Comparison // Trim the 'rel' Attribute-Value String of possible leading & trailing // White-Space before performing the comparison. return InnerTagGet.all(html, "link", "rel", TextComparitor.EQ_CI_TRM, "stylesheet");
-
insertExternalJavaScriptLink
public static void insertExternalJavaScriptLink (java.util.Vector<HTMLNode> html, java.lang.String externalJSFileURLAsString)
This inserts an HTML'<LINK ...>'
element into the right location for linking an externally-defined java-script'.js'
file-page.- Parameters:
html
- Any vectorized-html web-page, but it is important that it be one that contains an HTML<HEAD> ... </HEAD>
sub-section, or this method will generate / throw a'NodeNotFoundException'
because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML'header'
section.externalJSFileURLAsString
- This is theString
that will be copied into thepublic static final String 'javaScriptExternalPage'
and converted into an HTML'TagNode'
It will then be inserted as the first element of the html page header.- Throws:
NodeNotFoundException
- This is thrown if there is no HTML<HEAD> ... </HEAD>
section on the page where the<SCRIPT ... SRC="local-url/someScriptFile.js"></SCRIPT>
nodes would have to be inserted.QuotesException
- If the java-script page-URL uses a single-quote mark, anywhere in the url-string.- See Also:
javaScriptExternalPage
,getAllExternalJSLinks(Vector)
,checkForSingleQuote(String)
,TagNode
,TextNode
,DotPair
,HTMLTags.hasTag(String, TC)
- Code:
- Exact Method Body:
// Builds an external Java-Script link, and inserts it into the header portion of // this html page. // <script type='text/javascript' src='INSERT-URL-STRING-HERE'> checkForSingleQuote(externalJSFileURLAsString); // Build an HTML <SCRIPT ...> node, and a </SCRIPT> node. HTMLNode n = new TagNode(javaScriptExternalPage.replace ("INSERT-URL-STRING-HERE", externalJSFileURLAsString)); HTMLNode closeN = HTMLTags.hasTag("script", TC.ClosingTags); // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw. DotPair header = TagNodeFindInclusive.first(html, "head"); if (header == null) throw new NodeNotFoundException( NO_HEADER_MESSAGE.replace( "INSERT-STR", "externally-linked Java-Script <SCRIPT> ... </SCRIPT> elements") ); // Insert the Java-Script link into the page. Put it at the top of the header, just // after <HEAD> Util.insertNodes(html, header.start + 1, NEWLINE, n, closeN, NEWLINE);
-
getAllExternalJSLinks
public static java.lang.String[] getAllExternalJSLinks (java.util.Vector<? extends HTMLNode> html)
First, inserting java-script directly onto an HTML page and including an external link to a'.js'
file page are extremely similar tasks. The construct is simply:<SCRIPT TYPE='text/javascript'> ... </SCRIPT>
either way! When the actual functions and methods are pasted into the HTML page directly, they are pasted exactly where the ellipses'...'
are listed in the HTML code noted previously. When a link is made to an external page from the same server directory... (See: 'linking external java-script pages from the same host' at the Google Search Bar), when linking pages - both the open and close<SCRIPT> ... </SCRIPT>
tag-elements must be included, while the text-content in place of the ellipses'...'
should just be left blank. The URL to the java-script page is included in thesrc='js_file_url'
inner-tag value.
This page will retrieve any and all script nodes that meet these criteria:- The "script body" must be empty, meaning there is no java-script between the open and close script-tags
- The
src=''
attribute must contain some non-null, non-zero-length value
- Parameters:
html
- This is any vectorized-html web-page.- Returns:
- This will return a list of relative URL's to externally linked java-script pages as Strings.
- See Also:
InnerTagGetInclusive
,javaScriptExternalPage
,insertExternalJavaScriptLink(Vector, String)
,TagNode
,TextNode
,TagNode.AV(String)
,HTMLNode.str
- Code:
- Exact Method Body:
// InnerTagGetInclusive.all: Returns a vector of TagNode's that resemble: // <SCRIPT TYPE="javascript" ...> // // CN_CI: Check the 'rel' Attribute-Value using a Case-Insensitive, "Contains" // String-Comparison // 'contains' rather than 'equals' testing is done because this value may be // "javascript", but it may also be "text/javascript" // // Inclusive: This means that everything between the <SCRIPT type="javascript"> ... and // the closing </SCRIPT> tag are returned in a vector of vectors. Vector<Vector<HTMLNode>> v = InnerTagGetInclusive.all (html, "script", "type", TextComparitor.CN_CI, "javascript"); Stream.Builder<String> b = Stream.builder(); TOP: for (Vector<HTMLNode> scriptSection : v) { String srcValue=null; for (HTMLNode n : scriptSection) { if (n.isTagNode()) if ((srcValue = ((TagNode) n).AV("src")) != null) break; if (n.isTextNode()) if (n.str.trim().length() > 0) break TOP; } b.add(srcValue); } return b.build().toArray(String[]::new);
-
insertCanonicalURL
public static void insertCanonicalURL(java.util.Vector<HTMLNode> html, java.lang.String canonicalURLAsStr)
This section will insert a "canonical url" into the web-page passed via the html-parameter. This canonical-url will be placed in an html<link rel='canonical' href='the_url'>
element. This element must be placed in the head section of the passed html page, and if the vectorized-html page that was passed does not contain a'header'
section, a'NodeNotFoundException'
(a run-time / unchecked Exception) will be thrown.- Parameters:
html
- This is any vectorized-html web-page, but it is important that it be one that contains an HTML<HEAD> ... </HEAD>
sub-section, or this method will generate / throw a'NodeNotFoundException'
because when a meta-tag is inserted onto a page, it must be inserted in the page's HTML'header'
section.canonicalURLAsStr
- This text-String'
will be substituted for the'href'
attribute-value inside an HTML<LINK>
element, and then inserted into the vectorized-page at the top of the<HEAD>...</HEAD>
section.- Throws:
NodeNotFoundException
- This is thrown if there is no HTML<HEAD> ... </HEAD>
section on the page where the<link rel='canonical' href='canonical_url'>
would have to be inserted.QuotesException
- If the canonical-page URL uses a single-quote mark, anywhere in the url-string.- See Also:
canonicalTag
,hasCanonicalURL(Vector)
,checkForSingleQuote(String)
,TagNode
,DotPair
- Code:
- Exact Method Body:
// Inserts a link element into the header of this page // <link rel='canonical' href='INSERT-URL-STRING-HERE' /> checkForSingleQuote(canonicalURLAsStr); // The HTML Page must have a <HEAD> ... </HEAD> section, or an exception shall throw. DotPair header = TagNodeFindInclusive.first(html, "head"); if (header == null) throw new NodeNotFoundException (NO_HEADER_MESSAGE.replace("INSERT-STR", "Canonical-url <LINK> element")); // Builds the canonical <LINK ...> element TagNode linkTN = new TagNode (canonicalTag.replace("INSERT-URL-STRING-HERE", canonicalURLAsStr)); // Insert the canonical-url into the page. Put it at the top of the header, just // after <HEAD> Util.insertNodes(html, header.start + 1, NEWLINE, linkTN, NEWLINE);
-
hasCanonicalURL
public static java.lang.String hasCanonicalURL (java.util.Vector<? extends HTMLNode> html) throws MalformedHTMLException
This method will check whether a vectorized-html page has an HTML<LINK REL='canonical' ...>
tag informing search-engines whether or not the page indicates there is a "Canonical URL" available that may be visited when trying to index a web-site with many pages and sub-pages. Canonical URL's are similar to a top-level "table of contents" web-page that allow the search-engine to avoiding sifting through thousands of sub-pages, and trying to index all of them.- Parameters:
html
- This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card'? extends HTMLNode'
means is this method can receive aVector<TagNode>, Vector<TextNode>
or aVector<CommentNode>
, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the'NodeSearch'
package. The most common vector-type used isVector<HTMLNode>
.- Returns:
- This will return whatever text was placed inside the canonical-url
HREF='some_url'
attribute/value pair of the HTML link tag. If there were no HTML<LINK REL='canonical' HREF='some_url'>
tag, then this method will return null. - Throws:
MalformedHTMLException
- This exception will be thrown if there are multiple html tags that match the link, and rel='canonical' search criteria requirements. If an HTML element<link rel='canonical'>
is found, but that element does not have anhref='...'
attribute, or that attribute is of zero length, then this a situation that will also force this exception to throw.- See Also:
InnerTagGet
,canonicalTag
,insertCanonicalURL(Vector, String)
,TagNode.AV(String)
- Code:
- Exact Method Body:
// InnerTagGet.all: Returns a vector of TagNode's that resemble: // <LINK rel="canonical" ...> // // EQ_CI_TRM: Check the 'rel' Attribute-Value using a Case-Insensitive, Equality // String-Comparison // Trim the 'rel' Attribute-Value String of possible leading & trailing // White-Space before performing the comparison. Vector<TagNode> v = InnerTagGet.all (html, "link", "rel", TextComparitor.EQ_CI_TRM, "canonical"); if (v.size() == 0) return null; if (v.size() > 1) throw new MalformedHTMLException( "The web-page you have passed has precisely " + v.size() + " canonical-url link elements, but it may not have more than 1. This is " + "invalid HTML." ); String s = v.elementAt(0).AV("href"); if (s == null) throw new MalformedHTMLException( "The HTML link element that was retrieved, contained a " + "rel='canonical' inner-tag / value pair, but did not have an href='...' " + "attribute. This is invalid HTML." ); if (s.length() == 0) throw new MalformedHTMLException( "The HTML link element that was retrieved contained a zero-length " + "string as an attribute-value for the href='...' attribute. This is not " + "invalid, but poorly formatted HTML." ); return s;
-
-