Package Torello.HTML

Class Listeners


  • public class Listeners
    extends java.lang.Object
    A basic tool for finding Java-Script Listener Attributes in the TagNode elements in a Vectorized-HTML Web-Page.

    This class allows a user to search for listeners in page or sub-page. It uses the exact same hierarchy of programmer-call options to decide what to look. Search parameters are left as differing method-calls with differing argument marshalling.

    NOTE: Quite a number of large web-sites no longer use java-script in their page itself. Searching through a major hub and looking for java-script will usually return 0 results. There are often java-script files downloaded from the <HEAD>...<SCRIPT></SCRIPT> tags, but generally if there is scripted-content, the script will operate on the class=..., id=..., and data-SOME_TAG=... attributes in the HTML Element. In this way inserting script into the body-text HTML page directly is avoided. If you are scraping a page you have written yourself, and it does have java-script, then by-all-means - test it out. However If these methods are returning '0' results, at least for many of the large news-websites and search-engines which were tested - listeners inside HTML Elements seemed uncommon.

    FIND, GET Find implies that a (int) position within the Vector will be returned as a search result(s). Get implies that the actual TagNode itself shall be returned.

    • int sPos, int ePos: When these parameters are present, only HTMLNode's between these specified Vector indices will be considered for matching the search criteria.
    • String htmlTags: When this parameter is present, only HTML TagNode's whose "primary tag" matches this string will be considered.


Stateless Class: This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member field. It is very similar to the Java-Bean @Stateless Annotation.
  • 1 Constructor(s), 1 declared private, zero-argument constructor
  • 19 Method(s), 19 declared static
  • 1 Field(s), 1 declared static, 1 declared final


    • Method Summary

       
      Basic Methods
      Modifier and Type Method
      static Properties extract​(TagNode tn)
      static Properties[] extractAll​(Vector<TagNode> list)
      static boolean hasListener​(TagNode tn)
       
      Find: Vector-indices having TagNode's with Listeners
      Modifier and Type Method
      static int[] find​(Vector<? extends HTMLNode> html)
      static int[] find​(Vector<? extends HTMLNode> html, int sPos, int ePos)
      static int[] find​(Vector<? extends HTMLNode> html, int sPos, int ePos, String... htmlTags)
      static int[] find​(Vector<? extends HTMLNode> html, String... htmlTags)
      static int[] find​(Vector<? extends HTMLNode> html, DotPair dp)
      static int[] find​(Vector<? extends HTMLNode> html, DotPair dp, String... htmlTags)
       
      Get: TagNode's that have Listeners
      Modifier and Type Method
      static Vector<TagNode> get​(Vector<? extends HTMLNode> html)
      static Vector<TagNode> get​(Vector<? extends HTMLNode> html, int sPos, int ePos)
      static Vector<TagNode> get​(Vector<? extends HTMLNode> html, int sPos, int ePos, String... htmlTags)
      static Vector<TagNode> get​(Vector<? extends HTMLNode> html, String... htmlTags)
      static Vector<TagNode> get​(Vector<? extends HTMLNode> html, DotPair dp)
      static Vector<TagNode> get​(Vector<? extends HTMLNode> html, DotPair dp, String... htmlTags)
       
      Modify & Review Internal-List of Listeners
      Modifier and Type Method
      static boolean addNewListenerName​(String listenerName)
      static Iterator<String> listAllAvailable()
       
      Protected Methods
      Modifier and Type Method
      protected static boolean HAS_TOK_MATCH​(String htmlTag, String... htmlTags)
      protected static String[] toLowerCase​(String[] tags)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • listAllAvailable

        🡇    
        public static java.util.Iterator<java.lang.String> listAllAvailable()
        This will return an Iterator of the listed java-script listeners available in this class
        Code:
        Exact Method Body:
         return new RemoveUnsupportedIterator<String>(l.iterator());
        
      • addNewListenerName

        🡅  🡇    
        public static boolean addNewListenerName​(java.lang.String listenerName)
        This just allows the user to add a name of a new listener that was not already stored in the internal-set of known java-script listeners. When searching a page for listeners, this class will only (obviously) be able to find ones whose names are known.
        Parameters:
        listenerName - The name of a listener that is not already 'known-about' in by this class
        Returns:
        TRUE If the internal table of listener names was not already stored in the set, FALSE if attempting to add a listener that is already in the set.
        Code:
        Exact Method Body:
         return l.add(listenerName.toLowerCase());
        
      • extract

        🡅  🡇    
        public static java.util.Properties extract​(TagNode tn)
        This will test whether listeners are present in the TagNode, and if so - return them.
        Input TagNodeOutput Properties:
        <frameset cols="20%,80%" title="Documentation frame" onload="top.loadFrames()"> onload: top.loadFrames()
        <a href="javascript:void(0);" onclick="return j2gb('http://www.gov.cn');"> onclick: return j2gb('http://www.gov.cn');
        Parameters:
        tn - This may be any TagNode, but it will be tested for JavaScript listeners.
        Returns:
        Will return a java.util.Properties object that contains a key-value table of any/all listeners present in the TagNode. If there are no listeners, this method will not return null, it will return an empty Properties object.
        See Also:
        TagNode.AV(String), StrCmpr.containsIgnoreCase(String, String)
        Code:
        Exact Method Body:
         Properties  p = new Properties();
         String      s;
        
         for (String listener : l)
        
             if (StrCmpr.containsIgnoreCase(tn.str, listener))
        
                 if ((s = tn.AV(listener)) != null) 
        
                     // This **may** seem redundant, but it is not, because what if it was phony?
                     // What if the "listener" key-word was actually buried in some "ALT=..." text?
                     // The initial "StrCmpr.contains..." an optimization
        
                     p.put(listener, s);
        
         return p;
        
      • extractAll

        🡅  🡇    
        public static java.util.Properties[] extractAll​
                    (java.util.Vector<TagNode> list)
        
        If you have performed a Java-Script Listener Get, this method will cycle through the list that was returned and generate an identical length return Properties[] array that has called extract(tn) for-each element in the parameter 'list.'
        Parameters:
        list - A list of TagNode's that are expected to contain Java-Script listeners. If some of the members of this input Vector have TagNode's with no listeners, the return array will still remain a parallel (same-size) array, however some of it's elements will have Properties with no key/value pairs in them (zero-size).
        Returns:
        A list of Properties for each element in this 'list.'
        See Also:
        extract(TagNode)
        Code:
        Exact Method Body:
         Properties[] ret = new Properties[list.size()];
        
         for (int i=0; i < list.size(); i++) ret[i] = extract(list.elementAt(i));
        
         return ret;
        
      • find

        🡅  🡇    
        public static int[] find​(java.util.Vector<? extends HTMLNode> html,
                                 int sPos,
                                 int ePos)
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page,
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        A list of index-pointers into the underlying parameter 'html' where each node pointed to by the list contains a TagNode element with a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        hasListener(TagNode), LV
        Code:
        Exact Method Body:
         // Java Streams to keep lists of int's
         IntStream.Builder   b = IntStream.builder();
         LV                  l = new LV(html, sPos, ePos);
         TagNode             tn;
        
         for (int i=l.start; i < l.end; i++)
        
             // Only check Openening TagNode's, long enought to have attributes, and then only
             // retain TagNode's that have a listener attribute.
        
             if (((tn = html.elementAt(i).openTagPWA()) != null) && hasListener(tn)) b.add(i);
        
         return b.build().toArray();
        
      • find

        🡅  🡇    
        public static int[] find​(java.util.Vector<? extends HTMLNode> html,
                                 int sPos,
                                 int ePos,
                                 java.lang.String... htmlTags)
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page, and also limit the search to only allow for matches where the HTML Element is among the list of elements in parameter 'htmlTags'
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        htmlTags - A list of HTML Elements, as a varargs String... Array, that constitute a match. Any HTML Element in the web-page that has a listener attribute, but whose HTML tag/token is not present in this list will not be considered a match, and will not be returned in this method's search results.
        Returns:
        A list of index-pointers into the underlying parameter 'html' where each node pointed to by the list contains a TagNode element with a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos, and also limited to HTML Elements in parameter 'htmlTags'
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        HAS_TOK_MATCH(String, String[]), hasListener(TagNode), LV
        Code:
        Exact Method Body:
         // Java Streams can keep lists of int's
         IntStream.Builder   b = IntStream.builder();
         LV                  l = new LV(html, sPos, ePos);   
         TagNode             tn;
        
         htmlTags = toLowerCase(htmlTags);
        
         for (int i=l.start; i < l.end; i++)
        
             if (
                 // Only Match Opening-Tags with internal-string's long enough to contain Attributes
                 ((tn = html.elementAt(i).openTagPWA()) != null)
        
                 // Make sure the HTML Element (.tok field) is among the user-requested 'htmlTags'
                 &&  HAS_TOK_MATCH(tn.tok, htmlTags)
        
                 // Check whethr or not that the TagNode has a listener attribute (if yes, save it)
                 &&  hasListener(tn)
             )
                 b.add(i);                                           // Save the array-index
        
         return b.build().toArray();
        
      • get

        🡅  🡇    
        public static java.util.Vector<TagNodeget​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos)
        
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page,
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        Returns:
        A list TagNode elements that have a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos.
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        hasListener(TagNode), LV
        Code:
        Exact Method Body:
         Vector<TagNode> ret = new Vector<>();
         LV              l   = new LV(html, sPos, ePos);
         TagNode         tn;
        
         for (int i=l.start; i < l.end; i++)
        
             // Only check Openening TagNode's, long enought to have attributes, and then only
             // retain TagNode's that have a listener attribute.  If this TagNodes does have a 
             // listener, place it in the return vector.
        
             if (((tn = html.elementAt(i).openTagPWA()) != null) && hasListener(tn)) ret.add(tn);
        
         return ret;
        
      • get

        🡅  🡇    
        public static java.util.Vector<TagNodeget​
                    (java.util.Vector<? extends HTMLNode> html,
                     int sPos,
                     int ePos,
                     java.lang.String... htmlTags)
        
        Find all HTML Elements (TagNode elements) that have listeners. Limit the index of the page to a sublist of that page, and also limit the search to only allow for matches where the HTML Element is among the list of elements in parameter 'htmlTags'
        Parameters:
        html - This may be any vectorized-html web-page, or an html sub-section / partial-page. All that the variable-type wild-card '? extends HTMLNode' means is this method can receive a Vector<TagNode>, Vector<TextNode> or a Vector<CommentNode>, without throwing an exception, or producing erroneous results. These 'sub-type' Vectors are very often returned as search results from the classes in the 'NodeSearch' package. The most common vector-type used is Vector<HTMLNode>.
        sPos - This is the (integer) Vector-index that sets a limit for the left-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'inclusive' meaning that the HTMLNode at this Vector-index will be visited by this method.

        NOTE: If this value is negative, or larger than the length of the input-Vector, an exception will be thrown.
        ePos - This is the (integer) Vector-index that sets a limit for the right-most Vector-position to inspect/search inside the input Vector-parameter. This value is considered 'exclusive' meaning that the 'HTMLNode' at this Vector-index will not be visited by this method.

        NOTE: If this value is larger than the size of input the Vector-parameter, an exception will throw.

        ALSO: Passing a negative value to this parameter, 'ePos', will cause its value to be reset to the size of the input Vector-parameter.
        htmlTags - A list of HTML Elements, as a varargs String Array, that constitute a match. Any HTML Element in the web-page that has a listener attribute, but whose HTML tag/token is not present in this list will not be considered a match, and will not be returned in this method's search results.
        Returns:
        A list of TagNode elements that have a listener attribute / inner-tag. Search results shall be limited to only considering elements between sPos ... ePos, and also limited to HTML Elements in parameter 'htmlTags'
        Throws:
        java.lang.IndexOutOfBoundsException - This exception shall be thrown if any of the following are true:

        • If 'sPos' is negative, or if sPos is greater-than-or-equal-to the size of the Vector
        • If 'ePos' is zero, or greater than the size of the Vector
        • If the value of 'sPos' is a larger integer than 'ePos'. If 'ePos' was negative, it is first reset to Vector.size(), before this check is done.
        See Also:
        HAS_TOK_MATCH(String, String[]), hasListener(TagNode), LV
        Code:
        Exact Method Body:
         Vector<TagNode> ret = new Vector<>();
         LV              l   = new LV(html, sPos, ePos);
         TagNode         tn;
        
         htmlTags = toLowerCase(htmlTags);
        
         for (int i=l.start; i < l.end; i++)
        
             if (
                 // Only Match Opening-Tags with internal-string's long enough to contain Attributes
                 ((tn = html.elementAt(i).openTagPWA()) != null)
        
                 // Make sure the HTML Element (.tok field) is among the user-requested 'htmlTags'
                 &&  HAS_TOK_MATCH(tn.tok, htmlTags)
        
                 // Check whethr or not that the TagNode has a listener attribute (if yes, save it)
                 &&  hasListener(tn)
             )
        
                 // All requirements have been affirmed, save this node in the return vector.
                 ret.add(tn);
        
         return ret;
        
      • hasListener

        🡅  🡇    
        public static boolean hasListener​(TagNode tn)
        Checks if a certain class TagNode has a listener inner-tag / attribute.
        Parameters:
        tn - Any HTML Element TagNode
        Returns:
        TRUE If this TagNode has a listener, and FALSE otherwise.
        See Also:
        StrCmpr.containsIgnoreCase(String, String)
        Code:
        Exact Method Body:
         Properties p = new Properties();
        
         for (String listener : l)
        
             // This is a simple string-comparison - with no reg-ex involved
             if (StrCmpr.containsIgnoreCase(tn.str, listener))
        
                 // Slightly slower, uses a - TagNode.AV(attribute) uses a Regular-Expression
                 if (tn.AV(listener) != null)
        
                     // This **may** seem redundant, but it is not, because what if it was phony?
                     // What if the "listener" key-word was actually buried in some "ALT=..." text?
        
                     return true;
        
         return false;
        
      • toLowerCase

        🡅  🡇    
        protected static java.lang.String[] toLowerCase​(java.lang.String[] tags)
        Converts the varargs parameter to lower-case Strings.

        NOTE: This is var-args varargs safe, because a new String array is created, with new String-pointers.
        Parameters:
        tags - The varargs String parameter acquired from the search-methods in this class.
        Returns:
        a lower-case version of the input.
        Code:
        Exact Method Body:
         String[] ret = new String[tags.length];
        
         for (int i=0; i < tags.length; i++)
        
             if (tags[i] != null) ret[i] = tags[i].toLowerCase();
        
             else throw new HTMLTokException(
                 "One of the HTML tokens you have passed to the variable-length parameter " +
                 "'htmlTags' was null."
             );
        
         return ret;
        
      • HAS_TOK_MATCH

        🡅    
        protected static boolean HAS_TOK_MATCH​(java.lang.String htmlTag,
                                               java.lang.String... htmlTags)
        Checks if the var-args parameter String... htmlTags matches a particular token
        Parameters:
        htmlTag - The token to be checked against the user's requested 'htmlTags' list parameter
        htmlTags - The list of acceptable HTML Tag Elements. This is a search specification parameter used by some of the search-methods in this class.
        Returns:
        TRUE If the tested token parameter 'htmlTag' is a member of this elements in list parameter 'htmlTags', and FALSE otherwise.
        Code:
        Exact Method Body:
         for (String s : htmlTags) if (s.equals(htmlTag)) return true; return false;