Package Torello.HTML

Interface URLFilter

  • All Superinterfaces:
    java.util.function.Predicate<java.net.URL>, java.io.Serializable
    Functional Interface:
    This is a functional interface and can therefore be used as the assignment target for a lambda expression or method reference.

    @FunctionalInterface
    public interface URLFilter
    extends java.util.function.Predicate<java.net.URL>, java.io.Serializable
    A simple lambda-target which extends Predicate<URL>.

    The purpose of this filter is for skipping and selecting which links to follow. While crawling through HTML from a News-Site, there will eventually be links that are irrelevant. Implement this FunctionalInterface, or use a Lambda-Expression to create a URL-Filter for choosing which URL's to follow.

    Reusing class StrFilter:
    Using a StrFilter to quickly build a URLFilter instance can make the job of testing and filtering the String's in a URL a lot easier. That class has several Factory-Builder Methods, and includes several consistent Exception-Messages.
    See Also:
    StrFilter


    • Method Summary

       
      @FunctionalInterface: (Lambda) Method
      Modifier and Type Method
      boolean test​(URL url)
       
      Methods: Static Factory-Builder
      Modifier and Type Method
      static URLFilter fromStrFilter​(StrFilter sf)
       
      Default Methods: Remove Elements with Iterator.remove()
      Modifier and Type Method
      default int filter​(Iterable<URL> urls)
      • Methods inherited from interface java.util.function.Predicate

        and, negate, or
    • Field Detail

      • serialVersionUID

        🡇     🗕  🗗  🗖
        static final long serialVersionUID
        This fulfils the SerialVersion UID requirement for all classes that implement Java's interface java.io.Serializable. Using the Serializable Implementation offered by java is very easy, and can make saving program state when debugging a lot easier. It can also be used in place of more complicated systems like "hibernate" to store data as well.

        Functional Interfaces are usually not thought of as Data Objects that need to be saved, stored and retrieved; however, having the ability to store intermediate results along with the lambda-functions that helped get those results can make debugging easier.
        See Also:
        Constant Field Values
        Code:
        Exact Field Declaration Expression:
         public static final long serialVersionUID = 1;
        
      • imagesKEEP

        🡅  🡇     🗕  🗗  🗖
        static final URLFilter imagesKEEP
        This URLFilter will KEEP any Image URL's whose name ends with the standard image filenames.

        WARNING: There are occasions where an Image-URL is "handled" by a web-server internally, and the actual URL itself does not look like an image file-name at all. This has the inconvenient implication for this (factory-generated) Predicate that it might return erroneous results. An actual image file that does not end with '.jpg' or '.bmp' could be rejected, and a URL that happens to end with these String's but is not an image, might also be kept.
        See Also:
        StrCmpr.endsWithXOR_CI(String, String[])
        Code:
        Exact Field Declaration Expression:
         public static final URLFilter imagesKEEP = (URL url) ->
             {
                 return StrCmpr.endsWithXOR_CI
                     (url.toString().trim(), ".jpg", ".jpeg", ".gif", ".png", ".bmp");
             };
        
      • imagesREJECT

        🡅  🡇     🗕  🗗  🗖
        static final URLFilter imagesREJECT
        This URLFilter will REJECT any Image URL's whose name ends with the standard image filenames.

        WARNING: There are occasions where an Image-URL is "handled" by a web-server internally, and the actual URL itself does not look like an image file-name at all. This has the inconvenient implication for this (factory-generated) Predicate that it might return erroneous results. An actual image file that does not end with '.jpg' or '.bmp' could be kept, and a URL that happens to end with these String's but is not an image, could be rejected.
        See Also:
        StrCmpr.endsWithNAND_CI(String, String[])
        Code:
        Exact Field Declaration Expression:
         public static final URLFilter imagesREJECT = (URL url) ->
             {
                 return StrCmpr.endsWithNAND_CI
                     (url.toString().trim(), ".jpg", ".jpeg", ".gif", ".png", ".bmp");
             };
        
    • Method Detail

      • test

        🡅  🡇     🗕  🗗  🗖
        boolean test​(java.net.URL url)
        FunctionalInterface Target-Method:
        This method corresponds to the @FunctionalInterface Annotation's method requirement. It is the only non-default, non-static method in this interface, and may be the target of a Lambda-Expression or '::' (double-colon) Function-Pointer.

        This method will receive a URL. The purpose of this method is to provide an easy means to filter certain URL's from a URL-generating list.

        PRECISE NOTE: This method should return FALSE if the passed URL should be skipped. A return value of TRUE implies that the URL is not to be ignored or passed over, but rather 'kept.'

        NOTE: This behavior is compatible with the Java Stream's method "filter(Predicate<...>)".
        Specified by:
        test in interface java.util.function.Predicate<java.net.URL>
        Parameters:
        url - This is a URL that will be checked against the constraints specified by 'this' filter.
        Returns:
        When implementing this method, returning TRUE must mean that the URL has passed the filter's test-requirements (and will subsequently be retained by whatever code is carrying out the filter operation).
      • filter

        🡅  🡇     🗕  🗗  🗖
        default int filter​(java.lang.Iterable<java.net.URL> urls)
        This is similar to the java streams function filter(Predicate<>). Elements that do not meet the criteria specified by this (factory-generated) URLFilter - specifically, if an element of the input-parameter 'urlList' would evaluate to FALSE - then that element shall be removed from the list.
        Parameters:
        urls - An Iterable of URL's which the user would like filtered using 'this' filter.
        Returns:
        The number of elements that were removed from parameter 'urls' based on the results of the URLFilter.test() of 'this' instance.
        Code:
        Exact Method Body:
         int             removeCount = 0;
         Iterator<URL>   iter        = urls.iterator();
        
         // If the filter test returns FALSE, then remove the URL from the collection.
         // Increment the removeCount Counter.
        
         while (iter.hasNext()) if (! test(iter.next())) { removeCount++; iter.remove(); }
        
         return removeCount;
        
      • fromStrFilter

        🡅     🗕  🗗  🗖
        static URLFilter fromStrFilter​(StrFilter sf)
        This wraps a StrFilter inside of a URLFilter. The String-comparison that is performed will use the full-path-name of the URL.

        StrFilter NOTE: The class 'StrFilter' can be used in conjunction with the class-specific filters, for instance, this class 'URLFilter'
        Parameters:
        sf - This is a String Predicate that has (usually, but not required) been built by one of the many String-Filter Factory-Build static-methods of class StrFilter. The Predicate's that are constructed via the build methods of StrFilter call the standard method java.lang.Object.toString() on the objects they receive for testing.
        Returns:
        FileNodeFilter This will return an instance of a URLFilter that will test the url as a String.
        See Also:
        StrFilter
        Code:
        Exact Method Body:
         if (sf == null) throw new NullPointerException(
             "The String-Filter Predicate Parameter 'sf' in static-factory builder method " +
             "'fromStrFilter' was passed a null value."
         );
        
         return (URL url) -> sf.test(url);