Package Torello.Java

Class RegExFiles


  • public class RegExFiles
    extends java.lang.Object
    A utility for saving Regular-Expressions in a text-file that may be lazily-loaded at run-time when needed. This class allows a user to save regular-expressions to a text file. The added benefit is avoiding the "double-escaping" that sometimes happens to people who wish to use Regular Expressions in Java. Regular Expressions are an "escaped language" - meaning that the '\' (backslash) character is constantly being used to identify different types of characters. One may view the RegExr.com web-site in order to play around with regular-expressions directly to remember their use. If you have ever played with the UNIX/BASH shell, then you would see quite a number of the old UNIX commands like 'grep' and 'find' used regular expressions quite frequently.

    Java includes the package java.util.regex.* to provide an interface for Java Programmers to utilize regular-expressions. Please review the classjava.util.regex.Pattern o understand "Regular Expression Pattern Matching."

    This class provides just a small framework to allow people to save regular-expressions to text-files, and load them into memory. These expressions do not need to be "doubly-escaped" - which is required in Java because class java.lang.String expects that anytime a '\' (backslash) or " (double-quote) is used, it must be escaped by a backslash.

    More Syntax Notes:
    • Any line in a regular expression text-file that begins with a single '#' (hash-tag) is considered a comment line, and ignored
    • Any line that begins with a double-'#' ('##' - two hash-tags in a row) is expected to contain one of the Pattern.FLAGS as a value, such that the java.util.regex.Pattern flags - such as CASE_INSENSITIVE, DOTALL, etc... - may be used
    • blank lines are always ignored.
    • The regular expressions are loaded into a Vector<Pattern> and returned to the programmer.


    SAMPLE REG-EX TEXT-FILE:

    Regular Expression:
    # Here are the regular expression for the "Index.java" class in this package. # Currently there is only (1) available regular-expression # This retrieves the listed date of the file - according to GSUTIL # m.group(1) will retrieve the calendar year as a String-Integer (such as "2019") # m.group(2) will retrieve the calendar month as a String-Integer # Such as: "01" will be returned (January) # If the String were: gs://spain.spanishnewsboard.com/ABC.ES/2019/01 - January/18/index.html # m.group(3) will retrieve the calendar day as a String-Integer (here it is "18") ^\s*gs:\/\/\w+?.spanishnewsboard.com/.+?/(\d\d\d\d)/(\d\d - \w+?)/(\d\d)/index.html\s*$


    java.util.regex.Pattern Flags:
    Pattern.flag Meaning & Use.
    static int CANON_EQ Enables canonical equivalence.
    static int CASE_INSENSITIVE Enables case-insensitive matching.
    static int DOTALL Enables dotall mode This is where newline '\n' is included in '.'
    static int COMMENTS Permits whitespace and comments in pattern. This class is "kind of" an alternative to this flag.
    static int LITERAL Enables literal parsing of the pattern.
    static int MULTILINE Enables multiline mode. This is where each newline '\n' is a new String
    static int UNICODE_CASE Enables Unicode-aware case folding.
    static int UNICODE_CHARACTER_CLASS Enables the Unicode version of Predefined character classes and POSIX character classes.
    static int UNIX_LINES Enables Unix lines mode.


    Method-Name Acronym Note:
    For all of these, the acronym LFEC stands for: Load File Exception Catch

    The purpose of these methods are to guarantee that:

    • A file is loaded, without error, into memory
    • On exception - a message is printed and the entire program halts, because a critical data-file didn't load.


    Methods in this class that have the three letters JAR attached to the end of the method name load data from different place than the standard file system. If a programmer has already saved his or her data to a jar-file, these methods will read that data from the jar-file, instead



    Stateless Class:
    This class neither contains any program-state, nor can it be instantiated. The @StaticFunctional Annotation may also be called 'The Spaghetti Report'. Static-Functional classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's @Stateless Annotation.

    • 1 Constructor(s), 1 declared private, zero-argument constructor
    • 5 Method(s), 5 declared static
    • 0 Field(s)


    • Method Summary

       
      Methods: Load File, Exception-Catch
      Modifier and Type Method
      static Vector<Pattern> LFEC​(String f)
      static Vector<Pattern> LFEC_JAR​(Class<?> c, String f)
      static Vector<Pattern> LFEC_JAR_ZIP​(Class<?> c, String f)
       
      Internal, Protected Methods
      Modifier and Type Method
      protected static int generateFlags​(String line)
      protected static Vector<Pattern> parse​(Vector<String> file, String name)
      • Methods inherited from class java.lang.Object

        clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
    • Method Detail

      • LFEC

        🡇     🗕  🗗  🗖
        public static java.util.Vector<java.util.regex.Pattern> LFEC​
                    (java.lang.String f)
        
        This loads a regular expression text file. Each line is interpreted as a new Regular Expression Pattern.

        This method expects the entire regular expression to fit on a single line, and therefore, each new line containing text-data (without a starting '#') will be compile into a new regular expression. Use the '\n' within the expression to generated newlines.

        Some Syntax Rules:
        • Comment lines are lines beginning with the POUND ('#') sign.
        • Blank lines are ignored by the file-parse completely.
        • Lines with only white-space are considered blank.
        • Flag Lines are lines that begin with two, successive, POUND ('##') signs.
        • All non-comment, non-blank and non-flag lines are converted into Regular-Expression Pattern's


        LFEC Note:
        This method will halt program execution if any exceptions occur when loading a Regular-Expression text file! This is the primary-purpose of all 'LFEC' - Load File Exception Catch methods.
        Parameters:
        f - Filename for a Regular Expression
        Returns:
        A Vector containing one compiled regular expression per line. Comment lines & blank lines will all be ignored.
        See Also:
        Pattern, generateFlags(String), LFEC.ERROR_EXIT(Throwable, String)
        Code:
        Exact Method Body:
         try
             { return parse(FileRW.loadFileToVector(f, false), f); }
        
         catch (Throwable t)
         {
             LFEC.ERROR_EXIT(t, "Attempt to load Regular Expression file: [" + f + "], failed.\n");
         }
        
         return null; // Should NOT be possible to reach this statement...
        
      • LFEC_JAR

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Vector<java.util.regex.Pattern> LFEC_JAR​
                    (java.lang.Class<?> c,
                     java.lang.String f)
        
        This does the exact same thing as LFEC, but loads the file into a Vector using the "JAR File" information included here. In this case, parameter f indicates a jar-file class-loader pointer. It will not load from the standard file-system.

        Java's getResourceAsStream:
        The JAR implies that Java's "load resource as stream" features are being used in place of standard file i/o routines. Specifically, this loads from a JAR file, as seen below:
         BufferedReader br =
             new BufferedReader(new InputStreamReader(c.getResourceAsStream(f)));
        
        Parameters:
        c - This contains the class that is loading the file. It is not too important to use the "exact class" - since the only reason the class doing the loading is because the "Class Loader" employs the exact "Package Name" of the class for figuring out the directory / sub-directory where the data-file is stored. This variable may not be null.

        EXAMPLE: If you wanted to load a "Regular Expressions.txt" file that was in the same BASH/Debian/etc... directory as the following class - the following call to 'RegExFiles' would load the text-file "Regular Expressions.txt" into memory quickly. The primary purpose being that text files are much easier to read than 'double-escaped' Java String's.

        NOTE: It might be important to read the Java Doc's about the 'getResourceAsStream(String)' method for retrieving data that was stored to a JAR file instead of a UNIX/BASH/MS-DOS system file. Oracle's Java 8 would help.

        NOTE: The symbols <?> appended to the (almost) 'raw-type' here, are only there to prevent the java-compiler from issuing warnings regarding the use of "Raw Types." This warning is, actually, only issued if the command-line option -Xlint:all option is used.
        f - This is a file-pointer to a file stored inside a Java JAR file.
        Returns:
        A Vector containing one compiled regular expression per line. Comment lines & blank lines will all be ignored.
        See Also:
        LFEC(String), parse(Vector, String), LFEC.ERROR_EXIT(Throwable, String)
        Code:
        Exact Method Body:
         try (
             InputStream     is = c.getResourceAsStream(f);
             BufferedReader  br = new BufferedReader(new InputStreamReader(is));
         )
         {
             String          s       = "";
             StringBuilder   sb      = new StringBuilder();
             Vector<String>  file    = new Vector<String>();
        
             while ((s = br.readLine()) != null) file.addElement(s);
        
             return parse(file, f);
         }
        
         catch (Throwable t)
         { 
             LFEC.ERROR_EXIT(
                 t,
                 "Attempted to load Regular Expression file: [" + f + "]\n" +
                 "From jar-file using class: [" + c.getCanonicalName() + "]\n" +
                 "Did not load successfully."
             );
         }
        
         // Should NOT be possible to reach this statement...
         // Compiler does not recognize LFEC.ERROR_EXIT
        
         return null;
        
      • LFEC_JAR_ZIP

        🡅  🡇     🗕  🗗  🗖
        public static java.util.Vector<java.util.regex.Pattern> LFEC_JAR_ZIP​
                    (java.lang.Class<?> c,
                     java.lang.String f)
        
        This is identical to LFEC_JAR, except that it presumes the file was compressed before saving.
        Parameters:
        c - This contains the class that is loading the file. It is not too important to use the "exact class" - since the only reason the class doing the loading is because the "Class Loader" employs the exact "Package Name" of the class for figuring out the directory / sub-directory where the data-file is stored. This variable may not be null. Again, the class-loader looks in the directory of the package that contains this class!

        NOTE: The method public static Vector<Pattern> LFEC_JAR(Class, String;) has a more detailed look at the particular use of this parameter. The easy way to understand is: just pass the class that is doing the actual loading of the regular-expression (presuming the regex.dat file is in the same directory as the '.class' file!)

        NOTE: The symbols <?> appended to the (almost) 'raw-type' here, are only there to prevent the java-compiler from issuing warnings regarding the use of "Raw Types." This warning is, actually, only issued if the command-line option -Xlint:all option is used.
        f - This is a file-pointer to a file stored inside a Java JAR file.
        Returns:
        A Vector containing one compiled regular expression per line. Comment lines & blank lines will all be ignored.
        See Also:
        LFEC_JAR(Class, String), parse(Vector, String), LFEC.ERROR_EXIT(Throwable, String)
        Code:
        Exact Method Body:
         try (
             InputStream         is      = c.getResourceAsStream(f);
             GZIPInputStream     gzip    = new GZIPInputStream(is);
             ObjectInputStream   ois     = new ObjectInputStream(gzip);
         )
         {
             Object              ret         = ois.readObject();
             String              fileStr     = (String) ret;
             Vector<String>      file        = new Vector<>();
             int                 newLinePos  = 0;
        
             while ((newLinePos = fileStr.indexOf('\n')) != -1)
             {
                 file.addElement(fileStr.substring(0, newLinePos));
                 fileStr = fileStr.substring(newLinePos + 1);
             }
        
             return parse(file, f);
        
         }
        
         catch (Throwable t)
         {
             LFEC.ERROR_EXIT(t,
                 "Attempted to load Regular Expression file: [" + f + "]\n" +
                 "From jar-file using class: [" + c.getCanonicalName() + "]\n" +
                 "Content was zipped, but failed to load."
             );
         }
        
         return null; // Should NOT be possible to reach this statement...
        
      • parse

        🡅  🡇     🗕  🗗  🗖
        protected static java.util.Vector<java.util.regex.Pattern> parse​
                    (java.util.Vector<java.lang.String> file,
                     java.lang.String name)
        
        This does the exact same thing as LFEC, but takes a "pre-loaded file" as a Vector. This is an internal class - used to ensure that the methods: LFEC_JAR and LFEC do the exact same thing.
        Parameters:
        file - This presumes that the regular-expression text-file has been loaded into a Vector<String> (w/out the "include newlines" option!)
        name - The name of the file loading is required so that error-printing-information is easier.
        Returns:
        A Vector containing one compiled regular expression per line. Comment lines & blank lines will all be ignored.
        See Also:
        LFEC(String)
        Code:
        Exact Method Body:
         try
         {
             Vector<Pattern> ret     = new Vector<Pattern>();
             int             flags   = 0;
        
             for (String line : file)
             {
                 if (line.trim().length() == 0) continue;
        
                 if (line.charAt(0) == '#')
                 {
                     if (line.length() > 1) if (line.charAt(1) == '#') flags = generateFlags(line);
                     continue;
                 }
        
                 if (flags != 0)                 ret.add(Pattern.compile(line, flags));
                 else                            ret.add(Pattern.compile(line));
        
                 flags = 0;
             }
        
             return ret;
         }
        
         catch (Throwable t)
             { LFEC.ERROR_EXIT(t, "error parsing regular expression file: " + name); }
        
         return null; // Should NOT be possible to reach this statement...
        
      • generateFlags

        🡅     🗕  🗗  🗖
        protected static int generateFlags​(java.lang.String line)
        This information has been copied from Java's regular expression: Pattern. This is a Helper function as it converts the text-String's into their constants, so that a user may include these text String's in a regular expression file.

        NOTE: The regular expression loader will only load regular expressions that fit on a single line of text. Other than lines that begin with a comment, each line is intended/interpreted as an independent Regular Expression.
        See Also:
        Pattern
        Code:
        Exact Method Body:
         int mask = 0;
        
         if (line.contains("CANON_EQ"))          mask |= Pattern.CANON_EQ;
         if (line.contains("CASE_INSENSITIVE"))  mask |= Pattern.CASE_INSENSITIVE;
         if (line.contains("DOTALL"))            mask |= Pattern.DOTALL;
         if (line.contains("COMMENTS"))          mask |= Pattern.COMMENTS;
         if (line.contains("LITERAL"))           mask |= Pattern.LITERAL;
         if (line.contains("MULTILINE"))         mask |= Pattern.MULTILINE;
         if (line.contains("UNICODE_CASE"))      mask |= Pattern.UNICODE_CASE;
        
         return mask;