Package Torello.Java
Class RegExFiles
- java.lang.Object
-
- Torello.Java.RegExFiles
-
public class RegExFiles extends java.lang.Object
A utility for saving Regular-Expressions in a text-file that may be lazily-loaded at run-time when needed. This class allows a user to saveregular-expressions
to a text file. The added benefit is avoiding the "double-escaping" that sometimes happens to people who wish to use Regular Expressions in Java. Regular Expressions are an "escaped language" - meaning that the'\'
(backslash) character is constantly being used to identify different types of characters. One may view the RegExr.com web-site in order to play around with regular-expressions directly to remember their use. If you have ever played with the UNIX/BASH shell, then you would see quite a number of the old UNIX commands like'grep'
and'find'
usedregular expressions
quite frequently.
Java includes the packagejava.util.regex.*
to provide an interface for Java Programmers to utilizeregular-expressions
. Please review the classjava.util.regex.Pattern
o understand "Regular Expression Pattern Matching."
This class provides just a small framework to allow people to save regular-expressions to text-files, and load them into memory. These expressions do not need to be "doubly-escaped" - which is required in Java because classjava.lang.String
expects that anytime a'\'
(backslash) or"
(double-quote) is used, it must be escaped by a backslash.
More Syntax Notes:- Any line in a regular expression text-file that begins with a single
'#'
(hash-tag) is considered a comment line, and ignored - Any line that begins with a double-
'#'
('##'
- two hash-tags in a row) is expected to contain one of thePattern.FLAGS
as a value, such that thejava.util.regex.Pattern flags
- such asCASE_INSENSITIVE, DOTALL, etc...
- may be used - blank lines are always ignored.
- The regular expressions are loaded into a
Vector<Pattern>
and returned to the programmer.
SAMPLE REG-EX TEXT-FILE:
Regular Expression:
# Here are the regular expression for the "Index.java" class in this package. # Currently there is only (1) available regular-expression # This retrieves the listed date of the file - according to GSUTIL # m.group(1) will retrieve the calendar year as a String-Integer (such as "2019") # m.group(2) will retrieve the calendar month as a String-Integer # Such as: "01" will be returned (January) # If the String were: gs://spain.spanishnewsboard.com/ABC.ES/2019/01 - January/18/index.html # m.group(3) will retrieve the calendar day as a String-Integer (here it is "18") ^\s*gs:\/\/\w+?.spanishnewsboard.com/.+?/(\d\d\d\d)/(\d\d - \w+?)/(\d\d)/index.html\s*$java.util.regex.Pattern
Flags:Pattern.
flagMeaning & Use. static int CANON_EQ
Enables canonical equivalence. static int CASE_INSENSITIVE
Enables case-insensitive matching. static int DOTALL
Enables dotall mode This is where newline '\n' is included in '.' static int COMMENTS
Permits whitespace and comments in pattern. This class is "kind of" an alternative to this flag. static int LITERAL
Enables literal parsing of the pattern. static int MULTILINE
Enables multiline mode. This is where each newline '\n' is a new String static int UNICODE_CASE
Enables Unicode-aware case folding. static int UNICODE_CHARACTER_CLASS
Enables the Unicode version of Predefined character classes and POSIX character classes. static int UNIX_LINES
Enables Unix lines mode.
Method-Name Acronym Note:
For all of these, the acronym LFEC stands for: Load File Exception Catch
The purpose of these methods are to guarantee that:- A file is loaded, without error, into memory
- On exception - a message is printed and the entire program halts, because a critical data-file didn't load.
Methods in this class that have the three letters JAR attached to the end of the method name load data from different place than the standard file system. If a programmer has already saved his or her data to a jar-file, these methods will read that data from the jar-file, instead
Hi-Lited Source-Code:- View Here: Torello/Java/RegExFiles.java
- Open New Browser-Tab: Torello/Java/RegExFiles.java
File Size: 11,462 Bytes Line Count: 281 '\n' Characters Found
Stateless Class:This class neither contains any program-state, nor can it be instantiated. The@StaticFunctional
Annotation may also be called 'The Spaghetti Report'.Static-Functional
classes are, essentially, C-Styled Files, without any constructors or non-static member fields. It is a concept very similar to the Java-Bean's@Stateless
Annotation.
- 1 Constructor(s), 1 declared private, zero-argument constructor
- 5 Method(s), 5 declared static
- 0 Field(s)
- Any line in a regular expression text-file that begins with a single
-
-
Method Summary
Methods: Load File, Exception-Catch Modifier and Type Method static Vector<Pattern>
LFEC(String f)
static Vector<Pattern>
LFEC_JAR(Class<?> c, String f)
static Vector<Pattern>
LFEC_JAR_ZIP(Class<?> c, String f)
Internal, Protected Methods Modifier and Type Method protected static int
generateFlags(String line)
protected static Vector<Pattern>
parse(Vector<String> file, String name)
-
-
-
Method Detail
-
LFEC
public static java.util.Vector<java.util.regex.Pattern> LFEC (java.lang.String f)
This loads a regular expression text file. Each line is interpreted as a new Regular ExpressionPattern
.
This method expects the entire regular expression to fit on a single line, and therefore, each new line containing text-data (without a starting'#'
) will be compile into a new regular expression. Use the'\n'
within the expression to generated newlines.
Some Syntax Rules:- Comment lines are lines beginning with the POUND (
'#'
) sign. - Blank lines are ignored by the file-parse completely.
- Lines with only white-space are considered blank.
- Flag Lines are lines that begin with two, successive, POUND
(
'##'
) signs. - All non-comment, non-blank and non-flag lines are converted into Regular-Expression
Pattern's
LFEC Note:
This method will halt program execution if any exceptions occur when loading a Regular-Expression text file! This is the primary-purpose of all'LFEC'
- Load File Exception Catch methods.- Parameters:
f
- Filename for a Regular Expression- Returns:
- A
Vector
containing one compiled regular expression per line. Comment lines & blank lines will all be ignored. - See Also:
Pattern
,generateFlags(String)
,LFEC.ERROR_EXIT(Throwable, String)
- Code:
- Exact Method Body:
try { return parse(FileRW.loadFileToVector(f, false), f); } catch (Throwable t) { LFEC.ERROR_EXIT(t, "Attempt to load Regular Expression file: [" + f + "], failed.\n"); } return null; // Should NOT be possible to reach this statement...
- Comment lines are lines beginning with the POUND (
-
LFEC_JAR
public static java.util.Vector<java.util.regex.Pattern> LFEC_JAR (java.lang.Class<?> c, java.lang.String f)
This does the exact same thing asLFEC
, but loads the file into aVector
using the "JAR File" information included here. In this case, parameterf
indicates a jar-file class-loader pointer. It will not load from the standard file-system.
Java'sgetResourceAsStream
:
The JAR implies that Java's "load resource as stream" features are being used in place of standard file i/o routines. Specifically, this loads from a JAR file, as seen below:
BufferedReader br = new BufferedReader(new InputStreamReader(c.getResourceAsStream(f)));
- Parameters:
c
- This contains the class that is loading the file. It is not too important to use the "exact class" - since the only reason the class doing the loading is because the "Class Loader" employs the exact "Package Name" of the class for figuring out the directory / sub-directory where the data-file is stored. This variable may not be null.
EXAMPLE: If you wanted to load a "Regular Expressions.txt" file that was in the same BASH/Debian/etc... directory as the following class - the following call to'RegExFiles'
would load the text-file "Regular Expressions.txt" into memory quickly. The primary purpose being that text files are much easier to read than 'double-escaped' JavaString's
.
NOTE: It might be important to read the Java Doc's about the'getResourceAsStream(String)'
method for retrieving data that was stored to a JAR file instead of a UNIX/BASH/MS-DOS system file. Oracle's Java 8 would help.
NOTE: The symbols<?>
appended to the (almost) 'raw-type' here, are only there to prevent the java-compiler from issuing warnings regarding the use of "Raw Types." This warning is, actually, only issued if the command-line option-Xlint:all
option is used.f
- This is a file-pointer to a file stored inside a Java JAR file.- Returns:
- A Vector containing one compiled regular expression per line. Comment lines & blank lines will all be ignored.
- See Also:
LFEC(String)
,parse(Vector, String)
,LFEC.ERROR_EXIT(Throwable, String)
- Code:
- Exact Method Body:
try ( InputStream is = c.getResourceAsStream(f); BufferedReader br = new BufferedReader(new InputStreamReader(is)); ) { String s = ""; StringBuilder sb = new StringBuilder(); Vector<String> file = new Vector<String>(); while ((s = br.readLine()) != null) file.addElement(s); return parse(file, f); } catch (Throwable t) { LFEC.ERROR_EXIT( t, "Attempted to load Regular Expression file: [" + f + "]\n" + "From jar-file using class: [" + c.getCanonicalName() + "]\n" + "Did not load successfully." ); } // Should NOT be possible to reach this statement... // Compiler does not recognize LFEC.ERROR_EXIT return null;
-
LFEC_JAR_ZIP
public static java.util.Vector<java.util.regex.Pattern> LFEC_JAR_ZIP (java.lang.Class<?> c, java.lang.String f)
This is identical toLFEC_JAR
, except that it presumes the file was compressed before saving.- Parameters:
c
- This contains the class that is loading the file. It is not too important to use the "exact class" - since the only reason the class doing the loading is because the "Class Loader" employs the exact "Package Name" of the class for figuring out the directory / sub-directory where the data-file is stored. This variable may not be null. Again, the class-loader looks in the directory of the package that contains this class!
NOTE: The methodpublic static Vector<Pattern> LFEC_JAR(Class, String;)
has a more detailed look at the particular use of this parameter. The easy way to understand is: just pass the class that is doing the actual loading of the regular-expression (presuming the regex.dat file is in the same directory as the'.class'
file!)
NOTE: The symbols<?>
appended to the (almost) 'raw-type' here, are only there to prevent the java-compiler from issuing warnings regarding the use of "Raw Types." This warning is, actually, only issued if the command-line option-Xlint:all
option is used.f
- This is a file-pointer to a file stored inside a Java JAR file.- Returns:
- A
Vector
containing one compiled regular expression per line. Comment lines & blank lines will all be ignored. - See Also:
LFEC_JAR(Class, String)
,parse(Vector, String)
,LFEC.ERROR_EXIT(Throwable, String)
- Code:
- Exact Method Body:
try ( InputStream is = c.getResourceAsStream(f); GZIPInputStream gzip = new GZIPInputStream(is); ObjectInputStream ois = new ObjectInputStream(gzip); ) { Object ret = ois.readObject(); String fileStr = (String) ret; Vector<String> file = new Vector<>(); int newLinePos = 0; while ((newLinePos = fileStr.indexOf('\n')) != -1) { file.addElement(fileStr.substring(0, newLinePos)); fileStr = fileStr.substring(newLinePos + 1); } return parse(file, f); } catch (Throwable t) { LFEC.ERROR_EXIT(t, "Attempted to load Regular Expression file: [" + f + "]\n" + "From jar-file using class: [" + c.getCanonicalName() + "]\n" + "Content was zipped, but failed to load." ); } return null; // Should NOT be possible to reach this statement...
-
parse
protected static java.util.Vector<java.util.regex.Pattern> parse (java.util.Vector<java.lang.String> file, java.lang.String name)
This does the exact same thing asLFEC
, but takes a "pre-loaded file" as aVector
. This is an internal class - used to ensure that the methods:LFEC_JAR
andLFEC
do the exact same thing.- Parameters:
file
- This presumes that the regular-expression text-file has been loaded into aVector<String>
(w/out the "include newlines" option!)name
- The name of the file loading is required so that error-printing-information is easier.- Returns:
- A
Vector
containing one compiled regular expression per line. Comment lines & blank lines will all be ignored. - See Also:
LFEC(String)
- Code:
- Exact Method Body:
try { Vector<Pattern> ret = new Vector<Pattern>(); int flags = 0; for (String line : file) { if (line.trim().length() == 0) continue; if (line.charAt(0) == '#') { if (line.length() > 1) if (line.charAt(1) == '#') flags = generateFlags(line); continue; } if (flags != 0) ret.add(Pattern.compile(line, flags)); else ret.add(Pattern.compile(line)); flags = 0; } return ret; } catch (Throwable t) { LFEC.ERROR_EXIT(t, "error parsing regular expression file: " + name); } return null; // Should NOT be possible to reach this statement...
-
generateFlags
protected static int generateFlags(java.lang.String line)
This information has been copied from Java's regular expression:Pattern
. This is a Helper function as it converts the text-String's
into their constants, so that a user may include these textString's
in a regular expression file.
NOTE: The regular expression loader will only load regular expressions that fit on a single line of text. Other than lines that begin with a comment, each line is intended/interpreted as an independent Regular Expression.- See Also:
Pattern
- Code:
- Exact Method Body:
int mask = 0; if (line.contains("CANON_EQ")) mask |= Pattern.CANON_EQ; if (line.contains("CASE_INSENSITIVE")) mask |= Pattern.CASE_INSENSITIVE; if (line.contains("DOTALL")) mask |= Pattern.DOTALL; if (line.contains("COMMENTS")) mask |= Pattern.COMMENTS; if (line.contains("LITERAL")) mask |= Pattern.LITERAL; if (line.contains("MULTILINE")) mask |= Pattern.MULTILINE; if (line.contains("UNICODE_CASE")) mask |= Pattern.UNICODE_CASE; return mask;
-
-