Package Torello.Languages
Class Helper
- java.lang.Object
-
- Torello.Languages.Helper
-
public class Helper extends java.lang.Object
Hi-Lited Source-Code:- View Here: Torello/Languages/Helper.java
- Open New Browser-Tab: Torello/Languages/Helper.java
File Size: 2,927 Bytes Line Count: 105 '\n' Characters Found
-
-
Constructor Summary
Constructors Constructor Description Helper()
-
Method Summary
All Methods Static Methods Concrete Methods Modifier and Type Method static void
main(String[] argv)
static Vector<String>
splitOnWhiteSpace(String text, StringBuilder sbDOUT)
-
-
-
Constructor Detail
-
Helper
public Helper()
-
-
Method Detail
-
main
public static void main(java.lang.String[] argv) throws java.io.IOException
- Throws:
java.io.IOException
- Code:
- Exact Method Body:
System.out.println( "WHITE_SPACE: " + WHITE_SPACE + '\n' + "PUNCTUATION: " + PUNCTUATION + '\n' + "Exiting..." ); System.exit(1); // The File "Regular Expressions.txt" is missing. I still don't have time FileRW.writeObjectToFileNOCNFE( FileRW.loadFileToString ("Torello/Languages/FNA/Regular Expressions.txt"), DATA_FILE, true );
-
splitOnWhiteSpace
public static java.util.Vector<java.lang.String> splitOnWhiteSpace (java.lang.String text, java.lang.StringBuilder sbDOUT)
This will split a sentenceinto words. Also, all punctuation surrounding each word will be removed!- Parameters:
text
- A sentence, usually in a foreign language. Will work on any String.sbDOUT
- This is a "developer notes" or "debug notes" output stream. If null, notes will simply be discarded.- Returns:
- A list of words. Each will be trimmed of white-space and leading or trailing punctuation.
- Code:
- Exact Method Body:
Vector<String> ret = new Vector<String>(); Matcher m1 = WHITE_SPACE.matcher(text); while (m1.find()) { String word = m1.group(1).trim(); DOUT(sbDOUT, "\nP1: [" + word + "], len=" + word.length() + " "); if (word.length() == 0) { DOUT(sbDOUT, "\tSkipping, zero length word.\n"); continue; } Matcher m2 = PUNCTUATION.matcher(word); if (m2.find()) word = m2.group(2).trim(); else { DOUT(sbDOUT, "\tSkipping, PUNCTUATION RegEx found no match.\n"); continue; } DOUT(sbDOUT, "\nP2: [" + word + "], len=" + word.length() + " "); if (word.length() == 0) { DOUT(sbDOUT, "\tSkipping, zero-length, punctuation-stripped word\n"); continue; } ret.addElement(word); } DOUT(sbDOUT, "\n"); return ret;
-
-