com.billpringle.utils
Class WrpCsv

java.lang.Object
  extended by com.billpringle.utils.WrpCsv

public class WrpCsv
extends java.lang.Object

This class encapsulates a parser for CSV files. This class can also be used to read CSV data from a buffered input stream, such as a file.

The input string is separated into fields by any of the specified delimiters, which defaults to a comma.

This implementation is based on a similar routine that I wrote in C, which was in turn based loosely on Kernighan and Pike's "The Practice of Programming." The C version I wrote, and this Java version are both available under a Creative Commons license ( Attribution, Non-Commercial, and Share-Alike)).

This class is normally called from other classes, but contains a test driver main routine for testing.

Creative Commons License Creative Commons License Symbols Unless noted otherwise, all materials available for download from my site are copyrighted by Bill Pringle, and are licensed under a Creative Commons License.

Author:
Bill Pringle

Field Summary
private static java.lang.String defaultDelims
          Default collection of delimiters.
private static int defaultMaxFields
          Default maximum number of fields.
private static int defaultMaxLine
          Default maximum size of the input line.
protected  java.lang.String delims
          Possible field separators.
protected  java.lang.String errorMessage
          Error message for last error.
protected  java.util.Vector<java.lang.String> fields
          List of fields parsed from input line
protected  java.lang.Integer maxFields
          Maximum number of fields expected.
protected  int maxLine
          Maximum input line size (not used)
protected  java.lang.Integer minFields
          Minimum number of fields expected (not used)
protected  java.lang.Integer numFields
          Number of fields found in the input line
protected  java.lang.String parseLine
          Input line to be split into fields
 
Constructor Summary
WrpCsv()
          Default constructor.
WrpCsv(java.lang.String str)
          Creates a new instance of WrpCsv and parses the specified string into fields.
WrpCsv(java.lang.String str, java.lang.String delstr)
          Creates a new instance of WrpCsv and parses the specified string into fields, using the specified delimiter(s).
 
Method Summary
 void clearErrorMessage()
          Clear the current error message.
private  java.lang.String copyField(java.lang.String str)
          Copy string, transforming consecutive double quotes into one.
 java.lang.String getDelims()
          Return current delimiters.
 java.lang.String getErrorMessage()
          Return the most recent error message.
 java.lang.String getField(int loc)
          Get the specific field.
 java.lang.String getLine()
          Return the original line.
 int getMaxFields()
          Return the current maximum number of fields.
 int getMaxLine()
          Get the maximum line size (not really used)
 int getNumFields()
          Return the current number of fields.
private  void init()
          Initializes class variables.
static void main(java.lang.String[] args)
          Test driver.
private  int nonquotedField(java.lang.String str, int sloc)
          Return the offset for the terminating delimiter for a non-quoted field.
 int parseString(java.lang.String str)
          Main parsing routine - parse string into fields.
 int parseString(java.lang.String str, java.lang.String delims)
          Parse string using specified delimiters.
private  int quotedField(java.lang.String str, int sloc)
          Return index location of the closing quote for the current quoted field.
 int readLine(java.io.BufferedReader inp)
          Read a line from an input stream and parse it.
 void setDelims(java.lang.String str)
          Set field delimiters for current CSV.
 void setMaxFields(int val)
          Set the maximum number of fields.
 void setMaxLine(int val)
          Set the maximum size of the input line
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Field Detail

defaultMaxFields

private static int defaultMaxFields
Default maximum number of fields. This is the largest number of fields that the CSV structure can handle. It can be overridden using the setMaxFields method, which will increase the maximum number of fields


defaultMaxLine

private static int defaultMaxLine
Default maximum size of the input line. This is the longest size of the input line can be handled. This can be overridden by using the setMaxLine method.


defaultDelims

private static java.lang.String defaultDelims
Default collection of delimiters. Any character within this string will act as a delimiter. Normally this string consists of a single character, but it is possible to use more than one character as a delimeter. If more than one character appears in the delimiter string, then any of those characters will be treated as a delimiter. In other words, you cannot define a combination of strings to form a single delimiter - each character would act as a delimiter.


maxLine

protected int maxLine
Maximum input line size (not used)


parseLine

protected java.lang.String parseLine
Input line to be split into fields


fields

protected java.util.Vector<java.lang.String> fields
List of fields parsed from input line


numFields

protected java.lang.Integer numFields
Number of fields found in the input line


maxFields

protected java.lang.Integer maxFields
Maximum number of fields expected. Parsing will terminate when this many fields are encountered. If more fields are found, the will be contained within the last field parsed.


minFields

protected java.lang.Integer minFields
Minimum number of fields expected (not used)


delims

protected java.lang.String delims
Possible field separators. Any of the characters within this string will terminate a field.


errorMessage

protected java.lang.String errorMessage
Error message for last error. This value can be retrieved if an error status is returned by using the getErrorMessage method.

Constructor Detail

WrpCsv

public WrpCsv()
Default constructor.

This constructor must be used if the default values are not appropriate. The client should use this constructor, and then the appropriate setter routines to modify the default values.

If this constructor is used, one of the parseString methods must be used to parse the string.


WrpCsv

public WrpCsv(java.lang.String str)
Creates a new instance of WrpCsv and parses the specified string into fields.

This method will use the default values and delimiters to parse the fields. If different delimiters are to be used, the client can use the WrpCsv(String, String) constructor.

Parameters:
str - string to be parsed

WrpCsv

public WrpCsv(java.lang.String str,
              java.lang.String delstr)
Creates a new instance of WrpCsv and parses the specified string into fields, using the specified delimiter(s).

This constructor creates a new instance of WrpCsv and parses the string into fields using the specified set of delimiters, using the default values.

Parameters:
str - input string to parse
delstr - field delimiters
Method Detail

getMaxLine

public int getMaxLine()
Get the maximum line size (not really used)


setMaxLine

public void setMaxLine(int val)
Set the maximum size of the input line


setDelims

public void setDelims(java.lang.String str)
Set field delimiters for current CSV. The default delimiter is a comma (","). More than one string is allowed (e.g., "|/" bars or slashes); either character will delimit the fields. If the argument is null or empty, this call is ignored.

Parameters:
str -

getDelims

public java.lang.String getDelims()
Return current delimiters.

Returns:
string of delimiters

getErrorMessage

public java.lang.String getErrorMessage()
Return the most recent error message.

Returns:
the message

clearErrorMessage

public void clearErrorMessage()
Clear the current error message. This method can be called by the client after retrieving the most recent error message.


getMaxFields

public int getMaxFields()
Return the current maximum number of fields.

Returns:
max. number of fields

setMaxFields

public void setMaxFields(int val)
Set the maximum number of fields. If the parser contains more than the maximum number of fields, parsing terminates and the last field contains all remaining fields from the string. If val is non-positive, this call is ignored.

Parameters:
val - new max. number of fields

parseString

public int parseString(java.lang.String str)
Main parsing routine - parse string into fields.

This method drives the parsing, by scanning the input line and calling either quotedField or nonquotedField to extract the actual field.

Upon return, each field is stored in the fields vector, and can be accessed using the getField() method.

Parameters:
str - string to parse
Returns:
number of fields found

parseString

public int parseString(java.lang.String str,
                       java.lang.String delims)
Parse string using specified delimiters.

The delimiters are saved, and will be used for all future calls unless they are reset.

Parameters:
str - string to parse
delims - delimiter characters to use
Returns:
number of fields found

readLine

public int readLine(java.io.BufferedReader inp)
             throws java.io.IOException
Read a line from an input stream and parse it.

This method can be used to parse each line in a file. Once the CSV is created, by calling this method, you will read and parse the next line in the input file.

After making this call, the client can call the various get routines to retrieve the actual fields.

Parameters:
inp - BufferedReader input file
Returns:
number of fields parsed
Throws:
java.io.IOException

getNumFields

public int getNumFields()
Return the current number of fields.


getLine

public java.lang.String getLine()
Return the original line.


getField

public java.lang.String getField(int loc)
Get the specific field.


init

private void init()
Initializes class variables.

This method is called by the constructors and the parseString() methods to initialize all class variables.


quotedField

private int quotedField(java.lang.String str,
                        int sloc)
Return index location of the closing quote for the current quoted field. This method is called when a field starts with a double quote.

Internal double quotes are indicated by two double quotes in a row, which is treated as a single character. If not closing quote is found, the location past the end of the string is returned. This means that the value returned can be used as an argument for the substring method to extract the field.

Parameters:
str - string being parsed
sloc - offset to start searching (starting with zero)
Returns:
offset of closing quote (starting with zero)

nonquotedField

private int nonquotedField(java.lang.String str,
                           int sloc)
Return the offset for the terminating delimiter for a non-quoted field.

This method will scan for the first occurrence of any of the currently defined delimiter characters.

Parameters:
str - string being parsed
sloc - offset of where to start searching
Returns:
offset of delimiter

copyField

private java.lang.String copyField(java.lang.String str)
Copy string, transforming consecutive double quotes into one. This method is used to extract the actual field, allowing for consecutive double quotes, which are used to indicate that the field contains a double quote. Since this method is only called when dealing with quoted fields, any double quote we encounter should be followed by a second one. Only the first double quote will be copied.

Parameters:
str - field to copy
Returns:
copy of string

main

public static void main(java.lang.String[] args)
Test driver.

Parameters:
args - not used