public class DataSets extends Object
Modifier and Type | Field and Description |
---|---|
static String |
DELIMITER_COMMA |
static String |
DELIMITER_SEMICOLON |
static String |
DELIMITER_SPACE |
static String |
DELIMITER_TAB |
Constructor and Description |
---|
DataSets() |
Modifier and Type | Method and Description |
---|---|
static CsvFormat |
detectCsvFormat(String fileName) |
static float[] |
oneHotEncode(String hotLabel,
String[] allLabels)
Returns one hot encoded vector for the given label.
|
static TabularDataSet |
readCsv(File csvFile,
int numInputs,
int numOutputs,
boolean hasColumnNames,
String delimiter)
Creates and returns data set from specified CSV file.
|
static javax.visrec.ml.data.DataSet |
readCsv(String fileName,
int numInputs,
int numOutputs)
Create data set from CSV file, using coma (,) as default delimiter and no
header (column names) in first row.
|
static TabularDataSet |
readCsv(String fileName,
int numInputs,
int numOutputs,
boolean hasColumnNames) |
static TabularDataSet |
readCsv(String fileName,
int numInputs,
int numOutputs,
boolean hasColumnNames,
String delimiter) |
static TabularDataSet |
readCsv(String fileName,
int numInputs,
int numOutputs,
String delimiter) |
static MaxScaler |
scaleToMax(javax.visrec.ml.data.DataSet dataSet) |
static MinMaxScaler |
scaleToMinMax(javax.visrec.ml.data.DataSet dataSet) |
static TrainTestPair |
trainTestSplit(javax.visrec.ml.data.DataSet<?> dataSet,
double split) |
public static final String DELIMITER_SPACE
public static final String DELIMITER_COMMA
public static final String DELIMITER_SEMICOLON
public static final String DELIMITER_TAB
public static TabularDataSet readCsv(File csvFile, int numInputs, int numOutputs, boolean hasColumnNames, String delimiter) throws FileNotFoundException, IOException
csvFile
- CSV filenumInputs
- number of input values in a rownumOutputs
- number of output values in a rowhasColumnNames
- true if first row contains column namesdelimiter
- delimiter character used to separate values in a rowFileNotFoundException
- if file was not foundIOException
- if there was an error reading file
TODO: Detect if there are labels in the first line, if there are no
labels, set class1, class2, class3 in classifier evaluation! and detect
type of attributes Move this method to some factory class or something?
or as a default method in data set?
TODO: Autodetetect delimiter; column typepublic static TabularDataSet readCsv(String fileName, int numInputs, int numOutputs, boolean hasColumnNames, String delimiter) throws IOException
IOException
public static TabularDataSet readCsv(String fileName, int numInputs, int numOutputs, boolean hasColumnNames) throws IOException
IOException
public static TabularDataSet readCsv(String fileName, int numInputs, int numOutputs, String delimiter) throws IOException
IOException
public static javax.visrec.ml.data.DataSet readCsv(String fileName, int numInputs, int numOutputs) throws IOException
fileName
- Name of the CSV filenumInputs
- Number of input columnsnumOutputs
- Number of output columnsIOException
public static CsvFormat detectCsvFormat(String fileName) throws FileNotFoundException, IOException
FileNotFoundException
IOException
public static MaxScaler scaleToMax(javax.visrec.ml.data.DataSet dataSet)
public static MinMaxScaler scaleToMinMax(javax.visrec.ml.data.DataSet dataSet)
public static float[] oneHotEncode(String hotLabel, String[] allLabels)
hotLabel
- one label to encodeallLabels
- all labels (used to determine size and hot position of encoded vector)public static TrainTestPair trainTestSplit(javax.visrec.ml.data.DataSet<?> dataSet, double split)
Copyright © 2022. All rights reserved.