Skip to main content

Dataset Parameters

The Dataset Parameters allow you to customize the dataset to your needs. All of the parameters below are supported for both TensorFlow and PyTorch.

1. datasetId

It is set by default when you link the dataset with a model. You don't need to re-enter this value.

2. totalDatasetSize

This parameter is set by default based on the selected dataset. It describes the total number of images in the dataset.

3. allClasses

This parameter is set by default based on the selected dataset. It describes the total number of images per class in the dataset.

4. trainingDatasetSize

This parameter can be set based on the selected dataset customisation using the trainingClasses parameter. It describes the total number of images used for training and evaluating the model as per the customisation. By default this is equal to totalDatasetSize.

5. trainingClasses

This parameter is used to customise the dataset. It takes a dictionary as input. The dictionary contains the class name as key and the number of images to be selected as value. The dictionary must contain all classes with the respective values (number of images for each class) being greater than one.

Example The dataset selected contains the two classes 'car' and 'person' with 65 and 42 images respectively: {'car': 65, 'person': 42} A sub dataset can be created like this:

trainingObject.trainingClasses({'car': 30, 'person': 30})

6. imageShape

This parameter specifies the image shape to be used for training. The value must be an integer between 48 and 224. The default value is 224. Set this parameter like this:

trainingObject.imageShape(124)

7. imageType

This parameter specifies the image type to be used for training. The supported formats are rgb and grayscale. The default value is rgb. Set this parameter like this:

trainingObject.imageType('rgb')

8. seed

This parameter sets the global random seed. The default value is False.

trainingObject.seed(True)

Special Dataset Parameters

There are few methods that are specific to tabular/generic classification use case

1. get_features

This method returns list of all the features and method for interaction available in the dataset.

trainingObject.get_features()

2. feature_interaction

This method allows user to create more features via different methods as available in get_features method list. For each method user as to specify the features in the dictionary for which all the examples are given in the get_features method. This method can be called number of time to create new features, each time this method is called a new feature interaction is entered if it is unique. Default value is [].

trainingObject.feature_interaction({'feature1': 'feature1', 'feature2': 'feature2', 'method':'product'})

3. feature_selection (Coming Soon)

This method allows user to select features out of the feature list for training.

trainingObject.feature_selection(['feature1', 'feature2', 'feature3'])