Skip to main content

Data

Is it possible to download the dataset?

No, downloading the dataset is not an option. Unlike other platforms, our goal is to connect data scientists with real-time data while maintaining privacy and security. Therefore, the dataset remains hidden.

However, we do provide sample datasets and additional information, which can be accessed through the Data icon on the experiments page.

How does rescaling work and can you modify the image input size?

Our federated learning infrastructure allows you to rescale input image values using the rescale parameter, part of the training plan. By default, input images range from 0-255. If needed, you can modify this range using the rescale parameter to fit your requirements.

For detailed instructions, refer to the Augmentation Parameters section. Although you don't have access to the underlying training code, this feature gives you control over preprocessing steps.

Can I implement custom data augmentations?

Yes, you can include custom data augmentations, such as RandomResizedCrop, as a layer in the model file. If the model passes the checks during the upload process, these custom augmentations will be part of your training plan.

You can select and use any of the augmentations available in training plan. For more detailed list please refer Augmentation Parameters section.

What are the limitations of custom data augmentations included in my model file?

The main limitation is that your model must pass the provided checks. By following our model structure guidelines,you ensure that your custom augmentation layers are compatible with our federated learning infrastructure. If the checks are successful, your custom augmentations will be included in the training.

What libraries or packages are used for image augmentations?

We use the ImageDataGenerator API from TensorFlow's Keras for image augmentations in our federated learning infrastructure. You have control over the hyperparameters for augmentations, with detailed explanations provided in our documentation. For more information, you can also refer to the official TensorFlow or Keras documentation.

For PyTorch, we utilize the Albumentations library for augmentation. While the training code remains proprietary, the documentation offers transparency about how augmentations are applied.

Can I run a training on a small part of the dataset?

Yes, you can train on a subset of the dataset. To do this, pass a dictionary to the trainingClasses parameter:

For example, if the dataset contains: {'car': 3000, 'person': 3000}

You can use a subset of 30 images from each class:

trainingObject.trainingClasses({'car': 30, 'person': 30})

Refer to the Dataset Parameters section for more details.

What are the conditions for a subdataset?

The conditions for a subdataset are as follows:

  • The subdataset should be specified as a dictionary, with the class name as the key and the desired number of images as the value.
  • Each class must be present in the subdataset.
  • The number of images per class should be between 20 and the maximum number of images available for that class.