Skip to main content

Data

Is it possible to download the dataset?

No, downloading the dataset is not an option. Unlike some other platforms that offer direct access to their data, our platform has a distinct goal of connecting data scientists with real-time data. To preserve data privacy and security, the dataset remains hidden.

However, we do offer sample datasets that can be accessed through the Data icon on the experiments page, along with additional dataset information.

How does rescaling work and can you modify the image input size?

Our federated learning infrastructure allows you to rescale the input image values using the rescale parameter, which is part of the training plan. By default, the input images have values ranging from 0-255. If you wish to modify this, you can use the rescale parameter to scale these values according to your needs. Detailed instructions on how to do this can be found in our documentation under the Augmentation Parameters section. This way, even though you don't have access to the underlying training code, you still have control over important preprocessing steps.

Can I implement custom data augmentations?

Yes, you can include custom data augmentations like e.g. RandomResizedCrop as a layer in the model file itself. This will be part of your training plan if the model checks pass during the model file upload process.

What are the limitations of custom data augmentations included in my model file?

The only limitation for including custom augmentations in your model file is that it should pass the model checks. If you follow our model structure guidelines it will ensure the custom augmentation layer to be compatible with our federated learning infrastructure. If your model passes these checks, then your custom augmentation will be included in the federated training.

What libraries or packages are used for image augmentations?

We utilize the ImageDataGenerator API from TensorFlow's Keras for image augmentations in our federated learning infrastructure. You have full control over the hyperparameters for these augmentations, and we provide detailed explanations for each parameter in our documentation. For even more comprehensive information, you can also refer to the official TensorFlow or Keras documentation. Similarly, for PyTorch we use the albumentations library for augmentation. We understand the desire for transparency in the training process, the training code itself remains proprietary. However, the documentation should provide you with sufficient details to understand how the image augmentations are being applied.

Can I run a training on a small part of the dataset?

Yes, you can use a subdataset for your training. To select a subdataset, pass the corresponding dictionary to the trainingClasses parameter: If the dataset contains {'car': 3000, 'person': 3000}:

trainingObject.trainingClasses({'car': 30, 'person': 30})

Check out the Dataset Parameters section for more information.

What are the conditions for a subdataset?

The conditions for subdataset are:

  • The subadatset should be specified as a dictionary with the image class name as its key and the desired number of images for that class as its value.
  • Each class should be present in the subdataset.
  • Each class should have a number of images between 20 and the max number of images for that class.