Training-Parameters
FAQ Related to Training Parameters
What is the difference between epochs and cycles?
In our federated learning infrastructure:
- An epoch refers to a complete forward and backward pass of all the training data.
- A cycle refers to a set of epochs completed across multiple nodes, after which the model weights from different nodes are averaged.
You specify cycles instead of epochs because our system averages weights and calculates metrics at the end of each cycle. This makes cycles a more meaningful unit in federated learning.
Should I run more epochs or more cycles in my training?
It is recommended to run fewer epochs and more cycles. For example, if you plan to run 25 epochs, it's better to run 5 epochs with 5 cycles rather than 25 epochs with 1 cycle.
How can I implement callbacks?
Callbacks can be implemented via the training plan. For details, refer to the hyperparameters section.
Can I use a pre-trained model and specify which layers should not be retrained?
Yes, you can freeze specific layers using the layersFreeze parameter in the training plan. More details can be found in the hyperparameters section.
Can I use the source code for my trainings?
Due to the proprietary nature of our federated learning infrastructure, we do not provide direct access to the training code. However, our documentation and guides should provide enough transparency about the training process.
How can I set my training plan?
You can customize your models training with different types of parameters according to your needs.
For more information on training parameters have a look at the training plan section of our documentation.
Can I check the default values for each parameter?
Yes, you can check the default values of each parameter before setting any value by running:
trainingObject.getTrainingPlan()
You can use the same command to check the updated values for each parameter.
How can I reset all parameters to their default value?
Use the following command to reset all training plan parameters to their default values:
trainingObject.resetTrainingPlan()
AttributeError: 'NoneType' object has no attribute 'experimentName' while submitting my training plan?
This error indicates that your Google Colab session has expired. Try restarting the notebook from the first step.
Why are my training epochs not showing up on the frontend?
It may take time for epochs to appear, especially if you're training on a large dataset. The training speed also depends on the batch size. To speed up training, consider selecting a smaller batch size or using a subdataset with the TrainingClasses parameter. More info is available in the dataset parameters section.
What happens if I pause an experiment?
Pausing an experiment stops training. When you resume, the experiment starts from the first epoch of the paused cycle.
For example: If you started an experiment with 10 epochs and paused at epoch 5, when you resume, the training will restart from epoch 1 of that cycle.
What does the "weights file truncated" error indicate?
This error indicates that the weights file is corrupted. Try downloading the weights file again and rename it according to your model, such as modelname_weights.pkl for TensorFlow or modelname_weights.pth for PyTorch.
Which values can i set for the validation and test dataset split?
The supported validation split range is (0, 0.5].
Which values can i set for the batch size of my training?
The supported batchsize range is (4, 128].