General
General FAQs
What is the tracebloc package?
The tracebloc_package is a Python package that helps you create and manage your machine learning projects. For more details, refer to the tracebloc package documentation.
How to update the tracebloc repository?
There will be no updates conducted without notifying you. However, if you want to update your Tracebloc directory manually, follow these steps:
Open GitBash and discard any changes with:
git stash
Update your directory by running:
git pull
How are FLOPs calculated?
FLOPs utilized during model training are calculated using this formula:
utilizedTrainingFLOPS = machineFLOPs x CPUUtilizedTraining x totalTrainingTime
With
utilizedTrainingFLOPs : Actual FLOPs used by the system during training.
machineFLOPs : Total FLOPs a machine can execute based on its processor, CPU, and memory.
CPUUtilizedTraining : The percentage of CPU used during the training process.
totalTrainingTime : Total time taken to complete the training.
What are the various stages involved in the Training and Evaluation of my model?
The Training/Evaluation process is divided into four stages:
Model Selection: Choose the right architecture and model for your task. You can explore available models in our model zoo.
Parameter Selection: Fine-tune model parameters like epochs, learning rate, batch size, and augmentation techniques.
Starting the Models Training: Run the experiment after setting up your training plan.
Model Inference: After the model is trained, you can evaluate its performance on a separate test dataset. This is done by submitting an experiment for a specific cycle. Common evaluation metrics are the models accuracy, loss, precision, recall, and F1-score.
These four high-level stages guide you through the process of effectively training and assessing your machine learning models. For more in depth details on each step have a look at the model training Guides for Google Colab or locally using a Jupyter Notebook. Have a look at the training steps.
How can I evaluate the performance of my model?
There are two ways for you to evaluate your models performance:
- Downloading the model and the corresponding trained weigths from your experiment and evaluating it on a sample dataset locally.
- Submitting the experiment for a particular cycle.
After you submit your experiment, we run the model with your cycle-specific weights on a separate test dataset. The results will appear on the "Experiments" tab, but it might take a while.
How are FLOPs assigned?
Every new User is provided with 2PF (2 Peta FLOPs) on joining tracebloc platform. Every month an additional amount of 1PF is provided to the user. For collaborations, FLOPs are provided separately upon joining the collaboration. You will not be given additional collaboration specific FLOPs for the remainder of that collaboration, so use them wisely.
Where can I go if I have more specific questions?
For more specific questions simply head over to our Discord, where our team will be answering your questions personally ☺️.
What is federated learning?
Federated learning is a machine learning approach that allows a model to be trained across multiple decentralized edge devices or servers holding local data samples, without exchanging them. In traditional centralized machine learning, data is typically collected and sent to a central server for model training. In federated learning, the model is sent to the data, and the data stays on the edge devices or local servers.
How is federated learning implemented on tracebloc?
On tracebloc, data resides on client machines, and the training process occurs on these client machines. The trained model is then aggregated and averaged on a central server. Here is a short overview on the process:
- Model Initialisation: In this step the model provided by the user is uploaded to our platform via the
tracebloc_package
. - Model Training: After the training plan for the model has been set and an experiment is started, the model is sent to the client where the actual data resides and the corresponding training starts.
- Transmission of updated weights: Upon completion of one training cycle, the weights for that cycle from each client are encoded and sent to the server along with all the training parameters.
- Weights Averaging: Once the weights from all clients are recieved, they are averaged and then saved on the server. If the training includes multiple cycles, the whole process is repeated with the updated weights.
What is a "cycle" in my models training?
In our federated learning architecture, the training data is distributed across multiple clients. Each cycle represents a set of training results that are averaged after being processed on the client side. The averaged results for each cycle are displayed in the frontend.
For example, if your training plan specifies 5 epochs and 5 cycles, the data for each cycle contains averaged values from the 5 epochs.
Which types of Federated Learning Algorithms are supported?
Tracebloc currently supports Federated Averaging (FedAvg). Additional algorithms like Federated Stochastic Gradient Descent (FedSGD) and Federated Learning with Dynamic Regularization (FedDyn) are planned for future support. For more information, you can visit guide on federated learning.