General
General FAQs
What is the tracebloc package?
The tracebloc_package
is a python package that helps you create your machine learning projects. For more details on what it can do have a look at the tracebloc package documentation.
How to update the tracebloc repository?
There will be no updates conducted without notifying you. If you want to update your tracebloc directory manually, you can open GitBash and discard any changes with:
git stash
Now you would need to update your directory again with:
git pull
How are FLOPs calculated?
To determine the FLOPs that are utilized during your models training, we use the following equation:
utilizedTrainingFLOPS = machineFLOPs x CPUUtilizedTraining x totalTrainingTime
With
utilizedTrainingFLOPs : the actual FLOPS used by the system to complete the model training.
machineFLOPs : the total FLOPs that a machine with a given processor, CPU and memory can execute.
CPUUtilizedTraining : the percentage of the CPU utilized by the code during the training process on the edge.
totalTrainingTime : the total time taken by the code to complete the training on the edge.
What are the various stages involved in the Training and Evaluation of my model?
The Training / Evaluation process can be divided into four stages:
Model Selection: This involves choosing the right architecture and machine learning model that suits your specific task. You can explore available sample models in our model zoo.
Parameter Selection: In this stage, you'll fine-tune various parameters of the model such as epochs, cycle, learning rate, batch size, number of layers, and augmentation techniques to enhance the model's performance.
Starting the Models Training: Once you've selected the model and its parameters, you can initiate the training process.
Model Inference: After the model is trained, you can evaluate its performance on a separate test dataset. This is done by submitting an experiment for a specific cycle. Common evaluation metrics are the models accuracy, loss, precision, recall, and F1-score.
These four high-level stages guide you through the process of effectively training and assessing your machine learning models. For more in depth details on each step have a look at the model training Guides for Google Colab or locally using a Jupyter Notebook. Have a look at the training steps.
How can I evaluate the performance of my model?
There are two ways for you to evaluate your models performance:
- Downloading the model and the corresponding trained weigths from your experiment and evaluating it on a sample dataset locally.
- Submitting the experiment for a particular cycle.
After you submit your experiment, we run the model with your cycle-specific weights on a separate test dataset. The results will appear on the "Experiments" tab, but it might take a while.
How are FLOPs assigned?
Every new User is provided with 2PF (2 Peta FLOPs) on joining tracebloc platform. Every month an additional amount of 1PF is provided to the user. For competitions, FLOPs are provided separately upon joining the competition. You will not be given additional competition specific FLOPs for the remainder of that competition, so use them wisely.
Where can I go if I have more specific questions?
For more specific questions simply head over to our Discord, where our team will be answering your questions personally ☺️.
What is federated learning?
Federated learning is a machine learning approach that allows a model to be trained across multiple decentralized edge devices or servers holding local data samples, without exchanging them. In traditional centralized machine learning, data is typically collected and sent to a central server for model training. In federated learning, the model is sent to the data, and the data stays on the edge devices or local servers.
How is federated learning implemented on tracebloc?
On tracebloc, data resides on client machines, and the training process occurs on these client machines. The trained model is then aggregated and averaged on a central server. Here is a short overview on the process:
- Model Initialisation: In this step the model provided by the user is uploaded to our platform via the
tracebloc_package
. - Model Training: After the training plan for the model has been set and an experiment is started, the model is sent to the client where the actual data resides and the corresponding training starts.
- Transmission of updated weights: Upon completion of one training cycle, the weights for that cycle from each client are encoded and sent to the server along with all the training parameters.
- Weights Averaging: Once the weights from all clients are recieved, they are averaged and then saved on the server. If the training includes multiple cycles, the whole process is repeated with the updated weights.
What is a "cycle" in my models training?
In our federated learning architecture the training data is distributed onto different clients. On each client, each cycle is trained and then the finished cycles are averaged. The averaged value is visible in the frontend. This means that every cycle has the averaged data for that cycle only.
Example: If I set up my training plan so that my model trains for 5 epochs and 5 cycles, the data for each finished cycle only contains the averaged values from the 5 previous training epochs.
Which types of Federated Learning Algorithms are supported?
In Tracebloc, the Federated Averaging (FedAvg) type of Federated Learning Algorithm is supported. There are two more types of Federated Learning algorithms: Federated Stochastic Gradient Descent (FedSGD) and Federated Learning with Dynamic Regularization (FedDyn), which are currently not supported in our architecture but will soon be enabled. For more information on these algorithms, you can refer to the following link: https://www.v7labs.com/blog/federated-learning-guide#h3.