Ministry of Road Transport and Highways Deptt. of Road Transport and Highways

Publication of notification under Section 3D

Issue Date: 30 May 2025 Published: 04 June 2025

Gazette ID: CG-DL-E-04062025-263580 Download PDF

Back to Feed

Core Purpose

This document explains how the value of `questions_per_cycle` significantly impacts the active learning process, affecting accuracy curves and computational costs.

Detailed Summary

The value of `questions_per_cycle` determines the rate of training data growth per active learning cycle, with larger values accelerating growth and smaller values slowing it. This parameter also affects the granularity of active learning; a larger `questions_per_cycle` results in fewer active learning cycles and less frequent model retraining with larger data batches, while a smaller value leads to more frequent retraining with smaller batches, enabling the acquisition function to more often leverage the current model state for sample selection. Regarding accuracy, a larger `questions_per_cycle` generally leads to faster initial improvements but potentially a lower peak accuracy. Conversely, a smaller `questions_per_cycle` can achieve a higher final or peak accuracy through more strategic, iterative sample selection, though this comes with an increased computational cost due to more retraining steps. In essence, `questions_per_cycle` represents a trade-off between the speed of achieving a certain accuracy level and the potential for reaching a higher maximum accuracy.

Full Text

Based on the standard active learning process (and assuming the `_generate_training_examples` function is intended to select `questions_per_cycle` from the unlabeled set to add to the training set): Changing the value of `questions_per_cycle` significantly impacts the active learning process and, consequently, the accuracy curve over time and potentially the final achieved accuracy. Here's how: 1. **Rate of Training Data Growth:** `questions_per_cycle` determines how many new labeled instances are added to the training set in each active learning cycle. * A **larger `questions_per_cycle`** means the training set grows faster per cycle. * A **smaller `questions_per_cycle`** means the training set grows slower per cycle. 2. **Granularity of Active Learning:** The size of `questions_per_cycle` affects how frequently the model is retrained and how granular the sample selection process is. * A **larger `questions_per_cycle`** leads to fewer active learning cycles for a given pool of unlabeled data. The model is retrained less often with larger batches of new data. The acquisition function gets fewer opportunities to update its understanding of the data distribution and uncertainty landscape based on the latest model state. * A **smaller `questions_per_cycle`** leads to more active learning cycles. The model is retrained more often with smaller batches of new data. The acquisition function can more frequently leverage the current state of the model to select the *most* informative samples available at that precise moment, potentially leading to a more efficient exploration of the uncertain regions of the data space. 3. **Impact on Accuracy:** * **Speed of initial convergence:** A **larger `questions_per_cycle`** will generally lead to faster initial improvements in accuracy because the model receives a larger influx of new labeled data in the early stages. It might reach a moderate level of accuracy quickly. * **Potential for peak accuracy:** A **smaller `questions_per_cycle`** *can* potentially lead to a higher *final* or *peak* accuracy, especially if the acquisition function is effective. By querying fewer, potentially more impactful samples in each step, the active learning process can guide the model's learning more strategically. This fine-grained approach might help the model learn critical distinctions or cover diverse uncertain regions more effectively over many cycles, compared to adding a larger, potentially less curated batch. * **Computational cost:** Retraining the model is often the most expensive part of an active learning cycle. A **larger `questions_per_cycle`** means fewer retraining steps are needed to consume the unlabeled pool, which can be computationally more efficient overall, despite training on larger datasets in each step. A **smaller `questions_per_cycle`** requires more retraining steps, increasing computational cost but potentially improving the efficiency of data acquisition itself (getting more "bang for your buck" in terms of information gain per labeled sample). In summary, `questions_per_cycle` represents a trade-off between the speed of reaching a certain accuracy level and the potential to achieve a higher maximum accuracy through more strategic, iterative sample selection. * **Higher `questions_per_cycle`**: Faster, less granular selection, potentially lower peak accuracy. * **Lower `questions_per_cycle`**: Slower, more granular selection, potentially higher peak accuracy (if acquisition is good), but higher computational cost due to more retraining steps.

Never miss important gazettes

Create a free account to save gazettes, add notes, and get email alerts for keywords you care about.

Sign Up Free