Microsoft Azure AI Fundamentals (AI-900) Practice Exam

Disable ads (and more) with a membership for a one time $4.99 payment

Prepare for the Microsoft Azure AI Fundamentals certification with flashcards and multiple-choice questions. Enhance your understanding with helpful hints and explanations. Get ready for your certification success!

Practice this question and more.


When splitting data for machine learning training and evaluation, what is the recommended approach?

  1. Systematically segregate based on categories

  2. Use 80% of data for training and 20% for evaluation only

  3. Randomly split the data into rows for training and rows for evaluation

  4. Only use sequential data for evaluation

The correct answer is: Randomly split the data into rows for training and rows for evaluation

The recommended approach to splitting data for machine learning training and evaluation is to randomly split the data into rows for training and rows for evaluation. This method helps ensure that both subsets are representative of the overall dataset, which is crucial for the model's ability to generalize to unseen data. When the data is randomly split, it reduces the likelihood of introducing bias in the training and evaluation sets. For instance, if the data is arranged in a specific order, such as by date or category, a systematic or sequential split could lead to results that do not accurately reflect the model's performance across the entire dataset. This could skew evaluation metrics, making a model seem better or worse than it actually is when applied to new, unseen data. Random splitting allows for a more robust assessment of the model's performance because it exposes the model to a variety of examples during training, helping it to learn more comprehensively. Additionally, using a random sample helps maintain a balanced representation of different classes in classification tasks, ensuring that minority classes are not inadvertently excluded from either the training or evaluation set. This approach contrasts with methods such as using fixed proportions of the dataset for training and evaluation or relying solely on sequential data, which may not adequately test the model's robustness and versatility in real