1. What is the primary goal of Data Science?
ANS : (b) The goal of Data Science is to analyze and interpret complex data to make data-driven decisions.
2. Which of the following is NOT a part of Data Science?
ANS : (c) Data Science involves Machine Learning, Statistics, and Data Visualization, but Web Development is a separate field.
3. What is supervised learning in Machine Learning?
ANS : (b) Supervised learning uses labeled datasets to train models for prediction.
4. Which library is commonly used for data manipulation in Python?
ANS : (b) Pandas provides data structures like DataFrame and Series for easy manipulation.
5. What does ‘NaN’ stand for in Data Science?
ANS : (a) NaN represents missing or undefined values in datasets.
6. Which of the following is NOT a type of Machine Learning?
ANS : (c) Associative Learning is a concept in psychology, not a Machine Learning type.
7. What is the purpose of an activation function in a neural network?
ANS : (b) Activation functions help neural networks learn complex patterns by adding non-linearity.
8. Which technique is used to handle missing data in a dataset?
ANS : (c) Handling missing data can involve dropping rows or imputing values based on statistical methods.
9. Which programming language is most commonly used in Data Science?
ANS : (c) Python is widely used due to its extensive libraries like Pandas, NumPy, and Scikit-learn.
10. What does ‘overfitting’ mean in Machine Learning?
ANS : (b) Overfitting happens when a model learns specific noise instead of general patterns, leading to poor generalization.
11. What is the purpose of cross-validation in Machine Learning?
ANS : (c) Cross-validation helps evaluate how a model performs on different subsets of data.
12. Which of the following is an unsupervised learning algorithm?
ANS : (c) K-Means is used for clustering unlabelled data into groups.
13. What does the term ‘Big Data’ refer to?
ANS : (b) Big Data consists of large datasets that require specialized tools for processing.
14. Which statistical measure is used to find the middle value in a dataset?
ANS : (b) The median is the middle value that separates a dataset into two equal halves.
15. What is the primary purpose of dimensionality reduction?
ANS : (b) Dimensionality reduction simplifies datasets while preserving important patterns.
16. Which of the following is a performance metric for classification models?
ANS : (b) Precision is used to measure the accuracy of a classification model’s positive predictions.
17. In a normal distribution, what percentage of data falls within one standard deviation from the mean?
ANS : (b) According to the Empirical Rule, 68% of data falls within one standard deviation.
18. What is the main purpose of feature scaling?
ANS : (b) Feature scaling ensures that all numerical features contribute equally to the model.
19. What is the key difference between classification and regression?
ANS : (a) Classification is for categorical outcomes, while regression is for continuous numerical predictions.
20. Which algorithm is best suited for text classification tasks?
ANS : (b) Naïve Bayes is commonly used for text classification due to its probabilistic nature.
21. What is overfitting in machine learning?
ANS : (a) Overfitting occurs when a model performs well on the training set but poorly on new, unseen data.
22. What is a confusion matrix in machine learning?
ANS : (a) A confusion matrix is a table used to describe the performance of a classification model on a set of data.
23. What is the purpose of the "learning rate" in machine learning?
ANS : (b) The learning rate determines how quickly a model updates its weights during training.
24. What is the difference between supervised and unsupervised learning?
ANS : (a) Supervised learning uses labeled data, while unsupervised learning works with unlabeled data.
25. What is a decision tree in machine learning?
ANS : (a) A decision tree is a predictive model that splits data based on feature values to predict an outcome.
26. What is cross-validation in machine learning?
ANS : (a) Cross-validation is used to assess how well a model generalizes to new data, helping to prevent overfitting.
27. What is feature engineering?
ANS : (a) Feature engineering involves selecting, modifying, or creating features to improve model performance.
28. What is the purpose of normalization in machine learning?
ANS : (a) Normalization adjusts the values of numeric features to a common scale to improve model performance.
29. What is a neural network?
ANS : (a) A neural network is a computational model inspired by the way biological neural networks process information.
30. What is a random forest in machine learning?
ANS : (a) A random forest is an ensemble learning method that combines multiple decision trees to improve prediction accuracy.
31. What is the purpose of regularization in machine learning?
ANS : (a) Regularization reduces overfitting by adding a penalty to the loss function for large coefficients.
32. What is the "bias-variance tradeoff" in machine learning?
ANS : (a) The bias-variance tradeoff is the balance between model complexity and how well it generalizes to new data.
33. What is the purpose of the "activation function" in a neural network?
ANS : (a) Activation functions introduce non-linearity to the neural network, allowing it to model complex relationships.
34. What is "gradient descent" in machine learning?
ANS : (a) Gradient descent is an optimization algorithm that adjusts model parameters to minimize the loss function.
35. What is the purpose of cross-entropy loss in classification problems?
ANS : (a) Cross-entropy loss is used to measure the difference between the predicted and true probabilities for classification tasks.
36. What is the difference between L1 and L2 regularization?
ANS : (a) L1 regularization adds a penalty based on the absolute values of the coefficients, while L2 adds a penalty based on the squared values.
37. What is the difference between bagging and boosting?
ANS : (a) Bagging builds models independently and then combines them, while boosting builds models sequentially to correct previous model errors.
38. What is dimensionality reduction?
ANS : (a) Dimensionality reduction reduces the number of input features in a dataset to simplify models and improve efficiency.
39. What is feature scaling?
ANS : (a) Feature scaling standardizes or normalizes data to ensure that all features have the same scale, improving model performance.
40. What is "ensemble learning" in machine learning?
ANS : (a) Ensemble learning combines multiple models to create a stronger overall model, improving accuracy.
41. What is the "curse of dimensionality" in machine learning?
ANS : (a) The curse of dimensionality refers to the issue of model performance deteriorating as the number of features increases in a dataset.
42. What is the difference between supervised and unsupervised learning?
ANS : (a) Supervised learning involves training models on labeled data, while unsupervised learning works with unlabeled data.
43. What is overfitting in machine learning?
ANS : (a) Overfitting occurs when a model becomes too complex and performs well on training data but poorly on unseen data.
44. What is the purpose of the "learning rate" in a machine learning model?
ANS : (a) The learning rate controls how much the model’s parameters are adjusted with respect to the loss gradient during training.
45. What is the purpose of dropout in neural networks?
ANS : (a) Dropout is used to prevent overfitting by randomly setting a fraction of input units to zero during training, forcing the model to generalize better.
46. What is a confusion matrix in machine learning?
ANS : (a) A confusion matrix is a table that is used to evaluate the performance of a classification model by comparing the actual vs predicted values.
47. What is the purpose of feature engineering in machine learning?
ANS : (a) Feature engineering involves creating or selecting relevant features from raw data that will improve the performance of machine learning models.
48. What is the "bias-variance tradeoff" in machine learning?
ANS : (a) The bias-variance tradeoff refers to balancing model complexity to avoid underfitting (high bias) or overfitting (high variance).
49. What is the difference between bagging and boosting?
ANS : (a) Bagging trains models independently on different subsets of the data, while boosting trains models sequentially, giving more weight to misclassified points.
50. What is the purpose of the "exploration phase" in data science?
ANS : (a) The exploration phase involves analyzing and preparing data to better understand its structure and suitability for modeling.