1. What is data analysis?
ANS : (b) Data analysis is used to discover useful information, make conclusions, and support decision-making.
2. Which of the following is NOT a type of data analysis?
ANS : (d) The main types of data analysis are descriptive, diagnostic, predictive, and prescriptive. "Creative analysis" is not a recognized type.
3. What is the purpose of exploratory data analysis (EDA)?
ANS : (a) EDA helps in understanding patterns, trends, and relationships within the dataset.
4. What does SQL stand for?
ANS : (a) SQL is used to manage and manipulate relational databases.
5. What does "Big Data" refer to?
ANS : (b) Big Data involves high volume, velocity, and variety of data.
6. What is the function of a data warehouse?
ANS : (c) Data warehouses aggregate and store structured data from different sources for reporting and analysis.
7. What does a histogram represent?
ANS : (b) Histograms show how numerical values are distributed over intervals.
8. Which Python library is commonly used for data analysis?
ANS : (b) NumPy provides numerical computing tools, making it essential for data analysis.
9. What is the role of a data analyst?
ANS : (c) Data analysts work with data to uncover trends, insights, and solutions.
10. What does "data cleaning" involve?
ANS : (b) Data cleaning ensures accuracy and consistency for analysis.
11. What is a key characteristic of structured data?
ANS : (b) Structured data is organized in a predefined format, like databases.
12. What is the purpose of regression analysis?
ANS : (b) Regression analysis helps determine how variables relate to each other.
13. What is a KPI in data analysis?
ANS : (a) KPIs measure business success using specific metrics.
14. What is an outlier in data?
ANS : (b) Outliers can indicate errors or special cases.
15. What is the primary tool for creating pivot tables in Excel?
ANS : (d) Pivot Tables help summarize and analyze large datasets.
16. What is the significance of data visualization?
ANS : (b) Data visualization helps identify patterns, trends, and insights.
17. Which chart type is best for showing trends over time?
ANS : (c) Line charts are ideal for visualizing trends over time.
18. What does "ETL" stand for in data analysis?
ANS : (a) ETL is a process for moving and preparing data for analysis.
19. Which of these tools is widely used for data visualization?
ANS : (a) Power BI is a Microsoft tool for interactive data visualization.
20. What is the main goal of hypothesis testing in data analysis?
ANS : (b) Hypothesis testing helps determine the validity of assumptions using statistical methods.
21. What is the difference between correlation and causation in data analysis?
ANS : (d) Correlation refers to a relationship between two variables, but it does not imply that one causes the other.
22. What is a p-value in hypothesis testing?
ANS : (c) The p-value represents the probability of obtaining a test statistic at least as extreme as the one observed, assuming the null hypothesis is true.
23. What is the purpose of data normalization?
ANS : (b) Data normalization scales the data to a specific range, typically between 0 and 1, to avoid bias in statistical or machine learning models.
24. Which of the following is a measure of central tendency?
ANS : (d) The mean, mode, and median are all measures of central tendency used to describe the center of a data distribution.
25. What is the purpose of outlier detection in data analysis?
ANS : (b) Outlier detection helps identify data points that significantly differ from the rest, which could distort statistical models.
26. What is a box plot used for in data analysis?
ANS : (a) A box plot visualizes the distribution, including the median, quartiles, and potential outliers in a dataset.
27. What does the term "data wrangling" refer to?
ANS : (a) Data wrangling involves cleaning, transforming, and organizing data into a usable format for analysis.
28. Which of the following is an example of unstructured data?
ANS : (b) A text document is an example of unstructured data, which lacks a predefined data model.
29. What is the purpose of feature engineering in data analysis?
ANS : (a) Feature engineering involves creating new features or modifying existing ones to improve the performance of machine learning models.
30. Which of the following is the first step in the data analysis process?
ANS : (a) The first step is data collection, where data is gathered from various sources before cleaning and analyzing.
31. What is the purpose of data imputation?
ANS : (b) Data imputation refers to replacing missing values with estimated or predicted values to complete the dataset.
32. What is multicollinearity in regression analysis?
ANS : (a) Multicollinearity occurs when two or more independent variables in a regression model are highly correlated, which can lead to unreliable estimates.
33. What is the purpose of cross-validation in machine learning?
ANS : (d) Cross-validation involves splitting the data into multiple subsets to test and validate the model on different data portions, reducing overfitting.
34. What does PCA (Principal Component Analysis) do in data analysis?
ANS : (a) PCA is a technique used to reduce the dimensionality of the data while retaining as much variance as possible.
35. What is a confusion matrix used for in classification models?
ANS : (b) A confusion matrix helps evaluate the performance of a classification model by comparing predicted results with actual values.
36. What is the purpose of a ROC curve in binary classification?
ANS : (b) The ROC curve plots the true positive rate (sensitivity) against the false positive rate, helping assess the performance of binary classification models.
37. What is a time series analysis used for?
ANS : (a) Time series analysis is used to analyze data points collected or recorded at specific time intervals and to forecast future values.
38. What is clustering in data analysis?
ANS : (b) Clustering is an unsupervised learning technique used to group similar data points into clusters based on their features.
39. What does feature scaling do in data analysis?
ANS : (a) Feature scaling is the process of standardizing the range of independent variables in the dataset, making the model more efficient and accurate.
40. What is outlier detection in data analysis?
ANS : (a) Outlier detection is the process of identifying and handling extreme or erroneous data points that could distort statistical analysis.
41. What is a box plot used for in data analysis?
ANS : (a) A box plot shows the distribution of data based on five statistics: minimum, first quartile, median, third quartile, and maximum.
42. What is a decision tree in machine learning?
ANS : (b) A decision tree is a supervised machine learning model used for classification and regression, making decisions based on data.
43. What is the difference between supervised and unsupervised learning?
ANS : (a) Supervised learning uses labeled data to train models, while unsupervised learning works with unlabeled data to find hidden patterns or groupings.
44. What does the term 'overfitting' mean in machine learning?
ANS : (b) Overfitting occurs when a model learns the details and noise of the training data to the extent that it negatively impacts the performance on new data.
45. What is the purpose of feature engineering in data science?
ANS : (a) Feature engineering involves creating new features or modifying existing ones to improve the performance of machine learning models.
46. What is the purpose of normalization in data analysis?
ANS : (a) Normalization rescales the features to a specific range, often [0,1], to ensure that no single feature dominates the model.
47. What is a histogram used for in data analysis?
ANS : (a) A histogram is used to visualize the distribution of numerical data by splitting it into bins and displaying the frequency of data points in each bin.
48. What is the purpose of a scatter plot?
ANS : (a) A scatter plot is used to visualize the relationship or correlation between two continuous variables.
49. What is the significance of the p-value in hypothesis testing?
ANS : (c) The p-value helps determine the significance of the results in hypothesis testing, with a smaller p-value indicating stronger evidence against the null hypothesis.
50. What is the purpose of a confusion matrix in classification models?
ANS : (a) A confusion matrix is used to evaluate the performance of classification algorithms by comparing predicted labels with actual labels.