Skip to content
Snippets Groups Projects
Commit 5ad3d496 authored by Bouyahya Zied's avatar Bouyahya Zied
Browse files

Delete Questions_EDA.md

parent 18ebc6f1
No related branches found
No related tags found
No related merge requests found
# Exploratory Data Analysis (EDA) Questions
These questions are designed to help students understand the concepts of EDA and apply them to the provided dataset. The questions cover general concepts, data cleaning, univariate analysis, outlier detection, visualization, critical thinking, application, and reflection.
---
## **General Questions**
1. What is the purpose of Exploratory Data Analysis (EDA)?
2. Why is it important to inspect the first few rows of a dataset before starting EDA?
3. What does the `df.info()` function tell us about the dataset?
4. Why is it useful to set a column as the index in a DataFrame?
5. What is the difference between `df.head()` and `df.info()`?
---
## **Data Cleaning Questions**
6. What does the `isnull().sum()` function do? Why is it important to check for missing values?
7. How do we handle missing values in the 'reviews_per_month' column? Why did we choose to fill them with 0?
8. Why might we drop columns like 'last_review' and 'host_name' from the dataset?
9. What does the `dropna()` function do? Why is it important to remove rows with missing values in certain columns?
10. What is the purpose of checking for duplicates in the dataset? How do we remove them?
---
## **Univariate Analysis Questions**
11. What is univariate analysis, and why is it important in EDA?
12. What are the numerical variables in this dataset? Why are they important to analyze?
13. What is the purpose of using histograms in EDA?
14. What are bins in a histogram, and why are they important?
15. What is Sturges' Rule, and how is it used to determine the number of bins in a histogram?
---
## **Outlier Detection Questions**
16. What are outliers, and why are they important to identify in a dataset?
17. How did we filter out extreme outliers in the 'price' column? Why did we choose $1000 as the threshold?
18. What does the `describe()` function tell us about the 'price' column?
19. How does removing outliers affect the distribution of data in a histogram?
---
## **Visualization Questions**
20. What does the `sns.histplot()` function do? How does it help us understand the distribution of the 'price' column?
21. What is the purpose of the `kde=True` parameter in the `sns.histplot()` function?
22. How do bin edges help us interpret the histogram?
23. What does the title and axis labels of a histogram tell us about the data?
---
## **Critical Thinking Questions**
24. Why is it important to clean and preprocess data before performing EDA?
25. What challenges might arise if we don't handle missing values or outliers in a dataset?
26. How does the choice of bin size affect the interpretation of a histogram?
27. What other visualizations (besides histograms) could be useful for univariate analysis?
28. If you were to analyze another column (e.g., 'number_of_reviews'), how would you approach it?
---
## **Application Questions**
29. Using the `describe()` function, what insights can you gather about the 'minimum_nights' column?
30. Create a histogram for the 'availability_365' column. What does the distribution tell you?
31. Filter the dataset to include only listings with prices below $500. How does this affect the distribution of prices?
32. Calculate the number of bins for the 'number_of_reviews' column using the Square Root Rule. How does it compare to Sturges' Rule?
33. What other columns in the dataset might benefit from outlier detection and filtering? Why?
---
## **Reflection Questions**
34. What did you learn about the dataset from this EDA process?
35. What challenges did you face while performing EDA, and how did you overcome them?
36. How would you explain the importance of EDA to someone who has never worked with data before?
37. If you were to present your findings to a non-technical audience, how would you summarize the key insights?
38. What additional steps would you take to further analyze this dataset?
---
## **Bonus Questions**
39. What is the difference between univariate, bivariate, and multivariate analysis?
40. How would you use correlation analysis to explore relationships between numerical variables in this dataset?
41. What is the purpose of using a boxplot in EDA? Create a boxplot for the 'price' column and interpret the results.
42. How would you use `groupby` and aggregation functions to analyze categorical variables like 'neighbourhood_group'?
---
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment