diff --git a/EDA Application/Instructions.md b/EDA Application/Instructions.md new file mode 100644 index 0000000000000000000000000000000000000000..d463afe195da25b8b44f01b4403e623b80cb40b0 --- /dev/null +++ b/EDA Application/Instructions.md @@ -0,0 +1,42 @@ +# Practical Work statement: Exploratory Data Analysis (EDA) on the cars dataset + +**Objective:** +The goal of this practical work is to further familiarize you with the key steps of Exploratory Data Analysis (EDA) using Python. You will manipulate a dataset, clean it, and analyze the data to extract useful insights. + +--- + +## Steps to follow: + +1. **Importing required libraries for EDA** + - Import essential Python libraries for data analysis, such as `pandas`, `numpy`, `matplotlib`, `seaborn`, and `scipy`. + +2. **Loading the data into a dataframe** + - Load the provided dataset into a dataframe using `pandas`. + - Display the first few rows of the dataset to get an overview of the data. + +3. **Checking data types** + - Identify the data types of each column in the dataset (e.g., integer, float, object). + - Ensure that the data types are apropriate for the analysis. + +4. **Dropping irrelevant columns** + - Remove any columns that are not relevant to the analysis or do not contribute to the insights. + +5. **Renaming columns** + - Rename columns to make them more descriptive or easier to work with. + +6. **Dropping duplicate rows** + - Identify and remove any duplicate rows in the dataset to ensure data integrity. + +7. **Handling missing or null values** + - Detect missing or null values in the dataset. + - Decide on a strategy to handle them (e.g., imputation, removal). + +8. **Detecting outliers** + - Identify outliers in the dataset using statistical methods or visualization techniques. + - Decide whether to remove, transform, or keep the outliers based on the contex. + +9. **Univariate, bivariate, and multivariate analysis** + - Perform univariate analysis to understand the distribution of individual variables. + - Conduct bivariate analysis to explore relationships between two variables. + - Perform multivariate analysis to understand interactions between multiple variables. + - Use visualizations (e.g., histograms, scatter plots, heatmaps) to support your analysis. \ No newline at end of file