Skip to content
Snippets Groups Projects
Commit 4192979a authored by Bouyahya Zied's avatar Bouyahya Zied
Browse files

Instructions

parent ed10c8be
No related branches found
No related tags found
No related merge requests found
# Practical Work statement: Exploratory Data Analysis (EDA) on the cars dataset
**Objective:**
The goal of this practical work is to further familiarize you with the key steps of Exploratory Data Analysis (EDA) using Python. You will manipulate a dataset, clean it, and analyze the data to extract useful insights.
---
## Steps to follow:
1. **Importing required libraries for EDA**
- Import essential Python libraries for data analysis, such as `pandas`, `numpy`, `matplotlib`, `seaborn`, and `scipy`.
2. **Loading the data into a dataframe**
- Load the provided dataset into a dataframe using `pandas`.
- Display the first few rows of the dataset to get an overview of the data.
3. **Checking data types**
- Identify the data types of each column in the dataset (e.g., integer, float, object).
- Ensure that the data types are apropriate for the analysis.
4. **Dropping irrelevant columns**
- Remove any columns that are not relevant to the analysis or do not contribute to the insights.
5. **Renaming columns**
- Rename columns to make them more descriptive or easier to work with.
6. **Dropping duplicate rows**
- Identify and remove any duplicate rows in the dataset to ensure data integrity.
7. **Handling missing or null values**
- Detect missing or null values in the dataset.
- Decide on a strategy to handle them (e.g., imputation, removal).
8. **Detecting outliers**
- Identify outliers in the dataset using statistical methods or visualization techniques.
- Decide whether to remove, transform, or keep the outliers based on the contex.
9. **Univariate, bivariate, and multivariate analysis**
- Perform univariate analysis to understand the distribution of individual variables.
- Conduct bivariate analysis to explore relationships between two variables.
- Perform multivariate analysis to understand interactions between multiple variables.
- Use visualizations (e.g., histograms, scatter plots, heatmaps) to support your analysis.
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment