EDA in Machine learning| overview of EDA in Machine learning

EDA in Machine Learning – Table of Content

What is Exploratory Data Analysis (EDA)?

A method for summarizing data, identifying patterns and relationships, and detecting outliers is exploratory data analysis. This type of data analysis is most often used when the data set is large or complex, and it can help with data comprehension. There are numerous techniques for exploratory data analysis, but the most common include visual methods like plotting data on a graph and statistical methods like calculating summary statistics. Exploratory data analysis is an important step in data analysis that can be used on both qualitative and quantitative data.

Want to Become a Master in Machine Learning? Then visit here to Learn Machine Learning Training

Steps Involved in Exploratory Data Analysis

Let us look into the various steps involved in Exploratory Data Analysis

Identifying the Data Source(s) and Data Collection

To understand the data, identify the data source(s) and the data collection process first. It is possible to use primary or secondary data sources. If the data comes from a primary source, it was gathered by the study’s researcher(s). If the data is from a secondary source, it was collected by someone other than the researcher(s) and made available for use.

Following the identification of the data source(s), the next step is to understand the data collection procedure. Understanding how the data was gathered and what biases, if any, may exist in the data is part of this. Researchers can interpret data more accurately if they understand the data collection process.

Machine Learning

Machine learning is a rapidly expanding data science field with enormous potential in exploratory data analysis (EDA). EDA has traditionally been performed manually by inspecting data sets for patterns and trends. Machine learning, on the other hand, enables us to automate this process and have computers do the work for us. There are several machine learning algorithms available for EDA, each with its own set of benefits and drawbacks. There are several popular machine learning algorithms and how they can be used to improve your EDA.

Exploratory Data Analysis(EDA)

Exploratory Data Analysis is a critical component involved while working with data. Exploratory data analysis is used to comprehensively understand the data and discover all of its characteristics, typically by employing visual techniques. This makes it possible for you to understand your data more thoroughly and find interesting patterns in it.

1. Load .csv files

A CSV (comma-separated values) file is a type of text file that saves data in a table-structured format using a specific format.

2. Dataset Information

You must first understand your dataset in order to perform an Exploratory Data Analysis (EDA). This includes understanding the dataset’s data type, what each column represents, and any other relevant information. This understanding is critical for properly performing an EDA because it will help you know what to look for and how to analyze the data.

3. Data Cleaning/Wrangling

To perform effective Exploratory Data Analysis (EDA), your data must first be cleaned and wrangled. The process of transforming raw data into a format suitable for analysis is known as data wrangling. This usually involves removing invalid or irrelevant data, dealing with missing values, and standardizing data types. You can begin EDA once your data is in good shape.

4.Group by names

One of the first steps in Exploratory Data Analysis is to group data by one or more variables (EDA). This helps us understand the relationships between the variables and identify any trends or patterns. There are several approaches to data grouping, but one of the most common is to group by name. The groupby() function in Pandas can be used to accomplish this. To group by name, we must first create a dataframe with columns for each variable. For this example, we’ll use the dataframe:

| name | age | gender |

|——|—–|——–|

| John | 20 | Male |

| Jane | 21 | Female |

| Dave | 22 | Male |

| Emily | 23 | Female |

5.Summary of Statistics

Your sample data is summarized and informed by summary statistics. It gives details about the values in your data set. Determine where the mean is and whether or not your data is skewed.