Search

Exploratory Data Analysis

Updated: Jul 4, 2021

If I say do exploratory data analysis in just 2 lines of code you will say that it's just impossible right ? but hold down now it is possible just because of Sweetviz python library .


Exploratory Data Analysis is a process where we tend to analyze the dataset and summarize the main characteristics of the dataset often using visual methods. EDA is really important because if you are not familiar with the dataset you are working on, then you won’t be able to infer something from that data. However, EDA generally takes a lot of time.

But, what if I told you that python can automate the process of EDA with the help of some libraries? Won’t it make your work easier? So let’s start learning about Automated EDA.


In this article, we will work on Automating EDA using Sweetviz. It is a python library that generates beautiful, high-density visualizations to start your EDA. Let us explore Sweetviz in detail.

So let's start step by step


Import Libraries and Read Dataset


If you are unable to import Sweetviz library then just go to command prompt and just write

  • pip install sweetviz



Here we are just analysing only single data frame which is train .

Now you have to just write 2 line of codes in which for 1st line code you have to give your train data and the target variable and save it into a object (my_report) and then in second line of code you just have to pass this object to html and save it by some name(html_report).



Now if you dont want single data frame then we will use both train and test

In this for 1st line code you have to give your train data as well as test data and the target variable and save it into a object (my_report) and then in second line of code you just have to pass this object to html and save it by some name(html_report).



Now the time for look your html file(html_report) this file will be saved in notebook just search for it and open it then you will see the power of Sweetviz library.



Now you can see in this image there is a comparision between train and test data you have to now just focus what you want form data and look each and every feature diagram and information and use it for your model prediction.



So what do you think about this beautiful library? Go ahead try this and mention your experiences in the comment section.


References and credit

Krish Naik - He is an amazing teacher for Data Science you can just visit to his Youtube channel and explore this all concept.


Your Feedback is Appreciated!

Did you find this Blog helpful? Any suggestions for improvement? Please let me know by filling the contact us form or ping me on LinkedIn .

Thanks!






215 views0 comments

Recent Posts

See All