EDA 10X Faster with Mito Without Any Coding
Updated: Jun 23, 2021
Guys just imagine if I say you that you need not have to write any code ,it will be auto generated by just some clicks on Mito sheet then you would wonder that if it is possible then your coding part becomes too easy right ?
So guys you are at right blog here i will going to tell you about Mito library which will auto generate python code for you and you have to just do some clicks and it is 10X faster for your exploratory data analyses.
Mito is a Jupyter-Lab extension and Python library that makes it super easy to manipulate data in the spreadsheet environment with GUI support and much more. In this article I will explain:
1. How to properly set up Mito
2. Usage of various features offered by the Mito
3. How this library generates Python equivalent code for all the changes done on the dataset
Installing and Running Mito
Mito is a Python Library and can be installed via the pip package manager. It requires Python 3.6 and above version
For installing Mito write this on cmd : pip install mitoinstaller
For running Mito write this on cmd : python -m mitoinstaller install
Now start your Jupyter lab
In the Jupyter lab, create a new notebook and initialize the Mitosheet
or the first time, you will be prompted to enter your email address for signup just do it and now the main part interface is here.
Loading the Dataset
To load a dataset in MitoSheets, simply click on Import. Then you will have two options:
1. Add a file from the current folder: This will list all the CSV files in the current directory and you will be able to select them from the dropdown menu.
2. Add the file by the path of the file: This will add only that particular file.
Now here you can see that i have choosen 2nd option and load my data with 2 files which is clearly visible on the mito sheet and you saw that we haven't write any code till now but when we where loading dataset mito automatically write its code that's the magic of mito library you can see the auto generated code below :
Plot Summary Status
To plot summary status or to filter the data or to sort the values into the dataset you have to click on the funnel icon for specific column and just choose filter/sort or summary status , here we are doing summary status as well as sorting of data so you can see in the fig that a plot has been created for column 'spending score.
Merging Two Datasets
Merging datasets is an essential part of data science projects. Generally, the datasets are divided among various tables so as to increase the accessibility and readability of information. Merging in Mitosheets is easy.
1. Click on “Merge” and select the data sources.
2. You need to specify the keys to which this merger should be done.
3. You can also select the columns from the data sources to be retained after the merger. By default, all the columns will be retained in the merged dataset.
Here we have merged both the dataset on 'Age' feature and you can see that new data frame has been created named as df3 .
You can see that we have merged your data on some parameter but we haven't written any code mito is auto generating code for waht we are doing here you can see that in this image :
Create Pivot Tables
A pivot table is an important excel function that summarizes the numeric variables based on one more categorical feature. To create such a table using Mito,
1. Click on “Pivot” and select the source dataset (by default the CSV loaded)
2. Select the rows, columns, and value column for the pivot table. You can also select the aggregation function for the values column. All the options such as sum, mean, median, min, max, count, and standard deviation are available.
3. After selecting all the necessary fields, you will get a separate table containing the Pivot table implementation.
you can see that how we have created your pivot table just by clicking but mito has generated python code automatically.
Graphs and Stats Generation
You can also generate graphs right in this extension without coding the plotting logic. By default, all the plots generated by this extension are made using Plotly. That means that plots are interactive and can be modified on the fly. One thing I noticed is that the code for graphs is not present in the next cell (Maybe developers push this in a later update)
There are two types of Graphs that can be generated using Mito:
1. By tapping the graphs button :
You will be presented with a sidebar menu to make the selections for the type of graph and the corresponding axes to be selected.
2. By tapping the column name :
When you tap the column name from the spreadsheet, the filters and sorting options are visible. But if you navigate to “Summary Stats”, a line plot or bar plot, depending upon the type of variable is displayed along with the summary of the variable. This summary changes for text and no text variables.
Save and Replay
All the transformations done on the dataset can be saved and used for other similar datasets. This takes the form of a macro or VBA in Excel. The same can be done via these functions too.
In this article, you got introduced to a new tool “Mito” – one-stop solution for implementing spreadsheet-like functions in a Python environment and getting the equivalent Python code for all the changes made.
References and credit
1. Krish Naik - he is an amazing teacher for Data Science you can just visit to his Youtube channel and explore this all concept.
Your feedback is appreciated!
Did you find this Blog helpful? Any suggestions for improvement? Please let me know by filling the contact us form or ping me on LinkedIn .