Pandas Overview

So, What exactly Pandas is?

Pandas is an open-source Python library that provides high-performance, easy-to-use data structure, and data analysis tools for the Python programming language.Python with pandas is used in a wide range of fields, including academics, retail, finance, economics, statistics, analytics, and many others.

Python pandas is well suited for different kinds of data, such as:

  • Ordered and unordered time series data
  • Unlabeled data
  • Any other form of observational or statistical data sets
Image for post
Image for post
Source: Google

Now, How to install pandas on your system? Just run this command

#on Anaconda
conda install pandas

Let’s know more about very integral part of pandas, which are dataframes and series.The key data structures in pandas are :

1.Series is a one-dimensional array that can contain any type of data. You can create a series by using the following constructor:

pandas. Syntax: Series(data, index, dtype, copy)

Image for post
Image for post

2. A DataFrame is a multi-dimensional data structure in which data is arranged in the form of rows and columns. You can create a DataFrame using the following constructor:

pandas. Syntax: DataFrame(data, index, columns, dtype, copy)

Image for post
Image for post

3. Panel: It is a heterogeneous data structure which is three dimensional in format. which handles data in panels.

Image for post
Image for post

Creating DataFrames:

There are many ways to create a DataFrame, but a great option is to just use a simple dict. Ex:

purchases = pd.DataFrame(data)  

DataFrame operations:

To view Data:

The first thing to do when opening a new dataset is print out a few rows to keep as a visual reference. We accomplish this with .head():

  1. .head() outputs the first five rows of your DataFrame by default, but we could also pass a number as well: purchases_df.head() would output the top ten rows, for example.

To see the last five rows use .tail().

2. tail() also accepts a number, and in this case we printing the bottom two rows.:

3. .info() should be one of the very first commands you run after loading your data, provides the essential details about your dataset, such as the number of rows and columns, the number of non-null values, what type of data is in each column, and how much memory your DataFrame is using

Thanks for giving your valuable time.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store