Python 3x Pandas Django

Input & Output Tools


Creating Series and DataFrame's from scratch is nice but what you'll usually be doing is importing your data in the form of a .csv (comma separated value) or spreadsheet file or other types of files.

Pandas allows for easy importing of data like this through functions such as pd.read_csv() and pd.read_excel() (for Microsoft Excel files).

Say you wanted to get this information from excel Sheet document into a pandas DataFrame.

Importing Data

.read_excel()

let's create a sample input file Input_File.xlsx

Pandas DataFrame Anatomy

Using with pd.ExcelFile() will automatically close the file once read\write the data.

import pandas as pd

with pd.ExcelFile("E:\\Input_File.xlsx") as reader:
    Car_Detail_df = pd.read_excel(reader)

print(Car_Detail_df)

You can also read specific sheet in the excel using sheet_name keyword

Example 1

import pandas as pd

with pd.ExcelFile("E:\\Input_File.xlsx") as reader:
    Car_Detail_df = pd.read_excel(reader, sheet_name='Sheet1')

print(Car_Detail_df)

Output:

     Make  Color  Odometer (KM)  Doors  Price Date Launched
0  Toyota  White         150043      4   4000    28/02/2000
1  Toyota   Blue          32549      3   7000    28/02/2000
2  Nissan  White         213095      4   3500    22/09/2005
3   Honda   Blue          45698      4   7500    22/09/2005
4  Toyota  White          60000      4   6250    28/02/2000

To read the date column correctly, we can use the argument parse_dates to specify a list of date columns.

with pd.ExcelFile("E:\\Input_File.xlsx") as reader:
    Car_Detail_df = pd.read_excel(reader,  parse_dates=["Date Launched"])

Car_Detail_df.info()

Output:


RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   Make           5 non-null      object
 1   Color          5 non-null      object
 2   Odometer (KM)  5 non-null      int64
 3   Doors          5 non-null      int64
 4   Price          5 non-null      int64
 5  Date Launched  5 non-null      datetime64[ns]
dtypes: datetime64[ns](1), int64(3), object(2)
memory usage: 368.0+ bytes

Sometimes date is split up into multiple columns, for example, year, month, and day.

excel input file

To combine them into a datetime, we can pass a nested list to parse_dates.

import pandas as pd

with pd.ExcelFile("E:\\Input_File.xlsx") as reader:
    Car_Detail_df = pd.read_excel(reader,
                           parse_dates={"Date Launched" : ['Year', 'Month', 'Day']})

Car_Detail_df.info()

Output:


RangeIndex: 5 entries, 0 to 4
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   Make           5 non-null      object
 1   Color          5 non-null      object
 2   Odometer (KM)  5 non-null      int64
 3   Doors          5 non-null      int64
 4   Price          5 non-null      int64
 5  Date Launched  5 non-null      datetime64[ns]
dtypes: datetime64[ns](1), int64(3), object(2)
memory usage: 368.0+ bytes

Customizing a date parser

By default, date columns are parsed using the Pandas built-in parser from dateutil.parser.parse. Sometimes, you might need to write your own parser to support a different date format, for example, YYYY-DD-MM HH:MM:SS

There are nubmer of ways available in python to customize the date but The easiest way is to write a lambda function which can read the data in this format, and pass the lambda function to the argument date_parser.

import pandas as pd
from datetime import datetime
custom_date_parser = lambda x: datetime.strptime(x, "%d/%m/%Y")

with pd.ExcelFile("E:\\Input_File.xlsx") as reader:
    Car_Detail_df = pd.read_excel(reader,
                           parse_dates=["Date Launched"],
                           date_parser=custom_date_parser)

print(Car_Detail_df)

Output:

  Make  Color  Odometer (KM)  Doors  Price Date Launched
0  Toyota  White         150043      4   4000    2000-02-28
1  Toyota   Blue          32549      3   7000    2000-02-28
2  Nissan  White         213095      4   3500    2005-09-22
3   Honda   Blue          45698      4   7500    2005-09-22
4  Toyota  White          60000      4   6250    2000-02-28

Exporting Data

After you've made a few changes to your data, you might want to export it and save it so someone else can access the changes.

pandas allows you to export DataFrame's to .csv format using .to_csv() or spreadsheet format using .to_excel() or other types of files using pandas functions.

We haven't made any changes yet to the Car_Detail_df DataFrame but let's try export it.

Car_Detail_df.to_csv("E:\\Output_File.xlsx")

Output:

excel output file
pandas dataframe export with index

You see some additional column append in the dataframe as it's a index column which created while importing the excel data. To get rid off this index columns use index=False

Car_Detail_df.to_csv("E:\\Output_File.xlsx", index=False)

excel output file

If you have any doubts or queries related to this chapter, get them clarified from our Python Team experts on ibmmainframer Community!