10

The problem is in line 22 :

if start_date <= data_entries.iloc[j, 1] <= end_date:

where I want to compare the start_date and end_date portion to data_entries.iloc[j, 1] which is accessing a column of the pandas dataframe. I converted the column to datetime using,

data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y")

But I am unsure how to convert it to date.

import pandas as pd import datetime entries_csv = "C:\\Users\\Pops\\Desktop\\Entries.csv" data_entries = pd.read_csv(entries_csv) data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y") start_date = datetime.date(2018, 4, 1) end_date = datetime.date(2018, 10, 30) for j in range(0, len(data_entries)): if start_date <= data_entries.iloc[j, 1] <= end_date: print('Hello') 
7
  • please provide mcve. Commented Jul 23, 2018 at 8:27
  • Check this. docs.python.org/2/library/… Commented Jul 23, 2018 at 8:32
  • @pratikmankar I was referring to this earlier actually. My problem is the pd.to_datetime() of pandas. I believe there is no pd.to_date() equivalent. The link below only converts to datetime, but based on my code, it does not work with datetime.date() when you try to compare them. stackoverflow.com/questions/16852911/… Commented Jul 23, 2018 at 8:41
  • 1
    you could data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y").dt.date Commented Jul 23, 2018 at 8:44
  • @MarcSantos - It is pandas version related problem, you can check this. Commented Jul 23, 2018 at 8:48

2 Answers 2

11

Just use pd.Timestamp objects without any conversion:

start_date = pd.Timestamp('2018-04-01') end_date = pd.Timestamp('2018-10-30') res = data_entries[data_entries['VOUCHER DATE'].between(start_date, end_date)] 

Explanation

Don't use datetime.datetime or datetime.date objects in Pandas series. This is inefficient because you lose vectorised functionality. The benefit of pd.Timestamp objects is you can utilize vectorised functionality for calculations. As described here:

numpy.datetime64 is essentially a thin wrapper an int64. It has almost no date/time specific functionality.

pd.Timestamp is a wrapper around a numpy.datetime64. It is backed by the same int64 value, but supports the entire datetime.datetime interface, along with useful pandas-specific functionality.

Sign up to request clarification or add additional context in comments.

1 Comment

Thanks for this as well. Really helpful
8

this converts it to date:

data_entries['VOUCHER DATE'] = pd.to_datetime(data_entries['VOUCHER DATE'], format="%m/%d/%Y").dt.date 

however i would not recommend filtering like this. this is much faster

data_entries[data_entries['VOUCHER DATE'].between(start_date, end_date)] 

read this article

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.