3

I am trying out the latest version of numpy 2.0 dev:

np.__version__ Out[44]: '2.0.0.dev-aded70c' 

I am trying to import CSV data that looks like this:

date,system,pumping,rgt,agt,sps,eskom_import,temperature,wind,pressure,weather 2007-01-01 00:30,481.9,,,,,481.9,15,SW,1040,Fine 2007-01-01 01:00,471.9,,,,,471.9,15,SW,1040,Fine 2007-01-01 01:30,455.9,,,,,455.9,,,, 

etc.

by using the following code:

convertdict = {0: lambda s: np.datetime64(s, 'm'), 1: lambda s: float(s or 0), 2: lambda s: float(s or 0), 3: lambda s: float(s or 0), 4: lambda s: float(s or 0), 5: lambda s: float(s or 0), 6: lambda s: float(s or 0), 7: lambda s: float(s or 0), 8: str, 9: str, 10: str} dt = [('date', np.datetime64),('system', float), ('pumping', float),('rgt', float), ('agt', float), ('sps', float) ,('eskom_import', float),('temperature', float), ('wind', str), ('pressure', float), ('weather', str)] a = np.recfromcsv(fp, dtype=dt, converters=convertdict, usecols=range(0-11), names=True) 

The dtype it generates for a.date is 'object':

array([2007-01-01T00:30+0200, 2007-01-01T01:00+0200, 2007-01-01T01:30+0200, ..., 2007-12-31T23:00+0200, 2007-12-31T23:30+0200, 2008-01-01T00:00+0200], dtype=object) 

But I need it to be datetime64, like in this example (but including hrs and minutes):

array(['2011-07-11', '2011-07-12', '2011-07-13', '2011-07-14', '2011-07-15', '2011-07-16', '2011-07-17'], dtype='datetime64[D]') 

It seems that the CSV import creates an embedded object datetype for 'date' rather than a datetime64 data type. Any ideas on how to fix this?

Grové

3
  • What if you change np.datetime64(s, 'm') to np.datetime64(s, 'D')? From the docs "The most basic way to create datetimes is from strings in ISO 8601 date or datetime format. The unit for internal storage is automatically selected from the form of the string, and can be either a date unit or a time unit. The date units are years (‘Y’), months (‘M’), weeks (‘W’), and days (‘D’), while the time units are hours (‘h’), minutes (‘m’), seconds (‘s’), milliseconds (‘ms’), and some additional SI-prefix seconds-based units." It seems you're using a time unit instead of a date unit. Commented Sep 29, 2011 at 15:44
  • It may also help to change your lambda to truncate the hours portion of your date: lambda s: np.datetime64(s[:10], 'D') Commented Sep 29, 2011 at 15:48
  • Thanks, this determines whether it is imported in date or time units, but does not address the issue of whether it is imported as an embedded object (Dtype='object') or imported as native datetime64 (Dtype='datetime64[?]'). I need it to be native datetime64. Commented Oct 2, 2011 at 6:36

1 Answer 1

1

I think the trick to avoid the generic 'object' type is to avoid using the recfromcsv function. Manually reading in your data file and parsing the information yields the requested dtype='datetime64[m]'

import numpy as np dt = np.dtype([ ('date', '<M8[m]'), ('system', '<f8'), ('pumping', '<f8'), ('rgt', '<f8'), ('agt', '<f8'), ('sps', '<f8'), ('eskom_import','<f8'), ('temperature', '<f8'), ('wind', np.str), ('pressure', '<f8'), ('weather', np.str) ]) numfields = len(dt.fields.keys()) data = np.zeros(numlines, dtype=dt) fid = open('data.csv', 'rb') count = 0 try: fieldnames = fid.readline().strip().split(',') #Header for line in fid: parsedline = line.strip().split(',') data['date'][count] = np.datetime64(parsedline[0], 'm') data['system'][count] = np.double(parsedline[1]) data['pumping'][count] = np.double(parsedline[2]) data['rgt'][count] = np.double(parsedline[3]) data['agt'][count] = np.double(parsedline[4]) data['sps'][count] = np.double(parsedline[5]) data['eskom_import'][count] = np.double(parsedline[6]) data['temperature'][count] = np.double(parsedline[7]) data['wind'][count] = np.str(parsedline[8]) data['pressure'][count] = np.double(parsedline[9]) data['weather'][count] = np.str(parsedline[10]) count += 1 finally: fid.close() >>> data['date'] array(['2007-01-01T00:30-0500', '2007-01-01T01:00-0500', '2007-01-01T00:30-0500', '2007-01-01T01:00-0500', '2007-01-01T00:30-0500', '2007-01-01T01:00-0500', '2007-01-01T00:30-0500', '2007-01-01T01:00-0500'], dtype='datetime64[m]') 

You could definitely improve upon this code by making use of your "convertdict" and iterating over the parsedline but the idea is the same.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.