Pandas Arrays¶
For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.
For some data types, pandas extends NumPy’s type system.
| Kind of Data | Pandas Data Type | Scalar | Array |
|---|---|---|---|
| TZ-aware datetime | DatetimeTZDtype | Timestamp | Datetime Data |
| Timedeltas | (none) | Timedelta | Timedelta Data |
| Period (time spans) | PeriodDtype | Period | Timespan Data |
| Intervals | IntervalDtype | Interval | Interval Data |
| Nullable Integer | Int64Dtype, … | (none) | Nullable Integer |
| Categorical | CategoricalDtype | (none) | Categorical Data |
| Sparse | SparseDtype | (none) | Sparse Data |
Pandas and third-party libraries can extend NumPy’s type system (see Extension Types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.
array(data[, dtype, copy]) | Create an array. |
Datetime Data¶
NumPy cannot natively represent timezone-aware datetimes. Pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.
Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.
Timestamp | Pandas replacement for datetime.datetime |
Properties¶
Timestamp.asm8 | |
Timestamp.day | |
Timestamp.dayofweek | |
Timestamp.dayofyear | |
Timestamp.days_in_month | |
Timestamp.daysinmonth | |
Timestamp.fold | |
Timestamp.hour | |
Timestamp.is_leap_year | |
Timestamp.is_month_end | |
Timestamp.is_month_start | |
Timestamp.is_quarter_end | |
Timestamp.is_quarter_start | |
Timestamp.is_year_end | |
Timestamp.is_year_start | |
Timestamp.max | |
Timestamp.microsecond | |
Timestamp.min | |
Timestamp.minute | |
Timestamp.month | |
Timestamp.nanosecond | |
Timestamp.quarter | |
Timestamp.resolution | Return resolution describing the smallest difference between two times that can be represented by Timestamp object_state |
Timestamp.second | |
Timestamp.tz | Alias for tzinfo |
Timestamp.tzinfo | |
Timestamp.value | |
Timestamp.week | |
Timestamp.weekofyear | |
Timestamp.year |
Methods¶
Timestamp.astimezone | Convert tz-aware Timestamp to another time zone. |
Timestamp.ceil | return a new Timestamp ceiled to this resolution |
Timestamp.combine(date, time) | date, time -> datetime with same date and time fields |
Timestamp.ctime | Return ctime() style string. |
Timestamp.date | Return date object with same year, month and day. |
Timestamp.day_name | Return the day name of the Timestamp with specified locale. |
Timestamp.dst | Return self.tzinfo.dst(self). |
Timestamp.floor | return a new Timestamp floored to this resolution |
Timestamp.freq | |
Timestamp.freqstr | |
Timestamp.fromordinal(ordinal[, freq, tz]) | passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself |
Timestamp.fromtimestamp(ts) | timestamp[, tz] -> tz’s local time from POSIX timestamp. |
Timestamp.isocalendar | Return a 3-tuple containing ISO year, week number, and weekday. |
Timestamp.isoformat | |
Timestamp.isoweekday | Return the day of the week represented by the date. |
Timestamp.month_name | Return the month name of the Timestamp with specified locale. |
Timestamp.normalize | Normalize Timestamp to midnight, preserving tz information. |
Timestamp.now([tz]) | Returns new Timestamp object representing current time local to tz. |
Timestamp.replace | implements datetime.replace, handles nanoseconds |
Timestamp.round | Round the Timestamp to the specified resolution |
Timestamp.strftime | format -> strftime() style string. |
Timestamp.strptime | string, format -> new datetime parsed from a string (like time.strptime()). |
Timestamp.time | Return time object with same time but with tzinfo=None. |
Timestamp.timestamp | Return POSIX timestamp as float. |
Timestamp.timetuple | Return time tuple, compatible with time.localtime(). |
Timestamp.timetz | Return time object with same time and tzinfo. |
Timestamp.to_datetime64 | Returns a numpy.datetime64 object with ‘ns’ precision |
Timestamp.to_julian_date | Convert TimeStamp to a Julian Date. |
Timestamp.to_period | Return an period of which this timestamp is an observation. |
Timestamp.to_pydatetime | Convert a Timestamp object to a native Python datetime object. |
Timestamp.today(cls[, tz]) | Return the current time in the local timezone. |
Timestamp.toordinal | Return proleptic Gregorian ordinal. |
Timestamp.tz_convert | Convert tz-aware Timestamp to another time zone. |
Timestamp.tz_localize | Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp. |
Timestamp.tzname | Return self.tzinfo.tzname(self). |
Timestamp.utcfromtimestamp(ts) | Construct a naive UTC datetime from a POSIX timestamp. |
Timestamp.utcnow() | Return a new Timestamp representing UTC day and time. |
Timestamp.utcoffset | Return self.tzinfo.utcoffset(self). |
Timestamp.utctimetuple | Return UTC time tuple, compatible with time.localtime(). |
Timestamp.weekday | Return the day of the week represented by the date. |
A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.
If the data are tz-aware, then every value in the array must have the same timezone.
arrays.DatetimeArray(values[, dtype, freq, copy]) | Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
DatetimeTZDtype([unit, tz]) | A np.dtype duck-typed class, suitable for holding a custom datetime with tz dtype. |
Timedelta Data¶
NumPy can natively represent timedeltas. Pandas provides Timedelta for symmetry with Timestamp.
Timedelta | Represents a duration, the difference between two dates or times. |
Properties¶
Timedelta.asm8 | Return a numpy timedelta64 array scalar view. |
Timedelta.components | Return a Components NamedTuple-like |
Timedelta.days | Number of days. |
Timedelta.delta | Return the timedelta in nanoseconds (ns), for internal compatibility. |
Timedelta.freq | |
Timedelta.is_populated | |
Timedelta.max | |
Timedelta.microseconds | Number of microseconds (>= 0 and less than 1 second). |
Timedelta.min | |
Timedelta.nanoseconds | Return the number of nanoseconds (n), where 0 <= n < 1 microsecond. |
Timedelta.resolution | Return a string representing the lowest timedelta resolution. |
Timedelta.seconds | Number of seconds (>= 0 and less than 1 day). |
Timedelta.value | |
Timedelta.view | array view compat |
Methods¶
Timedelta.ceil | return a new Timedelta ceiled to this resolution |
Timedelta.floor | return a new Timedelta floored to this resolution |
Timedelta.isoformat | Format Timedelta as ISO 8601 Duration like P[n]Y[n]M[n]DT[n]H[n]M[n]S, where the [n] s are replaced by the values. |
Timedelta.round | Round the Timedelta to the specified resolution |
Timedelta.to_pytimedelta | return an actual datetime.timedelta object note: we lose nanosecond resolution if any |
Timedelta.to_timedelta64 | Returns a numpy.timedelta64 object with ‘ns’ precision |
Timedelta.total_seconds | Total duration of timedelta in seconds (to ns precision) |
A collection of timedeltas may be stored in a TimedeltaArray.
arrays.TimedeltaArray(values[, dtype, freq, …]) | Attributes |
Period¶
Period | Represents a period of time |
Properties¶
Period.day | Get day of the month that a Period falls on. |
Period.dayofweek | Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.dayofyear | Return the day of the year. |
Period.days_in_month | Get the total number of days in the month that this period falls on. |
Period.daysinmonth | Get the total number of days of the month that the Period falls in. |
Period.end_time | |
Period.freq | |
Period.freqstr | |
Period.hour | Get the hour of the day component of the Period. |
Period.is_leap_year | |
Period.minute | Get minute of the hour component of the Period. |
Period.month | |
Period.ordinal | |
Period.quarter | |
Period.qyear | Fiscal year the Period lies in according to its starting-quarter. |
Period.second | Get the second component of the Period. |
Period.start_time | Get the Timestamp for the start of the period. |
Period.week | Get the week of the year on the given Period. |
Period.weekday | Day of the week the period lies in, with Monday=0 and Sunday=6. |
Period.weekofyear | |
Period.year |
Methods¶
Period.asfreq | Convert Period to desired frequency, either at the start or end of the interval |
Period.now | |
Period.strftime | Returns the string representation of the Period, depending on the selected fmt. |
Period.to_timestamp | Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period |
A collection of timedeltas may be stored in a arrays.PeriodArray. Every period in a PeriodArray must have the same freq.
arrays.DatetimeArray(values[, dtype, freq, copy]) | Pandas ExtensionArray for tz-naive or tz-aware datetime data. |
PeriodDtype | A Period duck-typed class, suitable for holding a period with freq dtype. |
Interval Data¶
Arbitrary intervals can be represented as Interval objects.
Interval | Immutable object implementing an Interval, a bounded slice-like interval. |
Properties¶
Interval.closed | Whether the interval is closed on the left-side, right-side, both or neither |
Interval.closed_left | Check if the interval is closed on the left side. |
Interval.closed_right | Check if the interval is closed on the right side. |
Interval.left | Left bound for the interval |
Interval.length | Return the length of the Interval |
Interval.mid | Return the midpoint of the Interval |
Interval.open_left | Check if the interval is open on the left side. |
Interval.open_right | Check if the interval is open on the right side. |
Interval.overlaps | Check whether two Interval objects overlap. |
Interval.right | Right bound for the interval |
A collection of intervals may be stored in an IntervalArray.
IntervalArray | Pandas array for interval data that are closed on the same side. |
IntervalDtype | A Interval duck-typed class, suitable for holding an interval |
Nullable Integer¶
numpy.ndarray cannot natively represent integer-data with missing values. Pandas provides this through arrays.IntegerArray.
arrays.IntegerArray(values, mask[, copy]) | Array of integer (optional missing) values. |
Int8Dtype | Attributes |
Int16Dtype | Attributes |
Int32Dtype | Attributes |
Int64Dtype | Attributes |
UInt8Dtype | Attributes |
UInt16Dtype | Attributes |
UInt32Dtype | Attributes |
UInt64Dtype | Attributes |
Categorical Data¶
Pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a pandas.api.types.CategoricalDtype.
CategoricalDtype([categories, ordered]) | Type for categorical data with the categories and orderedness |
CategoricalDtype.categories | An Index containing the unique categories allowed. |
CategoricalDtype.ordered | Whether the categories have an ordered relationship. |
Categorical data can be stored in a pandas.Categorical
Categorical(values[, categories, ordered, …]) | Represents a categorical variable in classic R / S-plus fashion |
The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:
Categorical.from_codes(codes[, categories, …]) | Make a Categorical type from codes and categories or dtype. |
The dtype information is available on the Categorical
Categorical.dtype | The CategoricalDtype for this instance |
Categorical.categories | The categories of this categorical. |
Categorical.ordered | Whether the categories have an ordered relationship. |
Categorical.codes | The category codes of this categorical. |
np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!
Categorical.__array__([dtype]) | The numpy array interface. |
A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either
- the string
'category' - an instance of
CategoricalDtype.
If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical Accessor for more.
Sparse Data¶
Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a SparseArray.
SparseArray(data[, sparse_index, index, …]) | An ExtensionArray for storing sparse data. |
SparseDtype([dtype, fill_value]) | Dtype for data stored in SparseArray. |
The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse Accessor for more.