Pandas Arrays¶

For most data types, pandas uses NumPy arrays as the concrete objects contained with a Index, Series, or DataFrame.

For some data types, pandas extends NumPy’s type system.

Kind of Data	Pandas Data Type	Scalar	Array
TZ-aware datetime	`DatetimeTZDtype`	`Timestamp`	Datetime Data
Timedeltas	(none)	`Timedelta`	Timedelta Data
Period (time spans)	`PeriodDtype`	`Period`	Timespan Data
Intervals	`IntervalDtype`	`Interval`	Interval Data
Nullable Integer	`Int64Dtype`, …	(none)	Nullable Integer
Categorical	`CategoricalDtype`	(none)	Categorical Data
Sparse	`SparseDtype`	(none)	Sparse Data

Pandas and third-party libraries can extend NumPy’s type system (see Extension Types). The top-level array() method can be used to create a new array, which may be stored in a Series, Index, or as a column in a DataFrame.

array(data[, dtype, copy]) Create an array.

Datetime Data¶

NumPy cannot natively represent timezone-aware datetimes. Pandas supports this with the arrays.DatetimeArray extension array, which can hold timezone-naive or timezone-aware values.

Timestamp, a subclass of datetime.datetime, is pandas’ scalar type for timezone-naive or timezone-aware datetime data.

Timestamp Pandas replacement for datetime.datetime

Properties¶

`Timestamp.asm8`
`Timestamp.day`
`Timestamp.dayofweek`
`Timestamp.dayofyear`
`Timestamp.days_in_month`
`Timestamp.daysinmonth`
`Timestamp.fold`
`Timestamp.hour`
`Timestamp.is_leap_year`
`Timestamp.is_month_end`
`Timestamp.is_month_start`
`Timestamp.is_quarter_end`
`Timestamp.is_quarter_start`
`Timestamp.is_year_end`
`Timestamp.is_year_start`
`Timestamp.max`
`Timestamp.microsecond`
`Timestamp.min`
`Timestamp.minute`
`Timestamp.month`
`Timestamp.nanosecond`
`Timestamp.quarter`
`Timestamp.resolution`	Return resolution describing the smallest difference between two times that can be represented by Timestamp object_state
`Timestamp.second`
`Timestamp.tz`	Alias for tzinfo
`Timestamp.tzinfo`
`Timestamp.value`
`Timestamp.week`
`Timestamp.weekofyear`
`Timestamp.year`

Methods¶

`Timestamp.astimezone`	Convert tz-aware Timestamp to another time zone.
`Timestamp.ceil`	return a new Timestamp ceiled to this resolution
`Timestamp.combine`(date, time)	date, time -> datetime with same date and time fields
`Timestamp.ctime`	Return ctime() style string.
`Timestamp.date`	Return date object with same year, month and day.
`Timestamp.day_name`	Return the day name of the Timestamp with specified locale.
`Timestamp.dst`	Return self.tzinfo.dst(self).
`Timestamp.floor`	return a new Timestamp floored to this resolution
`Timestamp.freq`
`Timestamp.freqstr`
`Timestamp.fromordinal`(ordinal[, freq, tz])	passed an ordinal, translate and convert to a ts note: by definition there cannot be any tz info on the ordinal itself
`Timestamp.fromtimestamp`(ts)	timestamp[, tz] -> tz’s local time from POSIX timestamp.
`Timestamp.isocalendar`	Return a 3-tuple containing ISO year, week number, and weekday.
`Timestamp.isoformat`
`Timestamp.isoweekday`	Return the day of the week represented by the date.
`Timestamp.month_name`	Return the month name of the Timestamp with specified locale.
`Timestamp.normalize`	Normalize Timestamp to midnight, preserving tz information.
`Timestamp.now`([tz])	Returns new Timestamp object representing current time local to tz.
`Timestamp.replace`	implements datetime.replace, handles nanoseconds
`Timestamp.round`	Round the Timestamp to the specified resolution
`Timestamp.strftime`	format -> strftime() style string.
`Timestamp.strptime`	string, format -> new datetime parsed from a string (like time.strptime()).
`Timestamp.time`	Return time object with same time but with tzinfo=None.
`Timestamp.timestamp`	Return POSIX timestamp as float.
`Timestamp.timetuple`	Return time tuple, compatible with time.localtime().
`Timestamp.timetz`	Return time object with same time and tzinfo.
`Timestamp.to_datetime64`	Returns a numpy.datetime64 object with ‘ns’ precision
`Timestamp.to_julian_date`	Convert TimeStamp to a Julian Date.
`Timestamp.to_period`	Return an period of which this timestamp is an observation.
`Timestamp.to_pydatetime`	Convert a Timestamp object to a native Python datetime object.
`Timestamp.today`(cls[, tz])	Return the current time in the local timezone.
`Timestamp.toordinal`	Return proleptic Gregorian ordinal.
`Timestamp.tz_convert`	Convert tz-aware Timestamp to another time zone.
`Timestamp.tz_localize`	Convert naive Timestamp to local time zone, or remove timezone from tz-aware Timestamp.
`Timestamp.tzname`	Return self.tzinfo.tzname(self).
`Timestamp.utcfromtimestamp`(ts)	Construct a naive UTC datetime from a POSIX timestamp.
`Timestamp.utcnow`()	Return a new Timestamp representing UTC day and time.
`Timestamp.utcoffset`	Return self.tzinfo.utcoffset(self).
`Timestamp.utctimetuple`	Return UTC time tuple, compatible with time.localtime().
`Timestamp.weekday`	Return the day of the week represented by the date.

A collection of timestamps may be stored in a arrays.DatetimeArray. For timezone-aware data, the .dtype of a DatetimeArray is a DatetimeTZDtype. For timezone-naive data, np.dtype("datetime64[ns]") is used.

If the data are tz-aware, then every value in the array must have the same timezone.

`arrays.DatetimeArray`(values[, dtype, freq, copy])	Pandas ExtensionArray for tz-naive or tz-aware datetime data.
`DatetimeTZDtype`([unit, tz])	A np.dtype duck-typed class, suitable for holding a custom datetime with tz dtype.

Timedelta Data¶

NumPy can natively represent timedeltas. Pandas provides Timedelta for symmetry with Timestamp.

Timedelta Represents a duration, the difference between two dates or times.

Properties¶

`Timedelta.asm8`	Return a numpy timedelta64 array scalar view.
`Timedelta.components`	Return a Components NamedTuple-like
`Timedelta.days`	Number of days.
`Timedelta.delta`	Return the timedelta in nanoseconds (ns), for internal compatibility.
`Timedelta.freq`
`Timedelta.is_populated`
`Timedelta.max`
`Timedelta.microseconds`	Number of microseconds (>= 0 and less than 1 second).
`Timedelta.min`
`Timedelta.nanoseconds`	Return the number of nanoseconds (n), where 0 <= n < 1 microsecond.
`Timedelta.resolution`	Return a string representing the lowest timedelta resolution.
`Timedelta.seconds`	Number of seconds (>= 0 and less than 1 day).
`Timedelta.value`
`Timedelta.view`	array view compat

Methods¶

`Timedelta.ceil`	return a new Timedelta ceiled to this resolution
`Timedelta.floor`	return a new Timedelta floored to this resolution
`Timedelta.isoformat`	Format Timedelta as ISO 8601 Duration like `P[n]Y[n]M[n]DT[n]H[n]M[n]S`, where the `[n]` s are replaced by the values.
`Timedelta.round`	Round the Timedelta to the specified resolution
`Timedelta.to_pytimedelta`	return an actual datetime.timedelta object note: we lose nanosecond resolution if any
`Timedelta.to_timedelta64`	Returns a numpy.timedelta64 object with ‘ns’ precision
`Timedelta.total_seconds`	Total duration of timedelta in seconds (to ns precision)

A collection of timedeltas may be stored in a TimedeltaArray.

arrays.TimedeltaArray(values[, dtype, freq, …])

Attributes

Timespan Data¶

Pandas represents spans of times as Period objects.

Period¶

Period Represents a period of time

Properties¶

`Period.day`	Get day of the month that a Period falls on.
`Period.dayofweek`	Day of the week the period lies in, with Monday=0 and Sunday=6.
`Period.dayofyear`	Return the day of the year.
`Period.days_in_month`	Get the total number of days in the month that this period falls on.
`Period.daysinmonth`	Get the total number of days of the month that the Period falls in.
`Period.end_time`
`Period.freq`
`Period.freqstr`
`Period.hour`	Get the hour of the day component of the Period.
`Period.is_leap_year`
`Period.minute`	Get minute of the hour component of the Period.
`Period.month`
`Period.ordinal`
`Period.quarter`
`Period.qyear`	Fiscal year the Period lies in according to its starting-quarter.
`Period.second`	Get the second component of the Period.
`Period.start_time`	Get the Timestamp for the start of the period.
`Period.week`	Get the week of the year on the given Period.
`Period.weekday`	Day of the week the period lies in, with Monday=0 and Sunday=6.
`Period.weekofyear`
`Period.year`

Methods¶

`Period.asfreq`	Convert Period to desired frequency, either at the start or end of the interval
`Period.now`
`Period.strftime`	Returns the string representation of the `Period`, depending on the selected `fmt`.
`Period.to_timestamp`	Return the Timestamp representation of the Period at the target frequency at the specified end (how) of the Period

A collection of timedeltas may be stored in a arrays.PeriodArray. Every period in a PeriodArray must have the same freq.

`arrays.DatetimeArray`(values[, dtype, freq, copy])	Pandas ExtensionArray for tz-naive or tz-aware datetime data.
`PeriodDtype`	A Period duck-typed class, suitable for holding a period with freq dtype.

Interval Data¶

Arbitrary intervals can be represented as Interval objects.

Interval Immutable object implementing an Interval, a bounded slice-like interval.

Properties¶

`Interval.closed`	Whether the interval is closed on the left-side, right-side, both or neither
`Interval.closed_left`	Check if the interval is closed on the left side.
`Interval.closed_right`	Check if the interval is closed on the right side.
`Interval.left`	Left bound for the interval
`Interval.length`	Return the length of the Interval
`Interval.mid`	Return the midpoint of the Interval
`Interval.open_left`	Check if the interval is open on the left side.
`Interval.open_right`	Check if the interval is open on the right side.
`Interval.overlaps`	Check whether two Interval objects overlap.
`Interval.right`	Right bound for the interval

A collection of intervals may be stored in an IntervalArray.

`IntervalArray`	Pandas array for interval data that are closed on the same side.
`IntervalDtype`	A Interval duck-typed class, suitable for holding an interval

Nullable Integer¶

numpy.ndarray cannot natively represent integer-data with missing values. Pandas provides this through arrays.IntegerArray.

`arrays.IntegerArray`(values, mask[, copy])	Array of integer (optional missing) values.
`Int8Dtype`	Attributes
`Int16Dtype`	Attributes
`Int32Dtype`	Attributes
`Int64Dtype`	Attributes
`UInt8Dtype`	Attributes
`UInt16Dtype`	Attributes
`UInt32Dtype`	Attributes
`UInt64Dtype`	Attributes

Categorical Data¶

Pandas defines a custom data type for representing data that can take only a limited, fixed set of values. The dtype of a Categorical can be described by a pandas.api.types.CategoricalDtype.

CategoricalDtype([categories, ordered]) Type for categorical data with the categories and orderedness

`CategoricalDtype.categories`	An `Index` containing the unique categories allowed.
`CategoricalDtype.ordered`	Whether the categories have an ordered relationship.

Categorical data can be stored in a pandas.Categorical

Categorical(values[, categories, ordered, …]) Represents a categorical variable in classic R / S-plus fashion

The alternative Categorical.from_codes() constructor can be used when you have the categories and integer codes already:

Categorical.from_codes(codes[, categories, …]) Make a Categorical type from codes and categories or dtype.

The dtype information is available on the Categorical

`Categorical.dtype`	The `CategoricalDtype` for this instance
`Categorical.categories`	The categories of this categorical.
`Categorical.ordered`	Whether the categories have an ordered relationship.
`Categorical.codes`	The category codes of this categorical.

np.asarray(categorical) works by implementing the array interface. Be aware, that this converts the Categorical back to a NumPy array, so categories and order information is not preserved!

Categorical.__array__([dtype]) The numpy array interface.

A Categorical can be stored in a Series or DataFrame. To create a Series of dtype category, use cat = s.astype(dtype) or Series(..., dtype=dtype) where dtype is either

the string 'category'
an instance of CategoricalDtype.

If the Series is of dtype CategoricalDtype, Series.cat can be used to change the categorical data. See Categorical Accessor for more.

Sparse Data¶

Data where a single value is repeated many times (e.g. 0 or NaN) may be stored efficiently as a SparseArray.

`SparseArray`(data[, sparse_index, index, …])	An ExtensionArray for storing sparse data.
`SparseDtype`([dtype, fill_value])	Dtype for data stored in `SparseArray`.

The Series.sparse accessor may be used to access sparse-specific attributes and methods if the Series contains sparse values. See Sparse Accessor for more.

Table Of Contents

Search

Pandas Arrays¶

Datetime Data¶

Properties¶

Methods¶

Timedelta Data¶

Properties¶

Methods¶

Timespan Data¶

Period¶

Properties¶

Methods¶

Interval Data¶

Properties¶

Nullable Integer¶

Categorical Data¶

Sparse Data¶