pandas-dev · jankatins · Jul 16, 2014 · Jul 16, 2014 · Jul 16, 2014 · Jul 23, 2014
diff --git a/doc/source/10min.rst b/doc/source/10min.rst
@@ -66,7 +66,8 @@ Creating a ``DataFrame`` by passing a dict of objects that can be converted to s
  'B' : pd.Timestamp('20130102'),
  'C' : pd.Series(1,index=list(range(4)),dtype='float32'),
  'D' : np.array([3] * 4,dtype='int32'),
- 'E' : 'foo' })
+ 'E' : pd.Categorical(["test","train","test","train"]),
+ 'F' : 'foo' })
  df2
 
 Having specific :ref:`dtypes <basics.dtypes>`
@@ -635,6 +636,32 @@ the quarter end:
  ts.index = (prng.asfreq('M', 'e') + 1).asfreq('H', 's') + 9
  ts.head()
 
+Categoricals
+------------
+
+Since version 0.15, pandas can include categorical data in a `DataFrame`. For full docs, see the
+:ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>` .
+
+.. ipython:: python
+
+ df = pd.DataFrame({"id":[1,2,3,4,5,6], "raw_grade":['a', 'b', 'b', 'a', 'a', 'e']})
+
+ # convert the raw grades to a categorical
+ df["grade"] = pd.Categorical(df["raw_grade"])
+
+ # Alternative: df["grade"] = df["raw_grade"].astype("category")
+ df["grade"]
+
+ # Rename the levels
+ df["grade"].cat.levels = ["very good", "good", "very bad"]
+
+ # Reorder the levels and simultaneously add the missing levels
+ df["grade"].cat.reorder_levels(["very bad", "bad", "medium", "good", "very good"])
+ df["grade"]
+ df.sort("grade")
+ df.groupby("grade").size()
+
+
 
 Plotting
 --------

diff --git a/doc/source/api.rst b/doc/source/api.rst
@@ -528,11 +528,17 @@ and has the following usable methods and properties (all available as
  :toctree: generated/
 
  Categorical
- Categorical.from_codes
  Categorical.levels
  Categorical.ordered
  Categorical.reorder_levels
  Categorical.remove_unused_levels
+
+The following methods are considered API when using ``Categorical`` directly:
+
+.. autosummary::
+ :toctree: generated/
+
+ Categorical.from_codes
  Categorical.min
  Categorical.max
  Categorical.mode
@@ -547,7 +553,7 @@ the Categorical back to a numpy array, so levels and order information is not pr
  Categorical.__array__
 
 To create compatibility with `pandas.Series` and `numpy` arrays, the following (non-API) methods
-are also introduced.
+are also introduced and available when ``Categorical`` is used directly.
 
 .. autosummary::
  :toctree: generated/
@@ -564,7 +570,6 @@ are also introduced.
  Categorical.argsort
  Categorical.fillna
 
-
 Plotting
 ~~~~~~~~
 .. currentmodule:: pandas

diff --git a/doc/source/categorical.rst b/doc/source/categorical.rst
@@ -90,6 +90,7 @@ By using some special functions:
  df['group'] = pd.cut(df.value, range(0, 105, 10), right=False, labels=labels)
  df.head(10)
 
+See :ref:`documentation <reshaping.tile.cut>` for :func:`~pandas.cut`.
 
 `Categoricals` have a specific ``category`` :ref:`dtype <basics.dtypes>`:
 
@@ -331,6 +332,45 @@ Operations
 
 The following operations are possible with categorical data:
 
+Comparing `Categoricals` with other objects is possible in two cases:
+ * comparing a `Categorical` to another `Categorical`, when `level` and `ordered` is the same or
+ * comparing a `Categorical` to a scalar.
+All other comparisons will raise a TypeError.
+
+.. ipython:: python
+
+ cat = pd.Series(pd.Categorical([1,2,3], levels=[3,2,1]))
+ cat_base = pd.Series(pd.Categorical([2,2,2], levels=[3,2,1]))
+ cat_base2 = pd.Series(pd.Categorical([2,2,2]))
+
+ cat > cat_base
+
+ # This doesn't work because the levels are not the same
+ try:
+ cat > cat_base2
+ except TypeError as e:
+ print("TypeError: " + str(e))
+
+ cat > 2
+
+.. note::
+
+ Comparisons with `Series`, `np.array` or a `Categorical` with different levels or ordering
+ will raise an `TypeError` because custom level ordering would result in two valid results:
+ one with taking in account the ordering and one without. If you want to compare a `Categorical`
+ with such a type, you need to be explicit and convert the `Categorical` to values:
+
+.. ipython:: python
+
+ base = np.array([1,2,3])
+
+ try:
+ cat > base
+ except TypeError as e:
+ print("TypeError: " + str(e))
+
+ np.asarray(cat) > base
+
 Getting the minimum and maximum, if the categorical is ordered:
 
 .. ipython:: python
@@ -509,7 +549,8 @@ The same applies to ``df.append(df)``.
 Getting Data In/Out
 -------------------
 
-Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently raise ``NotImplementedError``.
+Writing data (`Series`, `Frames`) to a HDF store that contains a ``category`` dtype will currently
+raise ``NotImplementedError``.
 
 Writing to a CSV file will convert the data, effectively removing any information about the
 `Categorical` (levels and ordering). So if you read back the CSV file you have to convert the
@@ -579,7 +620,7 @@ object and not as a low level `numpy` array dtype. This leads to some problems.
  try:
  np.dtype("category")
  except TypeError as e:
-  print("TypeError: " + str(e))
+ print("TypeError: " + str(e))
 
  dtype = pd.Categorical(["a"]).dtype
  try:

diff --git a/doc/source/reshaping.rst b/doc/source/reshaping.rst
@@ -503,3 +503,10 @@ handling of NaN:
 
  pd.factorize(x, sort=True)
  np.unique(x, return_inverse=True)[::-1]
+
+.. note::
+ If you just want to handle one column as a categorical variable (like R's factor),
+ you can use ``df["cat_col"] = pd.Categorical(df["col"])`` or
+ ``df["cat_col"] = df["col"].astype("category")``. For full docs on :class:`~pandas.Categorical`,
+ see the :ref:`Categorical introduction <categorical>` and the
+ :ref:`API documentation <api.categorical>`. This feature was introduced in version 0.15.
diff --git a/doc/source/v0.15.0.txt b/doc/source/v0.15.0.txt
@@ -225,7 +225,8 @@ Categoricals in Series/DataFrame
 methods to manipulate. Thanks to Jan Schultz for much of this API/implementation. (:issue:`3943`, :issue:`5313`, :issue:`5314`,
 :issue:`7444`, :issue:`7839`, :issue:`7848`, :issue:`7864`, :issue:`7914`).
 
-For full docs, see the :ref:`Categorical introduction <categorical>` and the :ref:`API documentation <api.categorical>`.
+For full docs, see the :ref:`Categorical introduction <categorical>` and the
+:ref:`API documentation <api.categorical>`.
 
 .. ipython:: python