@@ -960,6 +960,9 @@ importantly, these methods exclude missing/NA values automatically. These are
960960accessed via the Series's ``str `` attribute and generally have names matching
961961the equivalent (scalar) build-in string methods:
962962
963+ Splitting and Replacing Strings
964+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
965+
963966.. ipython :: python
964967
965968 s = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
@@ -990,11 +993,12 @@ Methods like ``replace`` and ``findall`` take regular expressions, too:
990993 s3
991994 s3.str.replace(' ^.a|dog' , ' XX-XX ' , case = False )
992995
993- The method ``match `` returns the groups in a regular expression in one tuple.
994- Starting in pandas version 0.13.0, the method ``extract `` is available to
995- accomplish this more conveniently.
996+ Extracting Substrings
997+ ~~~~~~~~~~~~~~~~~~~~~
996998
997- Extracting a regular expression with one group returns a Series of strings.
999+ The method ``extract `` (introduced in version 0.13) accepts regular expressions
1000+ with match groups. Extracting a regular expression with one group returns
1001+ a Series of strings.
9981002
9991003.. ipython :: python
10001004
@@ -1016,18 +1020,34 @@ Named groups like
10161020
10171021.. ipython :: python
10181022
1019- Series([' a1' , ' b2' , ' c3' ]).str.match (' (?P<letter>[ab])(?P<digit>\d)' )
1023+ Series([' a1' , ' b2' , ' c3' ]).str.extract (' (?P<letter>[ab])(?P<digit>\d)' )
10201024
10211025 and optional groups like
10221026
10231027.. ipython :: python
10241028
1025- Series([' a1' , ' b2' , ' 3' ]).str.match (' (?P<letter>[ab])?(?P<digit>\d)' )
1029+ Series([' a1' , ' b2' , ' 3' ]).str.extract (' (?P<letter>[ab])?(?P<digit>\d)' )
10261030
10271031 can also be used.
10281032
1029- Methods like ``contains ``, ``startswith ``, and ``endswith `` takes an extra
1030- ``na `` arguement so missing values can be considered True or False:
1033+ Testing for Strings that Match or Contain a Pattern
1034+ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1035+
1036+ In previous versions, *extracting * match groups was accomplished by ``match ``,
1037+ which returned a not-so-convenient Series of tuples. Starting in version 0.14,
1038+ the default behavior of match will change. It will return a boolean
1039+ indexer, analagous to the method ``contains ``.
1040+
1041+ The distinction between
1042+ ``match `` and ``contains `` is strictness: ``match `` relies on
1043+ strict ``re.match `` while ``contains `` relies on ``re.search ``.
1044+
1045+ In version 0.13, ``match `` performs its old, deprecated behavior by default,
1046+ but the new behavior is availabe through the keyword argument
1047+ ``as_indexer=True ``.
1048+
1049+ Methods like ``match ``, ``contains ``, ``startswith ``, and ``endswith `` take
1050+ an extra ``na `` arguement so missing values can be considered True or False:
10311051
10321052.. ipython :: python
10331053
0 commit comments