2

I have a string template that looks like 'my_index-{year}'.
I do something like string_template.format(year=year) where year is some string. Result of this is some string that looks like my_index-2011.

Now. to my question. I have a string like my_index-2011 and my template 'my_index-{year}' What might be a slick way to extract the {year} portion?

[Note: I know of the existence of parse library]

0

5 Answers 5

2

There is this module called parse which provides an opposite to format() functionality:

Parse strings using a specification based on the Python format() syntax.

>>> from parse import parse >>> s = "my_index-2011" >>> f = "my_index-{year}" >>> parse(f, s)['year'] '2011' 

And, an alternative option and, since you are extracting a year, would be to use the dateutil parser in a fuzzy mode:

>>> from dateutil.parser import parse >>> parse("my_index-2011", fuzzy=True).year 2011 
Sign up to request clarification or add additional context in comments.

1 Comment

I didn't know about the dateutil parser. Nice note. I was a little iffy about installing a new package for just this one functionality, and was wondering if there was a nifty re.match(..) way of doing this. But yea. Seems like I'll just stick to using the parse module. +1
2

Use the split() string function to split the string into two parts around the dash, then grab just the second part.

mystring = "my_index-2011" year = mystring.split("-")[1] 

4 Comments

Good answer. I thought of that. Something like this would break as soon as the year became a time or the formatting changed or something of that sort. I wanted to focus on using just the template to get the year part out. If you check the link, the parse module does a good job at this. I just dont want to install a new library for just this, and was wondering if there was a slick way of achieving the same result. Regardless, a +1 for you. :)
@DebosmitRay Something like this would break as soon as the year became a time Why? As long as the basic pattern remains intact (i.e. the target portion of the string comes after a dash), split() will continue to work just fine. (If the target portion itself contains dashes, you can tell split() to only split on the first dash.)
I apologize for the ambiguity. Say, I decide to replace this with 'my-index'. The template changed, but our function, having not used it, broke. Again, this is for a healthy debate to see if there is a good answer. I am pretty sure this cannot have a "correct" answer haha
@DebosmitRay If the string template can change, I'm not sure any solution would be foolproof...
2

I assume "year" is 4 digits and you have multiple indexes

import re res = '' patterns = [ '%s-[0-9]{4}'%index for index in idx ] for index,pattern in zip(idx,patterns): res +=' '.join( re.findall(pattern ,data) ).replace(index+'-','') + ' ' 

---update---

dummyString = 'adsf-1234 fsfdr lkjdfaif ln ewr-1234 adsferggs sfdgrsfgadsf-3456' dummyIdx = ['ewr','adsf'] 

output

1234 1234 3456 

3 Comments

Why do we need the for loop in this approach?
Like. I think this is a decent idea. But, its overkill to be honest. That loop with a += on a string is bad for performance, as is using re.match in a loop. +1 for the answer for the effort, though.
@DebosmitRay agree. if you have pattern for your indexes, it can also be regax. for example, if the format always like 5 letters plus "-" plus 4 digits index-1234 = > pattern = r'\d{5}-\d{4}', therefore no need loop all the indexes. my first assumption is all the indexes are unique
2

Yes, a regex would be helpful here.

In [1]: import re In [2]: s = 'my_string-2014' In [3]: print( re.search('\d{4}', s).group(0) ) 2014 

Edit: I should have mentioned your regex can be more sophisticated. You can haul out a subcomponent of a more specific string, for example:

In [4]: print( re.search('my_string-(\d{4})$', s).group(1) ) 2014 

Given the problem you presented, I think any "find the year" formula should be expressible in terms of a regular expression.

1 Comment

I like the idea behind this one. Its kind of like using the length of the 'needle' to get the result. Always enjoyed going over this. +1
1

You are going to want to use the string method split to split on "-", and then catch the last element as your year:

year = "any_index-2016".split("-")[-1] 

Because you caught the last element (using -1 as the index), your index can have hyphens in them, and you will still extract the year appropriately.

2 Comments

I have already addressed this in John Gordon's answer. There are multiple ways of breaking this.
For example. Year changed to time of the format hh-mm

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.