This cannot be done in a reliable manner and that is not due to limitationlimitations in Python or any other programming language for that matter. A human being could not do this in a predictable manner without guessing and following a few rules (usually called Heuristics when used in this context).
- All the values are valid strings we know that because that is the basis of our problem so there is notno point in checking thatfor this at all. We should check everything else we can what everwhatever falls through we can just leave as a string.
- Dates are the most obvious thing to check first if they are formatted in predictable manner such as
[YYYY]-[MM]-[DD].(ISO ISO 8601 date format) they are easy to distinguish from other bits of text thethat contain numbers. If the dates are in a format with just numbers likeYYYYMMDDthen we are stuck as these datedates will be indistinguishable from ordinary numbers. - We will do integers next because all integers are valid floats but not all floats are valid integers. We could just check if the text contains on digits (or digits and the letters A-F if hexadecimal numbers are possible) in this case treat the value as an integer.
- Floats would be next as they are numbers with some formatting (the decimal point). It is easy to recognise
3.14159265as a floating point number. However5.0which can be written simply as5is also a valid float but would fall through filterhave been caught in the previous steps and not be caughtrecognised as a float even if it was intended to be. - Finally we could just check if the text contains on digits (or digits and the letters A-F if hexadecimal numbers are possible) in this case treat the value an integer.
- Any values that are left unconverted can be treated as strings.
Due to the possible overlaps I have mentioned above such a scheme can never be 100% reliable. Also any new data type that you need to handlesupport (complex number perhaps) would need its own set of heuristics and would have to placed in the most appropriate place in the chain of checks. The more likely a check is to match only the data type desired the higher up the chain it should be.
Now lets make this real in Python, most of the heuristics I mentioned above are taken care of for us by Python we just need to decide on the order in which to apply them:
from datetime import datetime heuristics = (lambda value: datetime.strptime(value, "%Y-%m-%d"), int, float) def convert(value): for type in heuristics: try: return type(value) except ValueError: continue # All other heuristics failed it is a string return value values = ['3.14159265', '2010-01-20', '16', 'some words'] for value in values: converted_value = convert(value) print converted_value, type(converted_value) This outputs the following:
3.14159265 <type 'float'> 2010-01-20 00:00:00 <type 'datetime.datetime'> 16 <type 'int'> some words <type 'str'>