4

I would like to try to deduce the type of data in a string.

Scenario:
I have a CSV file which contains rows of data, and I would like to store this data in a database.
I do not want to store all the fields as strings.
Since the fields in the CSV might change, I cannot assume anything about their types.

Example (CSV file):

[Row 1 - column names] --> "name", "age" , "children" [Row 2 - data row ] --> "John", "45.5", "3" ... [Row n - data row ] --> ... 

In this case, by looking at the data in the rows, I would like to deduce that name is a column of strings, age is a column of floats and children is a column of integers.

My attempt:
The simplest approach would be to try conversions, and decide upon the type when a certain conversion succeeds.
I wrote a method for this purpose which looks like this:

def deduceType(str): try: #first try to convert to int: int(str) return 0 #integer except ValueError: try: #not integer, try float: float(str) return 1 #float except ValueError: #not float, so deduct string return 2 #string 

My question:
The problem is that if I want to be able to deduce more data types (booleans, longs, unsigned numeric types, etc...), then this approach becomes cumbersome and inaccurate.

Is there a neater, more efficient and rigorous way to do this?

Answer (edit):
Based on Martijn Pieters answer, I'm doing this:

def deduceType(str): try: return type(ast.literal_eval(str)) except ValueError: return type('') #string 
1
  • Why don't you know the type of data that you have in the CSV file? If you know what fields the file will contain, you can have a mapping between field names and data types, eg.: field_type = {'name': str, 'age': int} Commented Nov 27, 2012 at 10:29

1 Answer 1

9

Use ast.literal_eval() on the value; it'll interpret it as a python literal. If that fails, you have a string instead.

>>> import ast >>> ast.literal_eval("45.5") 45.5 >>> ast.literal_eval("3") 3 >>> ast.literal_eval("John") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ast.py", line 68, in literal_eval return _convert(node_or_string) File "/opt/local/Library/Frameworks/Python.framework/Versions/2.6/lib/python2.6/ast.py", line 67, in _convert raise ValueError('malformed string') ValueError: malformed string 
Sign up to request clarification or add additional context in comments.

4 Comments

@Satyajeet: no, datetime objects are not considered literals. literal_eval() only supports Python literals.
Hmm...make sense...just read the docs... So Do you think if there's any "magical" method which can deduce all type of datatypes given a string...! I know i can create that method myself(specific to given requirement of course), Just asking out of curiosity.
@Satyajeet: if you need arbitrary object handling, use the pickle module to serialise and deserialise. There is no such magic method for 'source-like' text, sorry.
@Satyajeet: note that ast.literal_eval() is implemented as a walk over an AST parse tree, you can always extend this idea to support specific object types.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.