I would like to try to deduce the type of data in a string.
Scenario:
I have a CSV file which contains rows of data, and I would like to store this data in a database.
I do not want to store all the fields as strings.
Since the fields in the CSV might change, I cannot assume anything about their types.
Example (CSV file):
[Row 1 - column names] --> "name", "age" , "children" [Row 2 - data row ] --> "John", "45.5", "3" ... [Row n - data row ] --> ... In this case, by looking at the data in the rows, I would like to deduce that name is a column of strings, age is a column of floats and children is a column of integers.
My attempt:
The simplest approach would be to try conversions, and decide upon the type when a certain conversion succeeds.
I wrote a method for this purpose which looks like this:
def deduceType(str): try: #first try to convert to int: int(str) return 0 #integer except ValueError: try: #not integer, try float: float(str) return 1 #float except ValueError: #not float, so deduct string return 2 #string My question:
The problem is that if I want to be able to deduce more data types (booleans, longs, unsigned numeric types, etc...), then this approach becomes cumbersome and inaccurate.
Is there a neater, more efficient and rigorous way to do this?
Answer (edit):
Based on Martijn Pieters answer, I'm doing this:
def deduceType(str): try: return type(ast.literal_eval(str)) except ValueError: return type('') #string