Multiple OLS estimation TypeError

Question

I am trying to do some Newey-West OLS with statsmodels on my data to estimate my parameters, and the following is my code for doing so:

from __future__ import print_function, division import xlrd as xl import numpy as np import scipy as sp import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm file_loc = "/Python/dataset_3.xlsx" workbook = xl.open_workbook(file_loc) sheet = workbook.sheet_by_index(0) tot = sheet.nrows data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)] rv1 = [] rv5 = [] rv22 = [] rv1fcast = [] T = [] price = [] time = [] retnor = [] for i in range(1, tot): t = data[i][0] ret = data[i][1] ret5 = data[i][2] ret22 = data[i][3] ret1_1 = data[i][4] retn = data[i][5] t = xl.xldate_as_tuple(t, 0) rv1.append(ret) rv5.append(ret5) rv22.append(ret22) rv1fcast.append(ret1_1) retnor.append(retn) T.append(t) df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,}) df = df[df.RV1.notnull()] model = smf.OLS(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df)

Everything looks just fine when I look at the arrays or my dataframe, but it returns just: TypeError: init() takes at least 2 arguments (1 given)

I have tried a bunch of different methods and I cannot see what I am missing.

When i run it the following errormessage shows:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Python/harrv.py in <module>() 41 df = df[df.RV1.notnull()] 42 ---> 43 model = smf.OLS(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df) 44 45 #mdl = model.get_robustcov_results(cov_type='HAC',maxlags=1) TypeError: __init__() takes at least 2 arguments (1 given)

printing rv1 gives you:

Out[318]: [0.015538008996147568, 0.008881670570720125, 0.010421778063375802, ..... 0.003151044550868834, 0.0029676428110974166, 0.005236329928710288, 0.004838460533164701, '']

And the other rv gives similair floating numbers. The df just assembles them in the manner that pd.dataframe does, which according to the documentation is supported (http://statsmodels.sourceforge.net/devel/example_formulas.html).

It would help if you would write the actual error message including the stack trace. From your current description it is not clear what call crashes. — Dov Grobgeld
– Dov Grobgeld, Commented Apr 20, 2015 at 10:22
There it is. But I cannot see what more kinds of arguments could be needed, since I took this method from another example, which worked for that person. I was thinking, could it be because of the existence of single-quotes in the lists? — Niklas Lindeke
– Niklas Lindeke, Commented Apr 20, 2015 at 11:11
Using df. for the formula argument is clearly wrong. (Compare with statsmodels.sourceforge.net/devel/example_formulas.html). But I admit that I don't see a correlation between the error message and your arguments. It would help if you could make the example self contained. E.g. add import statements and example data. — Dov Grobgeld
– Dov Grobgeld, Commented Apr 20, 2015 at 13:06
Fixed. I hope the rv1 example will suffice because the other are assembled in the same manner and are just floats aswell. — Niklas Lindeke
– Niklas Lindeke, Commented Apr 20, 2015 at 13:29
After changing the df. part of the formula the same error message still shows. — Niklas Lindeke
– Niklas Lindeke, Commented Apr 20, 2015 at 13:37

Josef · Accepted Answer · 2015-04-21 21:16:36Z

2

The problem is that the formula function in statsmodels.formula.api is lower case. Upper case OLS is the same as in the main statsmodels.api. The uppercase models will be deleted in future from the formula.api namespace to avoid exactly this confusion.

That means, you need to use lower case ols, as in

model = smf.ols(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df)

Note, the lower case formula functions are just aliases to the from_formula methods of the models.

smf.ols is a shortcut for sm.OLS.from_formula

answered Apr 21, 2015 at 21:16

Josef

23.1k3 gold badges60 silver badges73 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Niklas Lindeke Over a year ago

That worked! But when I try to print a summary of the fitted model, it returns an error about the axis needing to be specified when shapes of a and weights differ. Do you have any idea about that?

Josef Over a year ago

I have no guess for that without the traceback or a full example. You could ask a new question or ask at the pystatsmodels google group. One thing: drop the df. inside the formula string. We (statsmodels developers) never considered this pattern, and I'm not sure it works correctly. It is likely that it messes up patsys creation of the design matrix, or post-estimation handling.

Niklas Lindeke Over a year ago

I posted a second question with more detail here: stackoverflow.com/questions/29799161/…

Collectives™ on Stack Overflow

Multiple OLS estimation TypeError

1 Answer 1

3 Comments

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Linked

Related