1

I am trying to do some Newey-West OLS with statsmodels on my data to estimate my parameters, and the following is my code for doing so:

from __future__ import print_function, division import xlrd as xl import numpy as np import scipy as sp import pandas as pd import statsmodels.formula.api as smf import statsmodels.api as sm file_loc = "/Python/dataset_3.xlsx" workbook = xl.open_workbook(file_loc) sheet = workbook.sheet_by_index(0) tot = sheet.nrows data = [[sheet.cell_value(r, c) for c in range(sheet.ncols)] for r in range(sheet.nrows)] rv1 = [] rv5 = [] rv22 = [] rv1fcast = [] T = [] price = [] time = [] retnor = [] for i in range(1, tot): t = data[i][0] ret = data[i][1] ret5 = data[i][2] ret22 = data[i][3] ret1_1 = data[i][4] retn = data[i][5] t = xl.xldate_as_tuple(t, 0) rv1.append(ret) rv5.append(ret5) rv22.append(ret22) rv1fcast.append(ret1_1) retnor.append(retn) T.append(t) df = pd.DataFrame({'RVFCAST':rv1fcast, 'RV1':rv1, 'RV5':rv5, 'RV22':rv22,}) df = df[df.RV1.notnull()] model = smf.OLS(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df) 

Everything looks just fine when I look at the arrays or my dataframe, but it returns just: TypeError: init() takes at least 2 arguments (1 given)

I have tried a bunch of different methods and I cannot see what I am missing.

When i run it the following errormessage shows:

--------------------------------------------------------------------------- TypeError Traceback (most recent call last) /Python/harrv.py in <module>() 41 df = df[df.RV1.notnull()] 42 ---> 43 model = smf.OLS(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df) 44 45 #mdl = model.get_robustcov_results(cov_type='HAC',maxlags=1) TypeError: __init__() takes at least 2 arguments (1 given) 

printing rv1 gives you:

Out[318]: [0.015538008996147568, 0.008881670570720125, 0.010421778063375802, ..... 0.003151044550868834, 0.0029676428110974166, 0.005236329928710288, 0.004838460533164701, ''] 

And the other rv gives similair floating numbers. The df just assembles them in the manner that pd.dataframe does, which according to the documentation is supported (http://statsmodels.sourceforge.net/devel/example_formulas.html).

5
  • It would help if you would write the actual error message including the stack trace. From your current description it is not clear what call crashes. Commented Apr 20, 2015 at 10:22
  • There it is. But I cannot see what more kinds of arguments could be needed, since I took this method from another example, which worked for that person. I was thinking, could it be because of the existence of single-quotes in the lists? Commented Apr 20, 2015 at 11:11
  • Using df. for the formula argument is clearly wrong. (Compare with statsmodels.sourceforge.net/devel/example_formulas.html). But I admit that I don't see a correlation between the error message and your arguments. It would help if you could make the example self contained. E.g. add import statements and example data. Commented Apr 20, 2015 at 13:06
  • Fixed. I hope the rv1 example will suffice because the other are assembled in the same manner and are just floats aswell. Commented Apr 20, 2015 at 13:29
  • After changing the df. part of the formula the same error message still shows. Commented Apr 20, 2015 at 13:37

1 Answer 1

2

The problem is that the formula function in statsmodels.formula.api is lower case. Upper case OLS is the same as in the main statsmodels.api. The uppercase models will be deleted in future from the formula.api namespace to avoid exactly this confusion.

That means, you need to use lower case ols, as in

model = smf.ols(formula = 'df.RVFCAST ~ df.RV1 + df.RV5 + df.RV22', data = df)

Note, the lower case formula functions are just aliases to the from_formula methods of the models.

smf.ols is a shortcut for sm.OLS.from_formula

Sign up to request clarification or add additional context in comments.

3 Comments

That worked! But when I try to print a summary of the fitted model, it returns an error about the axis needing to be specified when shapes of a and weights differ. Do you have any idea about that?
I have no guess for that without the traceback or a full example. You could ask a new question or ask at the pystatsmodels google group. One thing: drop the df. inside the formula string. We (statsmodels developers) never considered this pattern, and I'm not sure it works correctly. It is likely that it messes up patsys creation of the design matrix, or post-estimation handling.
I posted a second question with more detail here: stackoverflow.com/questions/29799161/…

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.