I'm trying to use cython to improve the performance of a loop, but I'm running into some issues declaring the types of the inputs.
How do I include a field in my typed struct which is a string that can be either 'front' or 'back'
I have a np.recarray that looks like the following (note the length of the recarray is unknown as compile time)
import numpy as np weights = np.recarray(4, dtype=[('a', np.int64), ('b', np.str_, 5), ('c', np.float64)]) weights[0] = (0, "front", 0.5) weights[1] = (0, "back", 0.5) weights[2] = (1, "front", 1.0) weights[3] = (1, "back", 0.0) as well as inputs of a list of strings and a pandas.Timestamp
import pandas as pd ts = pd.Timestamp("2015-01-01") contracts = ["CLX16", "CLZ16"] I am trying to cythonize the following loop
def ploop(weights, contracts, timestamp): cwts = [] for gen_num, position, weighting in weights: if weighting != 0: if position == "front": cntrct_idx = gen_num elif position == "back": cntrct_idx = gen_num + 1 else: raise ValueError("transition.columns must contain " "'front' or 'back'") cwts.append((gen_num, contracts[cntrct_idx], weighting, timestamp)) return cwts My attempt involved typing the weights input as a struct in cython, in a file struct_test.pyx as follows
import numpy as np cimport numpy as np cdef packed struct tstruct: np.int64_t gen_num char[5] position np.float64_t weighting def cloop(tstruct[:] weights_array, contracts, timestamp): cdef tstruct weights cdef int i cdef int cntrct_idx cwts = [] for k in xrange(len(weights_array)): w = weights_array[k] if w.weighting != 0: if w.position == "front": cntrct_idx = w.gen_num elif w.position == "back": cntrct_idx = w.gen_num + 1 else: raise ValueError("transition.columns must contain " "'front' or 'back'") cwts.append((w.gen_num, contracts[cntrct_idx], w.weighting, timestamp)) return cwts But I am receiving runtime errors, which I believe are related to the char[5] position.
import pyximport pyximport.install() import struct_test struct_test.cloop(weights, contracts, ts) ValueError: Does not understand character buffer dtype format string ('w') In addition I am a bit unclear how I would go about typing contracts as well as timestamp.
cythonand provide limited speed improvement. But at a first glance it looks like yourploopcould be written withnumpyarray methods, operating on allweightsat once. I may try that later.np.str_as unicode. If you usenp.bytes_instead then a simplified version of your code works for me. (I'm not posting this as an answer since I don't really want to get into the second part of your question about contracts and timestamp)