1

I'm trying to use cython to improve the performance of a loop, but I'm running into some issues declaring the types of the inputs.

How do I include a field in my typed struct which is a string that can be either 'front' or 'back'

I have a np.recarray that looks like the following (note the length of the recarray is unknown as compile time)

import numpy as np weights = np.recarray(4, dtype=[('a', np.int64), ('b', np.str_, 5), ('c', np.float64)]) weights[0] = (0, "front", 0.5) weights[1] = (0, "back", 0.5) weights[2] = (1, "front", 1.0) weights[3] = (1, "back", 0.0) 

as well as inputs of a list of strings and a pandas.Timestamp

import pandas as pd ts = pd.Timestamp("2015-01-01") contracts = ["CLX16", "CLZ16"] 

I am trying to cythonize the following loop

def ploop(weights, contracts, timestamp): cwts = [] for gen_num, position, weighting in weights: if weighting != 0: if position == "front": cntrct_idx = gen_num elif position == "back": cntrct_idx = gen_num + 1 else: raise ValueError("transition.columns must contain " "'front' or 'back'") cwts.append((gen_num, contracts[cntrct_idx], weighting, timestamp)) return cwts 

My attempt involved typing the weights input as a struct in cython, in a file struct_test.pyx as follows

import numpy as np cimport numpy as np cdef packed struct tstruct: np.int64_t gen_num char[5] position np.float64_t weighting def cloop(tstruct[:] weights_array, contracts, timestamp): cdef tstruct weights cdef int i cdef int cntrct_idx cwts = [] for k in xrange(len(weights_array)): w = weights_array[k] if w.weighting != 0: if w.position == "front": cntrct_idx = w.gen_num elif w.position == "back": cntrct_idx = w.gen_num + 1 else: raise ValueError("transition.columns must contain " "'front' or 'back'") cwts.append((w.gen_num, contracts[cntrct_idx], w.weighting, timestamp)) return cwts 

But I am receiving runtime errors, which I believe are related to the char[5] position.

import pyximport pyximport.install() import struct_test struct_test.cloop(weights, contracts, ts) ValueError: Does not understand character buffer dtype format string ('w') 

In addition I am a bit unclear how I would go about typing contracts as well as timestamp.

2
  • My limited experience is that compound dtype and strings are hard to use in cython and provide limited speed improvement. But at a first glance it looks like your ploop could be written with numpy array methods, operating on all weights at once. I may try that later. Commented Jul 5, 2017 at 17:11
  • 1
    I think you're using Python3, and it's interpreting np.str_ as unicode. If you use np.bytes_ instead then a simplified version of your code works for me. (I'm not posting this as an answer since I don't really want to get into the second part of your question about contracts and timestamp) Commented Jul 5, 2017 at 17:58

1 Answer 1

1

Your ploop (without the timestamp variable) produces:

In [226]: ploop(weights, contracts) Out[226]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)] 

Equivalent function without a loop:

def ploopless(weights, contracts): arr_contracts = np.array(contracts) # to allow array indexing wgts1 = weights[weights['c']!=0] mask = wgts1['b']=='front' wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]] mask = wgts1['b']=='back' wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]+1] return wgts1.tolist() In [250]: ploopless(weights, contracts) Out[250]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)] 

I'm taking advantage of the fact that returned list of tuples has same (int, str, int) layout as the input weight array. So I'm just making a copy of weights and replacing selected values of the b field.

Note that I use the field selection index before the mask one. The boolean mask produces a copy, so we have to careful about indexing order.

I'm guessing that loop-less array version will be competitive in time with the cloop (on realistic arrays). The string and list operations in cloop probably limit its speedup.

Sign up to request clarification or add additional context in comments.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.