Timeline for Apply a funtion to each element of a dataframe according to other elements values in the corresponding row in pandas and python

Current License: CC BY-SA 4.0

10 events

when toggle format	what		by	license	comment
Mar 4, 2022 at 8:57	comment	added	diedro		Let us continue this discussion in chat.
Mar 3, 2022 at 19:07	comment	added	Lodinn		This use case is actually fairly common. Here is an IDW implementation in MetPy solving the same problem and its implementation. Looks reasonably pythonic to me. Also, on the subject of parallelization - of course you can, at very least because your dataframe could be split in chunks and the processing of those is independent.
Mar 3, 2022 at 18:49	comment	added	Lodinn		Sure. Take a look at scikit-learn GPR then. NaN handling is bit tricky though and kriging does take a while with large datasets - consider simply using RBFs. I've found an SO answer with code examples for both. That would be a more "normal" way to use pandas/scipy/sklearn.
Mar 3, 2022 at 17:26	comment	added	diedro		The idea could be add a check in order to skip the point and moment when there is no close stations
Mar 3, 2022 at 17:21	comment	added	diedro		You totally get the point. Are you familiar with the kriging and with the concept of cross validation?
Mar 3, 2022 at 16:04	comment	added	Lodinn		Do take a look at meteostat: the logic implemented there is pretty reasonable (select stations within the 30km circle by default and interpolate data from those stations only). I mean... you do you, having to loop through columns and rows both is the sole solution to the problem you are posing, but the main reason people are bothered when their code doesn't look clean is that it is usually an indication of a deeper problem, resulting in a poor maintainability of the system as a whole. I would not recommend selecting three closest stations on a per-row basis even in an one-off research project.
Mar 3, 2022 at 15:58	comment	added	Lodinn		Yes, sorry, my bad. `header=0, skiprows=range(1,3)`. If you want to select three closest stations on a row-per-row basis, ignoring those giving NaNs, that is slightly different from what you described and there probably is no good way of doing this indeed. But it also raises more serious concerns about the overall logic: what happens if 5 out of 6 stations have NaNs in a given row and you attempt to select 3 of them not having NaNs? Arguably worse yet - suppose you have hundreds of stations and for whatever reason, you end up interpolating data from hundreds of kilometers away.
Mar 3, 2022 at 14:34	comment	added	diedro		Thanks. skiprows=3 is equal to skiprows=[0,1,2]. Am I wrong?; How can I create a dictionary mapping column names to the three closest stations without NaN?
S Mar 3, 2022 at 12:26	review	First answers
Mar 3, 2022 at 12:45
S Mar 3, 2022 at 12:26	history	answered	Lodinn	CC BY-SA 4.0