Timeline for Apply a funtion to each element of a dataframe according to other elements values in the corresponding row in pandas and python
Current License: CC BY-SA 4.0
10 events
| when toggle format | what | by | license | comment | |
|---|---|---|---|---|---|
| Mar 4, 2022 at 8:57 | comment | added | diedro | Let us continue this discussion in chat. | |
| Mar 3, 2022 at 19:07 | comment | added | Lodinn | This use case is actually fairly common. Here is an IDW implementation in MetPy solving the same problem and its implementation. Looks reasonably pythonic to me. Also, on the subject of parallelization - of course you can, at very least because your dataframe could be split in chunks and the processing of those is independent. | |
| Mar 3, 2022 at 18:49 | comment | added | Lodinn | Sure. Take a look at scikit-learn GPR then. NaN handling is bit tricky though and kriging does take a while with large datasets - consider simply using RBFs. I've found an SO answer with code examples for both. That would be a more "normal" way to use pandas/scipy/sklearn. | |
| Mar 3, 2022 at 17:26 | comment | added | diedro | The idea could be add a check in order to skip the point and moment when there is no close stations | |
| Mar 3, 2022 at 17:21 | comment | added | diedro | You totally get the point. Are you familiar with the kriging and with the concept of cross validation? | |
| Mar 3, 2022 at 16:04 | comment | added | Lodinn | Do take a look at meteostat: the logic implemented there is pretty reasonable (select stations within the 30km circle by default and interpolate data from those stations only). I mean... you do you, having to loop through columns and rows both is the sole solution to the problem you are posing, but the main reason people are bothered when their code doesn't look clean is that it is usually an indication of a deeper problem, resulting in a poor maintainability of the system as a whole. I would not recommend selecting three closest stations on a per-row basis even in an one-off research project. | |
| Mar 3, 2022 at 15:58 | comment | added | Lodinn | Yes, sorry, my bad. header=0, skiprows=range(1,3). If you want to select three closest stations on a row-per-row basis, ignoring those giving NaNs, that is slightly different from what you described and there probably is no good way of doing this indeed. But it also raises more serious concerns about the overall logic: what happens if 5 out of 6 stations have NaNs in a given row and you attempt to select 3 of them not having NaNs? Arguably worse yet - suppose you have hundreds of stations and for whatever reason, you end up interpolating data from hundreds of kilometers away. | |
| Mar 3, 2022 at 14:34 | comment | added | diedro | Thanks. skiprows=3 is equal to skiprows=[0,1,2]. Am I wrong?; How can I create a dictionary mapping column names to the three closest stations without NaN? | |
| S Mar 3, 2022 at 12:26 | review | First answers | |||
| Mar 3, 2022 at 12:45 | |||||
| S Mar 3, 2022 at 12:26 | history | answered | Lodinn | CC BY-SA 4.0 |