3,942 questions
Advice
0 votes
3 replies
43 views
String manipulation: extract words under brackets
I'm not yet very familiar with the patterns in Lua's string.gsub function. If I have a string like this: Fishing Lure(+100 Fishing Skill)(1 hour) and I want extract only the string "1 hour"...
2 votes
2 answers
186 views
Unnest a complex and inconsistent dataset using data.table
I have a dataset that originally was a json file which is converted to a data.table with data.table(jsonlite::fromJSON(data)). The resulting data.table is complex with nested data containing not only ...
3 votes
4 answers
237 views
Fast unnest complex column with data.table
I have a dataset where the column to unnest contains data with unequal rows and columns rather than data with equal dimensions. I'm looking for a fast approach to unnest this dataset using data.table. ...
1 vote
2 answers
185 views
How do I filter my data with a looped "if" statement while retaining data from both current, past and an average of current + past loops?
I'm currently trying to filter a dataset containing audio data on bird species. The data looks like this: head(audiomoth_sample) id park park_abbr am_no sci_name com_name start_s end_s conf date_time ...
3 votes
3 answers
214 views
Mutating detection data into binary
Currently I have a dataframe of bear detections that I want to convert into a binary detection history (14 columns of day1, day2, day3, etc. where: actual_date_out = the date the camera was deployed, ...
2 votes
4 answers
197 views
Get the number of days since last event
I have data where, for each individual, the dates of event are related. Here is an example: id Date 1001 2025-06-20 1002 2025-06-24 1002 2025-06-20 1002 2025-06-19 What I would like to ...
0 votes
0 answers
16 views
Find conditions from multiple databases to have in a single database
I am currently working in a project where multiple databses are available to check for specific conditions of a patient. Specifically, I have a "master" database in wide format, with one row ...
0 votes
1 answer
111 views
Formatting csv file format in pyspark
I have a | delimited csv file with data as shown below. AccountID|BounceSubcategory|BounceTypeID|BounceType|SMTPBounceReason|SMTPMessage|SMTPCode|TriggererSendDefinitionObjectID|...
0 votes
1 answer
77 views
Setting a row number for each row in PySpark Dataframe
Currently I'm working with a large database using PySpark and stuck with a problem oh how to correctly set row numbers depending on condition My dataframe is: id_company id_client id_loan date c1 ...
0 votes
0 answers
52 views
Execution of complex filtering procedures in PySpark
Currently I'm trying to execute some filtering procedures in PySpark (educational purposes). I'm new to PySpark, so decided to ask for a help. My dataframe look like this: ID ApplicationDate ...
0 votes
3 answers
244 views
Update Object_construct nested in an Array_construct in Snowflake
Can anyone please help me with this scenario where I have might have multiple OBJECT_CONSTRUCT nested within an ARRAY_CONSTRUCT. I am not able to update one value of an element within it. I am using ...
-1 votes
3 answers
144 views
Graphically reorganizing columns in DataFrame
I'm currently in the process of cleaning up a large questionnaire data base. I wanted to know, if in R or pandas, there was a way to graphically change the order of columns. I mostly used RStudio and ...
0 votes
2 answers
170 views
Merge dataframes with conditions using PySpark
Currently I'm making calculations using PySpark and trying to match data from multiple dataframes on a specific conditions. I'm new to PySpark and decided to ask for a help. My first dataframe ...
1 vote
1 answer
110 views
Advanced Filtering Operations in PySpark
Currently, I'm making calculations using PySpark on a dataframe where information on how loans are paid by borrowers is shown. I'm new to PySpark and decided to ask for help while trying to execute ...
0 votes
0 answers
78 views
Dropping rows whose row sum = zero keeping the original structure same
I have a dataframe containing incalculable rows and columns. The df is structured in such that until 6th row and 2nd column, I have string as input and the rest are numbers(floating points). I want to ...