0

Apologies in advance if this has already been asked elsewhere, but I've tried different attempts and nothing has worked so far.

In my data frame Mesure I would like to split the values of the column Row.names into two new columns named Sample_type and Locality. I try to use a tidyverse solution but R returns me that the column must not be dupicated... How can I modify it ? Also, is it possible to remove the "<" ?

> head(Mesure) Row.names mean_Mesure max_Mesure min_Mesure 1 Aquatic_moss.Paris.AG-110m.< 100 110 90 2 Aquatic_moss.Paris.BE-7. 123 177 53 3 Aquatic_moss.Paris.CO-57.< 40 60 20 4 Aquatic_moss.Paris.CO-58.< 40 50 30 5 Aquatic_moss.Paris.CO-60.< 50 70 30 6 Aquatic_moss.Paris.CS-134.< 200 300 100 > > library(tidyverse) > new_df <- Mesure %>% + rownames_to_column(var = "Row.names") %>% + separate(Row.names,sep = ".",into = c("Sample_type","Locality")) Error: Column name `Row.names` must not be duplicated. Run `rlang::last_error()` to see where the error occurred. 
2
  • When you create Mesure, wouldn't as_tibble(..., rownames="Row.names") give you what you want? Commented Jun 24, 2020 at 14:12
  • Well Mesure is the merging of several lists that come from the splitting of several different data frames. During splitting of these initial data frames, I renamed the dataframes Commented Jun 24, 2020 at 14:17

2 Answers 2

3

To separate that with the first "dot" you can use:

Mesure %>% separate(Row.names, sep = "\\.", into = c("Sample_type", "Locality"), extra = "merge") 

Explanation:

  • You don't need to convert rownames_to_column(), because "Row.names" is already a column.
  • sep = "." is not enough as the . is taken as a regular expression.
  • There are many . in the column, so you need to specify extra = "merge" to separate only at first appearance. If you would like to keep only "Paris" without AG-110m etc, you specify extra = "drop" there.

Result with extra = "merge":

 Sample_type Locality mean_Mesure max_Mesure min_Mesure 1 Aquatic_moss Paris.AG-110m.< 100 110 90 2 Aquatic_moss Paris.BE-7. 123 177 53 3 Aquatic_moss Paris.CO-57.< 40 60 20 4 Aquatic_moss Paris.CO-58.< 40 50 30 5 Aquatic_moss Paris.CO-60.< 50 70 30 6 Aquatic_moss Paris.CS-134.< 200 300 100 

Result with extra = "drop":

 Sample_type Locality mean_Mesure max_Mesure min_Mesure 1 Aquatic_moss Paris 100 110 90 2 Aquatic_moss Paris 123 177 53 3 Aquatic_moss Paris 40 60 20 4 Aquatic_moss Paris 40 50 30 5 Aquatic_moss Paris 50 70 30 6 Aquatic_moss Paris 200 300 100 

If you need to drop "<" at the end of Locality column, run something like:

Mesure$Locality <- gsub("<$", "", Mesure$Locality) 

where "<$" means "< at the end of the string".

Sign up to request clarification or add additional context in comments.

2 Comments

To drop "<" in the same command line, it is also possible to use extra = "drop" : new_df <- Mesure %>% separate(Row.names,sep = "\\.",into = c("Sample_type","Locality", "Chemicals"), extra = "drop")
@Sylvain Oh yes, this is even better! I was only thinking about two columns as you specified in the question, however separating the chemicals and dropping the rest is definitely a better idea.
0

Apologies. I should read your question properly. The second part of your answer would be:

d %>% separate(Row.names, into=c("Sample_type","Locality"), extra="drop") # A tibble: 6 x 6 Sample_type Locality mean_Mesure max_Mesure min_Mesure <chr> <chr> <dbl> <dbl> <dbl> 1 Aquatic moss 100 110 90 2 Aquatic moss 123 177 53 3 Aquatic moss 40 60 20 4 Aquatic moss 40 50 30 5 Aquatic moss 50 70 30 6 Aquatic moss 200 300 100 

I can't help you with the first part because I don't know how you create the input data frame.

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.