1

My Data

The data are from a large survey in a set of developing countries. The data include, among other things, variables on each respondent's country and local region (within the country).

The only problem is that, instead of coding local region as strings (such as "New York" or "Westchester County", it is coded as numeric, which correspond to a list of regions in the codebook.

My Question

What I would like to know is whether there's a way to automate the process of re-naming the factors using the codelist from the codebook. Each region is preceded by a numeric value and an equals sign, and is followed immediately by a comma.

This list takes this form:

1=New York, 2=Paris, 3=London, 4=Moscow, 5=Boston, ..., 230=Tblisi 

Is there some R code that might allow me to quickly rename all the factors in this variable using this list?

2
  • Does your "list" just exist in a text file? Or is it an R object? Commented Jan 9, 2016 at 3:23
  • 1
    Hi @jbaums, thank you for your feedback. I wanted to be sure to provide enough information to give context to the question. To answer your question, the list exists in a text file. Commented Jan 9, 2016 at 4:24

2 Answers 2

2

If you have a text file with a vector like

 1=New York, 2=Paris, 3=London, 4=Moscow, 5=Boston, ..., 230=Tblisi 

you're going to have to do some regex to extract the cities from the numbers. For example, you could do:

 library(stringr) List <- c("1=New York", "2=Paris", "3=London", "4=Moscow", "5=Boston") Cities <- data.frame(Orig = List) Cities$CityNum <- str_extract(Cities$Orig, "[0-9]{1,}") # match the number at least once Cities$City <- str_sub(Cities$Orig, start = str_locate(Cities$Orig, "[A-Z]")[, 1], end = str_length(Cities$Orig)) 

Assuming that you have a column in MyData called "CityNum" that lists the number...

 MyData <- merge(MyData, Cities, by = CityNum) 

And I must agree with jbaums about being concise. :-)

Sign up to request clarification or add additional context in comments.

1 Comment

Dear @LauraS, thank you for your response. The list is in the form that I specified in the original post. I am hoping to scrape the data without retyping every value or manually adding quotation marks.
1

You could use strsplit on the codelist and then use the result as the levels and labels for your factor.

citylist <- c("1=New York", "2=Paris", "3=London", "4=Moscow", "5=Boston") codes <- data.frame(do.call(rbind, strsplit(citylist, "="))) # Split and bind the result into a dataframe set.seed(85) mycities <- ceiling(runif(10, 0, 5)) # Generate some dummy data mycities <- factor(mycities, levels = codes$X1, labels = codes$X2) 

Which gives:

[1] London New York Paris Moscow London Boston New York New York New York [10] Boston Levels: New York Paris London Moscow Boston 

1 Comment

Hi @Jay, thank you very much for your feedback. I will try that now.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.