i need split variables (string) different columns. data this:
test.data <- data.frame(id=c(101,101,101,101,101), level=c( "levels p3 trunk slide.level", "levels p3 shark.level", "levels p3 wedge.level", "levels p3 tricky.level", "levels p4 annoying lever.level"), badge=c( "springboard badge s", "lever badge s", "lever badge s", "ramp badge s", "lever badge s")) > test.data id level badge 1 101 levels p3 trunk slide.level springboard badge s 2 101 levels p3 shark.level lever badge s 3 101 levels p3 wedge.level lever badge s 4 101 levels p3 tricky.level ramp badge s 5 101 levels p4 annoying lever.level lever badge s
i need split "level" variable 2 variables [pp,level] , "badge" variable 2 variables [item,badge].
my data should this:
> test.data id pp level item badge 1 101 levels p3 trunk slide.level springboard badge s 2 101 levels p3 shark.level lever badge s 3 101 levels p3 wedge.level lever badge s 4 101 levels p3 tricky.level ramp badge s 5 101 levels p4 annoying lever.level lever badge s
please note test.data$level variable starts "space". tried strsplit() function not solve it. on this?
best.
we can double extract
tidyr
. 'level' column, match word (\\w+
) followed 1 or more white space (\\s+
) followed word (\\w+
), capture group (wrap parentheses ((...)
) followed 1 or more space (\\s+
) , capture rest of characters ((.*)
). similarly, can separate other column 2 regex
library(tidyr) extract(test.data, level, = c('pp', 'level'), '(\\w+\\s+\\w+)\\s+(.*)') %>% extract(badge, = c('item', 'badge'), '(\\w+)\\s*(.*)') # id pp level item badge #1 101 levels p3 trunk slide.level springboard badge s #2 101 levels p3 shark.level lever badge s #3 101 levels p3 wedge.level lever badge s #4 101 levels p3 tricky.level ramp badge s #5 101 levels p4 annoying lever.level lever badge s
Comments
Post a Comment