i have function checking 0 numbers in each column in large dataframe. want check 0 numbers in each col after grouped category. here example:
zero_rate <- function(df) { z_rate_list <- sapply(df, function(x) { data.frame( n_zero=length(which(x==0)), n=length(x), z_rate=length(which(x==0))/length(x)) }) d <- data.frame(z_rate_list) d <- sapply(d, unlist) d <- as.data.frame(d) return(d)} df = data.frame(var1=c(1,0,na,4,na,6,7,0,0,10),var2=c(11,na,na,0,na,16,0,na,19,na)) df1= data.frame(cat = c(1,1,1,1,1,2,2,2,2,2),df) zero_rate_df = df1 %>% group_by(cat) %>% do( zero_rate(.))
here zero_rate(df) works expected. when group data cat , calculate in each category zero_rate each column, result not expected. expect this:
cat va1 var2 1 n_zero 1 1 n 5 5 z_rate 0.2 0.2 2 n_zero 2 1 n 5 5 z_rate 0.4 0.2
any suggestion? thank you.
i came following code. .[-1]
used remove grouping col:
zero_rate <- function(df){ res <- lapply(df, function(x){ y <- c(sum(x == 0, na.rm = t), length(x)) c(y, y[1]/y[2]) }) res <- do.call(cbind.data.frame, res) res$vars <- c('n_zero', 'n', 'z_rate') res } df1 %>% group_by(cat) %>% do( zero_rate(.[-1])) # cat var1 var2 vars # <dbl> <dbl> <dbl> <chr> # 1 1 1.0 1.0 n_zero # 2 1 5.0 5.0 n # 3 1 0.2 0.2 z_rate # 4 2 2.0 1.0 n_zero # 5 2 5.0 5.0 n # 6 2 0.4 0.2 z_rate
Comments
Post a Comment