i'm trying write r code read data mess of old spreadsheets. exact location of data varies sheet sheet: constant first column date , second column has "monthly return" header. in example, data starts in cell b5:
how automate search of excel cells "monthly return" string using r?
at moment, best idea can come upload in r starting @ cell a1 , sort out mess in resulting (huge) matrices. i'm hoping more elegant solution
i haven't found way elegantly, i'm familiar problem (getting data factset pa reports -> excel -> r, right?). understand different reports have different formats, , can pain.
for different version of annoyingly formatted spreadsheets, following. it's not elegant (it requires 2 reads of file) works. reading file twice, make sure columns of correct type, , headers. it's easy mess column imports, i'd rather have code read file twice go through , clean columns myself, , read_excel defaults, if start @ right row, pretty good.
also, it's worth noting of today (2017-04-20), readxl had update. installed new version see if make easy, don't believe that's case, although mistaken.
library(readxl) library(stringr) library(dplyr) f_path <- file.path("whatever.xlsx") if (!file.exists(f_path)) { f_path <- file.choose() } # read twice, temp_read figure out data starts... # maybe need - # excel_sheets <- readxl::excel_sheets(f_path) # desired_sheet <- which(stringr::str_detect(excel_sheets,"2 factor brinson attribution")) desired_sheet <- 1 temp_read <- readxl::read_excel(f_path,sheet = desired_sheet) skip_rows <- null col_skip <- 0 search_string <- "monthly returns" max_cols_to_search <- 10 max_rows_to_search <- 10 # note, - 0, may need add/subtract row if end skipping far later. while (length(skip_rows) == 0) { col_skip <- col_skip + 1 if (col_skip == max_cols_to_search) break skip_rows <- which(stringr::str_detect(temp_read[1:max_rows_to_search,col_skip][[1]],search_string)) - 0 } # ... re-read known starting point. real_data <- readxl::read_excel( f_path, sheet = desired_sheet, skip = skip_rows ) # don't need if start @ right row # given weird spreadsheets weird in own way # may want operate on col_skip, maybe so: # real_data <- real_data %>% # select(-(1:col_skip))
Comments
Post a Comment