ios - Swift version of StreamReader only streams the entire file (not in chunks, which is desired) -
i have been having issues processing large files. @ first blaming parser, wrote one, still have same issues. if use code scan 1,000,000 record file (250 mb), takes 4 gb of memory while processing. expect under 50 mb, considering taking 1 line @ time:
func sample(fileurl: url) { if let astreamreader = streamreader(path: fileurl.path) { defer { astreamreader.close() } while let line = astreamreader.nextline() { // insert industrious code here... (a function call) } } }
(note doesn't except read file , discard results) **
why entire file being processed rather 1 line @ time?
** files need process range in many gbs. did not write streamreader - have found same code in number of places minor variations -- appears based on c# class. streamreader code using:
// streamreader.swift
import foundation class streamreader { let encoding: string.encoding let chunksize: int var filehandle: filehandle! let delimdata: data var buffer: data var ateof: bool init?(path: string, delimiter: string = "\n", encoding: string.encoding = .utf8, chunksize: int = 4096) { guard let filehandle = filehandle(forreadingatpath: path), let delimdata = delimiter.data(using: encoding) else { return nil } self.encoding = encoding self.chunksize = chunksize self.filehandle = filehandle self.delimdata = delimdata self.buffer = data(capacity: chunksize) self.ateof = false } deinit { self.close() } /// return next line, or nil on eof. func nextline() -> string? { precondition(filehandle != nil, "attempt read closed file") // read data chunks file until line delimiter found: while !ateof { if let range = buffer.range(of: delimdata) { // convert complete line (excluding delimiter) string: let line = string(data: buffer.subdata(in: 0..<range.lowerbound), encoding: encoding) // remove line (and delimiter) buffer: buffer.removesubrange(0..<range.upperbound) return line } let tmpdata = filehandle.readdata(oflength: chunksize) if !tmpdata.isempty { buffer.append(tmpdata) } else { // eof or read error. ateof = true if !buffer.isempty { // buffer contains last line in file (not terminated delimiter). let line = string(data: buffer data, encoding: encoding) buffer.count = 0 return line } } } return nil } /// start reading beginning of file. func rewind() { filehandle.seek(tofileoffset: 0) buffer.count = 0 ateof = false } /// close underlying file. no reading must done after calling method. func close() { filehandle?.closefile() filehandle = nil } } extension streamreader : sequence { func makeiterator() -> anyiterator<string> { return anyiterator { return self.nextline() } } }
got it! 1000003 records in under 35 mb (originally: 4gb) although code using might have been useful once solved encoding issue, found code i'm more comfortable with. (he provided gist.)
see question read file/url line-by-line in swift. answer andy c, modernization of answer @algal. click on link see other excellent, thoughtful ideas matt , martin r. (both of higher rated solution).
finally, lazy ones:
import foundation /// reads text file line line class linereader { let path: string fileprivate let file: unsafemutablepointer<file>! init?(path: string) { self.path = path file = fopen(path, "r") guard file != nil else { return nil } } var nextline: string? { var line:unsafemutablepointer<cchar>? = nil var linecap:int = 0 defer { free(line) } return getline(&line, &linecap, file) > 0 ? string(cstring: line!) : nil } deinit { fclose(file) } } extension linereader: sequence { func makeiterator() -> anyiterator<string> { return anyiterator<string> { return self.nextline } } }
Comments
Post a Comment