use spark1.6.0 mlib pipeline build naivebayes model
val traintokenizer = new tokenizer().setinputcol("text").setoutputcol("words") val hashingtf = new hashingtf().setnumfeatures(400000).setinputcol("words").setoutputcol("rawfeatures") val idf = new idf().setinputcol("rawfeatures").setoutputcol("features") val bayes = new naivebayes().setsmoothing(1.0).setmodeltype("multinomial") .setlabelcol("category") .setfeaturescol("features") val pipeline = new pipeline() .setstages(array(traintokenizer, hashingtf, idf, bayes)) val model = pipeline.fit(traindf) model.write.overwrite().save("xxx")
so save trianed model in hdfs. when use pipelinemodel
val spanmsgmodel = pipelinemodel.load("xxxx")
something wrong error
assertion failed: no predefined schema found, , no parquet data files or summary files found under hdfs://nameservice1:8020/xxxx/xx/pipelinemodel/stages/3_nb_7a05f3476457/data
but when use spark 2.0 train、save、load. right bug in spark 1.6? how slove
Comments
Post a Comment