spark 1.6 mlib pipeline error -

use spark1.6.0 mlib pipeline build naivebayes model

val traintokenizer = new tokenizer().setinputcol("text").setoutputcol("words") val hashingtf = new hashingtf().setnumfeatures(400000).setinputcol("words").setoutputcol("rawfeatures") val idf = new idf().setinputcol("rawfeatures").setoutputcol("features") val bayes = new naivebayes().setsmoothing(1.0).setmodeltype("multinomial")   .setlabelcol("category")   .setfeaturescol("features") val pipeline = new pipeline()    .setstages(array(traintokenizer, hashingtf, idf, bayes)) val model = pipeline.fit(traindf) model.write.overwrite().save("xxx")

so save trianed model in hdfs. when use pipelinemodel

val spanmsgmodel = pipelinemodel.load("xxxx")

something wrong error

 assertion failed: no predefined schema found, , no parquet data files or summary files found under hdfs://nameservice1:8020/xxxx/xx/pipelinemodel/stages/3_nb_7a05f3476457/data

but when use spark 2.0 train、save、load. right bug in spark 1.6? how slove

test

Search This Blog

spark 1.6 mlib pipeline error -

Comments

Post a Comment