i write encoder row type in dataset, map operation doing. essentially, not understand how write encoders.
below example of map operation:
in example below, instead of returning dataset<string>, return dataset<row>
dataset<string> output = dataset1.flatmap(new flatmapfunction<row, string>() { @override public iterator<string> call(row row) throws exception { arraylist<string> obj = //some map operation return obj.iterator(); } },encoders.string());
i understand instead of string encoder needs written follows:
encoder<row> encoder = new encoder<row>() { @override public structtype schema() { return join.schema(); //return null; } @override public classtag<row> clstag() { return null; } };
however, not understand clstag() in encoder, , trying find running example can demostrate similar (i.e. encoder row type)
edit - not copy of question mentioned : encoder error while trying map dataframe row updated row answer talks using spark 1.x in spark 2.x (i not doing so), looking encoder row class rather resolve error. finally, looking solution in java, not in scala.
the answer use rowencoder , schema of dataset using typestruct.
below working example of flatmap operation datasets:
structtype structtype = new structtype(); structtype = structtype.add("id1", datatypes.longtype, false); structtype = structtype.add("id2", datatypes.longtype, false); expressionencoder<row> encoder = rowencoder.apply(structtype); dataset<row> output = join.flatmap(new flatmapfunction<row, row>() { @override public iterator<row> call(row row) throws exception { // static map operation demonstrate list<object> data = new arraylist<>(); data.add(1l); data.add(2l); arraylist<row> list = new arraylist<>(); list.add(rowfactory.create(data.toarray())); return list.iterator(); } }, encoder);
Comments
Post a Comment