java - Encoder for Row Type Spark Datasets -


i write encoder row type in dataset, map operation doing. essentially, not understand how write encoders.

below example of map operation:

in example below, instead of returning dataset<string>, return dataset<row>

dataset<string> output = dataset1.flatmap(new flatmapfunction<row, string>() {             @override             public iterator<string> call(row row) throws exception {                  arraylist<string> obj = //some map operation                 return obj.iterator();             }         },encoders.string()); 

i understand instead of string encoder needs written follows:

    encoder<row> encoder = new encoder<row>() {         @override         public structtype schema() {             return join.schema();             //return null;         }          @override         public classtag<row> clstag() {             return null;         }     }; 

however, not understand clstag() in encoder, , trying find running example can demostrate similar (i.e. encoder row type)

edit - not copy of question mentioned : encoder error while trying map dataframe row updated row answer talks using spark 1.x in spark 2.x (i not doing so), looking encoder row class rather resolve error. finally, looking solution in java, not in scala.

the answer use rowencoder , schema of dataset using typestruct.

below working example of flatmap operation datasets:

    structtype structtype = new structtype();     structtype = structtype.add("id1", datatypes.longtype, false);     structtype = structtype.add("id2", datatypes.longtype, false);      expressionencoder<row> encoder = rowencoder.apply(structtype);      dataset<row> output = join.flatmap(new flatmapfunction<row, row>() {         @override         public iterator<row> call(row row) throws exception {             // static map operation demonstrate             list<object> data = new arraylist<>();             data.add(1l);             data.add(2l);             arraylist<row> list = new arraylist<>();             list.add(rowfactory.create(data.toarray()));             return list.iterator();         }     }, encoder); 

Comments