one of json field (age below) meant number represented null coming string in dataframe printschema
input json file
{"age":null,"name":"abc","batch":190} {"age":null,"name":"abc","batch":190} spark code , output
val df = spark.read.json("/home/white/tmp/a.json") df.printschema() df.show() ********************* output ********************* root |-- batch: long (nullable = true) |-- age: string (nullable = true) |-- name: string (nullable = true) +-----+----+----+ |batch|age|name| +-----+----+----+ | 190|null| abc| | 190|null| abc| +-----+----+----+ i want age long , achieving creating new structtype age field long , recreating dataframe df.sqlcontext.createdataframe( df.rdd, newschema ). can done while spark.read.json api directly?
i think easiest way follows:
spark.read.json("/home/white/tmp/a.json").withcolumn("age", 'age.cast(longtype)) this produces following schema:
root |-- age: long (nullable = true) |-- batch: long (nullable = true) |-- name: string (nullable = true) spark makes best guess on types, , makes sense see null in json , think "string" since string lies on nullable anyref side of scala object hierarchy while long lies on non-nullable anyval side. need cast column make spark treat data see fit.
incidentally, why using long rather int ages? people must eat very healthy.
Comments
Post a Comment