i attempting run spark job accesses dynamodb , old way of instantiating dynamodb client has been deprecated , recommended use client builder.
well, works fine locally, when deploy emr i'm getting error:
exception in thread "main" java.lang.illegalaccesserror: tried access class com.amazonaws.services.dynamodbv2.amazondynamodbclientconfigurationfactory class com.amazonaws.services.dynamodbv2.amazondynamodbasyncclientbuilder
my code causes is:
val dynamodbclient = amazondynamodbasyncclientbuilder .standard() .withregion(regions.us_east_1) .build()
my build.sbt contains:
librarydependencies += "com.amazonaws" % "aws-java-sdk" % "1.11.114"
and spark-submit command looks this:
spark-submit --conf spark.eventlog.enabled=false --packages com.typesafe.play:play-json_2.11:2.5.9,com.github.traviscrawford:spark-dynamodb:0.0.6,com.amazonaws:aws-java-sdk:1.11.114 --master yarn --deploy-mode cluster --class main application.jar
does have ideas? overlooking basic?
update
i noticed emr running openjdk 1.8 , local system running oracle java 1.8. changed emr cluster match java running, there still no change.
i dont have perfect answer here i'm struggling similar problem fat jar build spark driver running on emr. drop recent tour.
- try run spark-submit option
-v
, logs class paths , forth. can see emr loading aws-java-sdk well. not clear me version of aws-java-sdk emr running? emr release 4.7.0 states "upgraded aws sdk java 1.10.75" (http://docs.aws.amazon.com/emr/latest/releaseguide/emr-whatsnew.html). - then add argument
--conf spark.driver.userclasspathfirst=true
load aws-java-sdk version driver specifies.
unfortunately last step raises yarn errors like: unable load yarn support ...
(some discussion on that: https://community.cloudera.com/t5/advanced-analytics-apache-spark/spark-submit-fails-after-setting-userclasspathfirst-to-true/td-p/46778)
some discussion aws-java-sdk github repos: https://github.com/aws/aws-sdk-java/issues/1094
conclusion: use apis of aws-java-sdk version 1.10.75
Comments
Post a Comment