程序问答   发布时间:2022-06-02  发布网站:大佬教程  code.js-code.com
大佬教程收集整理的这篇文章主要介绍了从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId大佬教程大佬觉得挺不错的,现在分享给大家,也给大家做个参考。

如何解决从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId?

开发过程中遇到从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId的问题如何解决?下面主要结合日常开发的经验,给出你关于从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId的解决方法建议,希望对你解决从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId有所启发或帮助;

我正在尝试使用在 Ubuntu 20.04、Spark 3.1.2、Hadoop 3.2、Java 11 上运行的 PySpark 读取和写入 AWS S3。

我将使用这些额外的 Java 包启动 PySpark:

pyspark --packages=com.amazonaws:aws-java-sdk:1.11.563,org.apache.hadoop:hadoop-aws:3.2.2

那么,s3a配置如下:

hadoopConf = sc._Jsc.hadoopConfiguration()
hadoopConf.set('fs.s3a.access.key','my_access_key')
hadoopConf.set('fs.s3a.secret.key','my_secret_key')

但是,当我尝试使用 df = spark.read.csv('s3a://sample_bucket_name/sample.csv') 读取时,出现以下错误:

TraceBACk (most recent call last):
  file "<stdin>",line 1,in <module>
  file "/usr/local/spark/python/pyspark/sql/reaDWriter.py",line 737,in csv
    return self._df(self._jreader.csv(self._spark._sc._jvm.PythonUtils.toSeq(path)))
  file "/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py",line 1304,in __call__
  file "/usr/local/spark/python/pyspark/sql/utils.py",line 111,in deco
    return f(*a,**kw)
  file "/usr/local/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py",line 326,in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling o38.csv.
: java.nio.file.AccessDenIEdException: s3a://********/sample.csv: getfileStatus on s3a://*********/sample.csv: com.amazonaws.services.s3.model.AmazonS3Exception: ForbIDden (service: Amazon S3; Status Code: 403; Error Code: 403 ForbIDden; request ID: *********; S3 Extended request ID: ****************),S3 Extended request ID: ****************:403 ForbIDden
    at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:230)
    at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:151)
    at org.apache.hadoop.fs.s3a.S3AfileSystem.s3GetfileStatus(S3AfileSystem.java:2275)
    at org.apache.hadoop.fs.s3a.S3Afilesystem.innerGetfileStatus(S3AfileSystem.java:2226)
    at org.apache.hadoop.fs.s3a.S3AfileSystem.getfileStatus(S3AfileSystem.java:2160)
    at org.apache.hadoop.fs.fileSystem.isDirectory(fileSystem.java:1700)
    at org.apache.hadoop.fs.s3a.S3AfileSystem.isDirectory(S3AfileSystem.java:3044)
    at org.apache.spark.sql.execution.streaming.fileStreamSink$.hasMetadata(fileStreamSink.scala:47)
    at org.apache.spark.sql.execution.datasources.Datasource.resolveRelation(Datasource.scala:377)
    at org.apache.spark.sql.DataFrameReader.loadV1source(DataFrameReader.scala:325)
    at org.apache.spark.sql.DataFrameReader.$anonfun$load$3(DataFrameReader.scala:307)
    at scala.option.getorElse(Option.scala:189)
    at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:307)
    at org.apache.spark.sql.DataFrameReader.csv(DataFrameReader.scala:795)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at java.base/jdk.internal.reflect.DelegaTingMethodAccessorImpl.invoke(DelegaTingMethodAccessorImpl.java:43)
    at java.base/java.lang.reflect.Method.invoke(Method.java:566)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionENGIne.invoke(ReflectionENGIne.java:357)
    at py4j.Gateway.invoke(Gateway.java:282)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:238)
    at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: ForbIDden (service: Amazon S3; Status Code: 403; Error Code: 403 ForbIDden; request ID: ****************; S3 Extended request ID: ****************),S3 Extended request ID: ****************
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.handleErrorResponse(AmazonhttpClIEnt.java:1712)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.executeOnerequest(AmazonhttpClIEnt.java:1367)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.executeHelper(AmazonhttpClIEnt.java:1113)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.doExecute(AmazonhttpClIEnt.java:770)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.executeWithTimer(AmazonhttpClIEnt.java:744)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.execute(AmazonhttpClIEnt.java:726)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutor.access$500(AmazonhttpClIEnt.java:686)
    at com.amazonaws.http.AmazonhttpClIEnt$requestExecutionBuilderImpl.execute(AmazonhttpClIEnt.java:668)
    at com.amazonaws.http.AmazonhttpClIEnt.execute(AmazonhttpClIEnt.java:532)
    at com.amazonaws.http.AmazonhttpClIEnt.execute(AmazonhttpClIEnt.java:512)
    at com.amazonaws.services.s3.AmazonS3ClIEnt.invoke(AmazonS3ClIEnt.java:4920)
    at com.amazonaws.services.s3.AmazonS3ClIEnt.invoke(AmazonS3ClIEnt.java:4866)
    at com.amazonaws.services.s3.AmazonS3ClIEnt.getobjectMetadata(AmazonS3ClIEnt.java:1320)
    at org.apache.hadoop.fs.s3a.S3AfileSystem.lambda$getobjectMetadata$4(S3AfileSystem.java:1307)
    at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:322)
    at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:285)
    at org.apache.hadoop.fs.s3a.S3AfileSystem.getobjectMetadata(S3AfileSystem.java:1304)
    at org.apache.hadoop.fs.s3a.S3AfileSystem.s3GetfileStatus(S3AfileSystem.java:2264)
    ... 22 more

我已确认我使用的 IAM 凭证有效并且有权访问 S3 上的目标存储桶。对可能出什么问题有什么想法吗?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)

@H_944_35@

大佬总结

以上是大佬教程为你收集整理的从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId全部内容,希望文章能够帮你解决从 PySpark 中的 AWS S3 读取时 InvalidAccessKeyId所遇到的程序开发问题。

如果觉得大佬教程网站内容还不错,欢迎将大佬教程推荐给程序员好友。

本图文内容来源于网友网络收集整理提供,作为学习参考使用,版权属于原作者。
如您有任何意见或建议可联系处理。小编QQ:384754419,请注明来意。