Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

RamSinha · 2019-07-01T07:03:34Z

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

Configurations
Environments
Operations
See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

RamSinha · 2019-07-02T09:59:53Z

so i just created an user in hadoop side and one user with same name is created in ranger. But the policies enforced in ranger aren't being applied.
I am kinda of confused about how the proxy-user is recognized by ranger. whats the logical mapping between two users.

yaooqinn · 2019-07-02T10:09:45Z

you can either user proxy user or login user, if you specify --proxy-user UserA , the runtime sparkUser will be UserA, otherwise it will use the user part of spark.yarn.principal configuration. If you are using other authentication method, just pay attention to the value of SparkContext.sparkUser

RamSinha · 2019-07-02T11:00:16Z

Thanks for reply.
In our case we are running spark on AWS EMR, spark.yarn.principal configuration is not set anywhere.

scala> spark.conf.get("spark.yarn.principal") java.util.NoSuchElementException: spark.yarn.principal at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1992) at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74) ... 54 elided

But the setting is not able to enforce ranger policies for this user.
We have enable all the xml settings as mentioned in the blog.
and using below command to start shell.
spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://127.0.0.1:10000/default" --driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dlog4j.configuration=file:/home/hadoop/optimus-log4j.properties" --jars /home/hadoop/spark-authorizer-2.2.0.jar --proxy-user ram

Any pointer would be really helpful.

yaooqinn · 2019-07-02T11:12:17Z

https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html please follow this doc to setup

RamSinha · 2019-07-02T12:00:03Z

Thanks for the pointer, BTW we are already using above installation guidelines.
Just one update:
We are on below versions.
spark 2.4 hive 2.3.4 ranger 0.7.1

Would it cause any problems?

yaooqinn · 2019-07-03T01:59:41Z

for spark 2.4 #14
for hive 2.3.4 ensure the built in hive metastore client of spark has no incompatibility issues with it
for ranger 0.7.1 ，you may need to fix incompatibility issues in the ranger-hive-plugin module

RamSinha · 2019-07-03T09:21:27Z

Thanks for the pointers, i am looking the ranger-hive-plugin module now.
One question: In the setup document mentioned above there is no mention of ranger-hive-plugin installation. Does that mean- we don't need to install ranger-hive-plugin-module separately?

yaooqinn · 2019-07-03T09:40:17Z

Just follow the doc's section 2 Applying Plugin to Apache Spark

RamSinha · 2019-07-03T10:31:06Z

Still no luck.
I build new EMR for spark2.3 now and followed all the instruction. Though i don't see any error but still the policies aren't being applied. Also on the ranger UI under audit section i don't see anything.

Also: When i tried installing ranger-hive-plugin on different EMR using below link
Ranger+Installation+Guide
Policies are being enforced on hive queries (started from hive cli).

RamSinha · 2019-07-05T16:19:09Z

When i try to build locally i get below warning.
Authorizable.scala:57: fruitless type test: a value of type org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener cannot also be a org.apache.spark.sql.hive.HiveExternalCatalog [WARNING] case _: HiveExternalCatalog =>

Also when i tried to run the same code from the spark-shell i get below error.
error: not found: type HiveExternalCatalog method.invoke(externalCatalog).asInstanceOf[HiveExternalCatalog]

Seems like from spark-shell its not able to access package private class HiveExternalCatalog

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

RamSinha commented Jul 1, 2019

RamSinha commented Jul 2, 2019

yaooqinn commented Jul 2, 2019

RamSinha commented Jul 2, 2019

yaooqinn commented Jul 2, 2019

RamSinha commented Jul 2, 2019

yaooqinn commented Jul 3, 2019

RamSinha commented Jul 3, 2019

yaooqinn commented Jul 3, 2019

RamSinha commented Jul 3, 2019 •

edited

Loading

RamSinha commented Jul 5, 2019

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

Comments

RamSinha commented Jul 1, 2019

RamSinha commented Jul 2, 2019

yaooqinn commented Jul 2, 2019

RamSinha commented Jul 2, 2019

yaooqinn commented Jul 2, 2019

RamSinha commented Jul 2, 2019

yaooqinn commented Jul 3, 2019

RamSinha commented Jul 3, 2019

yaooqinn commented Jul 3, 2019

RamSinha commented Jul 3, 2019 • edited Loading

RamSinha commented Jul 5, 2019

RamSinha commented Jul 3, 2019 •

edited

Loading