Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Please suggest: Where do we need to create --proxy-user? is it required to be created in Ranger UI only or under hadoop as well? #26

Open
RamSinha opened this issue Jul 1, 2019 · 10 comments

Comments

@RamSinha
Copy link

RamSinha commented Jul 1, 2019

Describe the bug
A clear and concise description of what the bug is.

To Reproduce
Steps to reproduce the behavior:

  1. Configurations
  2. Environments
  3. Operations
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Additional context
Add any other context about the problem here.

@RamSinha
Copy link
Author

RamSinha commented Jul 2, 2019

so i just created an user in hadoop side and one user with same name is created in ranger. But the policies enforced in ranger aren't being applied.
I am kinda of confused about how the proxy-user is recognized by ranger. whats the logical mapping between two users.

@yaooqinn
Copy link
Owner

yaooqinn commented Jul 2, 2019

you can either user proxy user or login user, if you specify --proxy-user UserA , the runtime sparkUser will be UserA, otherwise it will use the user part of spark.yarn.principal configuration. If you are using other authentication method, just pay attention to the value of SparkContext.sparkUser

@RamSinha
Copy link
Author

RamSinha commented Jul 2, 2019

Thanks for reply.
In our case we are running spark on AWS EMR, spark.yarn.principal configuration is not set anywhere.

scala> spark.conf.get("spark.yarn.principal") java.util.NoSuchElementException: spark.yarn.principal at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at org.apache.spark.sql.internal.SQLConf$$anonfun$getConfString$2.apply(SQLConf.scala:1992) at scala.Option.getOrElse(Option.scala:121) at org.apache.spark.sql.internal.SQLConf.getConfString(SQLConf.scala:1992) at org.apache.spark.sql.RuntimeConfig.get(RuntimeConfig.scala:74) ... 54 elided

But the setting is not able to enforce ranger policies for this user.
We have enable all the xml settings as mentioned in the blog.
and using below command to start shell.
spark-shell --conf spark.sql.hive.hiveserver2.jdbc.url="jdbc:hive2://127.0.0.1:10000/default" --driver-java-options="-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=5005 -Dlog4j.configuration=file:/home/hadoop/optimus-log4j.properties" --jars /home/hadoop/spark-authorizer-2.2.0.jar --proxy-user ram

Any pointer would be really helpful.

@yaooqinn
Copy link
Owner

yaooqinn commented Jul 2, 2019

https://yaooqinn.github.io/spark-authorizer/docs/install_plugin.html please follow this doc to setup

@RamSinha
Copy link
Author

RamSinha commented Jul 2, 2019

Thanks for the pointer, BTW we are already using above installation guidelines.
Just one update:
We are on below versions.
spark 2.4 hive 2.3.4 ranger 0.7.1

Would it cause any problems?

@yaooqinn
Copy link
Owner

yaooqinn commented Jul 3, 2019

for spark 2.4 #14
for hive 2.3.4 ensure the built in hive metastore client of spark has no incompatibility issues with it
for ranger 0.7.1 ,you may need to fix incompatibility issues in the ranger-hive-plugin module

@RamSinha
Copy link
Author

RamSinha commented Jul 3, 2019

Thanks for the pointers, i am looking the ranger-hive-plugin module now.
One question: In the setup document mentioned above there is no mention of ranger-hive-plugin installation. Does that mean- we don't need to install ranger-hive-plugin-module separately?

@yaooqinn
Copy link
Owner

yaooqinn commented Jul 3, 2019

Just follow the doc's section 2 Applying Plugin to Apache Spark

@RamSinha
Copy link
Author

RamSinha commented Jul 3, 2019

Still no luck.
I build new EMR for spark2.3 now and followed all the instruction. Though i don't see any error but still the policies aren't being applied. Also on the ranger UI under audit section i don't see anything.

Also: When i tried installing ranger-hive-plugin on different EMR using below link
Ranger+Installation+Guide
Policies are being enforced on hive queries (started from hive cli).

@RamSinha
Copy link
Author

RamSinha commented Jul 5, 2019

When i try to build locally i get below warning.
Authorizable.scala:57: fruitless type test: a value of type org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener cannot also be a org.apache.spark.sql.hive.HiveExternalCatalog [WARNING] case _: HiveExternalCatalog =>

Also when i tried to run the same code from the spark-shell i get below error.
error: not found: type HiveExternalCatalog method.invoke(externalCatalog).asInstanceOf[HiveExternalCatalog]

Seems like from spark-shell its not able to access package private class HiveExternalCatalog

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants