Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spark cannot read file in hadoop in overlay network(swarm mode) #58

Open
veerapatyok opened this issue Sep 12, 2018 · 2 comments
Open

Comments

@veerapatyok
Copy link

veerapatyok commented Sep 12, 2018

i have
namenode(10.0.5.x) in machine-1
spark master(10.0.5.x) in machine-1
network-endpoint(10.0.5.3) in machine-2
spark worker(10.0.5.x) in machine-2
datanode(10.0.5.x) in machine-2

my code run in spark master(use pyspark)
text = sc.textFile("hdfs://namenode:9000/path/file")
text.collect()

I create swarm with spark(gettyimage) with your hadoop and I cannot read data in hadoop, I read log in worker and it say Failed to connect to /10.0.5.3:50010 for block BP-1439091006-10.0.5.76-1536712157279:blk_1073741825_1001, add to deadNodes and continue.

10.0.5.3 is network endpoint
why namenode connect with endpoint ip?

but I can access data in namenode and why ? it endpoint ip.

docker-compose-hadoop.yml

version: '3'
services:
  namenode:
    image: bde2020/hadoop-namenode:2.0.0-hadoop2.7.4-java8
    ports:
      - 50070:50070
    volumes:
      - ./namenode:/hadoop/dfs/name
      - ./hadoop-data:/hadoop-data
    environment:
      - CLUSTER_NAME=test
    env_file:
      - ./hadoop.env
    deploy:
      mode: replicated
      replicas: 1
      placement:
        constraints: [node.role == manager]
      restart_policy:
        condition: on-failure

  datanode:
    image: bde2020/hadoop-datanode:2.0.0-hadoop2.7.4-java8
    volumes:
      - ./datanode:/hadoop/dfs/data
    env_file:
      - ./hadoop.env
    environment:
      SERVICE_PRECONDITION: "namenode:50070"
    deploy:
      mode: global
      placement:
        constraints: [node.role == worker]
      restart_policy:
        condition: on-failure

networks:
    default:
        external:
            name: hadoop-spark-swarm-network

docker-compose-spark.yml

version: '3'
services:
  master:
    image: gettyimages/spark
    command: bin/spark-class org.apache.spark.deploy.master.Master
    hostname: master
    environment:
      MASTER: spark://master:7077
      SPARK_CONF_DIR: /conf
      SPARK_PUBLIC_DNS: localhost
      SPARK_MASTER_HOST: 0.0.0.0
    env_file:
      - ./hadoop.env
    ports:
      - 4040:4040
      - 6066:6066
      - 7077:7077
      - 8001:8080
      - 8888:8888
    volumes:
      - ./data:/tmp/data
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == manager]

  worker:
    image: gettyimages/spark
    command: bin/spark-class org.apache.spark.deploy.worker.Worker spark://master:7077
    hostname: worker
    environment:
      SPARK_CONF_DIR: /conf
      SPARK_WORKER_CORES: 4
      SPARK_WORKER_MEMORY: 6g
      SPARK_WORKER_PORT: 8881
      SPARK_WORKER_WEBUI_PORT: 8081
      SPARK_PUBLIC_DNS: localhost
    env_file:
      - ./hadoop.env
    depends_on:
      - master
    links:
      - master
    ports:
      - 8081:8081
    volumes:
      - ./data:/tmp/data
    deploy:
      replicas: 1
      restart_policy:
        condition: on-failure
      placement:
        constraints: [node.role == worker]

networks:
    default:
        external:
            name: hadoop-spark-swarm-network

I create my own network with docker network create -d overlay hadoop-spark-swarm-network

@SuperElectron
Copy link

active?

@hans153
Copy link

hans153 commented May 20, 2020

same problem. I've decided to deploy hdfs cluster without docker.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants