Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IB+TF+DOCKER #12

Open
fychao opened this issue Feb 23, 2018 · 4 comments
Open

IB+TF+DOCKER #12

fychao opened this issue Feb 23, 2018 · 4 comments

Comments

@fychao
Copy link
Owner

fychao commented Feb 23, 2018

root@7b8cb7323ac8:/# ib_write_bw -a -F 172.16.130.2 -d mlx5_1 --report_gbits
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x11 QPN 0x0105 PSN 0xf34dc6 RKey 0x006b01 VAddr 0x007ff701e41000
 remote address: LID 0x12 QPN 0x0106 PSN 0x9fc3f6 RKey 0x004de5 VAddr 0x007f43ce6e9000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          5000           0.054671            0.054299            3.393695
 4          5000             0.16               0.16               4.848040
 8          5000             0.32               0.31               4.903680
 16         5000             0.64               0.62               4.861019
 32         5000             1.28               1.25               4.899429
 64         5000             2.56               2.52               4.925522
 128        5000             5.12               4.97               4.851018
 256        5000             10.28              9.97               4.867019
 512        5000             23.11              22.80              5.565797
 1024       5000             50.07              46.70              5.700940
 2048       5000             76.47              74.62              4.554210
 4096       5000             86.03              85.82              2.619163
 8192       5000             93.48              93.46              1.426129
 16384      5000             94.40              94.36              0.719899
 32768      5000             95.17              95.17              0.363046
 65536      5000             95.25              95.25              0.181671
 131072     5000             95.91              95.91              0.091469
 262144     5000             95.68              95.67              0.045619
 524288     5000             95.70              95.70              0.022818
 1048576    5000             95.70              95.02              0.011328
 2097152    5000             95.65              95.65              0.005701
 4194304    5000             95.67              95.66              0.002851
 8388608    5000             95.30              95.30              0.001420
---------------------------------------------------------------------------------------
root@7b8cb7323ac8:/#

@fychao
Copy link
Owner Author

fychao commented Feb 27, 2018

infiniband test

role command
Server ib_write_bw -a -d mlx5_1 &
Client ib_write_bw -a -F $server_IP -d mlx5_1 --report_gbits

@fychao
Copy link
Owner Author

fychao commented Feb 27, 2018

root@83d01dcb874c:/#
************************************
* Waiting for client to connect... *
************************************
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x11 QPN 0x0108 PSN 0xa7f05e RKey 0x0056ec VAddr 0x007fe3411c0000
 remote address: LID 0x12 QPN 0x0109 PSN 0xa7f05e RKey 0x008b5e VAddr 0x007f37d094e000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[MB/sec]    BW average[MB/sec]   MsgRate[Mpps]
 8388608    5000             95.31              95.31              0.001420
---------------------------------------------------------------------------------------

[1]+  Done                    ib_write_bw -a -d mlx5_1

root@e280959c3867:/# ib_write_bw -a -F 172.16.129.2 -d mlx5_1 --report_gbits
---------------------------------------------------------------------------------------
                    RDMA_Write BW Test
 Dual-port       : OFF          Device         : mlx5_1
 Number of qps   : 1            Transport type : IB
 Connection type : RC           Using SRQ      : OFF
 TX depth        : 128
 CQ Moderation   : 100
 Mtu             : 4096[B]
 Link type       : IB
 Max inline data : 0[B]
 rdma_cm QPs     : OFF
 Data ex. method : Ethernet
---------------------------------------------------------------------------------------
 local address: LID 0x12 QPN 0x0109 PSN 0xa7f05e RKey 0x008b5e VAddr 0x007f37d094e000
 remote address: LID 0x11 QPN 0x0108 PSN 0xa7f05e RKey 0x0056ec VAddr 0x007fe3411c0000
---------------------------------------------------------------------------------------
 #bytes     #iterations    BW peak[Gb/sec]    BW average[Gb/sec]   MsgRate[Mpps]
 2          5000           0.056738            0.055943            3.496431
 4          5000             0.17               0.16               5.140075
 8          5000             0.34               0.33               5.176208
 16         5000             0.69               0.66               5.150361
 32         5000             1.47               1.45               5.673912
 64         5000             3.54               3.44               6.717482
 128        5000             6.92               6.25               6.107578
 256        5000             14.19              14.16              6.915320
 512        5000             14.46              14.20              3.466014
 1024       5000             41.03              39.79              4.857483
 2048       5000             71.03              70.80              4.321457
 4096       5000             82.61              81.94              2.500763
 8192       5000             92.40              92.09              1.405127
 16384      5000             93.99              93.82              0.715753
 32768      5000             94.59              94.58              0.360785
 65536      5000             94.55              94.55              0.180338
 131072     5000             94.72              94.70              0.090315
 262144     5000             95.11              95.11              0.045353
 524288     5000             95.06              95.06              0.022663
 1048576    5000             95.15              95.14              0.011341
 2097152    5000             95.24              95.19              0.005674
 4194304    5000             95.00              95.00              0.002831
 8388608    5000             95.31              95.31              0.001420
---------------------------------------------------------------------------------------

@fychao
Copy link
Owner Author

fychao commented Feb 27, 2018

root@83d01dcb874c:/# ipython
Python 2.7.12 (default, Dec  4 2017, 14:50:18)
Type "copyright", "credits" or "license" for more information.

IPython 5.5.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.

In [1]: import tensorflow as tf

In [2]: tf.__version__
Out[2]: '1.4.0'

In [3]:

@fychao
Copy link
Owner Author

fychao commented Feb 27, 2018

# Run the following commands on host_0 (172.16.129.2):
python tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=8 \
--batch_size=64 --model=inception3 --variable_update=distributed_replicated \
--tf_random_seed=6813 --mkl=true --num_epochs=1 --all_reduce_spec=nccl/xring \
--job_name=worker --ps_hosts=172.16.129.2:50000,172.16.130.2:50000 \
--worker_hosts=172.16.129.2:50001,172.16.130.2:50001 --task_index=0

CUDA_VISIBLE_DEVICES=7; python tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=8 \
--batch_size=64 --model=inception3 --variable_update=distributed_replicated \
--tf_random_seed=6813 --mkl=true --num_epochs=1 --all_reduce_spec=nccl/xring \
--job_name=ps --ps_hosts=172.16.129.2:50000,172.16.130.2:50000 \
--worker_hosts=172.16.129.2:50001,172.16.130.2:50001 --task_index=0

# Run the following commands on host_1 (172.16.130.2):
python tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=8 \
--batch_size=64 --model=inception3 --variable_update=distributed_replicated \
--tf_random_seed=6813 --mkl=true --num_epochs=1 --all_reduce_spec=nccl/xring \
--job_name=worker --ps_hosts=172.16.129.2:50000,172.16.130.2:50000 \
--worker_hosts=172.16.129.2:50001,172.16.130.2:50001 --task_index=1

CUDA_VISIBLE_DEVICES=7; python tf_cnn_benchmarks.py --local_parameter_device=gpu --num_gpus=8 \
--batch_size=64 --model=inception3 --variable_update=distributed_replicated \
--tf_random_seed=6813 --mkl=true --num_epochs=1 --all_reduce_spec=nccl/xring \
--job_name=ps --ps_hosts=172.16.129.2:50000,172.16.130.2:50000 \
--worker_hosts=172.16.129.2:50001,172.16.130.2:50001 --task_index=1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant