Skip to content

Commit

Permalink
Merge pull request #130 from threedworld-mit/req_freeze
Browse files Browse the repository at this point in the history
Req freeze
  • Loading branch information
alters-mit authored Feb 17, 2021
2 parents 3c6b84d + 713ef37 commit 61d8afd
Show file tree
Hide file tree
Showing 12 changed files with 113 additions and 93 deletions.
33 changes: 33 additions & 0 deletions Documentation/Changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,39 @@

To upgrade from TDW v1.7 to v1.8, read [this guide](Documentation/upgrade_guides/v1.7_to_v1.8).

## v1.8.1

### Command API

#### New Commands

| Command | Description |
| -------------------- | ------------------------------------------------------------ |
| `set_socket_timeout` | Set the timeout duration for the socket used to communicate with the controller. |

### `tdw` module

#### `Controller`

- **Fixed: The connection to the build will occasionally fail, causing the controller to hang indefinitely.** Now, the build will close its socket, open a new socket, and alert the controller that it should re-send the previous message. This won't advance the simulation in any way but you might notice a few-second hiccup between messages.

### Benchmarking

- Increased the default number of trials from `benchmarker.py` from 10,000 to 50,000
- Fixed: `build_simulator.py` doesn't terminate automatically.
- Fixed: `controller_simulator.py` doesn't work.
- Removed: `req_test_controller.py` because ReqTest isn't supported in TDW anymore.
- Removed: `req_test_builder.py`

### Documentation

#### Modified Documentation

| Document | Modification |
| --------------- | ------------------------------------------------------------ |
| `unity_loop.md` | Removed test results that involve ReqTest because they aren't actually that meaningful. |
| `debug_tdw.md` | Added some information about what to do if the network connection hangs. |

## v1.8.0

### New Features
Expand Down
20 changes: 20 additions & 0 deletions Documentation/api/command_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,7 @@
| [`set_screen_size`](#set_screen_size) | Set the screen size. Any images the build creates will also be this size. |
| [`set_shadow_strength`](#set_shadow_strength) | Set the shadow strength of all lights in the scene. This only works if you already sent load_scene or add_scene. |
| [`set_sleep_threshold`](#set_sleep_threshold) | Set the global Rigidbody "sleep threshold", the mass-normalized energy threshold below which objects start going to sleep. A "sleeping" object is completely still until moved again by a force (object impact, force command, etc.) |
| [`set_socket_timeout`](#set_socket_timeout) | Set the timeout duration for the socket used to communicate with the controller. Occasionally, the build's socket will stop receiving messages from the controller. This is an inevitable consequence of how synchronous receive-response sockets work. When this happens, it will wait until the socket times out, close the socket, and alert the controller that it needs to re-send its message. The timeout duration shouldn't be less than the time required to send/receive commands, or the build will never receive anything! You should only send this command if it takes longer than the default timeout to send/receive commands. |
| [`set_target_framerate`](#set_target_framerate) | Set the target render framerate of the build. For more information: <ulink url="https://docs.unity3d.com/ScriptReference/Application-targetFrameRate.html">https://docs.unity3d.com/ScriptReference/Application-targetFrameRate.html</ulink> |
| [`set_time_step`](#set_time_step) | Set Time.fixedDeltaTime (Unity's physics step, as opposed to render time step). NOTE: Doubling the time_step is NOT equivalent to advancing two physics steps. For more information, see: <ulink url="https://docs.unity3d.com/Manual/TimeFrameManagement.html">https://docs.unity3d.com/Manual/TimeFrameManagement.html</ulink> |
| [`step_physics`](#step_physics) | Step through the physics without triggering new avatar output, or new commands. |
Expand Down Expand Up @@ -909,6 +910,25 @@ Set the global Rigidbody "sleep threshold", the mass-normalized energy threshold

***

## **`set_socket_timeout`**

Set the timeout duration for the socket used to communicate with the controller. Occasionally, the build's socket will stop receiving messages from the controller. This is an inevitable consequence of how synchronous receive-response sockets work. When this happens, it will wait until the socket times out, close the socket, and alert the controller that it needs to re-send its message. The timeout duration shouldn't be less than the time required to send/receive commands, or the build will never receive anything! You should only send this command if it takes longer than the default timeout to send/receive commands.


```python
{"$type": "set_socket_timeout"}
```

```python
{"$type": "set_socket_timeout", "timeout": 5}
```

| Parameter | Type | Description | Default |
| --- | --- | --- | --- |
| `"timeout"` | int | The socket will timeout after this many seconds. The default value listed here is the default value for the build. This must be an integer. | 5 |

***

## **`set_target_framerate`**

Set the target render framerate of the build. For more information: <ulink url="https://docs.unity3d.com/ScriptReference/Application-targetFrameRate.html">https://docs.unity3d.com/ScriptReference/Application-targetFrameRate.html</ulink>
Expand Down
49 changes: 6 additions & 43 deletions Documentation/benchmark/unity_loop.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,23 +14,14 @@ The main loop of Unity Engine in the build _innately slows down the build_. This

By using a minimal controller, a compiled build, and the `send_junk` command, we can compare the build's performance to `build_simulator.py`.

### ReqTest

ReqTest is a minimal C# application built in Unity. It has the same network code and pattern as the build and `build_simulator.py`. Unlike the build, ReqTest doesn't wait for Unity's `Update()` callback to send and receive messages.

### Implications

- `build_simulator.py` vs. ReqTest informs us of the innate slowdown of a Unity application and/or NetMQ (the C# implementation of ZMQ).
- ReqTest vs. the build informs us of the innate slowdown caused by the Unity `Update()` callback.

### Results

| Output data size (bytes) | `build_simulator.py` FPS | ReqTest FPS | Compiled build FPS |
| ---------------- | --------------------------- | ----------------------- | ----------------------- |
| 1 | 3171 | 1034 | 934 |
| 1000 | 3511 | 1124 | 917 |
| 10000 | 2346 | 505 | 515 |
| 700000 | 123 | 252 | 153 |
| Output data size (bytes) | `build_simulator.py` FPS |
| ---------------- | --------------------------- |
| 1 | 3171 |
| 1000 | 3511 |
| 10000 | 2346 |
| 700000 | 123 |

### How to run this test

Expand All @@ -46,31 +37,3 @@ cd <root>/Python/benchmarking
python3 controller_simulator.py
```

##### With ReqTest

```bash
cd <root>/Python/benchmarking
python3 req_test_creator.py
```

```bash
cd <root>/Python/benchmarking
python3 req_test_controller.py
```

```bash
cd <root>/dist/TDW_vTEST/TDW_vTEST_Windows/UtilityApplications/ReqTest
./ReqTest --length <length>
```

##### With a compiled build

```bash
cd <root>/Python/benchmarking
python3 build_simulator.py
```

```bash
<run build>
```

12 changes: 12 additions & 0 deletions Documentation/misc_frontend/debug_tdw.md
Original file line number Diff line number Diff line change
Expand Up @@ -117,6 +117,18 @@ One or more of the commands you sent isn't formatted correctly. This message sho

There is another process (probably another controller) currently running and using the same port that your controller is trying to use. Kill that process in order to run your controller.

### The controller pauses while sending a message to the build

Occasionally, the synchronous send-receive socket pattern between the controller and the build will fail. This seems to be an inevitable on Linux servers; we've never seen this problem on Windows.

Should the connection fail, the build will automatically reconnect after a certain timeout duration and ask the controller to re-send the last message. This won't advance the simulation's physics or rendering state but there will be a several-second pause between messages.

To adjust the timeout duration, send [`set_socket_timeout`](../api/command_api#set_socket_timeout).

### The controller hangs indefinitely

This is usually because another process (such as another instance of a TDW build) that is using the same port and received a message intended for the TDW build. Either stop all TDW build processes before launching TDW, or launch TDW on a unique port (see [Getting Started](../getting_started.md)).

## Common OS X Problems

See [OS X documentation](osx.md).
2 changes: 1 addition & 1 deletion Python/benchmarking/benchmarker.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,7 @@ def run(self, boxes=False, hi_res=False, passes="none", png=False, transforms=Fa
# Run the trials.
num_trials = 0
t0 = time()
while num_trials < 10000:
while num_trials < 50000:
self.communicate([])
num_trials += 1

Expand Down
7 changes: 5 additions & 2 deletions Python/benchmarking/build_simulator.py
Original file line number Diff line number Diff line change
Expand Up @@ -25,8 +25,11 @@
sock.connect("tcp://localhost:1071")

key = 1
while True:
done = False
while not done:
sock.send_multipart([outputs[key]])
resp = loads(sock.recv_multipart()[0])[0]
if resp["$type"] == "send_junk":
if "stop" in resp:
done = True
elif resp["$type"] == "send_junk":
key = resp["length"]
16 changes: 11 additions & 5 deletions Python/benchmarking/controller_simulator.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,20 @@
"""

if __name__ == "__main__":
num_trials = 1000
num_trials = 50000
sizes = [1, 1000, 10000, 700000]
c = Controller()
c = Controller(launch_build=False, check_version=False)
results = dict()
for size in sizes:
c.communicate({"$type": "send_junk", "length": size, "frequency": "always"})
t0 = time()
for i in range(num_trials):
if i % 200 == 0:
print('num_trials=%d' % i)
c.communicate({"$type": "do_nothing"})
fps = (num_trials / (time() - t0))

print(round(fps))
results[size] = (num_trials / (time() - t0))
c.communicate({"stop": True})
c.socket.close()
print("")
for size in results:
print(size, round(results[size]))
22 changes: 0 additions & 22 deletions Python/benchmarking/req_test_controller.py

This file was deleted.

11 changes: 0 additions & 11 deletions Python/benchmarking/req_test_creator.py

This file was deleted.

2 changes: 1 addition & 1 deletion Python/setup.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
from setuptools import setup, find_packages
from pathlib import Path

__version__ = "1.8.0.0"
__version__ = "1.8.1.0"
readme_path = Path('../README.md')
if readme_path.exists():
long_description = readme_path.read_text(encoding='utf-8')
Expand Down
30 changes: 23 additions & 7 deletions Python/tdw/controller.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
from typing import List, Union, Optional, Tuple, Dict
from tdw.librarian import ModelLibrarian, SceneLibrarian, MaterialLibrarian, HDRISkyboxLibrarian, \
HumanoidAnimationLibrarian, HumanoidLibrarian, HumanoidAnimationRecord, RobotLibrarian
from tdw.output_data import Version
from tdw.output_data import OutputData, Version
from tdw.release.build import Build
from tdw.release.pypi import PyPi
from tdw.version import __version__
Expand Down Expand Up @@ -70,12 +70,28 @@ def communicate(self, commands: Union[dict, List[dict]]) -> list:
:return The output data from the build.
"""

if not isinstance(commands, list):
commands = [commands]

self.socket.send_multipart([json.dumps(commands).encode('utf-8')])

return self.socket.recv_multipart()
if isinstance(commands, list):
msg = [json.dumps(commands).encode('utf-8')]
else:
msg = [json.dumps([commands]).encode('utf-8')]

# Send the commands.
self.socket.send_multipart(msg)
# Receive output data.
resp = self.socket.recv_multipart()

# Occasionally, the build's socket will stop receiving messages.
# If that happens, it will close the socket, create a new socket, and send a dummy output data object.
# The ID of the dummy object is "ftre" (FailedToReceive).
# If the controller receives the dummy object, it should re-send its commands.
# The dummy object is always in an array: [ftre, 0]
# This way, the controller can easily differentiate it from a response that just has the frame count.
while len(resp) > 1 and OutputData.get_data_type_id(resp[0]) == "ftre":
self.socket.send_multipart(msg)
resp = self.socket.recv_multipart()

# Return the output data from the build.
return resp

def start(self, scene="ProcGenScene") -> None:
"""
Expand Down
2 changes: 1 addition & 1 deletion Python/tdw/version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "1.8.0"
__version__ = "1.8.1"

0 comments on commit 61d8afd

Please sign in to comment.