Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation over time with ETS backend #71

Open
ruslandoga opened this issue Nov 22, 2023 · 11 comments
Open

Performance degradation over time with ETS backend #71

ruslandoga opened this issue Nov 22, 2023 · 11 comments
Assignees
Labels

Comments

@ruslandoga
Copy link
Contributor

ruslandoga commented Nov 22, 2023

Describe the bug
I've been benchmarking hot paths in https://github.com/plausible/analytics and noticed that ExHammer check here becomes slower over time, from 11us in the beginning of k6 benchmark (it hits the same site every time, so :hammer_ets_buckets has a single key throughout the benchmark) to over 100ms in the end. And in some requests (when user-agents are cached) that check_rate call becomes the largest slice in the request flamechart

Screenshot 2023-11-22 at 19 28 43

** Provide the following details

  • Elixir version (elixir -v): 1.15.7
  • Erlang version (erl -v): 26.1.2
  • Operating system: mac sonoma
@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 22, 2023

I've also tried to profile ETS backend and noticed that the workers Hammer.Backend.ETS don't actually perform any work, so the poolboy seems like pure overhead. When profiling poolboy process during k6 run, I'm getting the following from eprof

****** Process <0.462.0>    -- 100.00 % of profiled time ***
FUNCTION                             CALLS        %    TIME  [uS / CALLS]
--------                             -----  -------    ----  [----------]
gen_server:try_handle_info/3          1596     0.12     465  [      0.29]
queue:filter/2                        1590     0.16     657  [      0.41]
queue:filter_f/2                      3322     0.17     664  [      0.20]
lists:split/2                         3187     0.17     692  [      0.22]
poolboy:'-handle_info/2-fun-0-'/2     6425     0.20     800  [      0.12]
poolboy:handle_info/2                 1596     0.23     934  [      0.59]
lists:reverse/2                       6374     0.27    1065  [      0.17]
lists:split/3                        11967     0.32    1291  [      0.11]
queue:filter_r/2                      6283     0.45    1806  [      0.29]
queue:in/2                           16622     0.58    2323  [      0.14]
gen_server:reply/2                   44368     1.17    4687  [      0.11]
gen_server:try_dispatch/3            45958     1.21    4841  [      0.11]
queue:out/1                          44368     1.46    5834  [      0.13]
gen_server:handle_common_reply/8     62580     1.60    6420  [      0.10]
gen_server:try_handle_call/4         44372     1.86    7459  [      0.17]
ets:match/2                           1596     1.99    7965  [      4.99]
gen_server:try_handle_cast/3         44362     2.15    8615  [      0.19]
poolboy:handle_checkin/2             44368     2.82   11300  [      0.25]
ets:lookup/2                         44362     3.21   12871  [      0.29]
poolboy:handle_call/3                44372     3.26   13034  [      0.29]
ets:delete/2                         44368     3.50   14031  [      0.32]
gen_server:decode_msg/9              90330     3.60   14419  [      0.16]
gen_server:handle_msg/6              90330     4.27   17103  [      0.19]
poolboy:handle_cast/2                44362     4.70   18809  [      0.42]
ets:insert/2                         44368     5.95   23828  [      0.54]
erlang:monitor/2                     44372     7.51   30054  [      0.68]
erlang:demonitor/1                   44362    14.33   57358  [      1.29]
gen:reply/2                          44368    16.02   64132  [      1.45]
gen_server:loop/7                    90330    16.72   66924  [      0.74]
---------------------------------  -------  -------  ------  [----------]
Total:                             1016858  100.00%  400381  [      0.39]

Note that time spent in :ets calls is a small fraction of total time spent, which I assume is spent in :poolboy / :gen internals.

@ruslandoga
Copy link
Contributor Author

Would there be any interest in optimizing the ETS backend? I'd be happy to work on that. The first step would be (I think) to add benchmarks and then compare the performance with and without poolboy to check if my guesses are right.

@ruslandoga ruslandoga changed the title performance degradation over time performance degradation over time with ETS backend Nov 22, 2023
@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 22, 2023

The increasing wait time might point to a growing checkout queue in poolboy.
iex(1)> Supervisor.which_children Hammer.Supervisor
[{:hammer_backend_single_pool, #PID<0.462.0>, :worker, [:poolboy]}]

# :queue is empty ( {[], []})
iex(2)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>,
 [#PID<0.469.0>, #PID<0.468.0>, #PID<0.467.0>, #PID<0.464.0>], {[], []},
 #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
 
 # start benchmark
 iex(4)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>,
 [#PID<0.469.0>, #PID<0.468.0>, #PID<0.467.0>, #PID<0.464.0>], {[], []},
 #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
iex(5)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>, [#PID<0.464.0>, #PID<0.467.0>], {[], []},
 #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
iex(6)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>, [#PID<0.469.0>, #PID<0.464.0>], {[], []},
 #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
iex(7)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>, [],
 {[
    {{#PID<0.20229.0>,
      [:alias | #Reference<0.0.2589315.192931417.1724973064.35897>]},
     #Reference<0.192931417.1724907528.35896>,
     #Reference<0.192931417.1724907522.62923>},
    {{#PID<0.20236.0>,
      [:alias | #Reference<0.0.2590211.192931417.1724973060.51696>]},
     #Reference<0.192931417.1724907524.51695>,
     #Reference<0.192931417.1724907522.62922>}
  ],
  [
    {{#PID<0.20235.0>,
      [:alias | #Reference<0.0.2590083.192931417.1724973063.38309>]},
     #Reference<0.192931417.1724907527.38308>,
     #Reference<0.192931417.1724907522.62906>},
    {{#PID<0.20238.0>,
      [:alias | #Reference<0.0.2590467.192931417.1724973064.35881>]},
     #Reference<0.192931417.1724907528.35880>,
     #Reference<0.192931417.1724907522.62907>},
    {{#PID<0.20237.0>,
      [:alias | #Reference<0.0.2590339.192931417.1724973061.47663>]},
     #Reference<0.192931417.1724907525.47662>,
     #Reference<0.192931417.1724907522.62921>}
  ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
iex(8)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>, [],
 {[
    {{#PID<0.30693.0>,
      [:alias | #Reference<0.0.3928707.192931417.1724973061.85287>]},
     #Reference<0.192931417.1724907525.85286>,
     #Reference<0.192931417.1724907527.76715>},
    {{#PID<0.30691.0>,
      [:alias | #Reference<0.0.3928451.192931417.1724973057.98127>]},
     #Reference<0.192931417.1724907521.98126>,
     #Reference<0.192931417.1724907527.76712>},
    {{#PID<0.30692.0>,
      [:alias | #Reference<0.0.3928579.192931417.1724973062.86576>]},
     #Reference<0.192931417.1724907526.86575>,
     #Reference<0.192931417.1724907527.76711>},
    {{#PID<0.30690.0>,
      [:alias | #Reference<0.0.3928323.192931417.1724973061.85267>]},
     #Reference<0.192931417.1724907525.85266>,
     #Reference<0.192931417.1724907527.76708>},
    {{#PID<0.30664.0>,
      [:alias | #Reference<0.0.3924995.192931417.1724973063.76705>]},
     #Reference<0.192931417.1724907527.76704>,
     #Reference<0.192931417.1724907527.76707>},
    {{#PID<0.30688.0>,
      [:alias | #Reference<0.0.3928067.192931417.1724973060.89037>]},
     #Reference<0.192931417.1724907524.89036>,
     #Reference<0.192931417.1724907527.76706>},
    {{#PID<0.30687.0>,
      [:alias | #Reference<0.0.3927939.192931417.1724973061.85254>]},
     #Reference<0.192931417.1724907525.85253>,
     #Reference<0.192931417.1724907528.73877>},
    {{#PID<0.30689.0>,
      [:alias | #Reference<0.0.3928195.192931417.1724973062.86550>]},
     #Reference<0.192931417.1724907526.86549>,
     #Reference<0.192931417.1724907521.98095>},
    {{#PID<0.30686.0>,
      [:alias | #Reference<0.0.3927811.192931417.1724973061.85235>]},
     #Reference<0.192931417.1724907525.85234>,
     #Reference<0.192931417.1724907521.98094>},
    {{#PID<0.30685.0>,
      [:alias | #Reference<0.0.3927683.192931417.1724973062.86522>]},
     #Reference<0.192931417.1724907526.86521>,
     #Reference<0.192931417.1724907523.85084>}
  ],
  [
    {{#PID<0.30672.0>,
      [:alias | #Reference<0.0.3926019.192931417.1724973059.85063>]},
     #Reference<0.192931417.1724907523.85062>,
     #Reference<0.192931417.1724907523.85083>}
  ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
iex(9)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>, [],
 {[
    {{#PID<0.40217.0>,
      [:alias | #Reference<0.0.953491.192931417.1724973060.124990>]},
     #Reference<0.192931417.1724907524.124989>,
     #Reference<0.192931417.1724907524.125000>},
    {{#PID<0.40218.0>,
      [:alias | #Reference<0.0.953619.192931417.1724973060.124974>]},
     #Reference<0.192931417.1724907524.124973>,
     #Reference<0.192931417.1724907524.124997>},
    {{#PID<0.40216.0>,
      [:alias | #Reference<0.0.953363.192931417.1724973057.131528>]},
     #Reference<0.192931417.1724907521.131527>,
     #Reference<0.192931417.1724907524.124996>},
    {{#PID<0.40215.0>,
      [:alias | #Reference<0.0.953235.192931417.1724973059.119283>]},
     #Reference<0.192931417.1724907523.119282>,
     #Reference<0.192931417.1724907524.124995>},
    {{#PID<0.40202.0>,
      [:alias | #Reference<0.0.951571.192931417.1724973061.121546>]},
     #Reference<0.192931417.1724907525.121545>,
     #Reference<0.192931417.1724907524.124994>},
    {{#PID<0.40200.0>,
      [:alias | #Reference<0.0.951315.192931417.1724973061.121533>]},
     #Reference<0.192931417.1724907525.121532>,
     #Reference<0.192931417.1724907524.124993>},
    {{#PID<0.40204.0>,
      [:alias | #Reference<0.0.951827.192931417.1724973060.124951>]},
     #Reference<0.192931417.1724907524.124950>,
     #Reference<0.192931417.1724907524.124992>},
    {{#PID<0.40214.0>,
      [:alias | #Reference<0.0.953107.192931417.1724973062.120287>]},
     #Reference<0.192931417.1724907526.120286>,
     #Reference<0.192931417.1724907528.108633>},
    {{#PID<0.40213.0>,
      [:alias | #Reference<0.0.952979.192931417.1724973059.119251>]},
     #Reference<0.192931417.1724907523.119250>,
     #Reference<0.192931417.1724907528.108632>},
    {{#PID<0.40212.0>,
      [:alias | #Reference<0.0.952851.192931417.1724973057.131489>]},
     #Reference<0.192931417.1724907521.131488>,
     #Reference<0.192931417.1724907528.108631>},
    {{#PID<0.40211.0>,
      [:alias | #Reference<0.0.952723.192931417.1724973063.111480>]},
     #Reference<0.192931417.1724907527.111479>,
     #Reference<0.192931417.1724907528.108621>},
    {{#PID<0.40210.0>,
      [:alias | #Reference<0.0.952595.192931417.1724973064.108613>]},
     #Reference<0.192931417.1724907528.108612>,
     #Reference<0.192931417.1724907528.108614>},
    {{#PID<0.40209.0>,
      [:alias | #Reference<0.0.952467.192931417.1724973057.131474>]},
     #Reference<0.192931417.1724907521.131473>,
     #Reference<0.192931417.1724907521.131476>}
  ],
  [
    {{#PID<0.40193.0>,
      [:alias | #Reference<0.0.950419.192931417.1724973058.133850>]},
     #Reference<0.192931417.1724907522.133849>,
     #Reference<0.192931417.1724907521.131423>},
    {{#PID<0.40201.0>,
      [:alias | #Reference<0.0.951443.192931417.1724973064.108585>]},
     #Reference<0.192931417.1724907528.108584>,
     #Reference<0.192931417.1724907521.131444>},
    {{#PID<0.40203.0>,
      [:alias | #Reference<0.0.951699.192931417.1724973057.131443>]},
     #Reference<0.192931417.1724907521.131442>,
     #Reference<0.192931417.1724907521.131445>},
    {{#PID<0.40205.0>,
      [:alias | #Reference<0.0.951955.192931417.1724973063.111448>]},
     #Reference<0.192931417.1724907527.111447>,
     #Reference<0.192931417.1724907521.131460>},
    {{#PID<0.40207.0>,
      [:alias | #Reference<0.0.952211.192931417.1724973060.124916>]},
     #Reference<0.192931417.1724907524.124915>,
     #Reference<0.192931417.1724907521.131461>},
    {{#PID<0.40208.0>,
      [:alias | #Reference<0.0.952339.192931417.1724973063.111464>]},
     #Reference<0.192931417.1724907527.111463>,
     #Reference<0.192931417.1724907521.131475>}
  ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
iex(10)> :sys.get_state(pid(0,462,0))
{:state, #PID<0.463.0>, [],
 {[
    {{#PID<0.53151.0>,
      [:alias | #Reference<0.0.2609043.192931417.1724973062.167102>]},
     #Reference<0.192931417.1724907526.167101>,
     #Reference<0.192931417.1724907522.180080>},
    {{#PID<0.53150.0>,
      [:alias | #Reference<0.0.2608915.192931417.1724973058.180073>]},
     #Reference<0.192931417.1724907522.180072>,
     #Reference<0.192931417.1724907522.180079>},
    {{#PID<0.53147.0>,
      [:alias | #Reference<0.0.2608531.192931417.1724973062.167086>]},
     #Reference<0.192931417.1724907526.167085>,
     #Reference<0.192931417.1724907522.180078>},
    {{#PID<0.53130.0>,
      [:alias | #Reference<0.0.2606355.192931417.1724973058.180059>]},
     #Reference<0.192931417.1724907522.180058>,
     #Reference<0.192931417.1724907522.180077>},
    {{#PID<0.53146.0>,
      [:alias | #Reference<0.0.2608403.192931417.1724973062.167060>]},
     #Reference<0.192931417.1724907526.167059>,
     #Reference<0.192931417.1724907522.180076>},
    {{#PID<0.53149.0>,
      [:alias | #Reference<0.0.2608787.192931417.1724973064.157754>]},
     #Reference<0.192931417.1724907528.157753>,
     #Reference<0.192931417.1724907522.180075>},
    {{#PID<0.53148.0>,
      [:alias | #Reference<0.0.2608659.192931417.1724973057.178486>]},
     #Reference<0.192931417.1724907521.178485>,
     #Reference<0.192931417.1724907522.180074>},
    {{#PID<0.53145.0>,
      [:alias | #Reference<0.0.2608275.192931417.1724973063.159113>]},
     #Reference<0.192931417.1724907527.159112>,
     #Reference<0.192931417.1724907527.159132>},
    {{#PID<0.53144.0>,
      [:alias | #Reference<0.0.2608147.192931417.1724973064.157735>]},
     #Reference<0.192931417.1724907528.157734>,
     #Reference<0.192931417.1724907527.159131>},
    {{#PID<0.53143.0>,
      [:alias | #Reference<0.0.2608019.192931417.1724973057.178467>]},
     #Reference<0.192931417.1724907521.178466>,
     #Reference<0.192931417.1724907527.159130>},
    {{#PID<0.53140.0>,
      [:alias | #Reference<0.0.2607635.192931417.1724973062.167033>]},
     #Reference<0.192931417.1724907526.167032>,
     #Reference<0.192931417.1724907527.159129>},
    {{#PID<0.53138.0>,
      [:alias | #Reference<0.0.2607379.192931417.1724973063.159094>]},
     #Reference<0.192931417.1724907527.159093>,
     #Reference<0.192931417.1724907527.159128>},
    {{#PID<0.53133.0>,
      [:alias | #Reference<0.0.2606739.192931417.1724973062.167020>]},
     #Reference<0.192931417.1724907526.167019>,
     #Reference<0.192931417.1724907527.159127>},
    {{#PID<0.53137.0>,
      [:alias | #Reference<0.0.2607251.192931417.1724973063.159081>]},
     #Reference<0.192931417.1724907527.159080>,
     #Reference<0.192931417.1724907527.159126>},
    {{#PID<0.53142.0>,
      [:alias | #Reference<0.0.2607891.192931417.1724973057.178448>]},
     #Reference<0.192931417.1724907521.178447>,
     #Reference<0.192931417.1724907527.159125>},
    {{#PID<0.53139.0>,
      [:alias | #Reference<0.0.2607507.192931417.1724973064.157715>]},
     #Reference<0.192931417.1724907528.157714>,
     #Reference<0.192931417.1724907527.159124>},
    {{#PID<0.53141.0>,
      [:alias | #Reference<0.0.2607763.192931417.1724973060.172514>]},
     #Reference<0.192931417.1724907524.172513>,
     #Reference<0.192931417.1724907527.159123>},
    {{#PID<0.53136.0>,
      [:alias | #Reference<0.0.2607123.192931417.1724973063.159068>]},
     #Reference<0.192931417.1724907527.159067>,
     #Reference<0.192931417.1724907527.159122>},
    {{#PID<0.53135.0>,
      [:alias | #Reference<0.0.2606995.192931417.1724973063.159055>]},
     #Reference<0.192931417.1724907527.159054>,
     #Reference<0.192931417.1724907527.159121>},
    {{#PID<0.53134.0>,
      [:alias | #Reference<0.0.2606867.192931417.1724973063.159018>]},
     #Reference<0.192931417.1724907527.159017>,
     #Reference<0.192931417.1724907527.159118>},
    {{#PID<0.53132.0>,
      [:alias | #Reference<0.0.2606611.192931417.1724973063.158998>]},
     #Reference<0.192931417.1724907527.158997>,
     #Reference<0.192931417.1724907527.159116>},
    {{#PID<0.53131.0>,
      [:alias | #Reference<0.0.2606483.192931417.1724973057.178415>]},
     #Reference<0.192931417.1724907521.178414>,
     #Reference<0.192931417.1724907527.159114>},
    {{#PID<0.53129.0>,
      [:alias | #Reference<0.0.2606227.192931417.1724973061.166048>]},
     #Reference<0.192931417.1724907525.166047>,
     #Reference<0.192931417.1724907525.166050>},
    {{#PID<0.53128.0>,
      [:alias | #Reference<0.0.2606099.192931417.1724973062.166969>]},
     #Reference<0.192931417.1724907526.166968>,
     #Reference<0.192931417.1724907525.166049>},
    {{#PID<0.53127.0>,
      [:alias | #Reference<0.0.2605971.192931417.1724973061.166034>]},
     #Reference<0.192931417.1724907525.166033>,
     #Reference<0.192931417.1724907525.166035>},
    {{#PID<0.53125.0>,
      [:alias | #Reference<0.0.2605715.192931417.1724973057.178391>]},
     #Reference<0.192931417.1724907521.178390>,
     #Reference<0.192931417.1724907525.166009>},
    {{#PID<0.53126.0>,
      [:alias | #Reference<0.0.2605843.192931417.1724973057.178378>]},
     #Reference<0.192931417.1724907521.178377>,
     #Reference<0.192931417.1724907525.166008>},
    {{#PID<0.53121.0>,
      [:alias | #Reference<0.0.2605203.192931417.1724973057.178359>]},
     #Reference<0.192931417.1724907521.178358>,
     #Reference<0.192931417.1724907525.166007>},
    {{#PID<0.53120.0>,
      [:alias | #Reference<0.0.2605075.192931417.1724973057.178346>]},
     #Reference<0.192931417.1724907521.178345>,
     #Reference<0.192931417.1724907525.166006>},
    {{#PID<0.53124.0>,
      [:alias | #Reference<0.0.2605587.192931417.1724973059.165475>]},
     #Reference<0.192931417.1724907523.165474>,
     #Reference<0.192931417.1724907525.166005>},
    {{#PID<0.53123.0>,
      [:alias | #Reference<0.0.2605459.192931417.1724973059.165462>]},
     #Reference<0.192931417.1724907523.165461>,
     #Reference<0.192931417.1724907526.166938>},
    {{#PID<0.53122.0>,
      [:alias | #Reference<0.0.2605331.192931417.1724973062.166932>]},
     #Reference<0.192931417.1724907526.166931>,
     #Reference<0.192931417.1724907526.166933>},
    {{#PID<0.53119.0>,
      [:alias | #Reference<0.0.2604947.192931417.1724973059.165428>]},
     #Reference<0.192931417.1724907523.165427>,
     #Reference<0.192931417.1724907526.166914>},
    {{#PID<0.53118.0>,
      [:alias | #Reference<0.0.2604819.192931417.1724973058.180000>]},
     #Reference<0.192931417.1724907522.179999>,
     #Reference<0.192931417.1724907526.166908>},
    {{#PID<0.53117.0>,
      [:alias | #Reference<0.0.2604691.192931417.1724973064.157672>]},
     #Reference<0.192931417.1724907528.157671>,
     #Reference<0.192931417.1724907526.166907>},
    {{#PID<0.53116.0>,
      [:alias | #Reference<0.0.2604563.192931417.1724973062.166900>]},
     #Reference<0.192931417.1724907526.166899>,
     #Reference<0.192931417.1724907526.166901>},
    {{#PID<0.53115.0>,
      [:alias | #Reference<0.0.2604435.192931417.1724973062.166879>]},
     #Reference<0.192931417.1724907526.166878>,
     #Reference<0.192931417.1724907526.166881>},
    {{#PID<0.53113.0>,
      [:alias | #Reference<0.0.2604179.192931417.1724973062.166858>]},
     #Reference<0.192931417.1724907526.166857>,
     #Reference<0.192931417.1724907526.166860>},
    {{#PID<0.53114.0>,
      [:alias | #Reference<0.0.2604307.192931417.1724973058.179975>]},
     #Reference<0.192931417.1724907522.179974>,
     #Reference<0.192931417.1724907526.166859>},
    {{#PID<0.53112.0>,
      [:alias | #Reference<0.0.2604051.192931417.1724973063.158948>]},
     #Reference<0.192931417.1724907527.158947>,
     #Reference<0.192931417.1724907527.158949>},
    {{#PID<0.53111.0>,
      [:alias | #Reference<0.0.2603923.192931417.1724973059.165382>]},
     #Reference<0.192931417.1724907523.165381>,
     #Reference<0.192931417.1724907523.165384>},
    {{#PID<0.53110.0>, [...]}, #Reference<0.192931417.1724907527.158928>,
     #Reference<0.192931417.1724907526.166834>},
    {{#PID<0.53109.0>, ...}, #Reference<0.192931417.1724907522.179945>, ...}
  ],
  [
    {{#PID<0.53099.0>,
      [:alias | #Reference<0.0.2602387.192931417.1724973063.158896>]},
     #Reference<0.192931417.1724907527.158895>,
     #Reference<0.192931417.1724907528.157625>},
    {{#PID<0.53096.0>,
      [:alias | #Reference<0.0.2602003.192931417.1724973058.179914>]},
     #Reference<0.192931417.1724907522.179913>,
     #Reference<0.192931417.1724907528.157626>},
    {{#PID<0.53102.0>,
      [:alias | #Reference<0.0.2602771.192931417.1724973060.172418>]},
     #Reference<0.192931417.1724907524.172417>,
     #Reference<0.192931417.1724907528.157627>},
    {{#PID<0.53098.0>,
      [:alias | #Reference<0.0.2602259.192931417.1724973064.157624>]},
     #Reference<0.192931417.1724907528.157623>,
     #Reference<0.192931417.1724907528.157628>},
    {{#PID<0.53107.0>,
      [:alias | #Reference<0.0.2603411.192931417.1724973063.158915>]},
     #Reference<0.192931417.1724907527.158914>,
     #Reference<0.192931417.1724907528.157635>},
    {{#PID<0.53108.0>,
      [:alias | #Reference<0.0.2603539.192931417.1724973058.179933>]},
     #Reference<0.192931417.1724907522.179932>,
     #Reference<0.192931417.1724907527.158916>}
  ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo}
  
 # queue keeps growing

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 22, 2023

I've been able to reproduce the problem outside of Plausible with this script

1..1_000_000
|> Task.async_stream(
  fn i ->
    if rem(i, 666) == 0 do # we need sampling
      {t, r} =
        :timer.tc(fn ->
          Hammer.check_rate("ingest:site:loadtest.site", 60_000, 1_000_000)
        end)

      # since IO.inspect is too slow and allows Poolboy to clear the queue
      IO.inspect(t, label: i)
      r
    else
      # and the "actual overload" happens in this "fast" branch
      Hammer.check_rate("ingest:site:loadtest.site", 60_000, 1_000_000)
    end
  end,
  ordered: false,
  max_concurrency: 500
)
|> Stream.run()

You might need to play with max_concurrency: 500 and rem(i, 666) numbers to make Poolboy overload gradually like with k6 benchmark. On my system, these recreate the Plausible/k6 situation:

MIX_ENV=bench mix run bench/ets.exs
$ MIX_ENV=bench mix run bench/ets.exs
666: 10640
1332: 9843
1998: 10329
2664: 9764
3330: 9601
3996: 9598
4662: 9668
5328: 9592
5994: 9869
6660: 9605
7326: 9553
7992: 10193
8658: 9490
9324: 9851
9990: 9688
10656: 9733
11322: 9929
11988: 9842
12654: 9832
13320: 9521
13986: 9608
14652: 9387
15318: 9618
15984: 9402
16650: 9579
17316: 9761
17982: 9547
18648: 9439
19314: 9503
19980: 9675
20646: 9821
21312: 9391
21978: 9632
22644: 9607
23310: 9634
23976: 9400
24642: 9635
25308: 9517
25974: 10034
26640: 9520
27306: 9399
27972: 9095
28638: 9272
29304: 9435
29970: 9335
30636: 9372
31302: 9611
31968: 9550
32634: 9505
33300: 9482
33966: 9739
34632: 9615
35298: 9393
35964: 9445
36630: 9696
37296: 9831
37962: 9673
38628: 9651
39294: 9485
39960: 9099
40626: 9274
41292: 9237
41958: 10153
42624: 9428
43290: 9677
43956: 10819
44622: 9662
45288: 9622
45954: 9602
46620: 10600
47286: 9745
47952: 10368
48618: 9316
49284: 9982
49950: 10506
50616: 10758
51282: 10616
51948: 9924
52614: 10896
53280: 11025
53946: 11228
54612: 11000
55278: 10154
55944: 10853
56610: 10635
57276: 10953
57942: 11043
58608: 11334
59274: 10712
59940: 11052
60606: 10729
61272: 11144
61938: 11044
62604: 11352
63270: 11367
63936: 11474
64602: 11299
65268: 11428
65934: 11314
66600: 11238
67266: 11725
67932: 13145
68598: 13287
69264: 15812
69930: 15726
70596: 18159
71262: 24556
71928: 21276
72594: 22620
73260: 22422
73926: 26000
74592: 18282
75258: 17352
75924: 18440
76590: 18035
77256: 18177
77922: 14600
78588: 14917
79254: 14584
79920: 12567
80586: 12771
81252: 12515
81918: 12365
82584: 12992
83250: 12484
83916: 12385
84582: 12167
85248: 12868
85914: 12587
86580: 12619
87246: 12797
87912: 12460
88578: 14704
89244: 14084
89910: 13837
90576: 12096
91242: 13144
91908: 12912
92574: 12784
93240: 14351
93906: 14362
94572: 13985
95238: 16149
95904: 12922
96570: 13199
97236: 14150
97902: 12705
98568: 12364
99234: 12875
99900: 12469
100566: 12425
101232: 12790
101898: 12651
102564: 12480
103230: 13137
103896: 12540
104562: 12674
105228: 12750
105894: 12849
106560: 13044
107226: 12368
107892: 12484
108558: 12637
109224: 12599
109890: 12431
110556: 12481
111222: 12363
111888: 12622
112554: 12736
113220: 11264
113886: 12778
114552: 11255
115218: 10674
115884: 12293
116550: 11177
117216: 10788
117882: 10859
118548: 11043
119214: 10980
119880: 10863
120546: 10707
121212: 10598
121878: 10976
122544: 10641
123210: 11008
123876: 11317
124542: 11247
125208: 11058
125874: 12143
126540: 11282
127206: 10902
127872: 11642
128538: 10766
129204: 11731
^C

Note that this can't be reproduced in Benchee since it waits for each request to complete which allows Poolboy to keep the queue empty.

@epinault
Copy link
Contributor

Hello!

thanks for the detail report. I took over the project and there is bunch of work I want to try to find time to work on eventually. I have not run into this issue yet but yea the perf is either pool boy or ETS because we bypass the Genserver in the latest version and directly talk to ETS.

That being Said, Pool boy is configurable. Have you tried tweaking it and make it bigger just to see how it behaves?

Removing poolboy might be tricky because we do need for solution like REDIS and other backend so having a pool in front is helpful. We could consider removing the pool and see what happens ? One thing is that.

ETS is not shared accross VM so not sure how many people use it because you will not be able to do rate limiting with multiple pods correctly (or it will have to be approximate and set a value that is pod number dependent)

I ll see if I can spend some time in the coming weeks but if you have an MR or some idea, feel free to submit it as well.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 22, 2023

👋

That being Said, Pool boy is configurable. Have you tried tweaking it and make it bigger just to see how it behaves?

That would just delay the inevitable, I think. As #71 (comment) shows the overload would happen anyway once Poolboy stops keeping up, the pool size largely doesn't matter unless it matches the number of clients.

Removing poolboy might be tricky because we do need for solution like REDIS and other backend so having a pool in front is helpful. We could consider removing the pool and see what happens ? One thing is that.

The pooling strategy could be moved to the adapter, the Ecto way.

ETS is not shared accross VM so not sure how many people use it

According to hex.pm's recent downloads, 119339 (~one third) if them are probably for the ETS backend. Screenshot 2023-11-23 at 08 08 00

you will not be able to do rate limiting with multiple pods correctly

It could be actually worked around with sticky sessions or Phx PubSub

I ll see if I can spend some time in the coming weeks but if you have an MR or some idea, feel free to submit it as well.

I'm happy to work on this since it seems to directly affect Plausible, I just wanted to check with the project maintainers first :)

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Nov 23, 2023

I've started exploring various approaches and the current fastest one is PlugAttack: https://github.com/ruslandoga/rate_limit
Atomics seem promising as well. Maybe :hammer can provide two backends by default, ETS (based off of PlugAttack) and Atomics.

@epinault
Copy link
Contributor

I am not too familiar with Atomics yet but sounds like a good idea.

I am not against moving the pooling strategy, we will need to create a major version correctly for it and all of the backend then

@epinault
Copy link
Contributor

@ruslandoga are you still working on splitting the MR? there are changes I was you had that I was going to tackle but if you are not then I ll start working on them

@ruslandoga
Copy link
Contributor Author

Hi @epinault ! I'm currently not working on this. I do plan to return to it this month though.

@ruslandoga
Copy link
Contributor Author

ruslandoga commented Apr 29, 2024

Sorry for disappearing and not keeping my promise :) I'm going to continue working on my PRs later this week. in August. later this year!

@ruslandoga ruslandoga changed the title performance degradation over time with ETS backend Зerformance degradation over time with ETS backend Sep 2, 2024
@ruslandoga ruslandoga changed the title Зerformance degradation over time with ETS backend Performance degradation over time with ETS backend Sep 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants