Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Week 30: Finishing the naive Load Balancer #5

Open
mrdude opened this issue Mar 22, 2017 · 2 comments
Open

Week 30: Finishing the naive Load Balancer #5

mrdude opened this issue Mar 22, 2017 · 2 comments
Assignees

Comments

@mrdude
Copy link
Contributor

mrdude commented Mar 22, 2017

@twood02

What have I been doing?

I've mostly been working on fixing bugs in my naive load balancer implementation. This "load balancer" just forwards all connections to the first backend. As of commit 0d7865, this is mostly done. Athena will happily accept and forward Vegeta's connections for a while. After about ~20k packets or so, Vegeta will start reporting that it's connections are being rejected. I dunno what is going on yet; I'm still in the process of scanning the pcaps in Wireshark.

Milestones

  • TCP replay
  • add naive LB - need to finish debugging
    • add a SELECTING_BACKEND state to the main state machine
    • allow SELECTING_BACKEND to queue up packets from the client
  • pluggable LBs - allow the load balancer to be specified on the command line
    • round robin
    • least connections
    • smart queue time algorithm

What am I doing this week?

Once the naive LB implementation works, I'm going to move on to implementing pluggable load balancers. By the end of this week, I want to have implementations for Naive, Round Robin, and Least Connections done.

Once I have all of these algorithms implemented, I can compare their performance using Vegeta's reported stats. This will be good to have in my presentation; I can create a graph comparing 95%ile latency for each alg.

Other Misc TODO items

  • figure out how to get ATH_ASSERT() working
  • clean up code so it follows the style guide
  • ? add a TIME_WAIT state to state machine -- keep forwarding connections for 4 min after a RST?

Potential Roadblocks

In the current implementation, Athena assumes that it has the same IP as its load balancer backends. Because of this, Athena doesn't have to know ARP; Athena just blindly forwards any non-TCP packets it gets, and the networking stack in the backend's kernel handles everything else.

Ideally, one would be able to run Athena on a server as a reverse proxy for a cluster of web servers. This would require Athena to: 1) respond to ARP requests for its IP, and 2) read incoming ARP packets so that Athena can patch the ethernet headers (as well as the TCP and IP headers) while routing.

I'd really rather not have to write code to understand ARP right now; considering how long it took me to iron out the bugs in TCP replay, I don't have time to get Athena to understand another protocol (even one as relatively straightforward as ARP). For the time being, Athena is going to assume that it shares an IP with it's backends. If I still have time after implementing the rest of my intended milestone features, I'll add ARP support to Athena.

@mrdude mrdude self-assigned this Mar 22, 2017
mrdude added a commit that referenced this issue Mar 22, 2017
@mrdude
Copy link
Contributor Author

mrdude commented Mar 22, 2017

Here are the pcaps I've been looking at: nn27, nn29. Nimbnode27 has an IP of 11.0.0.27 and hosts the backend webservers at 11.0.0.27:81, :85, :90, :95, and :100. Nimbnode29 has an IP of 11.0.0.29 and sends the client requests. Athena runs on Nimbnode28 and routes all packets that pass between nn27 and nn29.

I've been looking at them in Wireshark. Everything seems to be working nicely until this point:

Wireshark screencapLink to larger image

Wireshark notes that TCP port numbers are beginning to be reused; I suspect that this is causing Athena to get confused about connection states.

@twood02
Copy link
Contributor

twood02 commented Mar 22, 2017

Yes, I was going to ask about this -- at some point ports will be reused, so if you aren't clearing out old connections that is likely to be an issue. Detecting the close of a connection can be a bit tricky (ordering of FIN/RST isn't always consistent). For your purposes it may be fine to just detect when a port is being reused (new SYN) and recognize that means you need to reset your state machine for that connection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants