Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spread cluster start fails to start up cluster #63

Open
hharnisc opened this issue May 24, 2016 · 7 comments
Open

spread cluster start fails to start up cluster #63

hharnisc opened this issue May 24, 2016 · 7 comments

Comments

@hharnisc
Copy link

It's unclear how I got in this state, but I'm not able to start up a localkube cluster.

I've tried stoping/starting the cluster, removing all images and containers, re-creating a docker machine, and even going as far as re-installing docker.

The container seems to continuously restart

CONTAINER ID        IMAGE                            COMMAND             CREATED             STATUS                         PORTS               NAMES
5a1e299c3124        redspreadapps/localkube:latest   "start.sh"          15 minutes ago      Restarting (0) 2 minutes ago                       localkube

When I grab the container logs (docker log 5a1e299c3124) I get the following:

0bb5c03101f0f473218733b67258b04c07176225413651703e62295686adc014
1ef0f1618a97621cf0cca908d428cf466d3dc4b5f8ac4c1112d8829bb31dc147
10.32.0.1
Starting LocalKube...
Starting etcd...
2016-05-24 14:27:53.477939 I | etcdserver: recovered store from snapshot at index 460046
2016-05-24 14:27:53.478088 I | etcdserver: name = kubeetcd
2016-05-24 14:27:53.478126 I | etcdserver: data dir = /var/localkube/data
2016-05-24 14:27:53.478152 I | etcdserver: member dir = /var/localkube/data/member
2016-05-24 14:27:53.478175 I | etcdserver: heartbeat = 100ms
2016-05-24 14:27:53.478197 I | etcdserver: election = 1000ms
2016-05-24 14:27:53.478218 I | etcdserver: snapshot count = 10000
2016-05-24 14:27:53.478245 I | etcdserver: advertise client URLs = http://localhost:2379
2016-05-24 14:27:53.478289 I | etcdserver: loaded cluster information from store: <nil>
2016-05-24 14:27:54.295145 C | etcdserver: read wal error (walpb: crc mismatch) and cannot be repaired
Plugin is not running.
@mfburnett
Copy link
Member

Hey @hharnisc, try to stop localkube and remove all containers with spread cluster stop -r and then restart with spread cluster start - let me know if that fixes it.

@hharnisc
Copy link
Author

@mfburnett still no luck

$ spread cluster stop -r
Stopping container '5a1e299c3124b361b895f1279f612f1174f7c5e2e9b5287a8ae077b12708f803'
Removing container '5a1e299c3124b361b895f1279f612f1174f7c5e2e9b5287a8ae077b12708f803'

then starting it

$ spread cluster start                                           
Creating localkube container...
Starting localkube container...

then checking the cluster

$ kubectl cluster-info
The connection to the server 192.168.99.100:8080 was refused - did you specify the right host or port?

@hharnisc
Copy link
Author

Looking at that log it looks like etcd is having a bad time. Potentially blowing up here: https://github.com/coreos/etcd/blob/master/wal/wal.go#L271

@hharnisc
Copy link
Author

@mfburnett @ethernetdan does localkube cache anything on the host filesystem?

@hharnisc
Copy link
Author

rm -rf ~/.localkube seems to have got me unstuck. I wish I would have thought to keep of copy of data in there so you could use it to debug. If it happens again I'll be sure to include it.

@mfburnett
Copy link
Member

@hharnisc hm glad you got unstuck, thanks for documenting it!

@ibmendoza
Copy link

It also happened to me under Turnkey Linux 14.1 but fortunately below worked.
Thanks @mfburnett

spread cluster stop -r

spread cluster start

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants