Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new 'runtime extension' system to MicroCluster #86

Merged
merged 15 commits into from
Apr 26, 2024

Conversation

gabrielmougard
Copy link
Contributor

@gabrielmougard gabrielmougard commented Feb 12, 2024

Introducing an 'extensions' endpoint in MicroCluster would help in scenarios where service A (backed by MicroCluster) needs to check that service B (also backed by MicroCluster) has a certain feature (e.g, does the installed version of MicroOVN supports custom encapsulation ip that has been entered by a user in a MicroCloud setup)

Required by canonical/microovn#121 and canonical/microovn#113 and canonical/microcloud#245

@gabrielmougard gabrielmougard changed the title Resources: Add a new featuresCmd to the public cluster API Add a new 'runtime extension' system to MicroCluster Feb 16, 2024
Copy link
Contributor

@masnax masnax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, just a couple things to change:

Firstly, we can't rely on the database to store the schema extensions for two reasons:

  • If there's an API extension mismatch, we we would have to shut down the database which is expensive and slow.
  • More importantly, when joining a cluster, you don't have an open database yet, so when you set up the join operation you can't tell the existing cluster what API extensions you're able to support.

Instead I think we should follow an approach similar to LXD where we have a []string of API extensions defined. In microcluster's case, this should be a function rather than a global variable, like we have for schema extensions here:

func NewSchema() *SchemaUpdateManager {
return &SchemaUpdateManager{
updates: map[int]schema.Update{
1: updateFromV0,
},
}
}

If the extensions are stored in memory, they will update immediately when we restart the daemon on an update, or start it for the first time.

Additionally, I think we should not expose the concept of vendor and just have one big set of extensions, and error out if there's a conflict. Internally, we should have 2 lists for internal and external but when a user consumes from the API we should show them just 1 unified list.

FInally, to handle the API extensions on the joining side, we need to pass the lists with rest of the joining information. When we ask to join an existing cluster, we send that cluster a JoinConfig payload. We should include the list of schema extensions so the existing cluster can see if we are compatible, and only then allow us to open the database:

// Prepare the cluster for the incoming dqlite request by creating a database entry.
newClusterMember := internalTypes.ClusterMember{
ClusterMemberLocal: internalTypes.ClusterMemberLocal{
Name: localClusterMember.Name,
Address: localClusterMember.Address,
Certificate: localClusterMember.Certificate,
},
SchemaVersion: state.Database.Schema().Version(),
Secret: token.Secret,
}

The existing cluster should then record the size in the internal_cluster_members table. Here I think we should have 2 new columns for internal_api_extensions and external_api_extensions which store the size of the respective extension sets, which will represent the "api version". We already do this for the schema version:

err = s.Database.Transaction(s.Context, func(ctx context.Context, tx *sql.Tx) error {
dbClusterMember := cluster.InternalClusterMember{
Name: req.Name,
Address: req.Address.String(),
Certificate: req.Certificate.String(),
Schema: req.SchemaVersion,
Heartbeat: time.Time{},
Role: cluster.Pending,
}

cluster/runtime_extensions.go Outdated Show resolved Hide resolved
internal/daemon/daemon.go Outdated Show resolved Hide resolved
@tomponline
Copy link
Member

Additionally, I think we should not expose the concept of vendor and just have one big set of extensions, and error out if there's a conflict.

Agreed.

@gabrielmougard
Copy link
Contributor Author

@masnax @tomponline let me know what you think about this new take. It's interesting to see this PR with canonical/microovn#121 as it shows how an integration can be easily done with a MicroCluster-based service.

Copy link
Contributor

@masnax masnax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good, just one more nit about the naming structure and one last point to add:

What happens on an upgrade? If you have a cluster of microovns currently sitting at 10 API extensions, and you snap refresh one of them so it has 11 API extensions, how should the cluster behave?

It looks like right now you're registering the API extensions on start, and comparing them to other nodes when a join occurs. That means the cluster will ignore extension changes when a node is upgraded. So we also need to compare extensions if we've restarted our node (which would happen on snap refresh).

We do this for schema versions here:

func (db *DB) Open(bootstrap bool, project string) error {
ctx, cancel := context.WithTimeout(db.ctx, 30*time.Second)
defer cancel()
err := db.dqlite.Ready(ctx)
if err != nil {
return err
}
db.db, err = db.dqlite.Open(db.ctx, db.dbName)
if err != nil {
return err
}
otherNodesBehind := false
newSchema := db.Schema()
if !bootstrap {
checkVersions := func(ctx context.Context, current int, tx *sql.Tx) error {
schemaVersion := newSchema.Version()
err = cluster.UpdateClusterMemberSchemaVersion(tx, schemaVersion, db.listenAddr.URL.Host)
if err != nil {
return fmt.Errorf("Failed to update schema version when joining cluster: %w", err)
}
versions, err := cluster.GetClusterMemberSchemaVersions(ctx, tx)
if err != nil {
return fmt.Errorf("Failed to get other members' schema versions: %w", err)
}
for _, version := range versions {
if schemaVersion == version {
// Versions are equal, there's hope for the
// update. Let's check the next node.
continue
}
if schemaVersion > version {
// Our version is bigger, we should stop here
// and wait for other nodes to be upgraded and
// restarted.
otherNodesBehind = true
return schema.ErrGracefulAbort
}
// Another node has a version greater than ours
// and presumeably is waiting for other nodes
// to upgrade. Let's error out and shutdown
// since we need a greater version.
return fmt.Errorf("this node's version is behind, please upgrade")
}
return nil
}
newSchema.Check(checkVersions)
}
err = db.retry(func() error {
_, err = newSchema.Ensure(db.db)
return err
})
// Check if other nodes are behind before checking the error.
if otherNodesBehind {
// If we are not bootstrapping, wait for an upgrade notification, or wait a minute before checking again.
if !bootstrap {
logger.Warn("Waiting for other cluster members to upgrade their versions", logger.Ctx{"address": db.listenAddr.String()})
select {
case <-db.upgradeCh:
case <-time.After(time.Minute):
}
}
return err
}
if err != nil {
return err
}
err = cluster.PrepareStmts(db.db, project, false)
if err != nil {
return err
}
db.openCanceller.Cancel()
return nil
}

How this works is, if we open the database for any reason other than a bootstrap (so a join or a restart), we compare the schema version in the database to what we have registered locally. First, we update our node's version in internal_cluster_members so other nodes can compare themselves to us as well. The comparison works like this:

  • If our locally registered schema version matches the all other nodes' versions recorded in internal_cluster_members then we open the database and proceed as normal.
  • If our locally registered schema version is smaller than another node, that means we need to upgrade locally so we error out.
  • If our locally registered schema version is larger than another node, that means we need to wait until the other nodes upgrade. So we block on db.upgradeCh:
    if otherNodesBehind {
    // If we are not bootstrapping, wait for an upgrade notification, or wait a minute before checking again.
    if !bootstrap {
    logger.Warn("Waiting for other cluster members to upgrade their versions", logger.Ctx{"address": db.listenAddr.String()})
    select {
    case <-db.upgradeCh:
    case <-time.After(time.Minute):
    }
    }
    This channel closes when we make this PATCH request to dqlite:
    // Send notification that this node is upgraded to all other cluster members.
    err = cluster.Query(d.ShutdownCtx, true, func(ctx context.Context, c *client.Client) error {
    path := c.URL()
    parts := strings.Split(string(internalClient.InternalEndpoint), "/")
    parts = append(parts, "database")
    path = *path.Path(parts...)
    upgradeRequest, err := http.NewRequest("PATCH", path.String(), nil)
    if err != nil {
    return err
    }
    upgradeRequest.Header.Set("X-Dqlite-Version", fmt.Sprintf("%d", 1))
    upgradeRequest = upgradeRequest.WithContext(ctx)
    resp, err := c.Client.Do(upgradeRequest)
    Any node that successfully opens its database (detects that its version matches all other records in internal_cluster_members) will make the PATCH requests to all nodes excluding itself, unblocking them.

So what I think you'll need to do here is modify UpdateClusterMemberSchemaVersion and GetClusterMemberSchemaVersion to update/return both the schema and internal and external API versions, and make sure all 3 are checked and updated in the checkVersions function inside db.Open.

internal/db/extensions.go Outdated Show resolved Hide resolved
internal/db/extensions.go Outdated Show resolved Hide resolved
internal/db/extensions.go Outdated Show resolved Hide resolved
@tomponline
Copy link
Member

@masnax does microcloud request the other members do a snap refresh when one gets upgraded like LXD does?

@masnax
Copy link
Contributor

masnax commented Feb 21, 2024

@masnax does microcloud request the other members do a snap refresh when one gets upgraded like LXD does?

I didn't know about this, sounds interesting.

@tomponline
Copy link
Member

@masnax does microcloud request the other members do a snap refresh when one gets upgraded like LXD does?

I didn't know about this, sounds interesting.

I think its this:

https://github.com/search?q=repo%3Acanonical%2Flxd%20LXD_CLUSTER_UPDATE&type=code
https://github.com/canonical/lxd-pkg-snap/blob/e896009ab5a74aa6ca9372d220015cbbf9919d7f/snapcraft/commands/daemon.start#L27

@sabaini
Copy link

sabaini commented Feb 21, 2024

Thanks, this looks useful! As a general comment, it would be great to have the example app updated to make use of the extension by way of documentation

@gabrielmougard gabrielmougard force-pushed the feat/feature-api branch 3 times, most recently from 23e6d1e to 1de2151 Compare February 23, 2024 10:55
Copy link
Contributor

@masnax masnax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few more things and I think we should be good to go :)

internal/db/db.go Outdated Show resolved Hide resolved
internal/extensions/extensions.go Outdated Show resolved Hide resolved
cluster/cluster_members.go Outdated Show resolved Hide resolved
internal/extensions/extensions.go Outdated Show resolved Hide resolved
@gabrielmougard
Copy link
Contributor Author

gabrielmougard commented Feb 27, 2024

@masnax I have an open question: it'd be nice to integrate the testing of this extension system in, let say, the MicroCloud integration test suite. Is it the best place for testing such thing? I was thinking about these test scenarios (correct me if I'm missing a test case):

(If the items below are checked, it means that I successfully managed to make the scenario work manually)

Bootstrap a single node

  • Bootstrap a node (with extensions)
$ microovn cluster bootstrap
$ microovn cluster sql "SELECT * FROM internal_cluster_members;"
+----+-------------+-------------------+------------------------------------------------------------------+--------+-------------------------------------+-------+-------------------------------+-------------------------+
| id |    name     |      address      |                           certificate                            | schema |              heartbeat              | role  |    internal_api_extensions    | external_api_extensions |
+----+-------------+-------------------+------------------------------------------------------------------+--------+-------------------------------------+-------+-------------------------------+-------------------------+
| 1  | gab-desktop | 192.168.1.38:6443 | -----BEGIN CERTIFICATE-----                                      | 4      | 2024-02-27T15:50:46.148257347+01:00 | voter | internal:runtime_extension_v1 | custom_encapsulation_ip |
|    |             |                   | MIIB8jCCAXigAwIBAgIQHZLPsKL7yExQMe4nsh6jcTAKBggqhkjOPQQDAzApMQww |        |                                     |       |                               |                         |
|    |             |                   | CgYDVQQKEwNMWEQxGTAXBgNVBAMMEHJvb3RAZ2FiLWRlc2t0b3AwHhcNMjQwMjI3 |        |                                     |       |                               |                         |
|    |             |                   | MTQ1MDE0WhcNMzQwMjI0MTQ1MDE0WjApMQwwCgYDVQQKEwNMWEQxGTAXBgNVBAMM |        |                                     |       |                               |                         |
|    |             |                   | EHJvb3RAZ2FiLWRlc2t0b3AwdjAQBgcqhkjOPQIBBgUrgQQAIgNiAASk9o+DEQ08 |        |                                     |       |                               |                         |
|    |             |                   | aRlr5AR+zgv6LOHaSb29T21tn4G18vnCF9sApZ/KmxHhM+5W4qHKACyJN2dHanz5 |        |                                     |       |                               |                         |
|    |             |                   | WW43hLQad6w3fnB8SFQ53uLiDKtZb68KeMkgWk3oDJxeBNDtHAXi98SjZTBjMA4G |        |                                     |       |                               |                         |
|    |             |                   | A1UdDwEB/wQEAwIFoDATBgNVHSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAA |        |                                     |       |                               |                         |
|    |             |                   | MC4GA1UdEQQnMCWCC2dhYi1kZXNrdG9whwR/AAABhxAAAAAAAAAAAAAAAAAAAAAB |        |                                     |       |                               |                         |
|    |             |                   | MAoGCCqGSM49BAMDA2gAMGUCMQCMjiPVww/dPoRCiRbjQ5iE3axQaLadYtbWeVtx |        |                                     |       |                               |                         |
|    |             |                   | LnUfGtEND3rBioT4m+BRdvdkiR8CMDdiPfemuHl7vyJtE2YzhNXwYqUhrjZbcdDT |        |                                     |       |                               |                         |
|    |             |                   | TCO+zbs9KVSnJ/h8J6udd8sqJEnjVw==                                 |        |                                     |       |                               |                         |
|    |             |                   | -----END CERTIFICATE-----                                        |        |                                     |       |                               |                         |
|    |             |                   |                                                                  |        |                                     |       |                               |                         |
+----+-------------+-------------------+------------------------------------------------------------------+--------+-------------------------------------+-------+-------------------------------+-------------------------+

Bootstrap a single node + Join other nodes

  • Bootstrap a node (latest API extensions) + Join nodes (all with latest API extensions)
# microovn cluster sql "SELECT * FROM internal_cluster_members;"                                                                                                                                                                                                                 
+----+------+--------------------+------------------------------------------------------------------+--------+--------------------------------+---------+-------------------------------+-------------------------+                                                                       
| id | name |      address       |                           certificate                            | schema |           heartbeat            |  role   |    internal_api_extensions    | external_api_extensions |                                                                       
+----+------+--------------------+------------------------------------------------------------------+--------+--------------------------------+---------+-------------------------------+-------------------------+                                                                       
| 1  | m1   | 10.12.134.172:6443 | -----BEGIN CERTIFICATE-----                                      | 4      | 2024-02-27T15:22:52.674445162Z | voter   | internal:runtime_extension_v1 | custom_encapsulation_ip |                                                                       
|    |      |                    | MIIB2TCCAV6gAwIBAgIRAKPkwsOwp7CRhHkPTIMXBtcwCgYIKoZIzj0EAwMwIDEM |        |                                |         |                               |                         |                                                                       
|    |      |                    | MAoGA1UEChMDTFhEMRAwDgYDVQQDDAdyb290QG0xMB4XDTI0MDIyNzE1MjAwOFoX |        |                                |         |                               |                         |                                                                       
|    |      |                    | DTM0MDIyNDE1MjAwOFowIDEMMAoGA1UEChMDTFhEMRAwDgYDVQQDDAdyb290QG0x |        |                                |         |                               |                         |                                                                       
|    |      |                    | MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEOYmP6WD3zIkoCJzKsUfwaUKjcWvog5qn |        |                                |         |                               |                         |                                                                       
|    |      |                    | V+Sm8dOsZIN2lvMQ3CoQJHpP4WJzHrovGotQsml+lfwLkBAwjLHuyUprXgETsluQ |        |                                |         |                               |                         |                                                                       
|    |      |                    | rYQ2kfK0/F62RuEphJDoI6JEQrp4Cmoao1wwWjAOBgNVHQ8BAf8EBAMCBaAwEwYD |        |                                |         |                               |                         |                                                                       
|    |      |                    | VR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAlBgNVHREEHjAcggJtMYcE |        |                                |         |                               |                         |                                                                       
|    |      |                    | fwAAAYcQAAAAAAAAAAAAAAAAAAAAATAKBggqhkjOPQQDAwNpADBmAjEAw5P5NOWw |        |                                |         |                               |                         |                                                                       
|    |      |                    | qdYvQo/owHQ6IyjMRVwQJAAaCQM+rjqiruPJqbnk6DN9adjnbR+PtUL5AjEA2JrG |        |                                |         |                               |                         |                                                                       
|    |      |                    | /SnDJUDtHpFpBum8frHmEdFZRWg+0EHjfhiJXOwPvpUl2NbLUu/5EIpm1XQz     |        |                                |         |                               |                         |                                                                       
|    |      |                    | -----END CERTIFICATE-----                                        |        |                                |         |                               |                         |                                                                       
|    |      |                    |                                                                  |        |                                |         |                               |                         |                                                                       
| 2  | m2   | 10.12.134.224:6443 | -----BEGIN CERTIFICATE-----                                      | 4      | 2024-02-27T15:22:52.72024209Z  | spare   | internal:runtime_extension_v1 | custom_encapsulation_ip |                                                                       
|    |      |                    | MIIB2DCCAV2gAwIBAgIQFlAERncV7B4f9kAALIgaFzAKBggqhkjOPQQDAzAgMQww |        |                                |         |                               |                         |                                                                       
|    |      |                    | CgYDVQQKEwNMWEQxEDAOBgNVBAMMB3Jvb3RAbTIwHhcNMjQwMjI3MTUyMTAwWhcN |        |                                |         |                               |                         |                                                                       
|    |      |                    | MzQwMjI0MTUyMTAwWjAgMQwwCgYDVQQKEwNMWEQxEDAOBgNVBAMMB3Jvb3RAbTIw |        |                                |         |                               |                         |                                                                       
|    |      |                    | djAQBgcqhkjOPQIBBgUrgQQAIgNiAATdXazF/1TloC8Mj3BbRTrUkGajC/ntu5Hz |        |                                |         |                               |                         |                                                                       
|    |      |                    | EKlnHVyxf04ELGECf/4ts0ZV5OZKRaeVY0aL25JalNI5LiNQtFVjk50tGEUf4Tq4 |        |                                |         |                               |                         |                                                                       
|    |      |                    | 4vtqfbO+hTChsQj2JcGTTSLRgcgM9YKjXDBaMA4GA1UdDwEB/wQEAwIFoDATBgNV |        |                                |         |                               |                         |                                                                       
|    |      |                    | HSUEDDAKBggrBgEFBQcDATAMBgNVHRMBAf8EAjAAMCUGA1UdEQQeMByCAm0yhwR/ |        |                                |         |                               |                         |                                                                       
|    |      |                    | AAABhxAAAAAAAAAAAAAAAAAAAAABMAoGCCqGSM49BAMDA2kAMGYCMQDbL60SLoIB |        |                                |         |                               |                         |                                                                       
|    |      |                    | My/vYjrVKCfrbfF9CLc6AOXPPydDelOPWOgWg3gZoTAt7tvJAyNQiQsCMQDXf98D |        |                                |         |                               |                         |
|    |      |                    | 8ILcZd8msxkb5mrPlR/83AQqUTk5EUP0GOifcKgq2vNqb3t94iI7AaZHDi8=     |        |                                |         |                               |                         |
|    |      |                    | -----END CERTIFICATE-----                                        |        |                                |         |                               |                         |
|    |      |                    |                                                                  |        |                                |         |                               |                         |
| 3  | m3   | 10.12.134.18:6443  | -----BEGIN CERTIFICATE-----                                      | 4      | 0001-01-01T00:00:00Z           | PENDING | internal:runtime_extension_v1 | custom_encapsulation_ip |
|    |      |                    | MIIB2TCCAV6gAwIBAgIRALrfP+aLkHo91BWQBg3IR44wCgYIKoZIzj0EAwMwIDEM |        |                                |         |                               |                         |
|    |      |                    | MAoGA1UEChMDTFhEMRAwDgYDVQQDDAdyb290QG0zMB4XDTI0MDIyNzE1MjEyNloX |        |                                |         |                               |                         |
|    |      |                    | DTM0MDIyNDE1MjEyNlowIDEMMAoGA1UEChMDTFhEMRAwDgYDVQQDDAdyb290QG0z |        |                                |         |                               |                         |
|    |      |                    | MHYwEAYHKoZIzj0CAQYFK4EEACIDYgAEBMOUTApb0ob3dXBaQ8EYk9HvRd6w1sHg |        |                                |         |                               |                         |
|    |      |                    | o3IQ+WMZzZLEaAn8jBITXlGybi3fbOPXJMe5R7vHXe+yztOXT2I/TGZqKNUYltsE |        |                                |         |                               |                         |
|    |      |                    | u0G5whdCnMLeXCSQHTd/Q7hdpDpLmKFwo1wwWjAOBgNVHQ8BAf8EBAMCBaAwEwYD |        |                                |         |                               |                         |
|    |      |                    | VR0lBAwwCgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAlBgNVHREEHjAcggJtM4cE |        |                                |         |                               |                         |
|    |      |                    | fwAAAYcQAAAAAAAAAAAAAAAAAAAAATAKBggqhkjOPQQDAwNpADBmAjEAuVnEl+Gw |        |                                |         |                               |                         |
|    |      |                    | pWMiz11Zdi0lvP5eAfKQ4LRj/wRDSkamWMzg0UphA4BacGERnD+k5jJ0AjEA2uUV |        |                                |         |                               |                         |
|    |      |                    | 5vyswMPwv0QcpfjmC6ZrH/haFbB0GQyX5eUBEkRfhFOcSgRSjv75yBlDhW+F     |        |                                |         |                               |                         |
|    |      |                    | -----END CERTIFICATE-----                                        |        |                                |         |                               |                         |
|    |      |                    |                                                                  |        |                                |         |                               |                         |
+----+------+--------------------+------------------------------------------------------------------+--------+--------------------------------+---------+-------------------------------+-------------------------+
  • Bootstrap a node (latest API extensions) + Join a node (without extensions system) : the join operation should error out.
microovn cluster join eyJuYW1lIjoibTIiLCJzZWNyZXQiOiIyNDhlNzExZDcwZDk3MDFkNzRhZTU0ZTUzM2M2NWVmNzcwMzRmNDM0N2RkZjBiYmM2ZWI3MjhkNGVjZmNkOGUyIiwiZmluZ2VycHJpbnQiOiI0ZTJkZTVjNTI1MTVkMTI3ZmNhZmNkMGY2OGNkNDNlY2Y4NDdjOWM0YzVhZGMzZGUyNzkzNjE1NTljMDU1NjFmIiwiam9pbl9hZGRyZXNzZXMiOlsiMTAuMTIuMTM0LjE3Mjo2NDQzIl19
Error: Failed to join cluster with the given join token 
  • [WIP] Bootstrap a node (latest API extensions) + Join a node (with more API extensions than the bootstrapped node) : the join operation should error out.
  • [WIP] Bootstrap a node (no extension system) + Join a node (with latest API extensions) : should error out.

Existing cluster + Upgrade of a node

  • Bootstrap a cluster (all nodes have no extension system) + Upgrade a node (with the latest extensions)
    @masnax I have a Unable to start daemon: Daemon failed to start: Failed to re-establish cluster connection: Failed to update schema version when joining cluster: no such column: internal_api_extensions. I thought that during a snap refresh (or in my case a snap install <local_snap_file>.snap --dangerous), the DB schema was re-applied. How to handle that case properly?

  • [WIP] Bootstrap a cluster (all nodes have an extension system but these are not the latest) + Upgrade a node (with the latest extensions) (@masnax how can you refresh a snap with a new .snap file ? We can't use snap refresh right?)


If we were to do that kind of testing, how would the setup be done in the test suite? Will we have to manually build different versions of MicroCloud with snapcraft prior to launching the test suite to be able to create these heterogeneous cluster? This approach seems quite complex and very time consuming... I wondered if you had a better idea.

@masnax
Copy link
Contributor

masnax commented Feb 28, 2024

* [ ]  Bootstrap a cluster (all nodes have no extension system) + Upgrade a node (with the latest extensions)
  @masnax I have a `Unable to start daemon: Daemon failed to start: Failed to re-establish cluster connection: Failed to update schema version when joining cluster: no such column: internal_api_extensions`. I thought that during a `snap refresh` (or in my case a `snap install <local_snap_file>.snap --dangerous`), the DB schema was re-applied. How to handle that case properly?

So what you're modifying here is the implementation of the checkVersion function, which is passed to newSchema.Check here: https://github.com/canonical/microcluster/pull/86/files#diff-7e4ba9264afd9eae1dce0c185a9141edf9c02bb44b0b76d19877389b71624006L77

And this basically just assigns the function to a field in newSchema:

// Check instructs the schema to invoke the given function whenever Ensure is
// invoked, before applying any due update. It can be used for aborting the
// operation.
func (s *SchemaUpdate) Check(check schema.Check) {
s.check = check
}

Finally, we call newSchema.Ensure here: https://github.com/canonical/microcluster/pull/86/files#diff-7e4ba9264afd9eae1dce0c185a9141edf9c02bb44b0b76d19877389b71624006R104-R107

But Ensure runs checkVersion before we apply the schema updates:

if s.check != nil {
err := s.check(ctx, current, tx)
if err == schema.ErrGracefulAbort {
// Abort the update gracefully, committing what
// we've done so far.
aborted = true
return nil
}
if err != nil {
return err
}
}

The schema updates come afterward, here:

// When creating the schema from scratch, use the fresh dump if
// available. Otherwise just apply all relevant updates.
if current == 0 && s.fresh != "" {
_, err = tx.ExecContext(ctx, s.fresh)
if err != nil {
return fmt.Errorf("cannot apply fresh schema: %w", err)
}
} else {
err = ensureUpdatesAreApplied(ctx, tx, current, s.updates, s.hook)
if err != nil {
return err
}
}

This is an unfortunate consequence of changing the schema update logic. In this case we need to update the relevant tables before do any checks. Sadly, I think we'll need to add a special case before we run checkVersion, rather than relying on an updateFromV{n}functions.

@masnax
Copy link
Contributor

masnax commented Feb 28, 2024

So in this case you'll have to grab the updateFromV1 function and call it before running checkVersion.

Looking at this function, it seems like it can break if it runs more than once, so we should probably change it to create a second table, then move the rows over from the old table, then delete the old table and rename the new table.

We occasionally do this in LXD like this: https://github.com/canonical/lxd/blob/898d3e02ff6b190e199d2dda4324cb0248a987c5/lxd/db/cluster/update.go#L221-L244

@masnax
Copy link
Contributor

masnax commented Feb 28, 2024

* [ ]  **[WIP]** Bootstrap a cluster (all nodes have an extension system but these are not the latest) + Upgrade a node (with the latest extensions) (@masnax how can you refresh a snap with a new `.snap` file ? We can't use `snap refresh` right?)

You can't use a new .snap file from my understanding, but you can leverage microcloudd.debug as a debug binary:

# Install the snap.
snap install --dangerous ./microcloud.snap

# Make a change to the codebase and build
make

# Copy the binary to the snap state directory (don't forget the double-d for the daemon binary).
cp "${GOPATH}/bin/microcloudd" /var/snap/microcloud/common/microcloudd.debug
systemctl restart snap.microcloud.daemon

This should be a close emulation of snap refresh for this purpose.

roosterfish
roosterfish previously approved these changes Apr 23, 2024
Copy link
Contributor

@roosterfish roosterfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reducing the complexity on the NewExtensionRegistry*() functions and fixing the regex. Looks good to me!

@tomponline
Copy link
Member

@markylaing please can you approve and merge if you're happy with this.

Copy link
Contributor

@markylaing markylaing left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry it's taken a while to get to this.

In general I don't think we need to pass around a *extensions.Extensions as a slice is already a pointer.

The comment I've added about the schema update will need to be addressed before merging but other comments can be addressed in a follow up PR if you choose. Thanks.

internal/extensions/extensions.go Outdated Show resolved Hide resolved
internal/db/update/update.go Outdated Show resolved Hide resolved
internal/extensions/extensions.go Outdated Show resolved Hide resolved
The `Extensions` registry struct holds both `internal` (MicroCluster related API extensions) and `external` (MicroCluster-based service API extensions) extensions
as in memory lookup tables.

The registry comes with some handy functions:

* Initializing logic (from 'empty' configuration (only internal extensions) or with an initial set of external extensions)
* Validation logic (regexp rules, no duplicates, not empty)
* Registration logic (update the external extensions)
* Serialization logic (API view and DB view of the registry)
* Compatibility logic (it is possible to compare two registry - it is useful during a cluster join where we'd want to assert that the joining node is compatible with the leader's extensions)

Signed-off-by: Gabriel Mougard <[email protected]>
…ernal` and `external`) in DB per cluster member

Signed-off-by: Gabriel Mougard <[email protected]>
…ly contains internal extensions) registry

Signed-off-by: Gabriel Mougard <[email protected]>
…ainst the leader's

During a cluster join, the joining node communicate its extensions as seen earlier.
When received (by the leader), we make sure that the target extensions (the joining node ones) and the source
extensions (the leader ones) by comparing them. If they the target ones are supported, we add it to the `internal_cluster_members` table.

Signed-off-by: Gabriel Mougard <[email protected]>
…I extensions

* In the bootstrap case, the `updateFromV2` function will be called in the end.
Because it is a bootstrap, there is no `s.check()` so this works fine.

* In the case of a node upgrade, we let the schema check happen normally, we
ensure all the shema updates are executed (`updateFromV2` should be executed in the Ensure() function)
and only after Ensure() has returned no error, we check the API extensions.

Signed-off-by: Gabriel Mougard <[email protected]>
…` parameter

In order to compare this node's extensions to the other nodes, we need to pass the daemon's extension
to the `db.Open` function so that the extensions check happens after the DB schema check.

Signed-off-by: Gabriel Mougard <[email protected]>
internal/db/db.go Outdated Show resolved Hide resolved
…n` function

Here, we are checking for the compatibility between this node's current extensions and the other node's.
In some cases, when this node's extensions are 'bigger', we allow the other nodes to be upgraded
in the hope to make all the system's extensions align properly. If the extensions sets are equal, we continue the checking process
as usual.

There is a case that is not recoverable like the current node lacking extensions compared to the other one. In this case, we indicate what are the
lacking extensions in the error messages.

Signed-off-by: Gabriel Mougard <[email protected]>
…aped internal cluster member

We want to store the API extensions for the initial internal cluster member.

Signed-off-by: Gabriel Mougard <[email protected]>
* Show how we can add API extensions at daemon startup (`example/api/extensions.go`)
* Show how we can manipulate API extensions within hooks.

Signed-off-by: Gabriel Mougard <[email protected]>
@masnax masnax dismissed markylaing’s stale review April 26, 2024 17:37

Changes addressed

@masnax
Copy link
Contributor

masnax commented Apr 26, 2024

I've tested that this works with new clusters and pre-existing clusters, so we can finally merge this one :)

@masnax masnax merged commit f877c78 into canonical:main Apr 26, 2024
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants