Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Manifests scanning performance improvements #68

Merged
merged 3 commits into from
Dec 12, 2023

Conversation

f1ames
Copy link
Contributor

@f1ames f1ames commented Dec 7, 2023

This PR fixes https://github.com/kubeshop/monokle-saas/issues/2258.

TL;DR: There are not many changes here, since the main issue was solved by recent PR. See Analysis below.

Changes

  • Added datetime to logs to see better what takes time.
  • Manifests scanning now ignores node_modules dirs by default. Since adding proper support for .gitignores is not trivial, I think this is a good enough solution for now (especially that it wasn't the cause here, at least not main one). For gitignores, I extracted separate issue, see reasoning behind this in the issue itself - Support .gitignore when scanning workspace for manifests #71.

Fixes

  • Deferred initial validation instead of keeping it a part of activation logic (cf85d68). This way activation time is shorter (you don't see Activating extensions status in VSC for so long) - especially that reading yamls from workspace (which is a part of validation) takes, like 10 - 20 seconds (see below).

Analysis

A bit on files finding

What is used now for scanning for manifests is native workspace.findeFiles. It takes into account files ignored in VSC via files.exclude but does not look on .gitignore.

Now measuring times for monokle-saas repo, it seems not finding files but parsing them takes most time:
image
and interestingly we got 50 resources from 837 files, which means most files are not K8s resources (I quickly check and just by ignoring node_modules we get to 137 files and less than 5 seconds).

Still the initial issue is not about a time, but resources. And finding yamls is the only place we do any find with globbing so this is a solid candidate for triggering resource intensive rg calls (and first thing to check is if excluding large dirs, like node_modules, makes it less CPU demanding).

On rg

I tested how rg behaves when switching branches with monokle-saas repo (from some old feature branch to main). But what I noticed is that it mostly happen when there are lots changes to pull, when switching to new branch for the first time. So the procedure for me was:

  1. Install specific extension version (from marketplace or local build).
  2. Open local test repo in VSCode.
  3. Switch to main branch (with latest changes) in local test repo.
  4. Restart VSCode.
  5. Switch to new branch/tag (old one, which is not checkouted locally).
  6. Run Monokle: Validate.

0.6.5

First, with current 0.6.5 version and results are similar as mentioned in initial issue, it goes wild a bit (output from atop with 1s interval, each screenshot is next snapshot):

image
image
image
image

With file watcher changes

There were significant changes how files are processed in #62. So I also tested with those changes (not released yet). Especially, we got rid of inefficient file finding logic - notice line 122 and 142 below:

return workspaceFolders.map((folder) => {
const pattern = new RelativePattern(folder.uri.fsPath, '**/*.{yaml,yml}');
const watcher = workspace.createFileSystemWatcher(pattern);
const resultFile = getValidationResultPath(folder.id);
const revalidateFolder = async () => {
logger.log('revalidateFolder', folder);
context.isValidating = true;
const uri = await validateFolder(folder);
if (uri) {
await context.sarifWatcher.add(Uri.file(resultFile));
} else {
await context.sarifWatcher.remove(Uri.file(resultFile));
}
context.isValidating = false;
};
const resetResourceCache = async (filePath: string) => {
const resourceId = await getResourceIdFromPath(folder, filePath);

☝️ So for every workspace there was a watcher using globbing (L122). And one thing which could happen is when you switch branches it was triggered for multiple files. And for each file it will get resource ids (L142).

async function getResourceIdFromPath(folder: Folder, path: string) {
const files = await findYamlFiles(folder.uri.fsPath);
const file = files.find(file => normalize(file.path) === normalize(path));
const resources = file ? await convertFilesToK8sResources([file]) : [];
return resources.pop()?.id ?? null;
}

☝️ Now getting resource id uses findYamlFiles for each file.

async function findYamlFiles(folderPath: string): Promise<File[]> {
const files = await workspace.findFiles(new RelativePattern(folderPath, '**/*.{yaml,yml}'));
return files
.map(file => {
const fullPath = file.fsPath;
return {
id: generateId(fullPath),
name: basename(file.fsPath),
path: fullPath
};
});
}

☝️ And findYamlFiles does scanning entire workspace again 😓 🙈 Sound like a good party...

Anyways, as mentioned this logic was reworked entirely. The results with new logic:

First run:

image

Second run:

image

This looks really good. Unfortunately, found a related regression too - #70. And even though as part of the test procedure Monokle: Validate is run, with the regression I also checked for rg processes during Monokle extension initialization to also check how first repo scanning behaves:

image
image
image
image

Still no rg party which is good 👍

Checklist

  • tested locally
  • added new dependencies
  • updated the docs
  • added a test

@f1ames f1ames changed the title F1ames/fix/performance improvements [WIP] Manifests scanning performance improvements Dec 11, 2023
@f1ames f1ames marked this pull request as ready for review December 11, 2023 13:36
@f1ames f1ames force-pushed the f1ames/fix/performance-improvements branch from cf85d68 to c41afe5 Compare December 11, 2023 13:44
@WitoDelnat
Copy link

Thank you for this thorough write-out. Great to see that our previous refactor solved the performance problem as well. Let's merge this deference of validation during initialisation, it's a neat little improvement and these things add up over time.

@f1ames f1ames merged commit 842925f into main Dec 12, 2023
3 checks passed
@f1ames f1ames deleted the f1ames/fix/performance-improvements branch December 12, 2023 14:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants