Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark registration shouldn't be coupled to main.cpp #39

Open
travisdowns opened this issue Mar 12, 2018 · 9 comments
Open

Benchmark registration shouldn't be coupled to main.cpp #39

travisdowns opened this issue Mar 12, 2018 · 9 comments

Comments

@travisdowns
Copy link
Owner

Currently every file that defines benchmarks needs to declare a benchmark registration method that is called explicitly in make_benches in main.cpp which look like:

template <typename TIMER>
GroupList make_benches() {

    GroupList groupList;

    register_default<TIMER>(groupList);
    register_loadstore<TIMER>(groupList);
    register_mem<TIMER>(groupList);
    register_misc<TIMER>(groupList);
    register_cpp<TIMER>(groupList);
    register_vector<TIMER>(groupList);
    register_call<TIMER>(groupList);
    register_oneshot<TIMER>(groupList);

    return groupList;
}

This is unfortunate since it means that otherwise independent lists of benchmarks have to be registered in a common place (increasing merge conflicts for independent code) and it also adds another step to adding a new benchmark file.

We should allow independent registration of benchmarks: ideally simply dropping in a new .cpp file that has benchmarks would be enough for it to be picked up. That probably means registration should use some kind of global constructor to register tests from the implementing .cpp file directly. This the order of such calls aren't defined across translation units, we need to sort on benchmark name or something so that we have a consistent order in the benchmark list regardless of actual registration order.

@nemequ
Copy link
Collaborator

nemequ commented Mar 14, 2018

You could look at GLib's Constructors for an idea of how to do this somewhat portably, but AFAIK there is just to guaranteed way to do this.

I think a better solution would be to just have each group compiled into a separate library which could then be dlopen()ed by uarch-bench. In addition to the ability to just drop a C file into uarch-bench and have it just work, it would also make it easy to load modules from other locations, allowing projects to keep uarch-bench tests in their trees. It would also make it easier to test different compilers since only the module in question would need to be recompiled.

@travisdowns
Copy link
Owner Author

travisdowns commented Mar 14, 2018

@nemequ - I'm not following the problem here. I've used this pattern many times (note I'm talking about C++ here not C). The portable approach is just to use straightforward C++, isn't it? An object with a constructor that registers the benchmarks (push instead of the current pull approach).

Perhaps I wasn't clear in the description, but I don't anticipate any issue here really, this is just a task so I remember to do the work.

I also like the idea of loading additional benchmarks at runtime from a shared object, but that's really a different, bigger feature (and using C++ in "plugin" interfaces is kind of a mess last time I looked). I added issue #43 to track that.

@nemequ
Copy link
Collaborator

nemequ commented Mar 15, 2018

Ah right, C++. Sorry, not used to thinking in C++. I don't know about it being a "portable" approach (IIRC there can be some issues if you use them in shared libraries), but yeah it should be safe for what you're talking about here.

@travisdowns
Copy link
Owner Author

travisdowns commented Mar 15, 2018

@nemequ - yeah there are definitely gotchas in C++, the big ones being that (a) the order between different compilation units isn't defined, so you can easily blow up if your globals refer to each other during initialization and (b) destruction is similarly messy, especially with shared libraries, and it's easy to accidentally access objects have they have been destroyed (the "easy out" here is often just to ensure the global object destructor never run at all: just "leak" them).

What I'm thinking of here is simple though and shouldn't run into those issues.

I didn't think much about C specifically, but yeah now that C benchmarks are supported it would be nice to come up with an approach there. One option would to be to have a build step that looks for .c files and picks out the benchmark registration method and adds it to an autogenerated file that registers all the C benchmarks, or ... dunno.

@nemequ
Copy link
Collaborator

nemequ commented Mar 15, 2018

I didn't think much about C specifically, but yeah now that C benchmarks are supported it would be nice to come up with an approach there. One option would to be to have a build step that looks for .c files and picks out the benchmark registration method and adds it to an autogenerated file that registers all the C benchmarks, or ... dunno.

AFAICT the registration code currently needs to be C++ anyways, even if the tests themselves are in C, so that doesn't really matter right now.

An API to register stuff in C would be very nice for #43, but if you're dlopen()ing a module you can just run a function too, so you don't need a constructor.

At some point it wouldn't be too difficult to add a macro which would support constructors on Windows, most GCC/clang/icc configurations, and suncc (would actually be a good idea for portable-snippets), plus C++, but if #43 happens I don't see why the test built in to uarch-bench couldn't use the same mechanism, then there is no need to maintain multiple paths.

@travisdowns
Copy link
Owner Author

An API to register stuff in C would be very nice for #43, but if you're dlopen()ing a module you can just run a function too, so you don't need a constructor... but if #43 happens I don't see why the test built in to uarch-bench couldn't use the same mechanism, then there is no need to maintain multiple paths.

Right, but you could only run one function or perhaps a fixed list of functions after dlopen, right? I mean it solves the problem if you put each group of benchmarks into its own shared library, but if you don't do that you still have the same problem within the library of how to have N separate .cpp or .c files defining benchmarks without registering them in a separate spot. If you could enumerate the available functions in a shared object, and perhaps call all that met some pattern it would also solve this "inner" independence problem, but AFAIK there's not portable way to do that.

Just want to check that we're on the same page here. Of course, having different modules already makes the problem a lot better since you have at least one way to decouple things.

@nemequ
Copy link
Collaborator

nemequ commented Mar 16, 2018

I've been assuming one group per shared library, but if you'd prefer it would be fairly easy to do something like

typedef struct UarchCtx_ UarchCtx;
typedef struct UarchGroup_ UarchGroup;

void uarch_ctx_add_group(UarchCtx* ctx, UarchGroup* group);
UarchGroup* uarch_group_new(UarchCtx* ctx, const char* id, const char* description);
void uarch_group_add_bench(UarchGroup* group, const char* id, const char* description, long(* func)(uint64_t), int ops_per_iter);

I don't think it's too much to ask the modules to have a registration function which knows about all functions within that module.

If you dlopen() a module odds are decent a constructor won't be run automatically anyways, so the only thing I can really think of would be some build system magic to automatically generate a registration function using the file names to generate symbols (e.g., foo-bar.c would trigger a foo_bar_register() call to be generated).

Besides, I think the idea of just dropping a C/C++ file into a directory and having uarch-bench pick it up automatically is more attractive if you're modifying uarch-bench (as you currently have to) than if you're building something in your own source tree, especially when you consider that any build system magic would need to be rewritten, or at least customized, for your project.

@travisdowns
Copy link
Owner Author

travisdowns commented Mar 16, 2018

If you dlopen() a module odds are decent a constructor won't be run automatically anyways

Really, why? That would basically break any code that relies on this standard and widespread feature, which is probably most C++ code. It seems unlikely to me that this is a problem on almost any modern platform.

In any case, I really see the two things as orthogonal: there should be a way to load benchmark shared modules at runtime (or perhaps a way to embed uarch bench as a component inside your own application/module) and there should be a way to make the registration of benchmarks from C++ and C within uarch-bench or a module a bit more decoupled from having a master list.

I think each feature can more or less live or die on its own. I agree with you that for the modules approach it makes the most sense to call one known function after dlopen which registers all the benchmarks in a module. If there happens to be a constructor-based way to generate this list from separate .cpp files then it could be used to generate the list that this method returns, or it could be just an explicit list like we have today in main.cpp.

I don't think it's too much to ask the modules to have a registration function which knows about all functions within that module.

Yeah, that's fair.

@nemequ
Copy link
Collaborator

nemequ commented Mar 19, 2018

Really, why? That would basically break any code that relies on this standard and widespread feature, which is probably most C++ code. It seems unlikely to me that this is a problem on almost any modern platform.

You're probably right for this project. AFAIK ELF and PE both support constructors in the runtime linker. My understanding is that there are still some platforms where it isn't supported, but you're not likely to run into them on x86 anyways.

In any case, I really see the two things as orthogonal: there should be a way to load benchmark shared modules at runtime (or perhaps a way to embed uarch bench as a component inside your own application/module) and there should be a way to make the registration of benchmarks from C++ and C within uarch-bench or a module a bit more decoupled from having a master list.

I would think you would want a single code path since it's easier to maintain, but if you're comfortable with two I don't mind.

That said, I believe there is still an issue with static libraries you should watch out for. Last time I checked, the linker would skip any static libraries for which you don't actually use any symbols, and a constructor doesn't count as usage (https://ofekshilon.com/2013/04/06/forcing-construction-of-global-objects-in-static-libraries/ comes up from a quick search). So, if you're thinking about creating static libraries for each module to keep code duplication minimal while still embedding the built-in tests that may cause problems.

I think each feature can more or less live or die on its own. I agree with you that for the modules approach it makes the most sense to call one known function after dlopen which registers all the benchmarks in a module. If there happens to be a constructor-based way to generate this list from separate .cpp files then it could be used to generate the list that this method returns, or it could be just an explicit list like we have today in main.cpp.

I don't see any reason to complicated it by using a global constructor, but I guess that would be up to each module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants