-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IDP demo journey #3470
Comments
Shouldn't it be a Kratix resource request? I thought the kratix promise is the template/definition of what should be done, and then a request basically instantiates that once, or am I mistaken? |
Updated to add high level architectural flow |
Nice, I guess "deploy app" step also includes create namespace for you? I would honestly pull that step out and put it next to "create cluster" as for platform teams the creation of a new project/team namespace is quite an important step that includes OIDC/RBAC setup, quotas, etc. so having it more or less as a "module" would make a lot of sense. |
That's an interesting thought - I think my main question here would be "what would be the delivery mechanism" - I can see two possibilities here - We provide a pre-packaged helm chart that can template the namespace, roles, quotas, etc and can be delivered via the app platform The second option is to use flux to deliver these via a kustomize base path I'd probably lean towards flux here for sake of simplicity and extensibility @piontec What are your thoughts on this? |
I've seen different scenarios for development environment vs. production environment. You want to make developers as productive as possible, while production environments are stricter and controlled. Which use case are we targeting here exactly? Is it I do understand showing infrastructure management, but feel we should be able highlight the value of what we are delivering. In an Integrated Development Environment, I would want to show a flow that allows a developer to quickly go through code / deploy / feedback / fix cycles. |
Do you care to elaborate more on how this is not highlighting the value in what we are delivering? On the contrary I am firmly of the belief this showcases the skills the team has to offer by bringing together a number of disparate tools into a single cohesive journey in a way that customers are already requesting capability towards. Having the ability to demonstrate that is IMHO an incredibly powerful tool, and one we do not have in our arsenal today. Perhaps I should clarify. An IDP is nothing to do with what happens in an engineers local development environment. IDP in this instance relates to an Internal Developer Platform, an interaction point between engineers and the clusters, and specifically on the portal side ( By understanding what is going on inside the cluster, engineers are empowered towards the products they themselves manage. The IDP would be a place they can construct deployments from off-the-shelf products, be that applications delivered as community driven helm charts, infrastructure delivered as
This is covered in that engineers can use the platform to quickly bootrap new applications into the cluster.
We do not care what environment the engineers are rolling to - all clusters are equal. We care only that the journey is the same irrespective of the target environment and in doing so, take away some of the pain of managing cross-environment deployments
I do not get the relevance here. We are not interested in what happens during application development lifecycle. IDEs are not a topic I plan to support and is certainly out of scope of this journey. |
This might be the wrong place for this discussion... I do think your demo is a valuable tool that we don't have today. I'm just looking at the overall story and am wondering if we need another demo as well. Our current iteration of the story & the value is : I've seen many platform demos. And I don't deny there is value in doing them. It is a lot of work to build these. Though the demo can be a bit unimpressive in the end ( It should be ). It often comes down to pushing a button to commit a change and then automation kicking in and delivering a new / changed cluster. Value to the business is in developers iterating faster, delivering better software quicker. Our story of "freeing up the platform engineers to be able to enable developers" is what we can show with this demo. This feels like a bit of a stretch to me, like we should try and do more. As if we free you up to do important things but will not be able to help you with these more important tasks. Would it not be even better, if we could show that devs can actually iterate faster with our IdP? A comparison of the situation / work without the IdP and the situation with the IdP. This could be a combination of slide ware and demo ( demo with the IdP ). I know that this is not where we are today. As I said in the beginning, this might not be the right place. |
Architecture diagram updated to include the separation of delivery of components such as namespace, quotas, permissions, etc. These will be delivered to the cluster using the |
We might need to talk to Big Mac here, as they already have some RBAC helper app IIRC and this gets close to their access management ownership. Maybe that app could be provided by Big Mac and deployed by whatever means Honeybadger feels most adequate. |
As for @LutzLange comments, I feel this is beyond the scope of this PoC/demo. This here is just about the getting started quickly step, i.e. setting up a new project with all the bells and whistles (we could over time add additional templates like e.g. for security or o11y to this). This is a big value driver that many current and most potential new customers have been asking for or even working on themselves. Fast iteration cycles once the project is set up might be influenced by this as everything is set up right and we try to have all environments similarly configured. But there might be other things to show there, which would be based on other features we might work on at some point in the future, e.g. Flagger for canary deployments, automatic branch deployments, o11y setup and validation feedback for these,... |
@mproffitt as the solution architect for the IDP Demo and I sat together to summarise where we are: intermediate status: Backstage:
Some complexity got moved to crossplane, but should not be in crossplane. From User perspective this should be outside of crossplane, mainly everything regarding the Kratix Crossplane
Demo App (app that gets deployed into the demo cluster)
The question that we need to ask ourselves, do we need to write something on our own in Go, or can we be happy with the app being written in python, which already exists. The goal is to show the platform and how it works and the goal should not be to show that we can write apps in Go - which is also clear because we have plenty of apps in Go. Completeness of our components (approximately):
Tasks:
After these tasks are done, the final task is to put everything together and test the whole platform end-to-end. Nice to have extension points (can also be discussed):
|
I'd like to phase this work a bit more so we can focus on getting a minimal thing out that is demoable soon. To me that would mean phase 1
phase 2:
And at the same time, once @piontec is back we can discuss direction with Kratix and if what we are intending to do with it (aka API to Git) would work or not, but pull it out of this demo for now and continue in Kratix specific epic. |
As for:
WC creation is out of scope. Could be a separate demo where we aim for WC creation from Backstage. Needs issue.
Out of scope here, BUT to me this is the next separate demo/feature we should work on. Deploying a ready "environment" for a dev team to a cluster is a super common use case that also a lot of current customers have. This we should then do at least in cooperation with Big Mac, as they do have some early work towards at least the RBAC part of it. Also needs issue.
If possible I'd like to keep this out of phase 1. Could be part of phase 2. For the demo in phase 1 I would rely on two instantions, one where we have run the demo in advance and everything works, and one where we show the creation and switch from one to the other when we want to show things working. |
I would have very much liked to avoid transit gateway - It's the one network component I have always struggled with but unfortunately yes, it is needed. The issue with requiring it relates to being able to get It could be argued "just set up an additional peering connection to the MC" but the second part of this is purpose. TGWs and Peering connections serve different purposes with peering connections being best for high-throughput / low latency and TGWs used for everything else. On a far more positive note, despite it melting my brain, the core of the TGW is now done - I built a basic version this afternoon and I'm fairly confident that this will just work.
I was less clear with your meaning on this point. If this is relating to the composition wrapper that first looks up a cluster, retrieves region and availability zone data, then feeds that to the next composition wrapper then the outer wrapper has not been tested and does not require any downstream changes, it is simply a passthrough that looks up some additional details that's probably best visualised as in the diagram below - any coloured box is a separate composition, white boxes are either endpoint compositions or specific MRs (or simply a reference to what data comes from where to start the inner wrapper) |
Hey all!
|
Here is a high level process diagram I'd like us to maintain, so it represents what we are buidling. (Not finished yet) |
the decision was made to follow the path with the Go App. Which still needs to be modified. |
This is the list of apps deployed as part of the release
|
Regarding Backstage UI wording: Card title
should be changed to:
Once the catalog entity exists, we show this:
|
We are changing the demo flow as follows:
Rationale:
|
In my opinion this removes the capability of showing a very important aspect of the journey in that an app can be deployed along with any and all infrastructure required for its operation should that infrastructure not already exist. One of the arguments about VPC CIDRS was "Where does that information come from" with the answer being "The platform team" - this argument was not suitable as in the opinion of the team it lead to a lack of self service for application teams who may need to spin up and tear down infrastructure without interaction with the platform team Now the argument is that "The platform team should provide the RDS database" which contradicts the earlier argument. If we're to argue that the platform team should handle all infrastructure builds then the purpose of the demo (deploying a new service) becomes mute as it does not demonstrate that a service can be deployed with all required infrastructure. Whilst the argument made here does carry a lot of merit, it detracts from the capability. Additionally to this, the arguments only consider RDS, ignoring entirely the Elasticache part of the service which would not work for an additional application as there are no application specific credentials attached and in fact to add credentials for a second application to Elasticache would require a modification to the replication group built by the original deployment and a restart of all replicated clusters. This capability does not exist today and due to how Elasticache works is not something that can be built separately, as in the case of provisioning users inside RDS. |
I do see both your point @marians and @mproffitt. The big question here is: Who is the audience of the demo. I think it is targeting developers. And as such it should focus on more on speed than creating an environment that is ready to run production workloads. Setting up a full RDS database feels more like a getting ready for production workload task. Developers are used to using virtual or lightweight dbs for testing & QA. |
I would slightly refine this: our target audience are platform teams. The end user we impersonate for the demo flow is a developer. |
Talked with @mproffitt and @piontec about Elasticache. To keep things simple, we are not going to provision anything new per new service created. All services/apps will use the same Elasticache redis server. Redis supports multiple databases, but the identifier is numeric, and we wouldn't have an easy way to map database and service/project. By writing to the same database, there may be a theoretical risk of key collision, but we accept this for now, as we don't run many demos concurrently and we can set the key lifetime very short in our demo application. |
Interesting feedback around the VPC CIDRs we got from a potential customer when we showed them our IDP demo architecture was, that in their case, there's a team that has basically VPC provisioning as their main service, so they basically separate our demo into several use cases that play into each other. Still does not invalidate our demo, just that different companies might disect the use cases or services differently. Similar, I'd say, to how we here now disect the "creation of an RDS cluster" from the "creation and provisioning of a DB in said cluster". I think it is good to cut the demo into something rather small for now, and then be able to show the extended use cases and the complexity that @mproffitt mentioned separately, cause they will not get around the complexity, it will just move somewhere else, in the customer's case actually to a team that will for now not use our stuff to automate their processes, but that we could maybe convince at some point, which then makes it easier for them to chain and integrate platform services into a coherent user experience. |
More ideas
|
@marians the first point I'm agnostic towards but I wonder if that's overcomplicating things a little The second point, no. This would automate too much and detract from showing a) what requires or should have human input and b) creates a failure point such as you selected the wrong provider Config or (future) assigned permissions to users or roles that are incorrect. Even though a lot of this is automated, I feel automating the pr approval is a step too far and introduces entropy into the system |
Automating the PR approval was meant as a fake thing that would simulate what otherwise would of course be done through a human. |
We used to just say: "And if you want, you can require PR reviews to merge your requests." And then merge them ourselves in the Demos that I did for Weaveworks. We should be fine addressing this with the audio track. |
Yeah, most companies I've spoken to have some kind of approval process. That said, if we automated validation could be done in the PR, at least some would enable some auto-merge functionality. Usually wouldbe some kind of PR bot checking for access control (i.e. is user allowed to request said resource) and approve and then if all validation tests are green auto-merge goes through. |
In the GoReleaser step of the release workflow I see this log message:
|
Is there a technical reason for all workloads landing in the |
@marians The main driver for the demo at this stage was simplicity, also see the comment from @puja108 here #3470 (comment)
As for technical reasons, In fact the crossplane compositions support a different namespace for delivering the secrets and we can use any namespace on the workload cluster for application deployment. The only thing that needs to happen is that namespace must pre-exist for ESO to send secrets to, and as per Pujas response, we had moved this out of phase 1 delivery |
Just putting it here as a sidenote: Namespace creation for a new project is a thing most companies have as a service and could be a cool module by itself. It could provision a namespace (with RBAC/OIDC, quota, security/network policy setup) for those use cases where there's no golden path (yet), and it could be chained with a golden path like in this demo, to remove the need for a two-step request. The good thing is, that such a namespace provisioning service could be basically just a helm chart that takes values like project name, team name, OIDC group, and auto-maps things. It can then be extended with things like o11y multi-tenancy or network policy base by other teams like Atlas and Cabbage. That said, I'd see that as a complementary thing that we can and should build as it's straight forward and used by many customers, but we should make that a separate project in area platform. cc @teemow this might be a nice project for Q4 or Q1 that aligns different capabilities of different teams and can generate value directly without the need for complex customer customization. We could talk to adidas and some others that already have such a thing, what features they would expect from it. |
Thanks @puja108! I've put this in a separate issue: https://github.com/giantswarm/giantswarm/issues/31767 |
There is a lot of value in these basic templates. Another template that I have seen in the wild is : "Create a Git repo' They need to be setup in the right way to keep things in order. There is naming conventions and security settings to take into consideration. Those should not be left open for developers to chose if you want to keep chaos at bay. We already have this implemented as part of the IDP demo. It would make sense to pull this out as a separate template as well.. |
Franz wanted us to have some Governance aspects in the demo as well. Governance has 2 parts: A: How do we make sure things are secure? B: Compliance I think we can cover good parts of this without changing the technical part of the demo, but by addressing these in the audio track. |
@LutzLange We were planning on addressing some of this with @giantswarm/team-shield next week and have already included trivy scan integration in the list of potential improvements provided in the description. of this issue. For the moment though, for the audio-track we can already highlight how we ensure some security, split into two topics
We should be careful on the cloud security side though as this is not a topic we traditionally cover and this would normally be the responsibility of the customer cloud security team - I would be hesitant to get bogged down here as it's a whole topic unto itself however as we're showing building infrastructure, we can anticipate some questions towards the topic. |
AFAIK we already have SBOMs and signatures in the build process and store them in the OCI registry. Not sure if we are already checking for those in cluster, but that might be an easy next step (enabled only for the app namespace to not break the whole cluster). We also already have PSS enforcement in-cluster, not sure if we also have network policies, but that could be added. On this level we could mention that you need a combination of in-cluster enforcement and "adding the actual security rules and exceptions to the app". As in this case we are creating an app from a template this means the template needs to include those things and be "secure by default", which I would guess it is, if it runs smoothly in our clusters. CVEs scans and reporting would be a good next feature for platform in Q4, but we need to discuss that on a general level and I don't think it makes sense to just smash it into the demo right now, as there the process is more important than just showing CVEs. |
There is a lot of value in simpler templates. You could also call them building blocks. They are valuable PE services on their own: A) Create a Git Repository (ready to use with security & policy) The self service aspect of these templates provides a lot of value. And If we can find a way to combine these building blocks into more complex templates easily. We would have a set of common building blocks and provide a lot of value to possible customers. I know these last points need further thought, investigation and discussion. but we could and should start with these simpler templates first. |
Whilst I definitely agree with there being a lot of value in simpler templates, this goes far beyond the scope of the demo journey and more towards turning the demo into a full fledged ready to use platform. My opinion on the current IDP demo is to attempt to answer some of the hardest questions facing the industry today. Moving the demo to become a more rounded and evolved product should not be in scope for the demo platform, but should be scoped separately to this current journey as it involves considerable additional thought, planning and implementation that significantly impacts the delivery of key features not even yet given hard consideration. This will definitely be an iterative process, however trying to implement simpler templates at this stage would have significant impacts on key questions that we've already been asked. I would propose that discussions on simple templates be moved to a separate "platform progression" epic, except where otherwise in scope for phase 2. Effectively this leaves B, and potentially A still in scope but C moves out. |
Along those lines, I think we should start closing the first demo issue, and create follow-ups, for which we can then discuss priorities also wrt to the many other things Honeybadger should/could do in the next months. |
I just created a separate ticket with my suggestions for improvements. I've scheduled a call for 6-Nov with the Honeybadger team to discuss. |
User Story
Details, Background
In order to take the user on a journey through the IDP, we have an overall story of creating infrastructure components via crossplane and deploying an app that then consumes that infrastructure.
To accomplish the story and really showcase the capabilities of all components in the pipeline the journey is as follows:
redis
as a backendFlow diagrams: https://miro.com/app/board/uXjVKnjQei8=/
Architecture
Blocked by / depends on
Tasks
create random password
promise #3583Improvements
dev-platform-kratix-promises
repo #3608githubrepo
promise #3607Research/discovery
The text was updated successfully, but these errors were encountered: