-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added GPU support for Google Cloud #2
Commits on Feb 14, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 77b44eb - Browse repository at this point
Copy the full SHA 77b44ebView commit details -
Configuration menu - View commit details
-
Copy full SHA for 2ae143e - Browse repository at this point
Copy the full SHA 2ae143eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5b0286b - Browse repository at this point
Copy the full SHA 5b0286bView commit details -
Configuration menu - View commit details
-
Copy full SHA for 24f44e0 - Browse repository at this point
Copy the full SHA 24f44e0View commit details -
Configuration menu - View commit details
-
Copy full SHA for 930c1e7 - Browse repository at this point
Copy the full SHA 930c1e7View commit details -
Configuration menu - View commit details
-
Copy full SHA for d74306f - Browse repository at this point
Copy the full SHA d74306fView commit details
Commits on Mar 6, 2024
-
- check_machine_type_availability() verifies the machine is available in the zone - check_gpu_model_support() verifies that tha machine and gpu model are compatible - check_gpu_enabled() verifies that GPU_ENABLED=true if GPU_MODEL is populated
Configuration menu - View commit details
-
Copy full SHA for f778776 - Browse repository at this point
Copy the full SHA f778776View commit details
Commits on Mar 8, 2024
-
Moved some gke.sh functions to Python
- Created src/infractl/deploy/gcp/main.py - Moved check_* functions to src/infractl/deploy/gcp/main.py - Made icl/jupyterhub module support 'intel' or 'nvidia'
Configuration menu - View commit details
-
Copy full SHA for 7aad5b2 - Browse repository at this point
Copy the full SHA 7aad5b2View commit details
Commits on Mar 12, 2024
-
Python migration and new nvidia base image
- Check/validate fucntions from gke.sh moved to src/infractl/deploy/gcp/main.py - Built new mutli-stage image for GPU profile image which uses nvidia/cuda:12.2.2-base-ubuntu22.04 as the base and adds pbchekin/icl-jupyterhub:0.0.21 changes - Changed GKE_GPU_DRIVER_VERSION environment variable default to "LATEST" - Added outputs to terraform modules for visibility and ease of future debugging - Modified terraform/icl module to dynamically set selected GPU image with jupyterhub_gpu_profile_image
Configuration menu - View commit details
-
Copy full SHA for 05a479a - Browse repository at this point
Copy the full SHA 05a479aView commit details
Commits on Mar 14, 2024
-
Enable shared gpu and extra_resource_limits fix
- Added shared_gpu variable added to terraform/gcp and terraform/gcp/icl-cluster - Created new conditional module in terraform/gcp/icl-cluster dependent on shared_gpu variable value - Modified pool names to reflect exclusive vs shared GPU modes - Added node_count and gpu_count variables to easily allow future addition of multi-node deployments - Changed var.jupyterhub_extra_resource_limits from map(string) to string - Removed default value for var.jupyterhub_extra_resource_limits
Configuration menu - View commit details
-
Copy full SHA for da15670 - Browse repository at this point
Copy the full SHA da15670View commit details
Commits on Mar 20, 2024
-
- Added default value for jupyterhub_extra_resource_limits - Fixed subprocess.run calls using shell - Removed unused $GPU_ENABLED parameter from gke.sh call to infractl.deploy.gcp.main - isort, black, and pyline changes
Configuration menu - View commit details
-
Copy full SHA for 70d6ab2 - Browse repository at this point
Copy the full SHA 70d6ab2View commit details
Commits on Mar 21, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 70dfbf0 - Browse repository at this point
Copy the full SHA 70dfbf0View commit details -
Updated the conditional logic in the xpumanager module instantiation …
…to enhance flexibility in specifying the type of GPU
Configuration menu - View commit details
-
Copy full SHA for ffc9d31 - Browse repository at this point
Copy the full SHA ffc9d31View commit details -
Configuration menu - View commit details
-
Copy full SHA for 8c5450f - Browse repository at this point
Copy the full SHA 8c5450fView commit details -
Formatting and review suggestions
- Trailing lines added - Uneeded whitepspace trimmed - subprocess import in main.py changed and reordered - Unintentional ray downgrade reverted - Duplicate variable declaration removed from gke.sh - Added GKE_GPU_DRIVER_VERSION description to gke.sh help output - Removed print lines from subprocess.CalledProcessError exceptions
Configuration menu - View commit details
-
Copy full SHA for 0de90d5 - Browse repository at this point
Copy the full SHA 0de90d5View commit details -
Co-authored-by: Pavel Chekin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 76847c0 - Browse repository at this point
Copy the full SHA 76847c0View commit details -
Update terraform/gcp/modules/icl-cluster/main.tf
Co-authored-by: Pavel Chekin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 553f533 - Browse repository at this point
Copy the full SHA 553f533View commit details
Commits on Mar 22, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9346369 - Browse repository at this point
Copy the full SHA 9346369View commit details
Commits on Mar 26, 2024
-
Apply suggestions from code review
Co-authored-by: Vadim Musin <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 98c3f9f - Browse repository at this point
Copy the full SHA 98c3f9fView commit details
Commits on Jun 11, 2024
-
- Added bastion-host terraform module
- Added firewall-rule-bastion-ports module - Added additional variables to /terraform/gcp/variables.tf for bastion-host - Added generate_bastion_key function to create public SSH key when CREATE_BASTION="true" - Added two new environment variables related to bastion creation - Added function check BASTION_SOURCE_RANGES exists and -neq "" if CREATE_BASTION=true
Configuration menu - View commit details
-
Copy full SHA for d034480 - Browse repository at this point
Copy the full SHA d034480View commit details -
Configuration menu - View commit details
-
Copy full SHA for 6c0e8a5 - Browse repository at this point
Copy the full SHA 6c0e8a5View commit details
Commits on Jun 12, 2024
-
-Additional variables added for user/bastion/cluster specific resourc…
…e names and tags
Configuration menu - View commit details
-
Copy full SHA for afc6b37 - Browse repository at this point
Copy the full SHA afc6b37View commit details -
Added deplyoment_type variable to control execution of GPU modules si…
…nce installtion methods will vary across environments.
Configuration menu - View commit details
-
Copy full SHA for 4bd7883 - Browse repository at this point
Copy the full SHA 4bd7883View commit details
Commits on Jun 27, 2024
-
- Changed "deployment_type" variable to more generic boolean "enable_…
…nvidia_operator" - Changed some ENV default values from string to boolean to better align with what TF is expecting - Fixed bastion_name variable - Updated terraform/icl/main.tf to use updated "enable_nvidia_operator" variable
Configuration menu - View commit details
-
Copy full SHA for 39cb829 - Browse repository at this point
Copy the full SHA 39cb829View commit details