Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parsed mds by for chatbot parser script for account.md, connecting.md, compiling_your_software.md (FOR REVIEW ONLY, NO MERGE!) #664

Draft
wants to merge 148 commits into
base: main
Choose a base branch
from

Conversation

EwDa291
Copy link
Contributor

@EwDa291 EwDa291 commented Aug 8, 2024

The scripts are still very much a messy work in progress. They need some extra features and need to be cleaned before they can be merged.

@EwDa291
Copy link
Contributor Author

EwDa291 commented Aug 8, 2024

This last commit made the amount of changed files very large since I also included the output of the script. The important scripts can be found in main.py and jinja_parser.py. I will make some more changes to the file structure in the future to clean it up a bit.

EwDa291 and others added 26 commits August 9, 2024 10:39
This file is just used to test some things locally and not part of the parser
and "permanently".
Click "Confirm"
You will now be taken to the authentication page of your institute.
You will now have to log in with CAS using your UGent account.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with CAS remove it? or explain it

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CAS -> CAS (Central Authentication Service)?

You will now have to log in with CAS using your UGent account.
You either have a login name of maximum 8 characters, or a (non-UGent)
email address if you are an external user. In case of problems with your
UGent password, please visit: <https://password.ugent.be/>. After
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EwDa291 remove the < and > . not sure what the : is for

@@ -0,0 +1,10 @@
Compiling and testing your software on the HPC
All nodes in the HPC cluster are running the "RHEL 8.8 (accelgor, doduo, donphan, gallade, joltik, skitty)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened with the double quotes?, not needed to include the clusters i think

@@ -0,0 +1,10 @@
Compiling and testing your software on the HPC
All nodes in the HPC cluster are running the "RHEL 8.8 (accelgor, doduo, donphan, gallade, joltik, skitty)"
Operating system, which is a specific version of Red Hat Enterprise Linux. This means that all the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no capital O i guess (or add one to System)?

All nodes in the HPC cluster are running the "RHEL 8.8 (accelgor, doduo, donphan, gallade, joltik, skitty)"
Operating system, which is a specific version of Red Hat Enterprise Linux. This means that all the
software programs
(executable) that the end-user wants to run on the HPC first must be
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

executables

Operating system, which is a specific version of Red Hat Enterprise Linux. This means that all the
software programs
(executable) that the end-user wants to run on the HPC first must be
compiled for RHEL 8.8 (accelgor, doduo, donphan, gallade, joltik, skitty). It also means that you first have to install all the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a macro for rhel without the clusters? or have value RHEL 8.8 (on clusters ...)

Note: The Intel Parallel Studio Cluster Edition contains equivalent
compilers for all GNU compilers. Hereafter the overview for C, C++ and
Fortran compilers.
| | Sequential Program | | **Parallel Program (with MPI)** | |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this table needs some explanation if you want to keep it around

Connecting to the HPC infrastructure
Before you can really start using the HPC clusters, there are several things
you need to do or know:
1. You need to log on to the cluster using an SSH client to one of
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

put the webportal before the ssh client.

that you need from your desktop computer to the cluster. At the end
of a job, you might want to transfer some files back.
3. Optionally, if you wish to use programs with a **graphical user
interface**, you will need an X-server on your client system and log
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wait, do we still recommend this for GUI usage? @boegel @hajgato

3. Optionally, if you wish to use programs with a **graphical user
interface**, you will need an X-server on your client system and log
in to the login nodes with X-forwarding enabled.
4. Often several versions of software packages and libraries are
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extra starting line to explain that this is about the software you want to use

$ exit
logout
Connection to login.hpc.ugent.be closed.
tip "tip: Setting your Language right"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tip tip? maybe add the word locale here? this is not only about language. and one or lines introduction what a locale is

@@ -0,0 +1,7 @@
Fast file transfer for large datasets
See the section on rsync in chapter 5 of the Linux intro manual.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chapter 5 -> "file uploading" chapter

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the "intro manual"?

Fast file transfer for large datasets
See the section on rsync in chapter 5 of the Linux intro manual.
Changing login nodes
It can be useful to have control over which login node you are on. However, when you connect to the HPC (High-Performance Computing) system, you are directed to a random login node, which might not be the one where you already have an active session. To address this, there is a way to manually switch your active login node.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HPC system? consistent terminology please. also not sure you still need to explain what HPC means at this point in the manual

$ hostname
gligar08.gastly.os
Rather than always starting a new session on the HPC, you can also use a terminal multiplexer like screen or tmux.
These can make sessions that 'survives' across disconnects.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we should also promote or recommend the webportal equivalnets?

@@ -0,0 +1,18 @@
Connection restrictions
Since March 20th 2020, restrictions are in place that limit from where
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be temoved i guess? not sure you make footnotes, but imho this is not relevant anymore

All other IP domains are blocked by default. If you are connecting from
an IP address that is not allowed direct access, you have the following
options to get access to VSC login nodes:
- Use an VPN connection to connect to UGent the network (recommended). See <https://helpdesk.ugent.be/vpn/en/> for more information.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@EwDa291 more <>

an IP address that is not allowed direct access, you have the following
options to get access to VSC login nodes:
- Use an VPN connection to connect to UGent the network (recommended). See <https://helpdesk.ugent.be/vpn/en/> for more information.
- Whitelist your IP address automatically by accessing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mention that this is streamlined when the webportal is used

connect, with an error message like:
ssh_exchange_identification: read: Connection reset by peer
First Time connection to the HPC infrastructure
The remaining content in this chapter is primarily focused for people utilizing a terminal with SSH.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consitency, and if this truly only for ssh, make a ssh chapter like there is one for the webportal.

cd examples
tip
Typing cd ex followed by tab (the Tab-key) will generate the cd examples
command. Command-line completion (also tab completion) is a common feature of the bash command
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also tab -> also known as tab

The first action is to copy the contents of the HPC examples directory to
your home directory, so that you have your own personal copy and that
you can start using the examples. The "-r" option of the copy command
will also copy the contents of the sub-directories "recursively".
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add (which is not done by default)

@@ -0,0 +1,19 @@
cp -r /apps/gent/tutorials/Intro-HPC/examples ~/
Go to your home directory, check your own private examples directory, ... and start working.
cd
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing a cd experiment1 or so?

accelgor 4 3 2 9 18 1
donphan 0 0 16 16 16 13
gallade 2 0 5 16 19 136
For a full view of the current loads and queues see:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

queues are not shown here. they also don't exist anymore ;)

@EwDa291
Copy link
Contributor Author

EwDa291 commented Aug 30, 2024

@EwDa291 counting tokens is with something like

import tiktoken

def num_tokens_from_string(string: str, encoding_name: str) -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    num_tokens = len(encoding.encode(string))
    return num_tokens

print(num_tokens_from_string("Hello world, let's test tiktoken.", "cl100k_base"))

it's not clear what the dependencies of tiktoken are (it's odd there are so few). you when you do a pip isnstall, capture the output and maybe paste a list here.

The output i got when doing pip install is this:

Collecting tiktoken
  Downloading tiktoken-0.7.0-cp312-cp312-win_amd64.whl.metadata (6.8 kB)
Collecting regex>=2022.1.18 (from tiktoken)
  Downloading regex-2024.7.24-cp312-cp312-win_amd64.whl.metadata (41 kB)
Collecting requests>=2.26.0 (from tiktoken)
  Using cached requests-2.32.3-py3-none-any.whl.metadata (4.6 kB)
Collecting charset-normalizer<4,>=2 (from requests>=2.26.0->tiktoken)
  Using cached charset_normalizer-3.3.2-cp312-cp312-win_amd64.whl.metadata (34 kB)
Collecting idna<4,>=2.5 (from requests>=2.26.0->tiktoken)
  Using cached idna-3.8-py3-none-any.whl.metadata (9.9 kB)
Collecting urllib3<3,>=1.21.1 (from requests>=2.26.0->tiktoken)
  Using cached urllib3-2.2.2-py3-none-any.whl.metadata (6.4 kB)
Collecting certifi>=2017.4.17 (from requests>=2.26.0->tiktoken)
  Downloading certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Downloading tiktoken-0.7.0-cp312-cp312-win_amd64.whl (799 kB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 799.3/799.3 kB 1.9 MB/s eta 0:00:00
Downloading regex-2024.7.24-cp312-cp312-win_amd64.whl (269 kB)
Using cached requests-2.32.3-py3-none-any.whl (64 kB)
Downloading certifi-2024.8.30-py3-none-any.whl (167 kB)
Using cached charset_normalizer-3.3.2-cp312-cp312-win_amd64.whl (100 kB)
Using cached idna-3.8-py3-none-any.whl (66 kB)
Using cached urllib3-2.2.2-py3-none-any.whl (121 kB)
Installing collected packages: urllib3, regex, idna, charset-normalizer, certifi, requests, tiktoken
Successfully installed certifi-2024.8.30 charset-normalizer-3.3.2 idna-3.8 regex-2024.7.24 requests-2.32.3 tiktoken-0.7.0 urllib3-2.2.2

@boegel boegel marked this pull request as draft August 30, 2024 08:12
@boegel boegel changed the title Parser script for the chatbot input Parser script for the chatbot input (DO NOT REVIEW/MERGE, SEE PR #675) Aug 30, 2024
@boegel boegel changed the title Parser script for the chatbot input (DO NOT REVIEW/MERGE, SEE PR #675) parsed mds by for chatbot parser script for account.md, connecting.md, compiling_your_software.md (FOR REVIEW ONLY, NO MERGE!) Aug 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants