Page 1 of 1

How to Use HiPerGator

Posted: Sun Oct 10, 2021 6:04 am
by jstenner
BASICS

Alright, we've worked through basic principles of AI using P5.js and ML5.js. Now let's work with AI using some tools that allow us to interact with the outstanding resources available via: UF's HiPerGator AI.

Python is commonly used in AI work and many resources are available online. A common technique is to prototype and develop an AI application using a development tool like Google Collab or Jupyter. Once the app is designed, tested and optimized it is typically run as a Python program in production. In the case of HiPerGator, it might be part of a SLURM batch process, for example. You can think of SLURM as something similar to our own Thinkbox Deadline renderfarm submission system, but more commonly available in the world of high performance computing systems. HiPerGator provides resources for using Jupyter, so that's what we'll focus on below. You should recognize, that you can install and run Jupyter on your personal computer as well, you just won't have ready access to all the CPUs and GPUs available via HiPerGator.

A good starting point for us is this set of tutorials at pythonprogramming.net:
https://pythonprogramming.net/introduct ... low-keras/

We'll work through the first several tutorials only...we won't spend time on cryptocurrency (though that could be cool, too).



REFERENCE

Pre-recorded Training
Jupyter Notebooks
AI HELP
TensorFlow
Modules (see also Basic Usage)
Locally available Datasets (i.e. on the HiPerGator system)

CONFIG

So, let's log into HPC via SSH using our previously created alias, "hgator" (assuming that's what you called yours...check your .zshrc or run "alias" to see all of your aliases if unsure):

Code: Select all

hgator
Next, we're going to create a symbolic link (like an alias) to our class directory on the HiPerGator "blue" drive. Recall, that's where we want to do most of our daily work. A symbolic link is created using the "ln -s" command followed by the path to the directory of interest, followed by the name of the link you want to create:

Code: Select all

ln -s /blue/art4612/ blue_art4612

After you've done that, look at your home directory and see how it represents a symbolic link...see the arrow?

Code: Select all

blue_art4612 -> /blue/art4612/

I have created a Demo working directory in blue_art4612/share/Deep_Learning_Basics. To work with these notebooks, you will want to create your own working directory inside your blue_art4612/<username> folder so you're not overwriting the demo files. Call it whatever you like. Once you've created your working directory, navigate to it. So, here's an example:

from within your home directory (~):

Code: Select all

cd blue_art4612/<your_username>
mkdir  my_notebooks
cd my_notebooks

Now, you want to copy the demo jupyter notebook(s) to your current working directory. You must be inside the directory in which you want to work before issuing the following command. Note the "." at the end of the command. That means, copy the file to the current directory which, based on the previous commands, is now "my_notebooks".

Code: Select all

cp ~/blue_art4612/share/Deep_Learning_Basics/01.Deep_Learning_with_Tensorflow.ipynb .

Repeat that for each of the jupyter notebooks in the Deep_Learning_Basics directory. You'll also want to grab the "datasets" directory.

WORK

The next thing we want to do is make sure we're using the proper version of Python in our project directory. We can do that by using Python 3's "venv" command. "venv" stands for virtual environment. It allows you to set the version of python and any other libraries (modules) we need for this particular project without overriding the system settings used elsewhere. I called my virtual python environment dl_env. You can call it whatever you want:

Code: Select all

python3 -m venv <whatevername>
source <whatevername>/bin/activate

Now, the steps above are less critical on HPC because they use what is called a "module" system. You can load pre-installed modules rather than install them into your working directory environment. Since we're using Jupyter Lab, they have also created task specific "kernels" that are designed for various types of work.

From on-campus or via your VPN from home, open JupyterHub in your web browser:
https://jupyterhub.rc.ufl.edu
or better, use Open on Demand, which provides more options:
https://ood.rc.ufl.edu

For SLURM account and QoS enter our class group name: "art4612" (or whichever it happens to be)
Screen Shot 2021-10-11 at 9.59.19 AM.png
Using the Jupyter Notebook web server, navigate to your "my_notebooks" (demo) directory and find your notebooks. Most of these rely on TensorFlow, so be sure to select the TensorFlow Kernel.

** FYI, via Jupyter you can install modules inline like this:

Code: Select all

!pip install --user <modulename>
and it will install in your user directory.
Be careful about version mismatches, though.

02.Loading Data.ipynb uses the library/module OpenCV. I you were working on your own computer, you'd install it into your "venv" using pip:

Code: Select all

python3 -m pip install opencv-python

Some other libraries you'd need to install for these demos on your OWN computer are:

Code: Select all

python3 -m pip install matplotlib
python3 -m pip install tqdm
BUT, on HiPerGator you use their module system to load openCV:

Code: Select all

module load opencv/4.5.2

You can list all the available modules and versions with:

Code: Select all

module spider <software-name>

You can list all currently loaded modules with:

Code: Select all

module list

If you get more involved with this kind of work, you'll want to familiarize yourself with "lmod" (the basis of the module system) and how to use it:
https://help.rc.ufl.edu/doc/Modules

It appears "matplotlib" and "tzdm", needed for these demos, are automatically available via the Jupyter TensorFlow kernel we are using. I didn't need to install them individually.

You should be able to run the demos now! The only thing left we need to configure is TensorBoard. TensorBoard is used to analyze the results of your training by providing a realtime (or not) graphing system you can monitor in a web browser. It looks at the log files generated by your application/notebook and provides a web service to view them.  On your own computer, after configuring your app to include TensorBoard, you'd run the following command from within your project directory to launch the TensorBoard web service assuming your log directory is named "logs":

Code: Select all

tensorboard --logdir=logs/

Your command will respond with a localhost URL that you open with your web browser to analyze your logs.

On HiPerGator, since you're working remotely, this has to be handled differently. Here are their instructions:
https://help.rc.ufl.edu/doc/Tensorboard

Essentially it's the same except you need to initiate a separate work session that utilizes a "desktop"....or via Open On Demand, then via the terminal in that session:

Code: Select all

module load ubuntu
firefox
module load tensorflow ubuntu
tensorboard --logdir ./pathtolog

HAPPY AI'ing!