Science Score: 44.0%
This score indicates how likely this project is to be science-related based on various indicators:
-
✓CITATION.cff file
Found CITATION.cff file -
✓codemeta.json file
Found codemeta.json file -
✓.zenodo.json file
Found .zenodo.json file -
○DOI references
-
○Academic publication links
-
○Academic email domains
-
○Institutional organization owner
-
○JOSS paper metadata
-
○Scientific vocabulary similarity
Low similarity (13.1%) to scientific vocabulary
Repository
AI-Lab
Basic Info
- Host: GitHub
- Owner: MarcusGrum
- License: agpl-3.0
- Default Branch: main
- Size: 905 KB
Statistics
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
- Releases: 0
Metadata Files
Readme.md
Welcome to the AI-Lab Repository
This repository supports the standardized node configuration, so that Artificial-Intelligence-(AI)-laboratory components work in harmony.
The lab configuration tool was originally developed by Dr.-Ing. Marcus Grum.
This configuration has been tested on Ubuntu-20.04 (using ubuntu-20.04-server-amd64.iso)
as well as on Ubuntu22.04 (using ubuntu-22.04-server-amd64.iso).
Prepare your BIOS to detect all storage devices.
If you use ASUS workstation motherboard called Pro WS WRX80E-SAGE SE WIFI,
you urgently need to realize these
Update to most current BIOS version. Here, the easiest way is as follows: Format an USB stick with
fat32andEnable
BIOS SR-IOVto activate single root IO virtualization support. You can find it at menuadvanced/PCI subsystem settings/SR-IOV_Support.Enable
BIOS CSMand enable all sub-features inUEFI mode(exceptVGA/Graphic).Test, if you can see all storage devices at BIOS. If you can see all storage devices, such as
M2,SSDs,HDDs, you are ready to go to install OS.
Download and install newest Ubuntu
Download the recent Ubuntu (server version) from
https://ubuntu.com/downloadCreate a bootable USB stick with the help of Etcher. Detailed steps can be found at
https://ubuntu.com/tutorials/create-a-usb-stick-on-macos#4-install-and-run-etcherInstall Ubuntu and follow Ubuntu installation instructions. Hence, during the installation process, please consider the following decisions:
* Computer name follows naming convention `AILabNode02`.
* OS is installed on SSDs on a `RAID1`. This consumes SSD1 and SSD2.
This can not be specified via BIOS before Ubuntu installation.
This can not be specified via Ubuntu-desktop installation.
This can not be specified after Ubuntu-desktop installation.
This only can be specified via Ubuntu-server installation.
* Data tank is on HDDs on a `RAID5`. This consumes HDD1, HDD2 and HDD3.
This can not be specified via BIOS before Ubuntu installation.
This can not be specified via Ubuntu-desktop installation.
This can either be specified after Ubuntu-desktop installation, or
this only can be specified via Ubuntu-server installation.
* A `softraid`on `btfs` is based on SSD3 being the `cache` and Data tank being the `data`.
So, cached data is backed up at runtime when processing time is available.
This is specified via OS after Ubuntu installation.
* `RAID1` is based on three partitions.
First, 200MB for `EFI` (or 1.000MB `Bios grub spacer`).
Second, `EXT4` being mounted at `/`.
Third, `SWAP` partition having the size of RAM.
* Use `LVM` with the new Ubuntu installation, so that you easily can deal with partitions, later.
* Install `OpenSSH server`, to enable secure remote access to your server.
Hence, the raid configuration looks as follows:
<img src="./documentation/RaidConfiguration.png" />
Why are we doing this?
Linux OS is mirrored on ultra-fast M2-SSDs, so that operations can be realized fast.
Since linux partitions are mirrored as `RAID1`, it is backed up efficiently.
Hence, one SSD failure can be captured.
Further, huge storage is provided by HDDs, which are backed up by `RAID5`.
Ultra-fast M2-SSD3 is considered as cache for huge storage raid,
so that reading operations can be realized fast (in case `writethrough`)
and reading and writing operations can be realized fast (in case `writeback`).
Since reliability is prioritized, `writethrough` is preferred.
Here,
(1) writing is realized based on HDD speed,
(2) reading is realized in SSD speed,
(3) one SSD failure can be captured because of mirroring relevant data from `RAID1` and
(4) one HDD failure can be captured because of mirroring data in `RAID1`.
- At Ubuntu server installation, setup partitions according to
https://alexskra.com/blog/ubuntu-20-04-with-software-raid1-and-uefi/. Only then,RAID1can be setup as required.
Reformat both drives if they’re not empty.
Mark both drives as a boot device. Doing so will create an ESP(EFI system partition) on both drives.
Add an unformatted GPT partition to both drives. They need to have the same size. We’re going to use those partitions for the RAID that contains the OS.
Create a software RAID(md) by selecting the two partitions you just created for the OS.
Congratulations, you now have a new RAID device. Let’s add at least one GPT partition to it.
Create a partitions for Ubuntu on the RAID device. Faced with our specification from above, we add two partitions: (a)
EXT4being mounted at/. (b)SWAPpartition specified before because we want the ability to swap, create a swap partition on the RAID device. Typically, you will set the size to the same as your RAM, or half if you have 64 GB or more RAM.
After installation, update and upgrade your system
sudo apt-get update sudo apt-get upgrade sudo apt-get updateInstall latest ubuntu drivers, which includes
nvidiadriver, for correctly displaying GUI e.g. For instance, by this, gamma issue (super white filter at standard GUI) vanishes.sudo ubuntu-drivers installInstall the desktop environment:
sudo apt install ubuntu-desktopInstall and set up a display manager to manage users and load up the desktop environment sessions. At installation process, select
GDM3because it refers to the default display manager of GNOME. Alternatively, you can chooseLightDM.sudo apt install lightdmRun this command to start the LightDM service with systemctl:
sudo systemctl start lightdm.serviceRun this command to start the LightDM service using the service utility:
sudo service lightdm startReboot your system with the reboot
sudo systemctl reboot -iLogin at desktop GUI and test GUI.
Set up software RAID
After Ubuntu installation, set up the softraid specified before.
Detailled steps can be found at https://ahelpme.com/linux/lvm/ssd-cache-device-to-a-software-raid5-using-lvm2/.
Install
gepartedfor dealing with storage in a plug-n-play manner.Use
gpartedto create a partition manually on eachsda. Considerunformatedas file system, here. Further, set up a partition in the NVME SSD device to occupy only 91% of the space to have a better SSD endurance and in many cases performance.Install lvm2 and enable the lvm2 service. But first, have a look on the individual storage devices and verify its partitions:
sudo su parted /dev/sda (parted) p (parted) q parted /dev/sdb (parted) p (parted) q parted /dev/sdc (parted) p (parted) q parted /dev/nvme2n1 (parted) p (parted) qAdd partitions to the LVM2 (as physical volumes):
sudo pvcreate /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/nvme2n1p1Create the LVM Volume Group device. The four physical volumes must be in the same group.
sudo vgcreate VG_storage /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/nvme2n1p1Create the RAID5 device using the three slow hard drive disks and their partitions
/dev/sda1,/dev/sdb1and/dev/sdc1. We want to use all the available space on our slow disks in one logical storage device we use100%FREE. The name of the logical device islv_slowhinting it consists of slow disks.sudo lvcreate --type raid5 -I 512 -l 100%FREE -n lv_slow VG_storage /dev/sda1 /dev/sdb1 /dev/sdc1Create the cache pool logical volume with the name
lv_cache(to show it’s a fast SSD device). Again, we use 100% available space on the physical volume (100% from the partition we’ve used).lvcreate --type cache-pool -l 100%FREE -c 4M --cachemode writethrough -n lv_cache VG_storage /dev/nvme2n1p1Consider
writebackif you focus on performance. This mode delays writing data blocks from the cache back to the origin LV. But this mode will increase performance, but the loss of a cache device can result in lost data. Hence, this mode should only be used if the fast cache is realized as individualRAID1.Consider
writethroughinstead if you focus on security. This mode ensures that any data written will be stored both in the cache and on the origin LV. The loss of a device associated with the cache in this case would not mean the loss of any data.Convert the cache device – the slow device (logical volume
lv_slow) will have a cache device (logical volumelv_cache):lvconvert --type cache --cachemode writethrough --cachepool VG_storage/lv_cache VG_storage/lv_slowFormat and do not miss to include it in the /etc/fstab to mount it automatically on boot.
mkfs.ext4 /dev/VG_storage/lv_slowGet to know
UUIDoflv_slowbyblkid |grep lv_slowAdd
UUIDoflv_slowto the/etc/fstab, so that it is mounted automatically on boot. E.g., the entry looks like this:UUID=cbf0e33c-8b89-4b7b-b7dd-1a9429db3987 /mnt/storage ext4 defaults,discard,noatime 1 3You are ready to use this cached
RAID1by mounting it:/mnt/storage
Set-Up Ubuntu AILab configuration
Install NVIDIA driver
Select most current NVIDIA driver from Ubuntu setting menu.
Test nvlink of graphic card
0:nvidia-smi nvlink --status -i 0Test nvlink capabilities of graphic card
0:nvidia-smi nvlink --capabilities -i 0Activate
SLImodus:TBD?
Install further tools
Install net-tools, so that you e.g. can run
ifconfigto get to know your current IP address.sudo apt install net-toolsInstall CLI gpu monitor:
sudo apt install nvtopTest this monitoring tool by running it:
nvtopInstall Internet browser called
firefox:sudo apt install firefoxWhy don't you install
Tree Style Tabadd-on manually right now?Install usefull libraries, such as numpy, matplotlib, etc.
pip3 install numpy pip3 install matplotlibInstall visual studio code as open-source editor:
Update package entries to install the editor first.
sudo apt-get install wget gpg wget -qO- https://packages.microsoft.com/keys/microsoft.asc | gpg --dearmor > packages.microsoft.gpg sudo install -D -o root -g root -m 644 packages.microsoft.gpg /etc/apt/keyrings/packages.microsoft.gpg sudo sh -c 'echo "deb [arch=amd64,arm64,armhf signed-by=/etc/apt/keyrings/packages.microsoft.gpg] https://packages.microsoft.com/repos/code stable main" > /etc/apt/sources.list.d/vscode.list' rm -f packages.microsoft.gpgThen, install the editor.
sudo apt install apt-transport-https sudo apt update sudo apt install code
Install docker engine
The docker engine suits to deal with programs over-the-air. Detailed installation steps can be found at https://docs.docker.com/engine/install/ubuntu.
a) Set up the package index to install docker
Before you install Docker Engine for the first time on a new host machine, you need to set up the Docker repository. Afterward, you can install and update Docker from the repository.
Update the apt package index and install packages to allow apt to use a repository over HTTPS:
``` sudo apt-get update
sudo apt-get install \ ca-certificates \ curl \ gnupg \ lsb-release ```
Add Docker’s official GPG key:
``` sudo mkdir -p /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg ```
Use the following command to set up the repository:
echo \ "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \ $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
b) Install Docker Engine
Update the apt package index, and install the latest version of Docker Engine, containerd, and Docker Compose, or go to the next step to install a specific version:
``` sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-compose-plugin ```
Test docker engine installation:
sudo docker run hello-worldInstall docker-compose:
sudo apt install docker-composeTest docker-compose by showing its current version:
docker-compose --version
c) Move docker's data-root to huge data storage
If you intend to deal with docker volumes and expect huge data (that exceeds SSD memory), move docker data directory and you won’t risk any more to run out of space in your root partition. Be aware, that this slows operation about a factor of 4.0!
Stop the docker daemon:
sudo service docker stopAdd a configuration file to tell the docker daemon what is the location of the data directory. Using your preferred text editor, add a file named
daemon.jsonunder the directory/etc/docker. The file should have at least this content:{ "data-root": "/mnt/storage/docker-data-root" }Copy the current data directory to the new one
sudo rsync -aP /var/lib/docker/ /mnt/storage/docker-data-rootRename the old docker directory
sudo mv /var/lib/docker /var/lib/docker.oldThis is just a sanity check to see that everything is ok and docker daemon will effectively use the new location for its data.
Create a symbolic link from old docker directory
/var/lib/dockerto new docker directory/mnt/storage/docker-data-root. So, old containers having/var/lib/dockerpaths configured in them can be restarted and everything works as before moving docker'sdata-rootpath.sudo ln -s /mnt/storage/docker-data-root /var/lib/dockerNote, the destination folder comes as the first argument to the
lncommand.Restart the docker daemon
sudo service docker startTest
If everything is ok you should see no differences in using your docker containers. When you are sure that the new directory is being used correctly by docker daemon you can delete the old data directory.
sudo rm -rf /var/lib/docker.old
Setting up NVIDIA Container Toolkit
Setting up NVIDIA Container Toolkit so that it functions as driver for docking docker containers to GPUs. Detailed steps can be found at https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html#docker.
Setup the package repository and the GPG key:
distribution=$(. /etc/os-release;echo $ID$VERSION_ID) \ && curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \ && curl -s -L https://nvidia.github.io/libnvidia-container/$distribution/libnvidia-container.list | \ sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \ sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.listInstall the nvidia-docker2 package (and dependencies) after updating the package listing:
``` sudo apt-get update
sudo apt-get install -y nvidia-docker2 ```
Restart the Docker daemon to complete the installation after setting the default runtime:
sudo systemctl restart dockerAt this point, a working setup can be tested by running a base CUDA container:
sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smiThis should result in a console output shown below:
``` +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.48.07 Driver Version: 515.48.07 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTX A6000 Off | 00000000:30:00.0 Off | Off | | 30% 49C P8 20W / 300W | 5MiB / 49140MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A6000 Off | 00000000:41:00.0 On | Off | | 30% 59C P8 30W / 300W | 299MiB / 49140MiB | 1% Default | | | | N/A | +-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| +-----------------------------------------------------------------------------+ ```
Install git tools
Git tools suite for dealing with different kinds of repositories.
Install git (detailed steps can be found at https://github.com/git-guides/install-git)
``` sudo apt-get update
sudo apt-get install git-all ```
Test git installation by showing current git version:
git --version
Install AILab repositories
Prepare joint repository at cached
RAID5storage created before:``` mkdir /mnt/storage/repositories
cd /mnt/storage/repositories ```
Clone relevant repositories, such as the following:
git clone https://github.com/MarcusGrum/AI-CPS.git
Owner
- Login: MarcusGrum
- Kind: user
- Repositories: 3
- Profile: https://github.com/MarcusGrum
Citation (CITATION.cff)
cff-version: 1.2.0
title: AI-Lab Repository
message: >-
If you use this software, please cite it using the
metadata from this file.
type: software
authors:
- family-names: Grum
orcid: 'https://orcid.org/0000-0002-3815-4238'
given-names: Marcus
version: 2.0.4
doi: TBD
date-released: 2022-07-21
url: "https://github.com/MarcusGrum/AI-Lab.git"