Refactor structure for essential steps

This commit is contained in:
Varun Vasudeva
2024-08-21 01:18:14 -05:00
parent 3a98385e0c
commit 2d252cbfc8

View File

@@ -11,6 +11,15 @@ _TL;DR_: A guide to setting up a fully local and private language model server a
- [System Requirements](#system-requirements) - [System Requirements](#system-requirements)
- [Prerequisites](#prerequisites) - [Prerequisites](#prerequisites)
- [Essential Setup](#essential-setup) - [Essential Setup](#essential-setup)
- [General](#general)
- [Drivers](#drivers)
- [Nvidia GPUs](#nvidia-gpus)
- [AMD GPUs](#amd-gpus)
- [Ollama](#ollama)
- [Startup Script](#startup-script)
- [Scheduling Startup Script](#scheduling-startup-script)
- [Configuring Script Permissions](#configuring-script-permissions)
- [Configuring Auto-Login](#configuring-auto-login)
- [Additional Setup](#additional-setup) - [Additional Setup](#additional-setup)
- [SSH](#ssh) - [SSH](#ssh)
- [Firewall](#firewall) - [Firewall](#firewall)
@@ -20,19 +29,19 @@ _TL;DR_: A guide to setting up a fully local and private language model server a
- [Open WebUI Integration](#open-webui-integration) - [Open WebUI Integration](#open-webui-integration)
- [Downloading Voices](#downloading-voices) - [Downloading Voices](#downloading-voices)
- [Verifying](#verifying) - [Verifying](#verifying)
- [Ollama](#ollama) - [Ollama](#ollama-1)
- [Open WebUI](#open-webui-1) - [Open WebUI](#open-webui-1)
- [OpenedAI Speech](#openedai-speech-1) - [OpenedAI Speech](#openedai-speech-1)
- [Updating](#updating) - [Updating](#updating)
- [General](#general) - [General](#general-1)
- [Nvidia Drivers \& CUDA](#nvidia-drivers--cuda) - [Nvidia Drivers \& CUDA](#nvidia-drivers--cuda)
- [Ollama](#ollama-1) - [Ollama](#ollama-2)
- [Open WebUI](#open-webui-2) - [Open WebUI](#open-webui-2)
- [OpenedAI Speech](#openedai-speech-2) - [OpenedAI Speech](#openedai-speech-2)
- [Troubleshooting](#troubleshooting) - [Troubleshooting](#troubleshooting)
- [`ssh`](#ssh-1) - [`ssh`](#ssh-1)
- [Nvidia Drivers](#nvidia-drivers) - [Nvidia Drivers](#nvidia-drivers)
- [Ollama](#ollama-2) - [Ollama](#ollama-3)
- [Open WebUI](#open-webui-3) - [Open WebUI](#open-webui-3)
- [OpenedAI Speech](#openedai-speech-3) - [OpenedAI Speech](#openedai-speech-3)
- [Monitoring](#monitoring) - [Monitoring](#monitoring)
@@ -91,15 +100,18 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
## Essential Setup ## Essential Setup
1. ### Update the system ### General
- Run the following commands: Update the system by running the following commands:
``` ```
sudo apt update sudo apt update
sudo apt upgrade sudo apt upgrade
``` ```
2. ### Install drivers ### Drivers
- #### Nvidia
Now, we'll install the required GPU drivers that allow programs to utilize their compute capabilities.
#### Nvidia GPUs
- Follow Nvidia's [guide on downloading CUDA Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian). The instructions are specific to your machine and the website will lead you to them interactively. - Follow Nvidia's [guide on downloading CUDA Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian). The instructions are specific to your machine and the website will lead you to them interactively.
- Run the following commands: - Run the following commands:
``` ```
@@ -112,7 +124,7 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
nvidia-smi nvidia-smi
``` ```
- #### AMD #### AMD GPUs
- Run the following commands: - Run the following commands:
``` ```
deb http://deb.debian.org/debian bookworm main contrib non-free-firmware deb http://deb.debian.org/debian bookworm main contrib non-free-firmware
@@ -120,7 +132,7 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
``` ```
- Reboot the server. - Reboot the server.
3. ### Install `ollama` ### Ollama
Ollama, a Docker-based wrapper of `llama.cpp`, serves the inference engine and enables inference from the language models you will download. It'll be installed as a service, so it runs automatically at boot. Ollama, a Docker-based wrapper of `llama.cpp`, serves the inference engine and enables inference from the language models you will download. It'll be installed as a service, so it runs automatically at boot.
@@ -148,11 +160,11 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
``` ```
> [!TIP] > [!TIP]
> If you installed `ollama` manually or don't use it as a service, remember to run `ollama serve` to properly start the server. Refer to [Ollama's troubleshooting steps](#ollama-2) if you encounter an error. > If you installed `ollama` manually or don't use it as a service, remember to run `ollama serve` to properly start the server. Refer to [Ollama's troubleshooting steps](#ollama-3) if you encounter an error.
4. ### Create the `init.bash` script ### Startup Script
This script will be run at boot to set the GPU power limit and start the server using `ollama`. We set the GPU power limit lower because it has been seen in testing and inference that there is only a 5-15% performance decrease for a 30% reduction in power consumption. This is especially important for servers that are running 24/7. In this step, we'll create a script called `init.bash`. This script will be run at boot to set the GPU power limit and start the server using `ollama`. We set the GPU power limit lower because it has been seen in testing and inference that there is only a 5-15% performance decrease for a 30% reduction in power consumption. This is especially important for servers that are running 24/7.
- Run the following commands: - Run the following commands:
``` ```
@@ -178,7 +190,7 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
chmod +x init.bash chmod +x init.bash
``` ```
5. ### Add `init.bash` to the crontab ### Scheduling Startup Script
Adding the `init.bash` script to the crontab will schedule it to run at boot. Adding the `init.bash` script to the crontab will schedule it to run at boot.
@@ -198,9 +210,9 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
``` ```
- Save and exit the file. - Save and exit the file.
6. ### Give `nvidia-persistenced` and `nvidia-smi` passwordless `sudo` permissions ### Configuring Script Permissions
We want `init.bash` to run the `nvidia-smi` commands without having to enter a password. This is done by editing the `sudoers` file. We want `init.bash` to run the `nvidia-smi` commands without having to enter a password. This is done by giving `nvidia-persistenced` and `nvidia-smi` passwordless `sudo` permissions, and can be achieved by editing the `sudoers` file.
AMD users can skip this step as power limiting is not supported on AMD GPUs. AMD users can skip this step as power limiting is not supported on AMD GPUs.
@@ -214,12 +226,12 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
(username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi (username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi
``` ```
> Replace `(username)` with your username. > Replace `(username)` with your username.
- Save and exit the file.
> [!IMPORTANT] > [!IMPORTANT]
> Ensure that you add these lines AFTER `%sudo ALL=(ALL:ALL) ALL`. The order of the lines in the file matters - the last matching line will be used so if you add these lines before `%sudo ALL=(ALL:ALL) ALL`, they will be ignored. > Ensure that you add these lines AFTER `%sudo ALL=(ALL:ALL) ALL`. The order of the lines in the file matters - the last matching line will be used so if you add these lines before `%sudo ALL=(ALL:ALL) ALL`, they will be ignored.
- Save and exit the file.
7. ### Configure auto-login ### Configuring Auto-Login
When the server boots up, we want it to automatically log in to a user account and run the `init.bash` script. This is done by configuring the `lightdm` display manager. When the server boots up, we want it to automatically log in to a user account and run the `init.bash` script. This is done by configuring the `lightdm` display manager.