mirror of
https://github.com/varunvasudeva1/llm-server-docs.git
synced 2025-12-16 19:47:42 +01:00
Refactor structure for essential steps
This commit is contained in:
122
README.md
122
README.md
@@ -11,6 +11,15 @@ _TL;DR_: A guide to setting up a fully local and private language model server a
|
|||||||
- [System Requirements](#system-requirements)
|
- [System Requirements](#system-requirements)
|
||||||
- [Prerequisites](#prerequisites)
|
- [Prerequisites](#prerequisites)
|
||||||
- [Essential Setup](#essential-setup)
|
- [Essential Setup](#essential-setup)
|
||||||
|
- [General](#general)
|
||||||
|
- [Drivers](#drivers)
|
||||||
|
- [Nvidia GPUs](#nvidia-gpus)
|
||||||
|
- [AMD GPUs](#amd-gpus)
|
||||||
|
- [Ollama](#ollama)
|
||||||
|
- [Startup Script](#startup-script)
|
||||||
|
- [Scheduling Startup Script](#scheduling-startup-script)
|
||||||
|
- [Configuring Script Permissions](#configuring-script-permissions)
|
||||||
|
- [Configuring Auto-Login](#configuring-auto-login)
|
||||||
- [Additional Setup](#additional-setup)
|
- [Additional Setup](#additional-setup)
|
||||||
- [SSH](#ssh)
|
- [SSH](#ssh)
|
||||||
- [Firewall](#firewall)
|
- [Firewall](#firewall)
|
||||||
@@ -20,19 +29,19 @@ _TL;DR_: A guide to setting up a fully local and private language model server a
|
|||||||
- [Open WebUI Integration](#open-webui-integration)
|
- [Open WebUI Integration](#open-webui-integration)
|
||||||
- [Downloading Voices](#downloading-voices)
|
- [Downloading Voices](#downloading-voices)
|
||||||
- [Verifying](#verifying)
|
- [Verifying](#verifying)
|
||||||
- [Ollama](#ollama)
|
- [Ollama](#ollama-1)
|
||||||
- [Open WebUI](#open-webui-1)
|
- [Open WebUI](#open-webui-1)
|
||||||
- [OpenedAI Speech](#openedai-speech-1)
|
- [OpenedAI Speech](#openedai-speech-1)
|
||||||
- [Updating](#updating)
|
- [Updating](#updating)
|
||||||
- [General](#general)
|
- [General](#general-1)
|
||||||
- [Nvidia Drivers \& CUDA](#nvidia-drivers--cuda)
|
- [Nvidia Drivers \& CUDA](#nvidia-drivers--cuda)
|
||||||
- [Ollama](#ollama-1)
|
- [Ollama](#ollama-2)
|
||||||
- [Open WebUI](#open-webui-2)
|
- [Open WebUI](#open-webui-2)
|
||||||
- [OpenedAI Speech](#openedai-speech-2)
|
- [OpenedAI Speech](#openedai-speech-2)
|
||||||
- [Troubleshooting](#troubleshooting)
|
- [Troubleshooting](#troubleshooting)
|
||||||
- [`ssh`](#ssh-1)
|
- [`ssh`](#ssh-1)
|
||||||
- [Nvidia Drivers](#nvidia-drivers)
|
- [Nvidia Drivers](#nvidia-drivers)
|
||||||
- [Ollama](#ollama-2)
|
- [Ollama](#ollama-3)
|
||||||
- [Open WebUI](#open-webui-3)
|
- [Open WebUI](#open-webui-3)
|
||||||
- [OpenedAI Speech](#openedai-speech-3)
|
- [OpenedAI Speech](#openedai-speech-3)
|
||||||
- [Monitoring](#monitoring)
|
- [Monitoring](#monitoring)
|
||||||
@@ -91,75 +100,78 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
|
|||||||
|
|
||||||
## Essential Setup
|
## Essential Setup
|
||||||
|
|
||||||
1. ### Update the system
|
### General
|
||||||
- Run the following commands:
|
Update the system by running the following commands:
|
||||||
```
|
```
|
||||||
sudo apt update
|
sudo apt update
|
||||||
sudo apt upgrade
|
sudo apt upgrade
|
||||||
```
|
```
|
||||||
|
|
||||||
2. ### Install drivers
|
### Drivers
|
||||||
- #### Nvidia
|
|
||||||
- Follow Nvidia's [guide on downloading CUDA Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian). The instructions are specific to your machine and the website will lead you to them interactively.
|
Now, we'll install the required GPU drivers that allow programs to utilize their compute capabilities.
|
||||||
- Run the following commands:
|
|
||||||
|
#### Nvidia GPUs
|
||||||
|
- Follow Nvidia's [guide on downloading CUDA Toolkit](https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Debian). The instructions are specific to your machine and the website will lead you to them interactively.
|
||||||
|
- Run the following commands:
|
||||||
```
|
```
|
||||||
sudo apt install linux-headers-amd64
|
sudo apt install linux-headers-amd64
|
||||||
sudo apt install nvidia-driver firmware-misc-nonfree
|
sudo apt install nvidia-driver firmware-misc-nonfree
|
||||||
```
|
```
|
||||||
- Reboot the server.
|
- Reboot the server.
|
||||||
- Run the following command to verify the installation:
|
- Run the following command to verify the installation:
|
||||||
```
|
```
|
||||||
nvidia-smi
|
nvidia-smi
|
||||||
```
|
```
|
||||||
|
|
||||||
- #### AMD
|
#### AMD GPUs
|
||||||
- Run the following commands:
|
- Run the following commands:
|
||||||
```
|
```
|
||||||
deb http://deb.debian.org/debian bookworm main contrib non-free-firmware
|
deb http://deb.debian.org/debian bookworm main contrib non-free-firmware
|
||||||
apt-get install firmware-amd-graphics libgl1-mesa-dri libglx-mesa0 mesa-vulkan-drivers xserver-xorg-video-all
|
apt-get install firmware-amd-graphics libgl1-mesa-dri libglx-mesa0 mesa-vulkan-drivers xserver-xorg-video-all
|
||||||
```
|
```
|
||||||
- Reboot the server.
|
- Reboot the server.
|
||||||
|
|
||||||
3. ### Install `ollama`
|
### Ollama
|
||||||
|
|
||||||
Ollama, a Docker-based wrapper of `llama.cpp`, serves the inference engine and enables inference from the language models you will download. It'll be installed as a service, so it runs automatically at boot.
|
Ollama, a Docker-based wrapper of `llama.cpp`, serves the inference engine and enables inference from the language models you will download. It'll be installed as a service, so it runs automatically at boot.
|
||||||
|
|
||||||
- Download `ollama` from the official repository:
|
- Download `ollama` from the official repository:
|
||||||
```
|
```
|
||||||
curl -fsSL https://ollama.com/install.sh | sh
|
curl -fsSL https://ollama.com/install.sh | sh
|
||||||
```
|
```
|
||||||
|
|
||||||
We want our API endpoint to be reachable by the rest of the LAN. For `ollama`, this means setting `OLLAMA_HOST=0.0.0.0` in the `ollama.service`.
|
We want our API endpoint to be reachable by the rest of the LAN. For `ollama`, this means setting `OLLAMA_HOST=0.0.0.0` in the `ollama.service`.
|
||||||
|
|
||||||
- Run the following command to edit the service:
|
- Run the following command to edit the service:
|
||||||
```
|
```
|
||||||
systemctl edit ollama.service
|
systemctl edit ollama.service
|
||||||
```
|
```
|
||||||
- Find the `[Service]` section and add `Environment="OLLAMA_HOST=0.0.0.0"` under it. It should look like this:
|
- Find the `[Service]` section and add `Environment="OLLAMA_HOST=0.0.0.0"` under it. It should look like this:
|
||||||
```
|
```
|
||||||
[Service]
|
[Service]
|
||||||
Environment="OLLAMA_HOST=0.0.0.0"
|
Environment="OLLAMA_HOST=0.0.0.0"
|
||||||
```
|
```
|
||||||
- Save and exit.
|
- Save and exit.
|
||||||
- Reload the environment.
|
- Reload the environment.
|
||||||
```
|
```
|
||||||
systemctl daemon-reload
|
systemctl daemon-reload
|
||||||
systemctl restart ollama
|
systemctl restart ollama
|
||||||
```
|
```
|
||||||
|
|
||||||
> [!TIP]
|
> [!TIP]
|
||||||
> If you installed `ollama` manually or don't use it as a service, remember to run `ollama serve` to properly start the server. Refer to [Ollama's troubleshooting steps](#ollama-2) if you encounter an error.
|
> If you installed `ollama` manually or don't use it as a service, remember to run `ollama serve` to properly start the server. Refer to [Ollama's troubleshooting steps](#ollama-3) if you encounter an error.
|
||||||
|
|
||||||
4. ### Create the `init.bash` script
|
### Startup Script
|
||||||
|
|
||||||
This script will be run at boot to set the GPU power limit and start the server using `ollama`. We set the GPU power limit lower because it has been seen in testing and inference that there is only a 5-15% performance decrease for a 30% reduction in power consumption. This is especially important for servers that are running 24/7.
|
In this step, we'll create a script called `init.bash`. This script will be run at boot to set the GPU power limit and start the server using `ollama`. We set the GPU power limit lower because it has been seen in testing and inference that there is only a 5-15% performance decrease for a 30% reduction in power consumption. This is especially important for servers that are running 24/7.
|
||||||
|
|
||||||
- Run the following commands:
|
- Run the following commands:
|
||||||
```
|
```
|
||||||
touch init.bash
|
touch init.bash
|
||||||
nano init.bash
|
nano init.bash
|
||||||
```
|
```
|
||||||
- Add the following lines to the script:
|
- Add the following lines to the script:
|
||||||
```
|
```
|
||||||
#!/bin/bash
|
#!/bin/bash
|
||||||
sudo nvidia-smi -pm 1
|
sudo nvidia-smi -pm 1
|
||||||
@@ -172,71 +184,71 @@ I also recommend installing a lightweight desktop environment like XFCE for ease
|
|||||||
sudo nvidia-smi -i 0 -pl (power_limit)
|
sudo nvidia-smi -i 0 -pl (power_limit)
|
||||||
sudo nvidia-smi -i 1 -pl (power_limit)
|
sudo nvidia-smi -i 1 -pl (power_limit)
|
||||||
```
|
```
|
||||||
- Save and exit the script.
|
- Save and exit the script.
|
||||||
- Make the script executable:
|
- Make the script executable:
|
||||||
```
|
```
|
||||||
chmod +x init.bash
|
chmod +x init.bash
|
||||||
```
|
```
|
||||||
|
|
||||||
5. ### Add `init.bash` to the crontab
|
### Scheduling Startup Script
|
||||||
|
|
||||||
Adding the `init.bash` script to the crontab will schedule it to run at boot.
|
Adding the `init.bash` script to the crontab will schedule it to run at boot.
|
||||||
|
|
||||||
- Run the following command:
|
- Run the following command:
|
||||||
```
|
```
|
||||||
crontab -e
|
crontab -e
|
||||||
```
|
```
|
||||||
- Add the following line to the file:
|
- Add the following line to the file:
|
||||||
```
|
```
|
||||||
@reboot /path/to/init.bash
|
@reboot /path/to/init.bash
|
||||||
```
|
```
|
||||||
> Replace `/path/to/init.bash` with the path to the `init.bash` script.
|
> Replace `/path/to/init.bash` with the path to the `init.bash` script.
|
||||||
|
|
||||||
- (Optional) Add the following line to shutdown the server at 12am:
|
- (Optional) Add the following line to shutdown the server at 12am:
|
||||||
```
|
```
|
||||||
0 0 * * * /sbin/shutdown -h now
|
0 0 * * * /sbin/shutdown -h now
|
||||||
```
|
```
|
||||||
- Save and exit the file.
|
- Save and exit the file.
|
||||||
|
|
||||||
6. ### Give `nvidia-persistenced` and `nvidia-smi` passwordless `sudo` permissions
|
### Configuring Script Permissions
|
||||||
|
|
||||||
We want `init.bash` to run the `nvidia-smi` commands without having to enter a password. This is done by editing the `sudoers` file.
|
We want `init.bash` to run the `nvidia-smi` commands without having to enter a password. This is done by giving `nvidia-persistenced` and `nvidia-smi` passwordless `sudo` permissions, and can be achieved by editing the `sudoers` file.
|
||||||
|
|
||||||
AMD users can skip this step as power limiting is not supported on AMD GPUs.
|
AMD users can skip this step as power limiting is not supported on AMD GPUs.
|
||||||
|
|
||||||
- Run the following command:
|
- Run the following command:
|
||||||
```
|
```
|
||||||
sudo visudo
|
sudo visudo
|
||||||
```
|
```
|
||||||
- Add the following lines to the file:
|
- Add the following lines to the file:
|
||||||
```
|
```
|
||||||
(username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-persistenced
|
(username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-persistenced
|
||||||
(username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi
|
(username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi
|
||||||
```
|
```
|
||||||
> Replace `(username)` with your username.
|
> Replace `(username)` with your username.
|
||||||
|
- Save and exit the file.
|
||||||
|
|
||||||
> [!IMPORTANT]
|
> [!IMPORTANT]
|
||||||
> Ensure that you add these lines AFTER `%sudo ALL=(ALL:ALL) ALL`. The order of the lines in the file matters - the last matching line will be used so if you add these lines before `%sudo ALL=(ALL:ALL) ALL`, they will be ignored.
|
> Ensure that you add these lines AFTER `%sudo ALL=(ALL:ALL) ALL`. The order of the lines in the file matters - the last matching line will be used so if you add these lines before `%sudo ALL=(ALL:ALL) ALL`, they will be ignored.
|
||||||
- Save and exit the file.
|
|
||||||
|
|
||||||
7. ### Configure auto-login
|
### Configuring Auto-Login
|
||||||
|
|
||||||
When the server boots up, we want it to automatically log in to a user account and run the `init.bash` script. This is done by configuring the `lightdm` display manager.
|
When the server boots up, we want it to automatically log in to a user account and run the `init.bash` script. This is done by configuring the `lightdm` display manager.
|
||||||
|
|
||||||
- Run the following command:
|
- Run the following command:
|
||||||
```
|
```
|
||||||
sudo nano /etc/lightdm/lightdm.conf
|
sudo nano /etc/lightdm/lightdm.conf
|
||||||
```
|
```
|
||||||
- Find the following commented line. It should be in the `[Seat:*]` section.
|
- Find the following commented line. It should be in the `[Seat:*]` section.
|
||||||
```
|
```
|
||||||
# autologin-user=
|
# autologin-user=
|
||||||
```
|
```
|
||||||
- Uncomment the line and add your username:
|
- Uncomment the line and add your username:
|
||||||
```
|
```
|
||||||
autologin-user=(username)
|
autologin-user=(username)
|
||||||
```
|
```
|
||||||
> Replace `(username)` with your username.
|
> Replace `(username)` with your username.
|
||||||
- Save and exit the file.
|
- Save and exit the file.
|
||||||
|
|
||||||
## Additional Setup
|
## Additional Setup
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user