commit 40b28b6461b82d6b76c1a7c682cab42d69e7994b Author: Varun Vasudeva Date: Tue Mar 26 14:36:34 2024 -0500 first commit diff --git a/README.md b/README.md new file mode 100644 index 0000000..a0f85dc --- /dev/null +++ b/README.md @@ -0,0 +1,250 @@ +# Local LLaMA Server Setup Documentation + +_TL;DR_: A guide to setting up a server for running local language models using `ollama`. + +## About + +This repository outlines the steps to run a server for running local language models. It uses Debian specifically, but most Linux distros should follow a very similar process. It aims to be a guide for Linux beginners like me who are setting up a server for the first time. + +The process involves installing the NVIDIA drivers, setting the GPU power limit, and configuring the server to run `ollama` at boot. It also includes setting up auto-login and scheduling the `init.bash` script to run at boot. All these settings are based on my ideal setup for a language model server that runs most of the day but a lot can be customized to suit your needs. For example, you can use any OpenAI-compatible server like `llama.cpp` or LM Studio instead of `ollama`. + +## System Requirements + +This guide assumes that we're working with a system with one or more Nvidia GPUs and an Intel CPU. It should be identical for an AMD CPU (but I haven't verified this). + +The same cannot be said for AMD GPUs. Specifically for `ollama`, [there's work being done](https://github.com/ollama/ollama/issues/738#issuecomment-1974874171) to support ROCm but it still isn't stable in the way most people would hope it is for a reliable server. However, used Nvidia cards are a good deal nowadays (as of Mar 2024). + +This guide was built around the following system: +- CPU: Intel Core i5-12600KF +- Memory: 32GB 6000 MHz DDR5 RAM +- Storage: 1TB M.2 NVMe SSD +- GPU: Nvidia RTX 3090 24GB + +## Steps + +1. #### Install Debian on the server + - Download the [Debian ISO](https://www.debian.org/distrib/) from the official website. + - Create a bootable USB using a tool like [Rufus](https://rufus.ie/en/) for Windows or [Balena Etcher](https://etcher.balena.io) for MacOS. + - Boot into the USB and install Debian. + +2. #### Update the system + - Run the following command: + ``` + sudo apt update + ``` + +3. #### Install the NVIDIA drivers + - Run the following commands: + ``` + apt install linux-headers-amd64 + apt install nvidia-driver firmware-misc-nonfree + ``` + - Reboot the server. + - Run the following command to verify the installation: + ``` + nvidia-smi + ``` + +4. #### Install `ollama` + - Download `ollama` from the official repository: + ``` + curl -fsSL https://ollama.com/install.sh | sh + ``` + - (Recommended) We want our LLM API endpoint to be reachable by the rest of the LAN. For `ollama`, this means setting `OLLAMA_HOST=0.0.0.0` in the `ollama.service`. + - Run the following command to edit the service: + ``` + systemctl edit ollama.service + ``` + - Find the `[Service]` section and add `Environment="OLLAMA_HOST=0.0.0.0"` under it. It should look like this: + ``` + [Service] + Environment="OLLAMA_HOST=0.0.0.0" + ``` + - Save and exit. + - Reload the environment. + ``` + systemctl daemon-reload + systemctl restart ollama + ``` + +5. #### Create the `init.bash` script + + This script will be run at boot to set the GPU power limit and start the server using `ollama`/`llama.cpp`. We set the GPU power limit lower because it has been seen in testing and inference that there is only a 5-15% performance decrease for a 50% reduction in power consumption. This is especially important for servers that are running 24/7. + + - Run the following commands: + ``` + touch init.bash + nano init.bash + ``` + - Add the following lines to the script: + ``` + #!/bin/bash + sudo nvidia-smi -pm 1 + sudo nvidia-smi -pl (power limit) + ollama run (model) + ollama serve + ``` + > Replace `(power limit)` with the desired power limit in watts. For example, `sudo nvidia-smi -pl 250`. + + > Replace `(model)` with the name of the model you want to run. For example, `ollama run mistral:latest`. + - Save and exit the script. + - Make the script executable: + ``` + chmod +x init.bash + ``` + +6. #### Give `nvidia-persistenced` and `nvidia-smi` passwordless sudo permissions + + We want `init.bash` to run the `nvidia-smi` commands without having to enter a password. This is done by editing the `sudoers` file. + + - Run the following command: + ``` + sudo visudo + ``` + - Add the following lines to the file: + ``` + (username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-persistenced + (username) ALL=(ALL) NOPASSWD: /usr/bin/nvidia-smi + ``` + > Replace `(username)` with your username. + + > **IMPORTANT**: Ensure that you add these lines AFTER `%sudo ALL=(ALL:ALL) ALL`. The order of the lines in the file matters - the last matching line will be used so if you add these lines before `%sudo ALL=(ALL:ALL) ALL`, they will be ignored. + - Save and exit the file. + +7. #### Configure auto-login + + When the server boots up, we want it to automatically log in to a user account and run the `init.bash` script. This is done by configuring the `lightdm` display manager. + + - Run the following command: + ``` + sudo nano /etc/lightdm/lightdm.conf + ``` + - Find the following commented line. It should be in the `[Seat:*]` section. + ``` + # autologin-user= + ``` + - Uncomment the line and add your username: + ``` + autologin-user=(username) + ``` + > Replace `(username)` with your username. + - Save and exit the file. + +8. #### Add `init.bash` to the crontab + + Adding the `init.bash` script to the crontab will schedule it to run at boot. + + - Run the following command: + ``` + crontab -e + ``` + - Add the following line to the file: + ``` + @reboot /path/to/init.bash + ``` + > Replace `/path/to/init.bash` with the path to the `init.bash` script. + + - (Optional) Add the following line to shutdown the server at 12am: + ``` + 0 0 * * * /sbin/shutdown -h now + ``` + - Save and exit the file. + +9. #### (Optional) Enable SSH + + Enabling SSH allows you to connect to the server remotely. + + On the server: + - Run the following command: + ``` + sudo apt install openssh-server + ``` + - Start the SSH service: + ``` + sudo systemctl start ssh + ``` + - Enable the SSH service to start at boot: + ``` + sudo systemctl enable ssh + ``` + - Find the server's IP address: + ``` + ip a + ``` + + On the client: + - Connect to the server using SSH: + ``` + ssh (username)@(ip address) + ``` + > Replace `(username)` with your username and `(ip address)` with the server's IP address. + + If you expect to tunnel into your server often, I recommend following [this guide](https://www.raspberrypi.com/documentation/computers/remote-access.html#configure-ssh-without-a-password) to enable passwordless SSH using `ssh-keygen` and `ssh-copy-id`. It worked perfectly on my Debian system despite having been written for Raspberry Pi OS. + +## Accessing Ollama + +Accessing `ollama` on the server itself is trivial. Simply run: +``` +curl http://localhost:11434/api/generate -d '{ + "model": "llama2", + "prompt":"Why is the sky blue?" +}' +``` +> Replace `llama2` with your preferred model. + +Assuming the `OLLAMA_HOST` environment variable has been set to `0.0.0.0`, accessing `ollama` from anywhere on the network is still trivial! Simply replace `localhost` with your server's IP. + +Refer to [Ollama's REST API docs](https://github.com/ollama/ollama/blob/main/docs/api.md) for more information on the entire API. + +## Troubleshooting + +- Disable Secure Boot in the BIOS if you're having trouble with the Nvidia drivers not working. For me, all packages were at the latest versions and `nvidia-detect` was able to find my GPU correctly, but `nvidia-smi` kept returning the `NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver` error. [Disabling Secure Boot](https://askubuntu.com/a/927470) fixed this for me. Better practice than disabling Secure Boot is to sign the Nvidia drivers yourself but I didn't want to go through that process for a non-critical server that can afford to have Secure Boot disabled. +- If you receive the `Failed to open "/etc/systemd/system/ollama.service.d/.#override.confb927ee3c846beff8": Permission denied` error from Ollama after running `systemctl edit ollama.service`, simply creating the file works to eliminate it. Use the following steps to edit the file. + - Run: + ``` + sudo mkdir -p /etc/systemd/system/ollama.service.d + sudo nano /etc/systemd/system/ollama.service.d/override.conf + ``` + - Retry the remaining steps. +- If you still can't connect to your API endpoint, check your firewall settings. [This guide to UFW (Uncomplicated Firewall) on Debian](https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-with-ufw-on-debian-10) is a good resource. + +## Notes + +- This is my first foray into setting up a server and ever working with Linux so there may be better ways to do some of the steps. I will update this repository as I learn more. +- I chose Debian because it is, apparently, one of the most stable Linux distros. I also went with an XFCE desktop environment because it is lightweight and I wasn't yet comfortable going full command line. +- The power draw of my EVGA FTW3 Ultra RTX 3090 was 350W at stock settings. I set the power limit to 250W and the performance decrease was negligible for my use case, which is primarily code completion in VS Code and the Q&A via chat. +- Use a user for auto-login, don't log in as root. + +## References + +Downloading Nvidia drivers: +- https://wiki.debian.org/NvidiaGraphicsDrivers + +Secure Boot: +- https://askubuntu.com/a/927470 + +Monitoring GPU usage, power draw: +- https://unix.stackexchange.com/questions/38560/gpu-usage-monitoring-cuda/78203#78203 + +Passwordless `sudo`: +- https://stackoverflow.com/questions/25215604/use-sudo-without-password-inside-a-script +- https://www.reddit.com/r/Fedora/comments/11lh9nn/set_nvidia_gpu_power_and_temp_limit_on_boot/ +- https://askubuntu.com/questions/100051/why-is-sudoers-nopasswd-option-not-working + +Auto-login: +- https://forums.debian.net/viewtopic.php?t=149849 +- https://wiki.archlinux.org/title/LightDM#Enabling_autologin + +Expose Ollama to LAN: +- https://github.com/ollama/ollama/blob/main/docs/faq.md#setting-environment-variables-on-linux +- https://github.com/ollama/ollama/issues/703 + +Firewall: +- https://www.digitalocean.com/community/tutorials/how-to-set-up-a-firewall-with-ufw-on-debian-10 + +Passwordless `ssh`: +- https://www.raspberrypi.com/documentation/computers/remote-access.html#configure-ssh-without-a-password + +Docs: + +https://github.com/ollama/ollama/blob/main/docs/api.md \ No newline at end of file