Private · on-device · open source
Run your own AI, entirely on your Mac
A beginner-friendly guide to setting up an offline AI chatbot that runs on your own hardware — no cloud, no subscription, and nothing you type ever leaves your computer.
01
Using this guide with an AI assistant
This document was designed to work as a context file you hand to any AI assistant before starting setup. Paste the prompt below along with this file and it'll guide you through the whole process:
"I want to set up a local, private AI on my Mac following the guide I've attached. Please read the full document first and review it for accuracy and logical consistency — flag anything that seems off, outdated, or contradictory. Then ask me a few questions about my setup — specifically my Mac model, chip, and how much RAM I have — before we start. Guide me through each step one at a time, flag any gotchas before we reach them, and confirm with me that each step worked before moving on."
Having this context upfront saves significant time. A lot of the back-and-forth in a typical setup session comes from an AI assistant not knowing what decisions have already been made, what hardware is involved, or what pitfalls exist. This document answers most of those questions in advance.
02
Who this is for
You have an Apple Silicon Mac (any M-series chip — M1 through M4) and you'd like to run an AI chatbot entirely on your own machine. No cloud, no accounts, no monthly subscription, and — most importantly — no data ever leaving your computer.
You don't need to be a developer. This guide assumes you're comfortable clicking around your Mac but maybe a little nervous about the Terminal. We'll walk through that together.
Reasons you might want this:
- Privacy. Process sensitive material — personal messages, private documents, work files — without sending it to anyone's servers.
- No cost. Free forever after setup. No per-message charges, no subscriptions.
- No limits. Use it as much as you want.
- Works offline. Once models are downloaded, no internet required.
- Curiosity. It's a genuinely satisfying way to understand how this technology actually works.
The tradeoff: models that run on your own computer are smaller and less capable than the big cloud models (like ChatGPT or Claude). They're still remarkably good — just not frontier-level. And you do the setup yourself, which is what this guide is for.
I built this on a Mac Mini M4 with 16GB of RAM. You don't need this exact machine — any Apple Silicon Mac works. The main thing that matters is how much RAM you have, because that determines how large a model you can run. More RAM = bigger, smarter models.
03
Why run AI locally?
When you use a cloud AI service, your words travel to a company's servers, get processed there, and the response comes back. That's fine for plenty of things — but for anything sensitive, that data has left your control.
A local AI flips this completely. The "brain" lives on your machine. Nothing you type ever leaves. That's the whole appeal: total privacy, total control, zero ongoing cost.
04
A quick Terminal primer
The Terminal is a text-based way to control your Mac by typing commands instead of clicking. It looks intimidating but it's just a different door into the same house.
To open it: press Cmd + Space (Spotlight), type "Terminal," and press Enter. A window opens with a line ending in % — that's the prompt, waiting for you to type.
A few concepts that come up in this guide:
- You type a command, then press Enter to run it. Nothing happens until you press Enter.
cdmeans "change directory" — it moves you to a different folder.cd ~always takes you back to your home folder, a safe default starting point. (~is shorthand for "home.")>>means "add this line to the end of a file." (A single>would overwrite the whole file —>>is the safe version.)- A
#symbol starts a comment — a note for humans that the computer ignores. If you copy a command that has a# notestuck on the end, the Terminal can get confused. More on this in the Gotchas. - A running program can look "frozen." If you start something that keeps running (like a server), the Terminal sits there quietly. That's normal — it's working, just not printing anything until there's activity.
That's enough to follow everything below.
05
How much RAM you need
RAM (your computer's working memory) is the single biggest factor in what you can run. AI models get loaded entirely into RAM to work, so bigger, smarter models require more of it.
On a 16GB Mac, your realistic budget looks like this:
| What's using memory | Typical amount |
|---|---|
| macOS + background services | ~2–3 GB |
| Apps you have open (browser, etc.) | ~2–3 GB |
| Left over for the AI model | ~10–12 GB |
A handy rule of thumb: a model takes up roughly 0.55 GB for every billion "parameters" it has (parameters are a rough measure of size and capability). So:
- A 4-billion-parameter model needs ~3.5 GB — fast and light.
- A 9-billion-parameter model needs ~7 GB — a great all-rounder on 16GB.
- A 14-billion-parameter model needs ~9.5 GB — highest quality, but close other apps first.
If you have more than 16GB, you can run larger models. If you have 8GB, stick to smaller (3–4B) models.
06
The big picture: three layers
This is the single most useful idea for understanding how everything fits together. A local AI setup has three distinct layers, each doing one job — and they all live inside the boundary of your own machine:
- The model is the AI itself — a large file containing the trained brain.
- The engine (Ollama) loads that file into memory and does the thinking.
- The interface is how you chat with it — a chat window or a browser page.
Why this matters: the layers are interchangeable. Swap models without touching the interface. Switch interfaces without re-downloading models. They all communicate through a standard local connection on your own machine.
07
The tool decisions
Here's why this guide uses the tools it does — understanding the reasoning helps you make your own choices later.
The engine: Ollama vs LM Studio vs GPT4All
All three run AI models locally and share the same underlying technology. The differences are in packaging and openness.
| Tool | What it is | Open source? | Best for |
|---|---|---|---|
| Ollama | Behind-the-scenes engine other apps connect to | ✅ Yes (MIT) | Flexibility; being the foundation |
| LM Studio | All-in-one app (engine + chat window) | ❌ No (closed) | Easiest point-and-click start |
| GPT4All | The original beginner-friendly local AI app | ✅ Yes | Simplicity — but now dated |
- GPT4All was great a few years ago but has fallen behind — it lacks features modern setups rely on and has become increasingly Windows-focused. Skip it.
- LM Studio is the most immediately beginner-friendly, with a native Mac app and a model browser. The catch: it's closed source (you can't inspect what it does) and collects anonymous analytics by default. Perfectly legitimate, but not ideal for a fully open, verifiable stack.
- Ollama is what we chose. Open source, the community standard backend that everything else plugs into.
Why "open source" matters here
A recurring theme in this setup was preferring open-source tools. The reasoning: when the whole goal is privacy, you want software the community can actually verify does what it claims. Closed-source tools are trustworthy companies — but you're taking their word for it. Open-source tools let you check.
The stack we landed on is open source throughout: Ollama + Open WebUI.
The interface: Ollama app vs Open WebUI
Installing the Ollama app gives you a built-in chat window immediately — model switching, conversation history, and file uploads included. That's a fully working setup on its own.
Open WebUI is a richer, browser-based interface that connects to the same Ollama engine underneath. The meaningful extras it adds are:
- Workspaces and folders — organize your chats into projects, similar to how Claude has Projects.
- Knowledge bases — build persistent, searchable document collections the AI can always reference.
- Saved prompts, skills, and tools — store and reuse your best instructions without retyping.
- Multi-model comparison — send the same prompt to multiple models simultaneously and compare responses side by side.
You can use either interface, or both — they don't conflict since both just talk to the same Ollama engine running underneath.
08
What is Homebrew?
A few of the commands below use Homebrew, so here's a quick explanation.
Homebrew is a package manager for the Mac — think of it as an app store you control from the Terminal. Instead of hunting down download links, you type brew install <name> and it fetches and installs the software for you. It's free, open source, and the standard tool for this on a Mac.
If you don't have it yet, install it by pasting this into Terminal and pressing Enter (it'll ask for your Mac password and take a few minutes):
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Homebrew installs two kinds of things — and the distinction matters:
- Formulae — command-line tools and background software (installed with
brew install). - Casks — full GUI apps with a normal window, like the ones in your Applications folder (installed with
brew install --cask).
You'll see both below.
09
What is uv?
uv is a fast, modern helper for running Python-based software (Open WebUI is built with Python). It keeps everything self-contained so it doesn't clutter or break the rest of your system.
You don't interact with it directly — install it once, and it works quietly in the background. The command we use, uvx, runs a program in a clean temporary environment and tidies up afterward. No manual Python management required.
10
The setup guide
Install Ollama via the cask (brew install --cask ollama), not the plain formula. There's a real reason — see Gotcha #1 below. Following these steps in order avoids that problem entirely.
Prerequisite: Homebrew
Make sure Homebrew is installed (see the section above). Open Terminal with Cmd + Space → "Terminal."
Path A: the simple setup — Ollama + its built-in chat
This gets you a working local AI with the fewest moving parts.
Step 1 — Install the Ollama app:
brew install --cask ollama
open /Applications/Ollama.app
A llama icon appears in your menu bar. Ollama is now running quietly in the background and will start automatically at every login. Verify it's working:
curl http://localhost:11434
# → should say: Ollama is running
Step 2 — Download a model:
ollama pull qwen3.5:4b
# downloads ~3.5 GB — a fast, light model to start with
Models save to ~/.ollama/models/ automatically. You never manage the files yourself.
Step 3 — Start chatting. Open the Ollama app from the menu bar. You now have a working, private, offline AI with a built-in chat window, conversation history, and file upload support. That's a complete setup. If this is all you need, you're done.
Path B: the richer setup — add Open WebUI
Open WebUI adds workspaces, knowledge bases, saved prompts, and multi-model comparison on top of the same Ollama engine. It runs in your browser.
They don't conflict — both talk to the same underlying Ollama engine. Running both is fine; use whichever suits the moment.
Step 4 — Install uv:
brew install uv
Step 5 — Start Open WebUI:
cd ~
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve
Wait until you see Application startup complete, then open http://localhost:8080 in your browser. You can bookmark this address — it never changes.
You'll be asked to create a local admin account. This is stored entirely on your machine. A made-up email is fine; nothing is sent anywhere.
Step 6 — Lock down privacy settings. In Open WebUI, go to Settings → Connections:
- Turn OFF the OpenAI API toggle — you're not using the cloud.
- Leave the Ollama API toggle ON — this is your local engine.
Step 7 — (Optional) Create a shortcut command:
echo 'alias startwebui="cd ~ && DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve"' >> ~/.zshrc
source ~/.zshrc
# now you can just type: startwebui
Which should you use?
| Ollama app (Path A) | Open WebUI (Path B) | |
|---|---|---|
| Simple chat with history | ✅ | ✅ |
| File uploads | ✅ | ✅ |
| Workspaces / folders (like Claude Projects) | ❌ | ✅ |
| Knowledge bases | ❌ | ✅ |
| Saved prompts, skills, tools | ❌ | ✅ |
| Compare multiple models side by side | ❌ | ✅ |
| Setup effort | Minimal | A bit more |
11
Choosing (and changing) models
We started with qwen3.5:4b — small and fast, good for testing. When you want more quality, step up:
ollama pull qwen3.5:9b # the recommended daily-driver on 16GB
ollama pull qwen3:14b # maximum quality; close other apps first
Solid choices for a 16GB Mac (speeds as of early June 2026):
| Model | Download command | RAM | Speed | Good for |
|---|---|---|---|---|
| Qwen3.5 4B | ollama pull qwen3.5:4b | ~3.5 GB | 38–48 wps | Fast tasks, testing |
| Qwen3.5 9B | ollama pull qwen3.5:9b | ~7 GB | 22–28 wps | Best all-rounder |
| Qwen3 14B | ollama pull qwen3:14b | ~9.5 GB | 10–14 wps | Highest quality |
| Gemma 4 E4B | ollama pull gemma4:e4b | ~4 GB | 35–45 wps | Efficient, handles images |
Where to explore and compare models:
- ollama.com/library — the full catalog of available models with sizes and details. Your first stop for finding something new.
- lmsys.org/chat — the Chatbot Arena, a crowd-sourced leaderboard where real users rate models head-to-head. Good for understanding how models compare on quality.
- reddit.com/r/LocalLLaMA — the most active community following local AI. New model releases, benchmarks, and real-world opinions show up here first.
The "best" local model changes constantly. New models come out monthly, and they leapfrog each other on quality and speed. The recommendations here are accurate as of early June 2026 but will date quickly. Don't treat any model as permanent — experiment. Download a few, try the same prompt in each, and keep the ones that work for you. Switching is cheap: models sit on disk, and only one loads into memory at a time.
12
Command cheat sheet
Managing models
ollama list # show all downloaded models
ollama pull <model> # download a model from the library
ollama run <model> # chat with a model directly in Terminal
ollama rm <model> # delete a model you no longer want
ollama ps # show which model is currently loaded in memory
du -sh ~/.ollama/models/ # check total disk space used by models
curl http://localhost:11434 # verify Ollama is running
Starting Open WebUI
# If you set up the alias (Step 7 above):
startwebui
# If you haven't set up the alias yet, use the full command:
cd ~
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve
# Then open your browser and go to:
# http://localhost:8080 (bookmark this — the address never changes)
Tips for chatting with local models
/no_think # append to end of any message to skip slow reasoning mode
/set parameter think false # turn off thinking mode for the whole session
/bye # exit the chat
/? # show all available commands
/show info # show details about the currently loaded model
A few things worth knowing as you use your local AI day to day:
- The first reply after switching models is slow (~10–30 seconds) — the model is loading into memory. Replies after that are much faster. This is normal.
- Only one model runs at a time. Switching models unloads the current one before loading the next.
- More specific prompts get better results. Local models are smaller than cloud models — clear, detailed instructions help them more.
13
Gotchas — the mistakes worth knowing about
These are things that actually went wrong during setup. They're the most useful part of this guide.
Gotcha #1 — Installing the wrong version of Ollama
The first attempt used the plain formula: brew install ollama. This is the standard, normal way to install command-line tools with Homebrew — it's what most guides suggest, and it installed without any complaint or warning.
But when it came time to actually chat, it failed with: llama-server binary not found. The Homebrew formula for Ollama version 0.30.x shipped with a key file missing — a known packaging bug, not a user error.
The fix was to uninstall the formula and use the cask instead, which installs the full official Ollama app with all binaries included:
brew services stop ollama # stop the broken background version
brew uninstall ollama # remove the formula
brew install --cask ollama # install the full app
All downloaded models were safe throughout — they live in ~/.ollama/models/, which is completely separate from the Ollama binary and wasn't touched by the reinstall.
The lesson: on a Mac, always use brew install --cask ollama. The cask is the proper, official release. The plain formula was the trap — and its failure wasn't obvious until inference was attempted.
This guide starts you with the cask from the beginning, so you won't hit this.
Gotcha #2 — A secret file landed in a project folder
When Open WebUI starts, it creates a small secret key file used to secure your login session. The problem: it saves this file into whatever folder you're in when you run the start command.
If you happen to run it from inside a project folder that syncs to the internet (like a GitHub repo), that private key file could get uploaded publicly. Two safeguards were set up to prevent this.
Understanding what the fix commands actually do: the >> operator in both commands means "append this line to the end of a file." Two separate mechanisms are at work:
Safeguard 1 — prevent the file from being created at all:
echo 'export WEBUI_SECRET_KEY="'$(openssl rand -hex 32)'"' >> ~/.zshrc
source ~/.zshrc
~/.zshrcis your Terminal's startup settings file — anything written there runs automatically every time you open a new Terminal.- This appends a line that permanently sets
WEBUI_SECRET_KEYas an environment variable. Theopenssl rand -hex 32part generates a random secure value for it. - Because the key is now always available as a setting, Open WebUI reads it from there and never needs to create a file. Problem solved at the source.
Safeguard 2 — a backup net in case a key file ever appears anyway:
echo '.webui_secret_key' >> .gitignore
echo '*.secret_key' >> .gitignore
.gitignoreis a list that Git reads to know which files to never track or upload to the internet.- These lines add secret-key filename patterns to that list.
- So even if such a file got created in a synced project, Git would silently ignore it and it would never be committed or pushed.
This isn't the Terminal "hiding" files — it's two distinct mechanisms working together. The environment variable (Safeguard 1) stops the file from being created at all. The .gitignore entries (Safeguard 2) tell Git to ignore it as a fallback. Belt and suspenders.
The simple takeaway: always run cd ~ before starting Open WebUI. Starting from your home folder means any stray file lands somewhere harmless, not inside a project.
Gotcha #3 — A copied command had a # comment that broke it
At one point a suggested command had an annotation tacked on the end:
some-command # new
Running it produced: Got unexpected extra arguments (# new). The Terminal tried to treat the annotation as part of the command.
Worth noting: this was an AI assistant (Claude) adding a # comment to its own suggested code as a helpful label. The comment was for the human — the Terminal didn't appreciate it.
The lesson: when copying a command from anywhere — a guide, a tutorial, or an AI assistant — strip off everything after the # before running it. The actual command is only what comes before.
Gotcha #4 — A simple question took almost two minutes
The first test question took 105 seconds to answer. The model wasn't broken — it was in "thinking mode," reasoning step-by-step through a question that didn't need it. (Qwen3.5 models default to extended thinking.)
The fix: append /no_think to your message. Response times drop to a few seconds for simple questions. See the cheat sheet above. Leave thinking mode on for genuinely hard problems where careful reasoning matters.
Gotcha #5 — The Terminal must stay open
When Open WebUI runs via the startwebui command, the Terminal window hosting it must stay open. Close it and the server stops; the browser interface goes offline.
That Terminal will look "frozen" — that's normal. It's only quiet because nothing is happening; it prints when requests come in.
The fix: open a second Terminal tab with Cmd + T for any other work, and leave the first tab running the server.
14
Privacy & security notes
A summary of who can see what in this stack:
| Layer | Who makes it | Can they see your chats? | Action |
|---|---|---|---|
| Model (Qwen, Gemma, etc.) | Open source | No — runs on your machine | — |
| Ollama (engine) | Open source (MIT) | No | — |
| Open WebUI (interface) | Open source (MIT) | No | — |
| LM Studio (if you use it) | Closed source | No (chats stay local) | Disable analytics in Settings → Privacy |
A few extra steps worth taking if you're processing sensitive material:
- Enable FileVault (System Settings → Privacy & Security) — encrypts your disk so data is protected even if the machine is lost or stolen.
- Keep sensitive files out of cloud-synced folders (iCloud Drive, Dropbox, etc.) while working on them.
- Only download models from official sources — ollama.com/library or well-known, verified publishers.
- Ollama binds to
localhostonly by default — your AI is not accessible from other devices on your network unless you explicitly change that.
15
Where this goes next
With the foundation in place, the interesting work begins:
- Ask questions about your own documents — upload files directly in the Ollama app or Open WebUI and ask the AI about them.
- Organize with knowledge bases — in Open WebUI, build persistent document collections the AI can always reference across sessions.
- Automate repetitive text work — summarizing, reformatting, classifying, redacting.
- Build small scripts — Ollama exposes a standard developer API at
localhost:11434, so any code written for the OpenAI SDK works locally with just a URL swap.
The setup is the hard part. Everything from here is exploration.