# Running a Private, Local AI on Your Mac

*A practical, beginner-friendly guide to setting up an offline, open-source AI chatbot that runs entirely on your own computer — including the real-world mistakes that the polished tutorials leave out.*

---

## 📎 Using This Guide With an AI Assistant

This document was designed to work as a **context file** you hand to any AI assistant before starting setup. Paste the following prompt along with this file and it'll guide you through the whole process:

> *"I want to set up a local, private AI on my Mac following the guide I've attached. Please read the full document first and review it for accuracy and logical consistency — flag anything that seems off, outdated, or contradictory. Then ask me a few questions about my setup — specifically my Mac model, chip, and how much RAM I have — before we start. Guide me through each step one at a time, flag any gotchas before we reach them, and confirm with me that each step worked before moving on."*

Having this context upfront saves significant time. A lot of the back-and-forth in a typical setup session comes from an AI assistant not knowing what decisions have already been made, what hardware is involved, or what pitfalls exist. This document answers most of those questions in advance.

---

## Who This Is For

You have an Apple Silicon Mac (any M-series chip — M1 through M4) and you'd like to run an AI chatbot **entirely on your own machine**. No cloud, no accounts, no monthly subscription, and — most importantly — **no data ever leaving your computer**.

You don't need to be a developer. This guide assumes you're comfortable clicking around your Mac but maybe a little nervous about the Terminal. We'll walk through that together.

Reasons you might want this:

- **Privacy.** Process sensitive material — personal messages, private documents, work files — without sending it to anyone's servers.
- **No cost.** Free forever after setup. No per-message charges, no subscriptions.
- **No limits.** Use it as much as you want.
- **Works offline.** Once models are downloaded, no internet required.
- **Curiosity.** It's a genuinely satisfying way to understand how this technology actually works.

The tradeoff: models that run on your own computer are smaller and less capable than the big cloud models (like ChatGPT or Claude). They're still remarkably good — just not frontier-level. And you do the setup yourself, which is what this guide is for.

> **My setup, for reference:** I built this on a **Mac Mini M4 with 16GB of RAM**. You don't need this exact machine — any Apple Silicon Mac works. The main thing that matters is **how much RAM you have**, because that determines how large a model you can run. More RAM = bigger, smarter models.

---

## Why Run AI Locally?

When you use a cloud AI service, your words travel to a company's servers, get processed there, and the response comes back. That's fine for plenty of things — but for anything sensitive, that data has left your control.

A local AI flips this completely. The "brain" lives on your machine. Nothing you type ever leaves. That's the whole appeal: total privacy, total control, zero ongoing cost.

---

## A Quick Terminal Primer

The **Terminal** is a text-based way to control your Mac by typing commands instead of clicking. It looks intimidating but it's just a different door into the same house.

**To open it:** press `Cmd + Space` (Spotlight), type "Terminal," and press Enter. A window opens with a line ending in `%` — that's the **prompt**, waiting for you to type.

A few concepts that come up in this guide:

- **You type a command, then press Enter to run it.** Nothing happens until you press Enter.
- **`cd` means "change directory"** — it moves you to a different folder. `cd ~` always takes you back to your home folder, a safe default starting point. (`~` is shorthand for "home.")
- **`>>` means "add this line to the end of a file."** (A single `>` would overwrite the whole file — `>>` is the safe version.)
- **A `#` symbol starts a comment** — a note for humans that the computer ignores. If you copy a command that has a `# note` stuck on the end, the Terminal can get confused. More on this in the Gotchas.
- **A running program can look "frozen."** If you start something that keeps running (like a server), the Terminal sits there quietly. That's normal — it's working, just not printing anything until there's activity.

That's enough to follow everything below.

---

## How Much RAM Do You Need?

RAM (your computer's working memory) is the single biggest factor in what you can run. AI models get loaded entirely into RAM to work, so bigger, smarter models require more of it.

On a 16GB Mac, your realistic budget looks like this:

| What's using memory | Typical amount |
|---|---|
| macOS + background services | ~2–3 GB |
| Apps you have open (browser, etc.) | ~2–3 GB |
| **Left over for the AI model** | **~10–12 GB** |

**A handy rule of thumb:** a model takes up roughly **0.55 GB for every billion "parameters"** it has (parameters are a rough measure of size and capability). So:

- A **4-billion-parameter** model needs ~3.5 GB — fast and light.
- A **9-billion-parameter** model needs ~7 GB — a great all-rounder on 16GB.
- A **14-billion-parameter** model needs ~9.5 GB — highest quality, but close other apps first.

If you have more than 16GB, you can run larger models. If you have 8GB, stick to smaller (3–4B) models.

---

## The Big Picture: Three Layers

This is the single most useful idea for understanding how everything fits together. A local AI setup has **three distinct layers**, each doing one job:

```
┌──────────────────────────────────────────────────────┐
│                    YOU (the user)                     │
├───────────────────────┬──────────────────────────────┤
│  INTERFACE LAYER      │  Ollama app (built-in chat)  │  ← how you talk to the AI
│                       │  Open WebUI (browser-based)  │
├───────────────────────┴──────────────────────────────┤
│  ENGINE LAYER         │         Ollama               │  ← runs the model
├──────────────────────────────────────────────────────┤
│  MODEL LAYER          │  The "brain" file (.gguf)    │  ← the actual AI
└──────────────────────────────────────────────────────┘
```

- **The model** is the AI itself — a large file containing the trained brain.
- **The engine** (Ollama) loads that file into memory and does the inference.
- **The interface** is how you chat with it — a chat window or a browser page.

Why this matters: the layers are **interchangeable**. Swap models without touching the interface. Switch interfaces without re-downloading models. They all communicate through a standard local connection on your own machine.

---

## The Tool Decisions

Here's *why* this guide uses the tools it does — understanding the reasoning helps you make your own choices later.

### The engine: Ollama vs LM Studio vs GPT4All

All three run AI models locally and share the same underlying technology. The differences are in packaging and openness.

| Tool | What it is | Open source? | Best for |
|---|---|---|---|
| **Ollama** | Behind-the-scenes engine other apps connect to | ✅ Yes (MIT) | Flexibility; being the foundation |
| **LM Studio** | All-in-one app (engine + chat window) | ❌ No (closed) | Easiest point-and-click start |
| **GPT4All** | The original beginner-friendly local AI app | ✅ Yes | Simplicity — but now dated |

- **GPT4All** was great a few years ago but has fallen behind — it lacks features modern setups rely on and has become increasingly Windows-focused. Skip it.
- **LM Studio** is the most immediately beginner-friendly, with a native Mac app and a model browser. The catch: it's **closed source** (you can't inspect what it does) and collects anonymous analytics by default. Perfectly legitimate, but not ideal for a fully open, verifiable stack.
- **Ollama** is what we chose. Open source, the community standard backend that everything else plugs into.

### Why "open source" matters here

A recurring theme in this setup was preferring open-source tools. The reasoning: when the whole goal is privacy, you want software the community can actually verify does what it claims. Closed-source tools are trustworthy companies — but you're taking their word for it. Open-source tools let you check.

The stack we landed on is open source throughout: **Ollama + Open WebUI**.

### The interface: Ollama app vs Open WebUI

Installing the Ollama app gives you a **built-in chat window** immediately — model switching, conversation history, and file uploads included. That's a fully working setup on its own.

**Open WebUI** is a richer, browser-based interface that connects to the same Ollama engine underneath. The meaningful extras it adds are:

- **Workspaces and folders** — organize your chats into projects, similar to how Claude has Projects.
- **Knowledge bases** — build persistent, searchable document collections the AI can always reference.
- **Saved prompts, skills, and tools** — store and reuse your best instructions without retyping.
- **Multi-model comparison** — send the same prompt to multiple models simultaneously and compare responses side by side.

You can use either interface, or both — they don't conflict since both just talk to the same Ollama engine running underneath.

---

## What Is Homebrew?

A few of the commands below use **Homebrew**, so here's a quick explanation.

Homebrew is a **package manager** for the Mac — think of it as an app store you control from the Terminal. Instead of hunting down download links, you type `brew install <name>` and it fetches and installs the software for you. It's free, open source, and the standard tool for this on a Mac.

**If you don't have it yet,** install it by pasting this into Terminal and pressing Enter (it'll ask for your Mac password and take a few minutes):

```bash
/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
```

Homebrew installs two kinds of things — and the distinction matters:

- **Formulae** — command-line tools and background software (installed with `brew install`).
- **Casks** — full GUI apps with a normal window, like the ones in your Applications folder (installed with `brew install --cask`).

You'll see both below.

---

## What Is `uv`?

**`uv`** is a fast, modern helper for running Python-based software (Open WebUI is built with Python). It keeps everything self-contained so it doesn't clutter or break the rest of your system.

You don't interact with it directly — install it once, and it works quietly in the background. The command we use, **`uvx`**, runs a program in a clean temporary environment and tidies up afterward. No manual Python management required.

---

## The Setup Guide

> ⚠️ **Important:** Install Ollama via the **cask** (`brew install --cask ollama`), not the plain formula. There's a real reason — see **Gotcha #1** below. Following these steps in order avoids that problem entirely.

### Prerequisite: Homebrew

Make sure Homebrew is installed (see the section above). Open Terminal with `Cmd + Space` → "Terminal."

---

### Path A: The Simple Setup — Ollama + its built-in chat

This gets you a working local AI with the fewest moving parts.

**Step 1 — Install the Ollama app:**

```bash
brew install --cask ollama
open /Applications/Ollama.app
```

A llama icon appears in your menu bar. Ollama is now running quietly in the background and will start automatically at every login.

Verify it's working:

```bash
curl http://localhost:11434
# → should say: Ollama is running
```

**Step 2 — Download a model:**

```bash
ollama pull qwen3.5:4b
# downloads ~3.5 GB — a fast, light model to start with
```

Models save to `~/.ollama/models/` automatically. You never manage the files yourself.

**Step 3 — Start chatting.** Open the Ollama app from the menu bar. You now have a working, private, offline AI with a built-in chat window, conversation history, and file upload support. **That's a complete setup.** If this is all you need, you're done.

---

### Path B: The Richer Setup — add Open WebUI

Open WebUI adds workspaces, knowledge bases, saved prompts, and multi-model comparison on top of the same Ollama engine. It runs in your browser.

> **You can use Path A, Path B, or both.** They don't conflict — both talk to the same underlying Ollama engine. Running both is fine; use whichever suits the moment.

**Step 4 — Install `uv`:**

```bash
brew install uv
```

**Step 5 — Start Open WebUI:**

```bash
cd ~
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve
```

Wait until you see `Application startup complete`, then open **http://localhost:8080** in your browser. You can bookmark this address — it never changes.

You'll be asked to create a local admin account. This is stored entirely on your machine. A made-up email is fine; nothing is sent anywhere.

**Step 6 — Lock down privacy settings.** In Open WebUI, go to **Settings → Connections**:

- Turn **OFF** the *OpenAI API* toggle — you're not using the cloud.
- Leave the *Ollama API* toggle **ON** — this is your local engine.

**Step 7 — (Optional) Create a shortcut command:**

```bash
echo 'alias startwebui="cd ~ && DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve"' >> ~/.zshrc
source ~/.zshrc
# now you can just type: startwebui
```

---

### Which should you use?

| | Ollama app (Path A) | Open WebUI (Path B) |
|---|---|---|
| Simple chat with history | ✅ | ✅ |
| File uploads | ✅ | ✅ |
| Workspaces / folders (like Claude Projects) | ❌ | ✅ |
| Knowledge bases | ❌ | ✅ |
| Saved prompts, skills, tools | ❌ | ✅ |
| Compare multiple models side by side | ❌ | ✅ |
| Setup effort | Minimal | A bit more |

---

## Choosing (and Changing) Models

We started with `qwen3.5:4b` — small and fast, good for testing. When you want more quality, step up:

```bash
ollama pull qwen3.5:9b     # the recommended daily-driver on 16GB
ollama pull qwen3:14b      # maximum quality; close other apps first
```

**Solid choices for a 16GB Mac** *(speeds as of early June 2026)*:

| Model | Download command | RAM | Speed | Good for |
|---|---|---|---|---|
| Qwen3.5 4B | `ollama pull qwen3.5:4b` | ~3.5 GB | 38–48 words/sec | Fast tasks, testing |
| Qwen3.5 9B | `ollama pull qwen3.5:9b` | ~7 GB | 22–28 words/sec | Best all-rounder |
| Qwen3 14B | `ollama pull qwen3:14b` | ~9.5 GB | 10–14 words/sec | Highest quality |
| Gemma 4 E4B | `ollama pull gemma4:e4b` | ~4 GB | 35–45 words/sec | Efficient, handles images |

**Where to explore and compare models:**

- **[ollama.com/library](https://ollama.com/library)** — the full catalog of available models with sizes and details. Your first stop for finding something new.
- **[lmsys.org/chat](https://lmsys.org/chat)** — the Chatbot Arena, a crowd-sourced leaderboard where real users rate models head-to-head. Good for understanding how models compare on quality.
- **[reddit.com/r/LocalLLaMA](https://reddit.com/r/LocalLLaMA)** — the most active community following local AI. New model releases, benchmarks, and real-world opinions show up here first.

> 🔄 **A reality worth internalizing:** the "best" local model changes *constantly*. New models come out monthly, and they leapfrog each other on quality and speed. The recommendations here are accurate as of **early June 2026** but will date quickly. Don't treat any model as permanent — **experiment**. Download a few, try the same prompt in each, and keep the ones that work for you. Switching is cheap: models sit on disk, and only one loads into memory at a time.

---

## Command Cheat Sheet

### Managing models

```bash
ollama list                   # show all downloaded models
ollama pull <model>           # download a model from the library
ollama run <model>            # chat with a model directly in Terminal
ollama rm <model>             # delete a model you no longer want
ollama ps                     # show which model is currently loaded in memory
du -sh ~/.ollama/models/      # check total disk space used by models
curl http://localhost:11434   # verify Ollama is running
```

### Starting Open WebUI

```bash
# If you set up the alias (Step 7 above):
startwebui

# If you haven't set up the alias yet, use the full command:
cd ~
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve

# Then open your browser and go to:
# http://localhost:8080
# (bookmark this — the address never changes)
```

### Tips for chatting with local models

```bash
/no_think                    # append to end of any message to skip slow reasoning mode
/set parameter think false   # turn off thinking mode for the whole session
/bye                         # exit the chat
/?                           # show all available commands
/show info                   # show details about the currently loaded model
```

A few things worth knowing as you use your local AI day to day:

- **The first reply after switching models is slow** (~10–30 seconds) — the model is loading into memory. Replies after that are much faster. This is normal.
- **Only one model runs at a time.** Switching models unloads the current one before loading the next.
- **More specific prompts get better results.** Local models are smaller than cloud models — clear, detailed instructions help them more.

---

## Gotchas (The Mistakes Worth Knowing About)

These are things that actually went wrong during setup. They're the most useful part of this guide.

### Gotcha #1 — Installing the wrong version of Ollama

The first attempt used the plain formula: `brew install ollama`. This is the **standard, normal way** to install command-line tools with Homebrew — it's what most guides suggest, and it installed without any complaint or warning.

But when it came time to actually chat, it failed with: `llama-server binary not found`. The Homebrew formula for Ollama version 0.30.x shipped with a key file missing — a known packaging bug, not a user error.

The fix was to uninstall the formula and use the **cask** instead, which installs the full official Ollama app with all binaries included:

```bash
brew services stop ollama        # stop the broken background version
brew uninstall ollama            # remove the formula
brew install --cask ollama       # install the full app
```

All downloaded models were safe throughout — they live in `~/.ollama/models/`, which is completely separate from the Ollama binary and wasn't touched by the reinstall.

**The lesson:** on a Mac, always use `brew install --cask ollama`. The cask is the proper, official release. The plain formula was the trap — and its failure wasn't obvious until inference was attempted.

> This guide starts you with the cask from the beginning, so you won't hit this.

### Gotcha #2 — A secret file landed in a project folder

When Open WebUI starts, it creates a small **secret key file** used to secure your login session. The problem: it saves this file into *whatever folder you're in* when you run the start command.

If you happen to run it from inside a project folder that syncs to the internet (like a GitHub repo), that private key file could get uploaded publicly. Two safeguards were set up to prevent this.

**Understanding what the fix commands actually do:**

The `>>` operator in both commands means "append this line to the end of a file." Two separate mechanisms are at work:

**Safeguard 1 — prevent the file from being created at all:**

```bash
echo 'export WEBUI_SECRET_KEY="'$(openssl rand -hex 32)'"' >> ~/.zshrc
source ~/.zshrc
```

- `~/.zshrc` is your Terminal's **startup settings file** — anything written there runs automatically every time you open a new Terminal.
- This appends a line that permanently sets `WEBUI_SECRET_KEY` as an environment variable. The `openssl rand -hex 32` part generates a random secure value for it.
- Because the key is now always available as a setting, **Open WebUI reads it from there and never needs to create a file**. Problem solved at the source.

**Safeguard 2 — a backup net in case a key file ever appears anyway:**

```bash
echo '.webui_secret_key' >> .gitignore
echo '*.secret_key' >> .gitignore
```

- `.gitignore` is a list that **Git** reads to know which files to *never* track or upload to the internet.
- These lines add secret-key filename patterns to that list.
- So even if such a file got created in a synced project, **Git would silently ignore it** and it would never be committed or pushed.

> **To be precise:** this isn't the Terminal "hiding" files — it's two distinct mechanisms working together. The environment variable (Safeguard 1) stops the file from being created at all. The `.gitignore` entries (Safeguard 2) tell Git to ignore it as a fallback. Belt and suspenders.

**The simple takeaway:** always run `cd ~` before starting Open WebUI. Starting from your home folder means any stray file lands somewhere harmless, not inside a project.

### Gotcha #3 — A copied command had a `#` comment that broke it

At one point a suggested command had an annotation tacked on the end:

```bash
some-command   # new
```

Running it produced: `Got unexpected extra arguments (# new)`. The Terminal tried to treat the annotation as part of the command.

> **Worth noting:** this was an AI assistant (Claude) adding a `# comment` to its own suggested code as a helpful label. The comment was for the human — the Terminal didn't appreciate it.

**The lesson:** when copying a command from anywhere — a guide, a tutorial, or an AI assistant — strip off everything after the `#` before running it. The actual command is only what comes before.

### Gotcha #4 — A simple question took almost two minutes

The first test question took **105 seconds** to answer. The model wasn't broken — it was in "thinking mode," reasoning step-by-step through a question that didn't need it. (Qwen3.5 models default to extended thinking.)

**The fix:** append `/no_think` to your message. Response times drop to a few seconds for simple questions. See the cheat sheet above. Leave thinking mode *on* for genuinely hard problems where careful reasoning matters.

### Gotcha #5 — The Terminal must stay open

When Open WebUI runs via the `startwebui` command, **the Terminal window hosting it must stay open**. Close it and the server stops; the browser interface goes offline.

That Terminal will look "frozen" — that's normal. It's only quiet because nothing is happening; it prints when requests come in.

**The fix:** open a second Terminal tab with `Cmd + T` for any other work, and leave the first tab running the server.

---

## Privacy & Security Notes

A summary of who can see what in this stack:

| Layer | Who makes it | Can they see your chats? | Action |
|---|---|---|---|
| Model (Qwen, Gemma, etc.) | Open source | No — runs on your machine | — |
| Ollama (engine) | Open source (MIT) | No | — |
| Open WebUI (interface) | Open source (MIT) | No | — |
| LM Studio *(if you use it)* | Closed source | No (chats stay local) | Disable analytics: Settings → Privacy |

A few extra steps worth taking if you're processing sensitive material:

- **Enable FileVault** (System Settings → Privacy & Security) — encrypts your disk so data is protected even if the machine is lost or stolen.
- **Keep sensitive files out of cloud-synced folders** (iCloud Drive, Dropbox, etc.) while working on them.
- **Only download models from official sources** — [ollama.com/library](https://ollama.com/library) or well-known, verified publishers.
- Ollama binds to `localhost` only by default — your AI is not accessible from other devices on your network unless you explicitly change that.

---

## Where This Goes Next

With the foundation in place, the interesting work begins:

- **Ask questions about your own documents** — upload files directly in the Ollama app or Open WebUI and ask the AI about them.
- **Organize with knowledge bases** — in Open WebUI, build persistent document collections the AI can always reference across sessions.
- **Automate repetitive text work** — summarizing, reformatting, classifying, redacting.
- **Build small scripts** — Ollama exposes a standard developer API at `localhost:11434`, so any code written for the OpenAI SDK works locally with just a URL swap.

The setup is the hard part. Everything from here is exploration.

---

*Built and tested on a Mac Mini M4 (16GB) using Ollama and Open WebUI — both free and open source. Models used: Qwen3.5 4B and 9B. Notes current as of early June 2026; the local-AI landscape moves fast, so expect specifics to evolve.*
