Run a Private, Local AI on Your Mac

01

Using this guide with an AI assistant

This document was designed to work as a context file you hand to any AI assistant before starting setup. Paste the prompt below along with this file and it'll guide you through the whole process:

Starter prompt

"I want to set up a local, private AI on my Mac following the guide I've attached. Please read the full document first and review it for accuracy and logical consistency — flag anything that seems off, outdated, or contradictory. Then ask me a few questions about my setup — specifically my Mac model, chip, and how much RAM I have — before we start. Guide me through each step one at a time, flag any gotchas before we reach them, and confirm with me that each step worked before moving on."

Having this context upfront saves significant time. A lot of the back-and-forth in a typical setup session comes from an AI assistant not knowing what decisions have already been made, what hardware is involved, or what pitfalls exist. This document answers most of those questions in advance.

02

Who this is for

You have an Apple Silicon Mac (any M-series chip — M1 through M4) and you'd like to run an AI chatbot entirely on your own machine. No cloud, no accounts, no monthly subscription, and — most importantly — no data ever leaving your computer.

You don't need to be a developer. This guide assumes you're comfortable clicking around your Mac but maybe a little nervous about the Terminal. We'll walk through that together.

Reasons you might want this:

Privacy. Process sensitive material — personal messages, private documents, work files — without sending it to anyone's servers.
No cost. Free forever after setup. No per-message charges, no subscriptions.
No limits. Use it as much as you want.
Works offline. Once models are downloaded, no internet required.
Curiosity. It's a genuinely satisfying way to understand how this technology actually works.

The tradeoff: models that run on your own computer are smaller and less capable than the big cloud models (like ChatGPT or Claude). They're still remarkably good — just not frontier-level. And you do the setup yourself, which is what this guide is for.

My setup, for reference

I built this on a Mac Mini M4 with 16GB of RAM. You don't need this exact machine — any Apple Silicon Mac works. The main thing that matters is how much RAM you have, because that determines how large a model you can run. More RAM = bigger, smarter models.

03

Why run AI locally?

When you use a cloud AI service, your words travel to a company's servers, get processed there, and the response comes back. That's fine for plenty of things — but for anything sensitive, that data has left your control.

A local AI flips this completely. The "brain" lives on your machine. Nothing you type ever leaves. That's the whole appeal: total privacy, total control, zero ongoing cost.

04

A quick Terminal primer

The Terminal is a text-based way to control your Mac by typing commands instead of clicking. It looks intimidating but it's just a different door into the same house.

To open it: press Cmd + Space (Spotlight), type "Terminal," and press Enter. A window opens with a line ending in % — that's the prompt, waiting for you to type.

A few concepts that come up in this guide:

You type a command, then press Enter to run it. Nothing happens until you press Enter.
cd means "change directory" — it moves you to a different folder. cd ~ always takes you back to your home folder, a safe default starting point. (~ is shorthand for "home.")
>> means "add this line to the end of a file." (A single > would overwrite the whole file — >> is the safe version.)
A # symbol starts a comment — a note for humans that the computer ignores. If you copy a command that has a # note stuck on the end, the Terminal can get confused. More on this in the Gotchas.
A running program can look "frozen." If you start something that keeps running (like a server), the Terminal sits there quietly. That's normal — it's working, just not printing anything until there's activity.

That's enough to follow everything below.

05

How much RAM you need

RAM (your computer's working memory) is the single biggest factor in what you can run. AI models get loaded entirely into RAM to work, so bigger, smarter models require more of it.

On a 16GB Mac, your realistic budget looks like this:

What's using memory	Typical amount
macOS + background services	~2–3 GB
Apps you have open (browser, etc.)	~2–3 GB
Left over for the AI model	~10–12 GB

A handy rule of thumb: a model takes up roughly 0.55 GB for every billion "parameters" it has (parameters are a rough measure of size and capability). So:

A 4-billion-parameter model needs ~3.5 GB — fast and light.
A 9-billion-parameter model needs ~7 GB — a great all-rounder on 16GB.
A 14-billion-parameter model needs ~9.5 GB — highest quality, but close other apps first.

If you have more than 16GB, you can run larger models. If you have 8GB, stick to smaller (3–4B) models.

06

The big picture: three layers

This is the single most useful idea for understanding how everything fits together. A local AI setup has three distinct layers, each doing one job — and they all live inside the boundary of your own machine:

Your Mac · nothing leaves

Interface layer How you chat with it — a window or browser page

Engine layer · Ollama Loads the model into memory and does the work

Model layer The AI itself — a large trained "brain" file

The model is the AI itself — a large file containing the trained brain.
The engine (Ollama) loads that file into memory and does the thinking.
The interface is how you chat with it — a chat window or a browser page.

Why this matters: the layers are interchangeable. Swap models without touching the interface. Switch interfaces without re-downloading models. They all communicate through a standard local connection on your own machine.

07

The tool decisions

Here's why this guide uses the tools it does — understanding the reasoning helps you make your own choices later.

The engine: Ollama vs LM Studio vs GPT4All

All three run AI models locally and share the same underlying technology. The differences are in packaging and openness.

Tool	What it is	Open source?	Best for
Ollama	Behind-the-scenes engine other apps connect to	✅ Yes (MIT)	Flexibility; being the foundation
LM Studio	All-in-one app (engine + chat window)	❌ No (closed)	Easiest point-and-click start
GPT4All	The original beginner-friendly local AI app	✅ Yes	Simplicity — but now dated

GPT4All was great a few years ago but has fallen behind — it lacks features modern setups rely on and has become increasingly Windows-focused. Skip it.
LM Studio is the most immediately beginner-friendly, with a native Mac app and a model browser. The catch: it's closed source (you can't inspect what it does) and collects anonymous analytics by default. Perfectly legitimate, but not ideal for a fully open, verifiable stack.
Ollama is what we chose. Open source, the community standard backend that everything else plugs into.

Why "open source" matters here

A recurring theme in this setup was preferring open-source tools. The reasoning: when the whole goal is privacy, you want software the community can actually verify does what it claims. Closed-source tools are trustworthy companies — but you're taking their word for it. Open-source tools let you check.

The stack we landed on is open source throughout: Ollama + Open WebUI.

The interface: Ollama app vs Open WebUI

Installing the Ollama app gives you a built-in chat window immediately — model switching, conversation history, and file uploads included. That's a fully working setup on its own.

Open WebUI is a richer, browser-based interface that connects to the same Ollama engine underneath. The meaningful extras it adds are:

Workspaces and folders — organize your chats into projects, similar to how Claude has Projects.
Knowledge bases — build persistent, searchable document collections the AI can always reference.
Saved prompts, skills, and tools — store and reuse your best instructions without retyping.
Multi-model comparison — send the same prompt to multiple models simultaneously and compare responses side by side.

You can use either interface, or both — they don't conflict since both just talk to the same Ollama engine running underneath.

08

What is Homebrew?

A few of the commands below use Homebrew, so here's a quick explanation.

Homebrew is a package manager for the Mac — think of it as an app store you control from the Terminal. Instead of hunting down download links, you type brew install <name> and it fetches and installs the software for you. It's free, open source, and the standard tool for this on a Mac.

If you don't have it yet, install it by pasting this into Terminal and pressing Enter (it'll ask for your Mac password and take a few minutes):

bash

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Homebrew installs two kinds of things — and the distinction matters:

Formulae — command-line tools and background software (installed with brew install).
Casks — full GUI apps with a normal window, like the ones in your Applications folder (installed with brew install --cask).

You'll see both below.

09

What is uv?

uv is a fast, modern helper for running Python-based software (Open WebUI is built with Python). It keeps everything self-contained so it doesn't clutter or break the rest of your system.

You don't interact with it directly — install it once, and it works quietly in the background. The command we use, uvx, runs a program in a clean temporary environment and tidies up afterward. No manual Python management required.

10

The setup guide

Important

Install Ollama via the cask (brew install --cask ollama), not the plain formula. There's a real reason — see Gotcha #1 below. Following these steps in order avoids that problem entirely.

Prerequisite: Homebrew

Make sure Homebrew is installed (see the section above). Open Terminal with Cmd + Space → "Terminal."

Path A: the simple setup — Ollama + its built-in chat

This gets you a working local AI with the fewest moving parts.

Step 1 — Install the Ollama app:

bash

brew install --cask ollama
open /Applications/Ollama.app

A llama icon appears in your menu bar. Ollama is now running quietly in the background and will start automatically at every login. Verify it's working:

bash

curl http://localhost:11434
# → should say: Ollama is running

Step 2 — Download a model:

bash

ollama pull qwen3.5:4b
# downloads ~3.5 GB — a fast, light model to start with

Models save to ~/.ollama/models/ automatically. You never manage the files yourself.

Step 3 — Start chatting. Open the Ollama app from the menu bar. You now have a working, private, offline AI with a built-in chat window, conversation history, and file upload support. That's a complete setup. If this is all you need, you're done.

Path B: the richer setup — add Open WebUI

Open WebUI adds workspaces, knowledge bases, saved prompts, and multi-model comparison on top of the same Ollama engine. It runs in your browser.

You can use Path A, Path B, or both

They don't conflict — both talk to the same underlying Ollama engine. Running both is fine; use whichever suits the moment.

Step 4 — Install uv:

bash

brew install uv

Step 5 — Start Open WebUI:

bash

cd ~
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve

Wait until you see Application startup complete, then open http://localhost:8080 in your browser. You can bookmark this address — it never changes.

You'll be asked to create a local admin account. This is stored entirely on your machine. A made-up email is fine; nothing is sent anywhere.

Step 6 — Lock down privacy settings. In Open WebUI, go to Settings → Connections:

Turn OFF the OpenAI API toggle — you're not using the cloud.
Leave the Ollama API toggle ON — this is your local engine.

Step 7 — (Optional) Create a shortcut command:

bash

echo 'alias startwebui="cd ~ && DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve"' >> ~/.zshrc
source ~/.zshrc
# now you can just type: startwebui

Which should you use?

	Ollama app (Path A)	Open WebUI (Path B)
Simple chat with history	✅	✅
File uploads	✅	✅
Workspaces / folders (like Claude Projects)	❌	✅
Knowledge bases	❌	✅
Saved prompts, skills, tools	❌	✅
Compare multiple models side by side	❌	✅
Setup effort	Minimal	A bit more

11

Choosing (and changing) models

We started with qwen3.5:4b — small and fast, good for testing. When you want more quality, step up:

bash

ollama pull qwen3.5:9b     # the recommended daily-driver on 16GB
ollama pull qwen3:14b      # maximum quality; close other apps first

Solid choices for a 16GB Mac (speeds as of early June 2026):

Model	Download command	RAM	Speed	Good for
Qwen3.5 4B	`ollama pull qwen3.5:4b`	~3.5 GB	38–48 wps	Fast tasks, testing
Qwen3.5 9B	`ollama pull qwen3.5:9b`	~7 GB	22–28 wps	Best all-rounder
Qwen3 14B	`ollama pull qwen3:14b`	~9.5 GB	10–14 wps	Highest quality
Gemma 4 E4B	`ollama pull gemma4:e4b`	~4 GB	35–45 wps	Efficient, handles images

Where to explore and compare models:

ollama.com/library — the full catalog of available models with sizes and details. Your first stop for finding something new.
lmsys.org/chat — the Chatbot Arena, a crowd-sourced leaderboard where real users rate models head-to-head. Good for understanding how models compare on quality.
reddit.com/r/LocalLLaMA — the most active community following local AI. New model releases, benchmarks, and real-world opinions show up here first.

A reality worth internalizing

The "best" local model changes constantly. New models come out monthly, and they leapfrog each other on quality and speed. The recommendations here are accurate as of early June 2026 but will date quickly. Don't treat any model as permanent — experiment. Download a few, try the same prompt in each, and keep the ones that work for you. Switching is cheap: models sit on disk, and only one loads into memory at a time.

12

Command cheat sheet

Managing models

bash

ollama list                   # show all downloaded models
ollama pull <model>           # download a model from the library
ollama run <model>            # chat with a model directly in Terminal
ollama rm <model>             # delete a model you no longer want
ollama ps                     # show which model is currently loaded in memory
du -sh ~/.ollama/models/      # check total disk space used by models
curl http://localhost:11434   # verify Ollama is running

Starting Open WebUI

bash

# If you set up the alias (Step 7 above):
startwebui

# If you haven't set up the alias yet, use the full command:
cd ~
DATA_DIR=~/.open-webui uvx --python 3.11 open-webui@latest serve

# Then open your browser and go to:
# http://localhost:8080  (bookmark this — the address never changes)

Tips for chatting with local models

bash

/no_think                    # append to end of any message to skip slow reasoning mode
/set parameter think false   # turn off thinking mode for the whole session
/bye                         # exit the chat
/?                           # show all available commands
/show info                   # show details about the currently loaded model

A few things worth knowing as you use your local AI day to day:

The first reply after switching models is slow (~10–30 seconds) — the model is loading into memory. Replies after that are much faster. This is normal.
Only one model runs at a time. Switching models unloads the current one before loading the next.
More specific prompts get better results. Local models are smaller than cloud models — clear, detailed instructions help them more.

13

Gotchas — the mistakes worth knowing about

These are things that actually went wrong during setup. They're the most useful part of this guide.

Gotcha #1 — Installing the wrong version of Ollama

The first attempt used the plain formula: brew install ollama. This is the standard, normal way to install command-line tools with Homebrew — it's what most guides suggest, and it installed without any complaint or warning.

But when it came time to actually chat, it failed with: llama-server binary not found. The Homebrew formula for Ollama version 0.30.x shipped with a key file missing — a known packaging bug, not a user error.

The fix was to uninstall the formula and use the cask instead, which installs the full official Ollama app with all binaries included:

bash

brew services stop ollama        # stop the broken background version
brew uninstall ollama            # remove the formula
brew install --cask ollama       # install the full app

All downloaded models were safe throughout — they live in ~/.ollama/models/, which is completely separate from the Ollama binary and wasn't touched by the reinstall.

The lesson: on a Mac, always use brew install --cask ollama. The cask is the proper, official release. The plain formula was the trap — and its failure wasn't obvious until inference was attempted.

This guide starts you with the cask from the beginning, so you won't hit this.

Gotcha #2 — A secret file landed in a project folder

When Open WebUI starts, it creates a small secret key file used to secure your login session. The problem: it saves this file into whatever folder you're in when you run the start command.

If you happen to run it from inside a project folder that syncs to the internet (like a GitHub repo), that private key file could get uploaded publicly. Two safeguards were set up to prevent this.

Understanding what the fix commands actually do: the >> operator in both commands means "append this line to the end of a file." Two separate mechanisms are at work:

Safeguard 1 — prevent the file from being created at all:

bash

echo 'export WEBUI_SECRET_KEY="'$(openssl rand -hex 32)'"' >> ~/.zshrc
source ~/.zshrc

~/.zshrc is your Terminal's startup settings file — anything written there runs automatically every time you open a new Terminal.
This appends a line that permanently sets WEBUI_SECRET_KEY as an environment variable. The openssl rand -hex 32 part generates a random secure value for it.
Because the key is now always available as a setting, Open WebUI reads it from there and never needs to create a file. Problem solved at the source.

Safeguard 2 — a backup net in case a key file ever appears anyway:

bash

echo '.webui_secret_key' >> .gitignore
echo '*.secret_key' >> .gitignore

.gitignore is a list that Git reads to know which files to never track or upload to the internet.
These lines add secret-key filename patterns to that list.
So even if such a file got created in a synced project, Git would silently ignore it and it would never be committed or pushed.

To be precise

This isn't the Terminal "hiding" files — it's two distinct mechanisms working together. The environment variable (Safeguard 1) stops the file from being created at all. The .gitignore entries (Safeguard 2) tell Git to ignore it as a fallback. Belt and suspenders.

The simple takeaway: always run cd ~ before starting Open WebUI. Starting from your home folder means any stray file lands somewhere harmless, not inside a project.

Gotcha #3 — A copied command had a # comment that broke it

At one point a suggested command had an annotation tacked on the end:

bash

some-command   # new

Running it produced: Got unexpected extra arguments (# new). The Terminal tried to treat the annotation as part of the command.

Worth noting: this was an AI assistant (Claude) adding a # comment to its own suggested code as a helpful label. The comment was for the human — the Terminal didn't appreciate it.

The lesson: when copying a command from anywhere — a guide, a tutorial, or an AI assistant — strip off everything after the # before running it. The actual command is only what comes before.

Gotcha #4 — A simple question took almost two minutes

The first test question took 105 seconds to answer. The model wasn't broken — it was in "thinking mode," reasoning step-by-step through a question that didn't need it. (Qwen3.5 models default to extended thinking.)

The fix: append /no_think to your message. Response times drop to a few seconds for simple questions. See the cheat sheet above. Leave thinking mode on for genuinely hard problems where careful reasoning matters.

Gotcha #5 — The Terminal must stay open

When Open WebUI runs via the startwebui command, the Terminal window hosting it must stay open. Close it and the server stops; the browser interface goes offline.

That Terminal will look "frozen" — that's normal. It's only quiet because nothing is happening; it prints when requests come in.

The fix: open a second Terminal tab with Cmd + T for any other work, and leave the first tab running the server.

14

Privacy & security notes

A summary of who can see what in this stack:

Layer	Who makes it	Can they see your chats?	Action
Model (Qwen, Gemma, etc.)	Open source	No — runs on your machine	—
Ollama (engine)	Open source (MIT)	No	—
Open WebUI (interface)	Open source (MIT)	No	—
LM Studio (if you use it)	Closed source	No (chats stay local)	Disable analytics in Settings → Privacy

A few extra steps worth taking if you're processing sensitive material:

Enable FileVault (System Settings → Privacy & Security) — encrypts your disk so data is protected even if the machine is lost or stolen.
Keep sensitive files out of cloud-synced folders (iCloud Drive, Dropbox, etc.) while working on them.
Only download models from official sources — ollama.com/library or well-known, verified publishers.
Ollama binds to localhost only by default — your AI is not accessible from other devices on your network unless you explicitly change that.

15

Where this goes next

With the foundation in place, the interesting work begins:

Ask questions about your own documents — upload files directly in the Ollama app or Open WebUI and ask the AI about them.
Organize with knowledge bases — in Open WebUI, build persistent document collections the AI can always reference across sessions.
Automate repetitive text work — summarizing, reformatting, classifying, redacting.
Build small scripts — Ollama exposes a standard developer API at localhost:11434, so any code written for the OpenAI SDK works locally with just a URL swap.

The setup is the hard part. Everything from here is exploration.

Run your own AI, entirely on your Mac

Using this guide with an AI assistant

Who this is for

Why run AI locally?

A quick Terminal primer

How much RAM you need

The big picture: three layers

The tool decisions

The engine: Ollama vs LM Studio vs GPT4All

Why "open source" matters here

The interface: Ollama app vs Open WebUI

What is Homebrew?

What is uv?

The setup guide

Prerequisite: Homebrew

Path A: the simple setup — Ollama + its built-in chat

Path B: the richer setup — add Open WebUI

Which should you use?

Choosing (and changing) models

Command cheat sheet

Managing models

Starting Open WebUI

Tips for chatting with local models

Gotchas — the mistakes worth knowing about

Gotcha #1 — Installing the wrong version of Ollama

Gotcha #2 — A secret file landed in a project folder

Gotcha #3 — A copied command had a # comment that broke it

Gotcha #4 — A simple question took almost two minutes

Gotcha #5 — The Terminal must stay open

Privacy & security notes

Where this goes next