A Llama 3.1 70B in Q4_K_M GGUF lands on disk at 42 GB. The same model in Q8_0 is 75 GB. The full FP16 weights cross 141 GB. None of those numbers is a typo, and only one of them is a sane choice for a 1 TB MacBook that already runs Xcode. The audit a developer published at brtkwr.com after freeing 200 GB on a full disk lists "Unused GPT4All / LLM model files, 7 GB" as one of the smaller culprits, which sounds modest until you remember the 7 GB was leftovers, not the working set. The working set is where this comparison starts.
How big is a single LLM model in 2026?
A weight file is two numbers multiplied: parameter count and bits per parameter. An 8B at FP16 carries roughly two bytes per parameter, so 16 GB. Drop to Q4_K_M, the GGUF community's default, and average bits fall to about 4.8, putting the same 8B at 4.9 GB. The math scales linearly with parameter count, so a single table covers it.
| Model size | Q2_K | Q4_K_M | Q5_K_M | Q8_0 | FP16 |
|---|---|---|---|---|---|
| 8B | 3.2 GB | 4.9 GB | 5.7 GB | 8.5 GB | 16 GB |
| 13B | 5.4 GB | 7.9 GB | 9.2 GB | 13.8 GB | 26 GB |
| 34B | 12.8 GB | 20.2 GB | 24.0 GB | 36.1 GB | 68 GB |
| 70B | 26.4 GB | 42.5 GB | 49.5 GB | 74.6 GB | 141 GB |
| 405B | 150 GB | 243 GB | 287 GB | 432 GB | 810 GB |
Two reads matter on a Mac. Every column above Q5_K_M crosses the threshold where one model rivals a full iOS DeviceSupport folder, and the 405B column is not a real choice on consumer hardware. The interesting LLM model disk space band on a 1 TB MacBook is 8B to 70B at Q4 to Q5, roughly 5 GB to 50 GB per model.
How do GGUF, MLX, and safetensors compare on disk?
Format matters as much as quantisation. GGUF is a quantisation-first binary container the llama.cpp ecosystem standardised on. MLX is Apple's array framework format with its own quantised variants. safetensors is a Hugging Face format that almost always ships at native precision because it is built for training, not edge inference.
For the same Llama 3.1 8B model, the on-disk numbers look like this.
| Format | Precision | Size on disk | Typical home |
|---|---|---|---|
| GGUF | Q4_K_M | 4.9 GB | Ollama, LM Studio, llama.cpp |
| GGUF | Q8_0 | 8.5 GB | Ollama, LM Studio, llama.cpp |
| GGUF | FP16 | 16 GB | rare, conversion artefacts |
| MLX | 4-bit | 4.5 GB | mlx-lm, MLX Swift apps |
| MLX | 8-bit | 8.5 GB | mlx-lm |
| safetensors | BF16 | 16 GB | Hugging Face Transformers |
| safetensors | FP32 | 32 GB | training pipelines |
If you are running inference, you almost never want full-precision safetensors sitting in a cache. They arrive when a transformers or diffusers notebook pulls a model once and they stay because nothing purges the Hugging Face hub. The fastest way to halve LLM model disk space on a Mac that has been used for both inference and one-off ML notebooks is to delete the safetensors snapshot once the quantised equivalent exists.
Where does each tool keep its models on a Mac?
Three tools account for almost all of the local LLM bytes on a developer Mac in 2026. None of them dedupe across each other, and none of them surface a per-model size in the Storage bar. macOS folds the whole pile into System Data.
| Tool | Default path | Format | Notes |
|---|---|---|---|
| Ollama | ~/.ollama/models/ |
GGUF blobs + manifests | content-addressable, override with OLLAMA_MODELS |
| LM Studio | ~/.lmstudio/models/ |
GGUF, MLX | mirrors Hugging Face publisher paths |
| Hugging Face hub | ~/.cache/huggingface/hub/ |
safetensors, GGUF | per-revision snapshots, linked into snapshots/ |
| llama.cpp (manual) | wherever you saved them | GGUF | typically ~/models/ by convention |
| mlx-lm | ~/.cache/huggingface/hub/ |
MLX | re-uses the Hugging Face cache |
That last line is the trap. mlx-lm piggybacks on the Hugging Face hub cache, so an MLX 4-bit Llama 3.1 8B sits inside ~/.cache/huggingface/hub/models--mlx-community--Meta-Llama-3.1-8B-Instruct-4bit/. Audit only Hugging Face and you miss every Ollama duplicate. Audit only Ollama and you miss every safetensor.
For per-tool layouts, see where Ollama stores models on a Mac and the Hugging Face cache location on Mac.
What does an honest LLM disk audit return?
The audit takes one paste. Read-only, no sudo, no installer.
# Per-tool totals, biggest first
du -sh ~/.ollama/models 2>/dev/null
du -sh ~/.lmstudio/models 2>/dev/null
du -sh ~/.cache/huggingface/hub 2>/dev/null
# Ollama blob-by-blob (the actual weight files)
du -sh ~/.ollama/models/blobs/* 2>/dev/null | sort -hr | head -10
# Every GGUF, safetensor, or MLX file over 4 GB anywhere in $HOME
find ~ -type f \( -name "*.gguf" -o -name "*.safetensors" -o -name "*.mlx" \) \
-size +4G \
-exec stat -f "%Sm %z %N" -t "%Y-%m-%d" {} \; 2>/dev/null \
| sort
That last command is the receipt. Each row is a date, a byte size, and a full path. On a Mac that has been running mixed Ollama, LM Studio, and Hugging Face workloads for a year, expect 20 to 60 entries totalling 80 to 250 GB. The duplicates are the easy wins. If meta-llama-3.1-70b-instruct-q4_k_m.gguf appears in both ~/.lmstudio/models/ and ~/.ollama/models/blobs/, you are paying the 42 GB twice.
Which combination is right for a 1 TB MacBook?
A 1 TB MacBook with Xcode, simulators, Docker, and a normal browser cache has roughly 250 to 400 GB free for LLM weights before disk pressure starts costing you in failed OS updates and slow swapping.
| Use case | Footprint | Models that fit |
|---|---|---|
| Code completion | 5 to 10 GB | one 8B Q4_K_M |
| Local agent loop, RAG | 20 to 40 GB | one 8B plus one 34B Q4 |
| Chat replacement | 50 to 80 GB | one 70B Q4_K_M |
| Multi-model bench | 100 to 200 GB | three to four 70B variants |
| Fine-tune prep | 200 GB and up | safetensors plus GGUF copies |
The pattern that fails is the bench setup nobody cleans after the benchmark ships. A Q4, Q5, Q8, and FP16 sweep of the same 70B is 308 GB. After the sweep, only one stays. The rest are exactly what you can archive without redownload risk or move to Trash.
Are there hidden duplicates between Ollama, LM Studio, and Hugging Face?
Yes, and they are larger than most people guess. Two patterns recur.
The first is the same quantisation in two tools. You pulled Llama 3.1 70B Q4_K_M into Ollama for CLI work, then loaded the same model in LM Studio for the chat UI. Ollama stores it under ~/.ollama/models/blobs/sha256-<hash>. LM Studio stores it as a GGUF inside ~/.lmstudio/models/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/. Both are 42 GB. Neither dedupes.
The second is safetensors plus GGUF for the same model. A transformers import of meta-llama/Meta-Llama-3.1-8B-Instruct pulls 16 GB of safetensors into the Hugging Face hub. Later you also pull the GGUF conversion at 4.9 GB. The safetensors stays silent because nothing surfaces a "not loaded in 60 days" hint.
The fix is one audit that lists every weight file across all three locations with size and last-modified, then a decision per row.
What is the safest way to delete an LLM model on Mac?
Move to Trash, never rm -rf. The reasoning is the same as for any state-bearing dev cache. A model file took an hour of bandwidth to fetch. The Hugging Face repo it came from might be private, gated, or pulled by the time you regret the delete. The Finder Trash gives you a seven-day rollback window for free.
# Stamp a Trash subfolder so today's purge is reversible and labeled
STAMP=$(date +%Y%m%d-%H%M%S)
mkdir -p ~/.Trash/llm-purge-$STAMP
# Move (not delete) a specific Ollama blob and its manifest entry
mv ~/.ollama/models/blobs/sha256-<paste-hash> ~/.Trash/llm-purge-$STAMP/
# Move a stale LM Studio model folder
mv ~/.lmstudio/models/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF \
~/.Trash/llm-purge-$STAMP/
# Move a Hugging Face hub snapshot you have not run in 90 days
mv ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-8B-Instruct \
~/.Trash/llm-purge-$STAMP/
The stamped Trash folder is the trick. Six days later, if a project still loads cleanly with the new layout, empty the Trash. If anything broke, drag the folder back to its original path and you are exactly where you started, minus an hour.
What CleanMyDev does with this comparison
The tables above are the static answer. The dynamic answer is the actual list of models on your Mac, with size, last-modified, owning tool, and a risk label per row. CleanMyDev runs the audit under the hood, normalises paths across Ollama, LM Studio, Hugging Face, and standalone GGUFs, then routes anything you tick to the Finder Trash. No background daemon, no telemetry, no subscription. One-time $9.99 at the CleanMyDev pricing page. If your Storage bar shows a System Data band you cannot account for, at least 30 GB of it is likely one of the rows above.