How big is a typical LLM model on disk?

The LLM model disk space you should plan for depends on parameter count and quantisation. An 8B at Q4_K_M is roughly 4.9 GB, a 13B at Q4_K_M is around 7.9 GB, a 70B at Q4_K_M lands near 42 GB, and a 405B at Q4_K_M is about 243 GB. Full FP16 weights for those same models scale to 16, 26, 141, and 810 GB respectively.

What is the smallest useful quantisation for a 70B model?

For a 70B, Q4_K_M is the practical floor that keeps reasoning intact while keeping the LLM model disk space near 42 GB. Q3 and Q2_K save another 12 to 16 GB but visibly degrade longer-context tasks, so most people who actually run 70B locally settle at Q4 or Q5 and leave Q2 for experiments.

Does GGUF take the same disk as safetensors?

No. Safetensors typically ship at BF16 or FP16 precision and weigh roughly two bytes per parameter, so an 8B safetensors file is around 16 GB. GGUF Q4_K_M for the same 8B is around 4.9 GB. The LLM model disk space gap between formats is real because GGUF is built around quantised weights while safetensors usually carries the full-precision originals.

Why do my Ollama and LM Studio sizes not match for the same model?

Ollama stores weights in a content-addressable blob store under `~/.ollama/models/blobs/`, while LM Studio mirrors the publisher path under `~/.lmstudio/models/`. The bytes are the same but the path layouts and manifests differ, and neither tool dedupes against the other. Running the same 70B in both costs you the LLM model disk space twice.

How do I see all the LLM models on my Mac in one list?

Run a `find` across `~` for files larger than 4 GB ending in `.gguf` or `.safetensors`, then sort by size. That single command catches Ollama blobs, LM Studio folders, Hugging Face hub snapshots, and standalone llama.cpp downloads. CleanMyDev does the same audit with a path, last-modified date, and owning tool printed next to every row.

Is it safe to delete a large LLM model file?

Yes, as long as you move the file to the Finder Trash instead of running `rm -rf`. The Trash gives you a seven-day window to restore the weights if you discover the model was hard to redownload or the Hugging Face repo was pulled. Empty Trash only after you have confirmed no project needs that LLM model disk space back.

How does LLM model disk space compare across sizes and quantisations?

A Llama 3.1 70B in Q4_K_M GGUF lands on disk at 42 GB. The same model in Q8_0 is 75 GB. The full FP16 weights cross 141 GB. None of those numbers is a typo, and only one of them is a sane choice for a 1 TB MacBook that already runs Xcode. The audit a developer published at brtkwr.com after freeing 200 GB on a full disk lists "Unused GPT4All / LLM model files, 7 GB" as one of the smaller culprits, which sounds modest until you remember the 7 GB was leftovers, not the working set. The working set is where this comparison starts.

TL;DR

This LLM model disk space comparison maps 7B, 13B, 70B, and 405B against Q2_K, Q4_K_M, Q5_K_M, Q8_0, and FP16 across GGUF, MLX, and safetensors, then folds in per-tool footprints for Ollama, LM Studio, and Hugging Face. A Q4 70B is 42 GB, a full-precision 405B passes 800 GB, and the same weights duplicate silently across three caches. CleanMyDev shows every model on your Mac with size, last-modified, and owning tool, so you choose a quantisation from a receipt instead of a guess.

How big is a single LLM model in 2026?

A weight file is two numbers multiplied: parameter count and bits per parameter. An 8B at FP16 carries roughly two bytes per parameter, so 16 GB. Drop to Q4_K_M, the GGUF community's default, and average bits fall to about 4.8, putting the same 8B at 4.9 GB. The math scales linearly with parameter count, so a single table covers it.

Model size	Q2_K	Q4_K_M	Q5_K_M	Q8_0	FP16
8B	3.2 GB	4.9 GB	5.7 GB	8.5 GB	16 GB
13B	5.4 GB	7.9 GB	9.2 GB	13.8 GB	26 GB
34B	12.8 GB	20.2 GB	24.0 GB	36.1 GB	68 GB
70B	26.4 GB	42.5 GB	49.5 GB	74.6 GB	141 GB
405B	150 GB	243 GB	287 GB	432 GB	810 GB

Two reads matter on a Mac. Every column above Q5_K_M crosses the threshold where one model rivals a full iOS DeviceSupport folder, and the 405B column is not a real choice on consumer hardware. The interesting LLM model disk space band on a 1 TB MacBook is 8B to 70B at Q4 to Q5, roughly 5 GB to 50 GB per model.

How do GGUF, MLX, and safetensors compare on disk?

Format matters as much as quantisation. GGUF is a quantisation-first binary container the llama.cpp ecosystem standardised on. MLX is Apple's array framework format with its own quantised variants. safetensors is a Hugging Face format that almost always ships at native precision because it is built for training, not edge inference.

For the same Llama 3.1 8B model, the on-disk numbers look like this.

Format	Precision	Size on disk	Typical home
GGUF	Q4_K_M	4.9 GB	Ollama, LM Studio, llama.cpp
GGUF	Q8_0	8.5 GB	Ollama, LM Studio, llama.cpp
GGUF	FP16	16 GB	rare, conversion artefacts
MLX	4-bit	4.5 GB	mlx-lm, MLX Swift apps
MLX	8-bit	8.5 GB	mlx-lm
safetensors	BF16	16 GB	Hugging Face Transformers
safetensors	FP32	32 GB	training pipelines

If you are running inference, you almost never want full-precision safetensors sitting in a cache. They arrive when a transformers or diffusers notebook pulls a model once and they stay because nothing purges the Hugging Face hub. The fastest way to halve LLM model disk space on a Mac that has been used for both inference and one-off ML notebooks is to delete the safetensors snapshot once the quantised equivalent exists.

Where does each tool keep its models on a Mac?

Three tools account for almost all of the local LLM bytes on a developer Mac in 2026. None of them dedupe across each other, and none of them surface a per-model size in the Storage bar. macOS folds the whole pile into System Data.

Tool	Default path	Format	Notes
Ollama	`~/.ollama/models/`	GGUF blobs + manifests	content-addressable, override with `OLLAMA_MODELS`
LM Studio	`~/.lmstudio/models/`	GGUF, MLX	mirrors Hugging Face publisher paths
Hugging Face hub	`~/.cache/huggingface/hub/`	safetensors, GGUF	per-revision snapshots, linked into `snapshots/`
llama.cpp (manual)	wherever you saved them	GGUF	typically `~/models/` by convention
mlx-lm	`~/.cache/huggingface/hub/`	MLX	re-uses the Hugging Face cache

That last line is the trap. mlx-lm piggybacks on the Hugging Face hub cache, so an MLX 4-bit Llama 3.1 8B sits inside ~/.cache/huggingface/hub/models--mlx-community--Meta-Llama-3.1-8B-Instruct-4bit/. Audit only Hugging Face and you miss every Ollama duplicate. Audit only Ollama and you miss every safetensor.

For per-tool layouts, see where Ollama stores models on a Mac and the Hugging Face cache location on Mac.

What does an honest LLM disk audit return?

The audit takes one paste. Read-only, no sudo, no installer.

# Per-tool totals, biggest first
du -sh ~/.ollama/models 2>/dev/null
du -sh ~/.lmstudio/models 2>/dev/null
du -sh ~/.cache/huggingface/hub 2>/dev/null

# Ollama blob-by-blob (the actual weight files)
du -sh ~/.ollama/models/blobs/* 2>/dev/null | sort -hr | head -10

# Every GGUF, safetensor, or MLX file over 4 GB anywhere in $HOME
find ~ -type f \( -name "*.gguf" -o -name "*.safetensors" -o -name "*.mlx" \) \
  -size +4G \
  -exec stat -f "%Sm %z %N" -t "%Y-%m-%d" {} \; 2>/dev/null \
  | sort

That last command is the receipt. Each row is a date, a byte size, and a full path. On a Mac that has been running mixed Ollama, LM Studio, and Hugging Face workloads for a year, expect 20 to 60 entries totalling 80 to 250 GB. The duplicates are the easy wins. If meta-llama-3.1-70b-instruct-q4_k_m.gguf appears in both ~/.lmstudio/models/ and ~/.ollama/models/blobs/, you are paying the 42 GB twice.

Which combination is right for a 1 TB MacBook?

A 1 TB MacBook with Xcode, simulators, Docker, and a normal browser cache has roughly 250 to 400 GB free for LLM weights before disk pressure starts costing you in failed OS updates and slow swapping.

Use case	Footprint	Models that fit
Code completion	5 to 10 GB	one 8B Q4_K_M
Local agent loop, RAG	20 to 40 GB	one 8B plus one 34B Q4
Chat replacement	50 to 80 GB	one 70B Q4_K_M
Multi-model bench	100 to 200 GB	three to four 70B variants
Fine-tune prep	200 GB and up	safetensors plus GGUF copies

The pattern that fails is the bench setup nobody cleans after the benchmark ships. A Q4, Q5, Q8, and FP16 sweep of the same 70B is 308 GB. After the sweep, only one stays. The rest are exactly what you can archive without redownload risk or move to Trash.

Are there hidden duplicates between Ollama, LM Studio, and Hugging Face?

Yes, and they are larger than most people guess. Two patterns recur.

The first is the same quantisation in two tools. You pulled Llama 3.1 70B Q4_K_M into Ollama for CLI work, then loaded the same model in LM Studio for the chat UI. Ollama stores it under ~/.ollama/models/blobs/sha256-<hash>. LM Studio stores it as a GGUF inside ~/.lmstudio/models/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF/. Both are 42 GB. Neither dedupes.

The second is safetensors plus GGUF for the same model. A transformers import of meta-llama/Meta-Llama-3.1-8B-Instruct pulls 16 GB of safetensors into the Hugging Face hub. Later you also pull the GGUF conversion at 4.9 GB. The safetensors stays silent because nothing surfaces a "not loaded in 60 days" hint.

The fix is one audit that lists every weight file across all three locations with size and last-modified, then a decision per row.

What is the safest way to delete an LLM model on Mac?

Move to Trash, never rm -rf. The reasoning is the same as for any state-bearing dev cache. A model file took an hour of bandwidth to fetch. The Hugging Face repo it came from might be private, gated, or pulled by the time you regret the delete. The Finder Trash gives you a seven-day rollback window for free.

# Stamp a Trash subfolder so today's purge is reversible and labeled
STAMP=$(date +%Y%m%d-%H%M%S)
mkdir -p ~/.Trash/llm-purge-$STAMP

# Move (not delete) a specific Ollama blob and its manifest entry
mv ~/.ollama/models/blobs/sha256-<paste-hash> ~/.Trash/llm-purge-$STAMP/

# Move a stale LM Studio model folder
mv ~/.lmstudio/models/bartowski/Meta-Llama-3.1-70B-Instruct-GGUF \
   ~/.Trash/llm-purge-$STAMP/

# Move a Hugging Face hub snapshot you have not run in 90 days
mv ~/.cache/huggingface/hub/models--meta-llama--Meta-Llama-3.1-8B-Instruct \
   ~/.Trash/llm-purge-$STAMP/

The stamped Trash folder is the trick. Six days later, if a project still loads cleanly with the new layout, empty the Trash. If anything broke, drag the folder back to its original path and you are exactly where you started, minus an hour.

What CleanMyDev does with this comparison

The tables above are the static answer. The dynamic answer is the actual list of models on your Mac, with size, last-modified, owning tool, and a risk label per row. CleanMyDev runs the audit under the hood, normalises paths across Ollama, LM Studio, Hugging Face, and standalone GGUFs, then routes anything you tick to the Finder Trash. No background daemon, no telemetry, no subscription. One-time $9.99 at the CleanMyDev pricing page. If your Storage bar shows a System Data band you cannot account for, at least 30 GB of it is likely one of the rows above.