Cool thing! A couple suggestions: 1. I have an M3 Ultra with 256GB of memory, but the options list only goes up to 192GB. The M3 Ultra supports up to 512GB. 2. It'd be great if I could flip this around and choose a model, and then see the performance for all the different processors. Would help making buying decisions!
GrayShade2026-03-14 01:08
ENGLISH (원문)
This feels a bit pessimistic. Qwen 3.5 35B-A3B runs at 38 t/s tg with llama.cpp (mmap enabled) on my Radeon 6800 XT.
phelm2026-03-14 01:13
ENGLISH (원문)
This is awesome, it would be great to cross reference some intelligence benchmarks so that I can understand the trade off between RAM consumption, token rate and how good the model is
S4phyre2026-03-14 01:16
ENGLISH (원문)
Oh how cool. Always wanted to have a tool like this.
embedding-shape2026-03-14 01:21
ENGLISH (원문)
Yeah, that's weird, seems it has later models, and earlier, but specifically not Pro 6000? Also, based on my experience, the given numbers seems to be at least one magnitude off, which seems like a lot, when I use the approx values for a Pro 6000 (96GB VRAM + 1792 GB/s)
adithyassekhar2026-03-14 01:21
ENGLISH (원문)
This just reminded me of this https://www.systemrequirementslab.com/cyri. Not sure if it still works.
twampss2026-03-14 01:24
ENGLISH (원문)
Is this just llmfit but a web version of it? https://github.com/AlexsJones/llmfit
mrdependable2026-03-14 01:25
ENGLISH (원문)
This is great, I've been trying to figure this stuff out recently. One thing I do wonder is what sort of solutions there are for running your own model, but using it from a different machine. I don't necessarily want to run the model on the machine I'm also working from.
댓글
10