Just want to chime in that I’ve seen TabbyML used a fair bit at work. Tabby in particular can run locally on M1/M2/M3 and uses the Neural Engine via CoreML. The performance hit isn’t noticeable at all and most of what we use it for (large autocompletes in serialized formats) it excels at.
Yeah this along with Neoseeker have been my go-to for decades now