← back to knowledge-hub

Local AI Models in .NET, Wired Up by Aspire

Calling a hosted model in dev has a quiet cost: every keystroke of experimentation burns tokens, needs a key, and assumes a network. Run the model locally and that friction disappears — but now you’ve got a second runtime to install, start, and wire into your app.

.NET Aspire takes that second runtime and treats it like any other dependency. You declare it, Aspire runs it, your app references it. Ollama becomes just another resource in the graph.

This builds on the IChatClient abstraction from .NET AI Essentials — if that one is unfamiliar, read it first. Here we point the same client at a model running on your own box.

Declare Ollama in the AppHost

One line spins up the Ollama server, a persistent volume so models survive restarts, and OpenWebUI for poking at the model in a browser.

1
2
3
4
var ollama =
        builder.AddOllama("ollama")
               .WithDataVolume()
               .WithOpenWebUI();

Pull a model

AddModel names a model and tells Aspire to download it on startup. First run pulls the weights; after that it’s cached in the volume.

1
var chat = ollama.AddModel("chat", "llama3.2");

Tight on RAM? Pin a smaller tag:

1
ollama.AddModel("chat", "llama3.2:1b")

Then hand the model to your API project. WaitFor keeps the app from starting until the model is actually ready — no race against a half-downloaded model.

1
2
3
builder.AddProject<Projects.MyApi>("api")
       .WithReference(chat)
       .WaitFor(chat);

Consume it as an IChatClient

Inside the API, OllamaSharp registers as a keyed IChatClient, and AddChatClient wraps it in the same middleware pipeline you’d use for any provider — function calling, telemetry, logging.

1
2
3
4
5
builder.AddKeyedOllamaSharpChatClient("chat");
builder.Services.AddChatClient(sp => sp.GetRequiredKeyedService("chat"))
                .UseFunctionInvocation()
                .UseOpenTelemetry(configure: t => t.EnableSensitiveData = true)
                .UseLogging();

The route handler never learns it’s talking to a local model. It asks for an IChatClient and calls it:

1
2
3
4
5
app.MapPost("/chat", async (IChatClient chatClient, string question) =>
{
    var response = await chatClient.GetResponseAsync(question);
    return response.Text;
});

Local in dev, cloud in prod — same handler

Here’s the payoff. The abstraction means the registration changes by environment; nothing downstream does. Ollama on your laptop, Azure OpenAI in production.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
if (builder.Environment.IsDevelopment())
{
    builder.AddKeyedOllamaSharpChatClient("chat");
}
else
{
    builder.AddKeyedAzureOpenAIClient("chat");
}

builder.Services.AddChatClient(sp => sp.GetRequiredKeyedService("chat"))
                .UseFunctionInvocation()
                .UseOpenTelemetry(configure: t => t.EnableSensitiveData = true)
                .UseLogging();

Same handler, same middleware, same IChatClient. The only thing that moved is one if.

Final thought

Local AI usually means yak-shaving — install the runtime, manage the daemon, hardcode an endpoint. Aspire folds all of that into the resource graph, and Microsoft.Extensions.AI keeps your code blind to where the model actually lives. Experiment for free on your own hardware, ship to the cloud by flipping an environment check.

Adapted from Aaron Powell’s Using Local AI Models with Aspire on the .NET Blog.

graph cloud