Ollama: Running Large Language Models Locally - Installation, Models, and Use Cases

Introduction to Ollama

Ollama is an open-source framework that allows you to run large language models (LLMs) and vision models locally on your own hardware. It provides a simple interface to download, manage, and interact with these models, offering a range of powerful AI capabilities without requiring cloud services or complex setups.

With Ollama, you can:

Run powerful AI models completely offline and locally
Choose from a wide variety of text and vision models
Customize models with your own requirements
Access models through a simple command-line interface or REST API
Integrate AI capabilities into your applications using provided libraries

Whether you're a developer looking to integrate AI into your applications, a researcher experimenting with different models, or just someone interested in exploring the capabilities of large language models without sending data to third-party services, Ollama provides an accessible solution.

Installation

Ollama is available for macOS, Windows, and Linux, with Docker support as well. Here are the installation instructions for each platform:

macOS

Ollama supports macOS 11 Big Sur or later.

Visit ollama.com/download and click on the macOS download link
Open the downloaded zip file
Drag Ollama to your Applications folder
Launch Ollama from your Applications folder

Alternatively, if you use Homebrew:

brew install ollama

Windows

Ollama provides a Windows installer that simplifies the setup process:

Visit ollama.com/download and download the Windows installer
Run the downloaded OllamaSetup.exe file
Follow the installation wizard
After installation, Ollama will be available from the Start menu

Linux

For Linux, you can use the provided installation script:

curl -fsSL https://ollama.com/install.sh | sh

This script will install Ollama on your Linux system. For manual installation options, refer to the Linux installation documentation.

Docker

The official Ollama Docker image is available on Docker Hub:

docker pull ollama/ollama

To run Ollama using Docker:

docker run -d -p 11434:11434 ollama/ollama

This will start the Ollama server on port 11434.

Note: After installing Ollama, the service will run in the background. You can interact with it using the Ollama CLI or the REST API.

Supported Models

Ollama provides access to a diverse library of models available at ollama.com/library. Here's a selection of key models you can download and run locally:

Model	Parameters	Size	Download Command
Gemma 3	1B	815MB	`ollama run gemma3:1b`
Gemma 3	4B	3.3GB	`ollama run gemma3`
Gemma 3	12B	8.1GB	`ollama run gemma3:12b`
Gemma 3	27B	17GB	`ollama run gemma3:27b`
Llama 3.2	3B	2.0GB	`ollama run llama3.2`
Llama 3.2	1B	1.3GB	`ollama run llama3.2:1b`
Llama 3.2 Vision	11B	7.9GB	`ollama run llama3.2-vision`
Llama 3.1	8B	4.7GB	`ollama run llama3.1`
Phi 4	14B	9.1GB	`ollama run phi4`
Mistral	7B	4.1GB	`ollama run mistral`
Moondream 2	1.4B	829MB	`ollama run moondream`
Neural Chat	7B	4.1GB	`ollama run neural-chat`
Code Llama	7B	3.8GB	`ollama run codellama`
LLaVA	7B	4.5GB	`ollama run llava`

Hardware Requirements: You should have at least 8 GB of RAM available to run 7B models, 16 GB for 13B models, and 32 GB for 33B models.

Model Categories

Ollama supports several types of models, including:

Text-only models: For chat, content generation, and text completion
Vision models: For processing both text and images
Code models: Specialized for programming and development tasks
Multimodal models: Supporting multiple types of input data

Basic Usage

Running a Model

Once Ollama is installed, you can run models with a simple command:

ollama run llama3.2

This will download the model if you don't already have it, then start an interactive chat session.

Chat with a Model

After running a model, you'll see a prompt where you can enter text to converse with the model:

                
>>> Why is the sky blue?

The sky appears blue due to a phenomenon called Rayleigh scattering. When sunlight travels through the atmosphere, it interacts with air molecules and other tiny particles. These particles scatter the sunlight in all directions.

Blue light has a shorter wavelength compared to other colors in the visible spectrum, which causes it to scatter more easily when it collides with air molecules. This scattered blue light then reaches our eyes from all directions in the sky, giving it the blue appearance we observe.

During sunrise and sunset, the sky often appears red or orange because the blue light gets scattered away from our line of sight as sunlight has to travel through more of the atmosphere to reach us, allowing the longer wavelength red and orange light to dominate what we see.

Basic CLI Commands

Here are some fundamental commands for working with Ollama:

ollama run [model] - Run a model in chat mode
ollama list - List all available models on your system
ollama pull [model] - Download a model without running it
ollama rm [model] - Remove a model from your system
ollama cp [source] [destination] - Copy a model
ollama serve - Start the Ollama server without the desktop application

Advanced Features

Customizing Models with Modelfiles

You can create custom models using a Modelfile, which allows you to modify parameters, set system messages, and more.

Create a file named Modelfile with the following content:

                
FROM llama3.2

# Set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1

# Set the system message
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""

Then create and run the model:

                
ollama create mario -f ./Modelfile
ollama run mario

Using the REST API

Ollama provides a REST API for programmatically interacting with models:

Generate a response:

                
curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt":"Why is the sky blue?"
}'

Chat with a model:

                
curl http://localhost:11434/api/chat -d '{
  "model": "llama3.2",
  "messages": [
    { "role": "user", "content": "why is the sky blue?" }
  ]
}'
                
            

Structured Outputs

Ollama supports structured outputs that allow you to define the format of responses using JSON schema:

                
curl -X POST http://localhost:11434/api/chat -H "Content-Type: application/json" -d '{
  "model": "llama3.1",
  "messages": [{"role": "user", "content": "Tell me about Canada."}],
  "stream": false,
  "format": {
    "type": "object",
    "properties": {
      "name": {
        "type": "string"
      },
      "capital": {
        "type": "string"
      },
      "languages": {
        "type": "array",
        "items": {
          "type": "string"
        }
      }
    },
    "required": [
      "name",
      "capital", 
      "languages"
    ]
  }
}'
                
            

This produces a structured response like:

                
{
  "capital": "Ottawa",
  "languages": [
    "English",
    "French"
  ],
  "name": "Canada"
}
                
            

Working with Multimodal Models

Vision models like Llama 3.2 Vision can process images along with text:


ollama run llava "What's in this image? /path/to/image.png"

Use Cases

Ollama enables a wide range of local AI applications. Here are some popular use cases:

🤖 Chatbots & Assistants

Create personalized AI assistants for various tasks, from answering questions to providing recommendations, all while keeping your conversations private and running locally.

💻 Coding Assistant

Use Code Llama or other code-specialized models to help with programming tasks, debug code, explain algorithms, or generate code snippets without sending your proprietary code to external services.

📊 Data Analysis & Extraction

Extract structured information from documents, summarize reports, or analyze data trends with models equipped with structured output capabilities.

🔍 Local Search & RAG

Implement Retrieval-Augmented Generation (RAG) systems that can search through and reason about your local documents, providing accurate answers based on your private data.

👁️ Image Analysis

Use vision models to analyze images, extract information, generate descriptions, or identify objects without sending potentially sensitive visual data to cloud services.

🎮 Gaming & Interactive Fiction

Create dynamic game characters or interactive storytelling experiences with customized models that can maintain context and generate creative responses.

🧠 Learning & Education

Develop personalized tutoring systems or educational tools that can explain concepts, answer questions, and adapt to individual learning styles.

📝 Content Creation

Generate blog posts, marketing copy, creative writing, or other content while maintaining full control over the generation process.

🔄 Workflow Automation

Automate repetitive tasks by integrating Ollama with scripts and tools to process and transform data, generate reports, or respond to events.

Real-world Examples

Here are some specific examples of how people are using Ollama in real-world scenarios:

Privacy-focused document analysis: Processing sensitive documents locally without exposing data to third parties
Offline AI access: Using AI capabilities in environments with limited or no internet connectivity
Enterprise tool integration: Embedding AI capabilities into internal company tools while keeping data on-premises
Personal knowledge management: Creating systems that can query and analyze personal notes and documents
Custom customer service bots: Building specialized support agents with domain-specific knowledge
Research and development: Experimenting with AI models in academic or industrial research settings

Integration Ecosystem

Ollama has a rich ecosystem of integrations and libraries that extend its functionality:

Libraries for Developers

Python: ollama-python
JavaScript: ollama-js
Go: Various client libraries available
C#/.NET: Client libraries for .NET applications
Java: Java client libraries
Swift: Libraries for iOS and macOS development
Rust: Rust client implementations

Framework Integrations

LangChain: Python and JavaScript integrations
LlamaIndex: For building RAG applications
Spring AI: Java framework integration
Firebase Genkit: For Firebase applications
Semantic Kernel: Microsoft's AI orchestration framework

User Interfaces

Many community-built UIs are available for Ollama, including:

Web interfaces like Open WebUI
Desktop applications
Mobile apps for iOS and Android
IDE plugins for VS Code and other editors

Tips and Best Practices

Performance Optimization

Match model size to your hardware capabilities
Close unnecessary applications to free up memory
Consider GPU acceleration for faster inference
Use smaller models for faster response times when absolute quality isn't critical

Prompt Engineering

Be clear and specific in your instructions
Provide context to help the model understand what you need
Use system messages to define the model's role and behavior
Experiment with different temperature settings (lower for more deterministic outputs, higher for creativity)

Security Considerations

By default, Ollama's API server listens only on localhost (127.0.0.1)
Be careful when exposing the API to other machines
Consider network isolation for sensitive applications
Models run locally, so your data stays on your machine unless explicitly shared

Conclusion

Ollama represents a significant step forward in making powerful AI models accessible to everyone. By enabling local execution of LLMs, it addresses privacy concerns, reduces dependency on cloud services, and opens up new possibilities for AI integration in various applications.

Whether you're a developer looking to build AI-powered applications, a researcher experimenting with language models, or just someone interested in exploring what these models can do, Ollama provides an easy-to-use platform that puts the power of state-of-the-art AI in your hands.

As the field of AI continues to evolve rapidly, tools like Ollama will play an increasingly important role in democratizing access to these technologies and enabling innovation across industries.

Resources

Official Website: https://ollama.com
GitHub Repository: https://github.com/ollama/ollama
Model Library: https://ollama.com/library
API Documentation: API Reference
Modelfile Documentation: Modelfile Reference
Community: Discord | Reddit