Switches – Full List (Categorised)
All commands accept a common set of global flags plus command‑specific options. The table below groups them for easy lookup.
            Global / Daemon Switches (apply to any command)
            SwitchArgumentDefaultDescription
            --verbose—offPrint extra debugging information (useful for logs).
            --json—offEmit output as JSON objects (compatible with jq pipelines).
            --quiet—offSuppress non‑essential status messages.
            Server / Daemon Switches (used with ollama serve)
            SwitchArgumentDefaultDescription
            --port<int>11434Port on which the HTTP API will listen.
            --host<ip|hostname>0.0.0.0Network interface to bind.
            --gpu<int>0 (auto‑detect)Force a specific GPU device ID (useful on multi‑GPU machines).
            --cpu—offRun inference on CPU only, ignoring GPUs.
            --keep-alive<seconds>300Idle time after which the server will unload unused models.
            Model Management Switches
            SwitchArgumentDefaultDescription
            --model<name>—Explicitly set the model for commands that accept an implicit model (e.g., ollama run).
            --modelfile<path>ModelfilePath to a custom Modelfile when using ollama create.
            --pull—offForce a fresh pull of the model even if it exists locally.
            Inference / Generation Switches (used with ollama run and ollama create)
            SwitchArgumentDefaultDescription
            --num-predict<int>128Maximum number of tokens to generate for a single request.
            --temperature<float>0.8Controls randomness; 0 = deterministic, 1 = very random.
            --top-p<float>0.9Nucleus sampling – keep the smallest set of tokens with cumulative probability ≥ top‑p.
            --top-k<int>40Keep only the top‑k most likely tokens at each step.
            --repeat-penalty<float>1.1Penalty for repeating the same token; higher values discourage repetition.
            --presence-penalty<float>0.0Penalty for tokens that have already appeared in the generated text.
            --frequency-penalty<float>0.0Penalty proportional to how often a token has already been generated.
            --seed<int>—Set a deterministic random seed for reproducible outputs.
            --stream—onStream tokens as they are generated (useful for UI). Use --no-stream to wait for the full answer.
            Output Formatting Switches
            SwitchArgumentDefaultDescription
            --formattext|json|yamltextForce a specific output format (overrides --json when set to json).
            --no-color—offDisable ANSI colour codes – handy when piping to files.
            --log-file<path>—Write verbose logs to the given file instead of stdout.
            Advanced / Debug Switches
            SwitchArgumentDefaultDescription
            --profile—offCollect simple timing statistics and print them after the request.
            --benchmark<iterations>1Run the same prompt repeatedly to benchmark throughput.
            --trace—offEnable low‑level tracing (useful for developers of custom back‑ends).
