How Prometheus Scrapes 10,000 Targets Concurrently And How You Can Use the Same Mechanism
The two characters that make Prometheus fast (and how to use them yourself)
The math that should not work
Prometheus scrapes 500 endpoints every 15 seconds. If it scraped them one after another, and each HTTP call took 200ms, that is 100 seconds per cycle. The scrape interval is 15 seconds. The sequential version simply does not fit.
But Prometheus works. You have seen it yourself. Every service reporting on time, every 15 seconds, reliably. When you set up your first Prometheus instance and watched Grafana dashboards populate in real time, something was making hundreds of simultaneous HTTP calls without you thinking about it. That something is goroutines, and they are not specific to Prometheus. They are a standard feature of every Go program. This article shows you exactly what they are and how to build the same pattern yourself.
Hi — this is Pushpit from CloudOdyssey . Each week, I write about Cloud, DevOps, Systems Design deep dives and community update around it. If you have not subscribed yet, you can subscribe here.
What a goroutine actually is
A goroutine is not a thread. That distinction matters more than it sounds.
An OS thread starts with 1 to 8 megabytes of stack memory allocated by the kernel. If you try to run 10,000 threads on a single machine, you will exhaust memory before you exhaust your patience. OS threads are expensive to create, expensive to context-switch, and managed by the kernel with no knowledge of your application’s intentions.
A goroutine starts with approximately 2 kilobytes of stack memory, grows as needed, and is managed entirely by the Go runtime. The runtime runs what is called an M:N scheduler M goroutines are multiplexed onto N OS threads. When a goroutine blocks on I/O, the runtime moves another goroutine onto that thread without involving the OS kernel. The programmer does not manage any of this. You launch a goroutine and the runtime handles placement, scheduling, and teardown.
The programmer’s interface is two characters:
go checkEndpoint(url)That call returns immediately. The function runs concurrently in its own goroutine. You can run 100,000 goroutines on a laptop. You can not run 100,000 OS threads. This is why Prometheus scrapes 10,000 targets without the machine noticing.
The first concurrent program and the bug inside it
Take the health checker from Article 6. Sequential version, checking a list of service URLs one by one. Converting it to concurrent looks like this:
func main() {
urls := []string{
"https://api.example.com/health",
"https://auth.example.com/health",
"https://billing.example.com/health",
}
for _, url := range urls {
go checkEndpoint(url) // launch each check as a goroutine
}
// main returns here — before goroutines finish
}Run this. The output is partial, or completely empty. The program exits before the goroutines complete their work. main() does not wait for goroutines it launches. When main returns, the program terminates and takes every running goroutine with it.
This is the first goroutine bug almost every Go beginner writes. The fix is sync.WaitGroup:
package main
import (
"fmt"
"net/http"
"sync"
"time"
)
type Result struct {
URL string
Status string
}
func checkEndpoint(url string, wg *sync.WaitGroup, results *[]Result, mu *sync.Mutex) {
defer wg.Done() // signal this goroutine is finished, always runs on exit
client := &http.Client{Timeout: 5 * time.Second}
resp, err := client.Get(url)
status := "DOWN"
if err == nil && resp.StatusCode == 200 {
status = "UP"
resp.Body.Close()
}
mu.Lock() // acquire lock before touching shared slice
*results = append(*results, Result{URL: url, Status: status})
mu.Unlock() // release lock immediately after
}
func main() {
urls := []string{
"https://api.example.com/health",
"https://auth.example.com/health",
"https://billing.example.com/health",
}
var wg sync.WaitGroup
var mu sync.Mutex
var results []Result
for _, url := range urls {
wg.Add(1) // increment counter before launch
go checkEndpoint(url, &wg, &results, &mu) // launch goroutine
}
wg.Wait() // block until all goroutines call wg.Done()
for _, r := range results {
fmt.Printf("%s: %s\n", r.URL, r.Status)
}
}wg.Add(1) increments a counter before each goroutine launches. defer wg.Done() decrements it when the goroutine exits. wg.Wait() blocks main until the counter reaches zero. Every batch concurrent operation in Go follows this exact shape.
The shared data problem
Notice the mutex in the example above. That is not decoration. Multiple goroutines writing to the same slice concurrently is a data race, and data races in Go produce corrupted results or crashes that appear randomly and are extremely hard to debug.
The Go toolchain ships with a race detector. Run your program with:
go run -race main.goIf you try to append to a shared slice from multiple goroutines without synchronisation, the race detector will fire immediately:
WARNING: DATA RACE
Write at 0x00c0001b4000 by goroutine 8:
main.checkEndpoint(...)append is not atomic. Two goroutines can both read the slice header, both decide there is room, and both write to the same memory address. The mutex in the example prevents this by ensuring only one goroutine can modify results at a time.
The mutex approach is correct, but it requires discipline. Every write to shared state must be protected. Miss one and the race condition returns. Article 15 introduces channels, which replace the mutex entirely by removing the need for shared memory in the first place. Goroutines send results through a channel; one goroutine collects them. No shared slice, no lock needed. That is the idiomatic Go pattern, and it is exactly what Prometheus uses.
In the cloud wild
Prometheus scrape manager
Open prometheus/prometheus on GitHub and look at scrape/manager.go. Each scrape target gets a goroutine managed through a scrapePool. The pool runs goroutines that fire on a ticker, execute the HTTP scrape, and cancel via context if the target is removed from the configuration.
The goroutine loop inside looks structurally identical to what you wrote above. wg.Add, launch, wg.Wait. The main difference is that Prometheus wraps it with a context for cancellation, which lets the scrape manager shut down targets cleanly when you reload configuration.
Your running Prometheus instance is managing thousands of goroutines right now. It is not doing anything special. It is using the same
gokeyword, the samesync.WaitGroup, and the same patterns you just wrote. The scale is different. The mechanism is identical.
This same pattern appears in Kubernetes controllers (one goroutine per reconcile loop), in kubectl parallel operations, and in every Go-based load balancer or API gateway you have used.
Goroutines in one paragraph
A goroutine is a lightweight concurrent function managed by the Go runtime, not the OS. You launch one with go. You wait for a group of them with sync.WaitGroup. You protect shared state with sync.Mutex, or better, you eliminate shared state with channels. The entire Prometheus scraping engine, the Kubernetes controller reconciliation loop, and every parallel CLI operation you have used in the cloud ecosystem is built on this three-part foundation.
What is next
The mutex works, but it is the cautious solution. The Go-idiomatic solution is channels a mechanism for goroutines to communicate by passing values rather than sharing memory.



