Loki
This, the previous "article" (on Grafana), and the next one (on Vector), are occurring mostly in parallel so that I can validate these services as I go.
Loki is... there's a whole lot going on there.
Log Retention Configuration
I enabled a retention policy so that my logs wouldn't grow without bound until the end of time. This coincided with me noticing that my /var/log/journal
directories had gotten up to about 4GB, which led me to perform a similar change in the journald
configuration.
Retention Policy Configuration:
limits_config:
retention_period: 168h # 7 days
compactor:
working_directory: /tmp/retention
compaction_interval: 10m
retention_enabled: true
retention_delete_delay: 2h
retention_delete_worker_count: 5
delete_request_store: filesystem
I reduced the retention_delete_worker_count
from 150 to 5 🙂 This optimization:
- Reduces resource usage: Less CPU overhead on Raspberry Pi nodes
- Maintains efficiency: 5 workers sufficient for 7-day retention window
- Prevents overload: Avoids overwhelming the Pi's limited resources
Consul Integration for Ring Management
I also configured Loki to use Consul as its ring kvstore, which involved sketching out an ACL policy and generating a token, but nothing too weird. (Assuming that it works.)
Ring Configuration:
common:
ring:
kvstore:
store: consul
consul:
acl_token: {{ loki_consul_token }}
host: {{ ipv4_address }}:8500
Consul ACL Policy (loki.policy.hcl
):
key_prefix "collectors/" {
policy = "write"
}
key_prefix "loki/" {
policy = "write"
}
This integration provides:
- Service discovery: Automatic discovery of Loki components
- Consistent hashing: Proper ring distribution for ingester scaling
- High availability: Shared state management across cluster nodes
- Security: ACL-based access control to Consul KV store
Comprehensive TLS Configuration
The next several hours involved cleanup after I rashly configured Loki to use TLS. I didn't know that I'd then need to configure Loki to communicate with itself via TLS, and that I would have to do so in several different places and that those places would have different syntax for declaring the same core ideas (CA cert, TLS cert, TLS key).
Server TLS Configuration
GRPC and HTTP Server:
server:
grpc_listen_address: {{ ipv4_address }}
grpc_listen_port: 9096
grpc_tls_config: &http_tls_config
cert_file: "{{ loki.cert_path }}"
key_file: "{{ loki.key_path }}"
client_ca_file: "{{ step_ca.root_cert_path }}"
client_auth_type: "VerifyClientCertIfGiven"
http_listen_address: {{ ipv4_address }}
http_listen_port: 3100
http_tls_config: *http_tls_config
TLS Features:
- Mutual TLS: Client certificate verification when provided
- Step-CA Integration: Uses cluster certificate authority
- YAML Anchors: Reuses TLS config across components to reduce duplication
Component-Level TLS Configuration
Frontend Configuration:
frontend:
grpc_client_config: &grpc_client_config
tls_enabled: true
tls_cert_path: "{{ loki.cert_path }}"
tls_key_path: "{{ loki.key_path }}"
tls_ca_path: "{{ step_ca.root_cert_path }}"
tail_tls_config:
tls_cert_path: "{{ loki.cert_path }}"
tls_key_path: "{{ loki.key_path }}"
tls_ca_path: "{{ step_ca.root_cert_path }}"
Pattern Ingester TLS:
pattern_ingester:
metric_aggregation:
loki_address: {{ ipv4_address }}:3100
use_tls: true
http_client_config:
tls_config:
ca_file: "{{ step_ca.root_cert_path }}"
cert_file: "{{ loki.cert_path }}"
key_file: "{{ loki.key_path }}"
Internal Component Communication
The configuration ensures TLS across all internal communications:
- Ingester Client:
grpc_client_config: *grpc_client_config
- Frontend Worker:
grpc_client_config: *grpc_client_config
- Query Scheduler:
grpc_client_config: *grpc_client_config
- Ruler: Uses separate alertmanager client TLS config
And holy crap, the Loki site is absolutely awful for finding and understanding where some configuration is needed.
Advanced Configuration Features
Pattern Recognition and Analytics
Pattern Ingester:
pattern_ingester:
enabled: true
metric_aggregation:
loki_address: {{ ipv4_address }}:3100
use_tls: true
This enables:
- Log pattern detection: Automatic recognition of log patterns
- Metric generation: Convert log patterns to Prometheus metrics
- Performance insights: Understanding log volume and patterns
Schema and Storage Configuration
TSDB Schema (v13):
schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h
Storage Paths:
common:
path_prefix: /tmp/loki
storage:
filesystem:
chunks_directory: /tmp/loki/chunks
rules_directory: /tmp/loki/rules
Query Performance Optimization
Caching Configuration:
query_range:
results_cache:
cache:
embedded_cache:
enabled: true
max_size_mb: 20
Performance Features:
- Embedded cache: 20MB query result cache for faster repeated queries
- Protobuf encoding: Efficient data serialization for frontend communication
- Concurrent streams: 1000 max concurrent GRPC streams
Certificate Management Integration
Automatic Certificate Renewal:
[Service]
Environment=CERT_LOCATION={{ loki.cert_path }} \
KEY_LOCATION={{ loki.key_path }}
# Restart Loki service after certificate renewal
ExecStartPost=/usr/bin/env sh -c "! systemctl --quiet is-active loki.service || systemctl try-reload-or-restart loki.service"
Certificate Lifecycle:
- 24-hour validity: Short-lived certificates for enhanced security
- Automatic renewal:
cert-renewer@loki.timer
handles renewal - Service restart: Seamless certificate updates with service reload
- Step-CA integration: Consistent with cluster-wide PKI infrastructure
Monitoring and Alerting Integration
Ruler Configuration:
ruler:
alertmanager_url: http://{{ ipv4_address }}:9093
alertmanager_client:
tls_cert_path: "{{ loki.cert_path }}"
tls_key_path: "{{ loki.key_path }}"
tls_ca_path: "{{ step_ca.root_cert_path }}"
Observability Features:
- Structured logging: JSON format for better parsing
- Debug logging: Detailed logging for troubleshooting
- Request logging: Log requests at info level for monitoring
- Grafana integration: Primary storage for alert state history
Deployment Architecture
Single-Node Deployment: Currently deployed on inchfield
node
Replication Factor: 1 (appropriate for single-node setup)
Resource Optimization: Configured for Raspberry Pi resource constraints
Integration Points:
- Vector: Log shipping from all cluster nodes
- Grafana: Log visualization and alerting
- Prometheus: Metrics scraping from Loki endpoints
This comprehensive Loki configuration provides a production-ready log aggregation platform with enterprise-grade security, retention management, and integration capabilities, despite the complexity of getting all the TLS configurations properly aligned across the numerous internal components.