每秒上百万级ELK日志平台优化

作者: root007 分类: EFK,未分类 发布时间: 2019-08-20 17:46

没错我是一个标题党!

kibana显示日志延迟,kafka消息堆积,elasticsearch动不动就OOM: 优化前VS优化后


Marking url as dead. Last error: [LogStash::Outputs::ElasticSearch::HttpClient::Pool::HostUnreachableError

org.elasticsearch.cluster.metadata.ProcessClusterEventTimeoutException: failed to process cluster event (put-mapping) within 30s

1、开启多个logstash消费kfka,也就是logstash高可用

input{
  kafka{
    bootstrap_servers => "xx.xxx.xx.xx:9091,xxx.xx.140:9092,xxx.xx.x.140:9093"
    topics => "ali-k8s-logs"
    consumer_threads => 10   #多个logstash等于topicd的PartitionCount:20
    client_id => "logstash-A"  #使用不同的client_id
    group_id => "logstash"
    decorate_events => true
    codec => json
    auto_offset_reset => "latest"
}

}

2、Elasticsearch优化,根据自己服务器配置来

thread_pool:
    bulk:
        size: 3
        queue_size: 5000
indices.fielddata.cache.size:  10%

3、logstash output

output {

  elasticsearch {
    hosts => [""]
    index => "logstash-k8s-%{pod_name}-%{+YYYY.MM.dd}"
    pool_max => "5000"
    pool_max_per_route => "500"
  }

4、Elasticsearch 优化JVM还老是OOM的,GC改G1GC

## GC configuration
#-XX:+UseConcMarkSweepGC
##-XX:CMSInitiatingOccupancyFraction=75
#-XX:+UseCMSInitiatingOccupancyOnly
-XX:+UseG1GC
-XX:MaxGCPauseMillis=50

还有就是分片啦,现象一下一个16G服务器跑1个数据库实例和
一个16G服务器跑上千个数据库实例

发表评论

电子邮件地址不会被公开。 必填项已用*标注