Fluent Bit & Fluentd

Jeff Yen

9 min readJan 22, 2021

開始使用 Fluent Bit和Fluentd

Fluentd vs Fluent Bit

兩個專案由於是同一家公司出品，所以有很多相似的地方，不過官方的定位不太一樣。

Fluent Bit is a fast Log Processor and Forwarder for Linux, Embedded Linux, MacOS and BSD family operating systems.

Fluentd is an open source data collector for unified logging layer.

https://docs.fluentbit.io/manual/about/fluentd-and-fluent-bit

在某些在某種角度上，Fluent Bit 和 Fluentd，就好比Beats 和 Logstash，一種更輕量的數據收集和傳送的服務，在kubernetes中Fluent Bit是以daemonset形式部署在每一個node上去收集數據，在下圖就可以看到Fluentd便像是資訊收集和然後基於不同的標籤routing到不同的目的地。

https://logz.io/blog/fluentd-vs-fluent-bit/

install fluentbit

官方提供兩種主流的安裝方式，這次的例子是container log -> fluent bit -> elasticsearch

Kustomize
Helm

Kustomize

第一步是建立namespace

kubectl create namespace logging

接下來設定service account和role

準備Fluent Bit DaemonSet初版的ConfigMap，等等裡面的內容會被不同環境kustomize overlays掉

如果想要確認INPUT, FILTER, and OUTPUT 設定，可以到官方的網站(https://config.calyptia.com)

git 資料夾架構大概是如下圖，然後再用Kustomize Merge project/prod/1-fluent-bit-config.yaml 去取代[INPUT] [OUTPUT] 設定。

裡面有service.yaml 是為了給servicemonitor 做監控用的

然後在prometheus-operator 裡加上serviceMonitor

然後使用官網提供的 example dashboard 作圖

Helm Chart

依如往常的簡單，先新增fluent helm repo

helm repo add fluent https://fluent.github.io/helm-charts

建立namespace

kubectl create namespace logging

準備好fluentbit-value.yaml 後， helm install

helm install --values=fluentbit-value.yaml -n logging fluent-bit fluent/fluent-bit

fluent bit config

我大多數input plugin 都是使用tail去拿到檔案資訊，行為就像是tail -f ，raw data是依照是哪種container而有所不同，像是docker和containerd，也因為來源的不同導致[INPUT]使用的paser設定也不同。

docker (original log source is a JSON map string)就可以直接使用JSON，

containerd 就有多一個 stdout F，所以在parser就要客製化，用regex抓出想要的資訊。

2021-01-22T03:27:45.253547129Z stdout F {"level":"info","app_id":"jeff-api","environment":"","hostname":"jys-api-server-749d876f57-6qxwh","grpc.start_time":1611286065,"grpc.service":"api.apiService","grpc.method":"EthereumCreateAddress","request_id":"56b69dd5-1234-12ab-987a-30f2311e2359","grpc.time_ms":26.612,"grpc.req":{},"grpc.resp":{"address_index":18,"address_hex":"0x4F97DAEC923BBB11a123d0123428984E522Cf750","created_at":"2021-01-22T03:27:45.227885267Z"},"access_log":true,"timestamp":1611286065,"graylog_level":6,"graylog_timestamp":1611286065.253,"message":"no message"}

使用官方推薦的ruby regular測試是否抓出自己想要的資訊。

準備好自訂的containerd Parser，過濾完的log{}變成JSON format但是有 escaped string，所以我們這裡要Decoders。

https://stackoverflow.com/questions/40406928/kubernetes-save-json-logs-to-file-with-escaped-quotes-why

完整的config在下圖，這次輸出的位置在graylog，所以使用gelf的[OUTPUT]

Fluentd

就像我一開始所提到，fluentd用於資訊收集和資料轉發，在高流量的網站，官方建議使用high-availability fluentd，去避免資料的遺失，官方提到三種可能性在

At most once: 最多傳遞一次，在有問題的情況會有可能造成訊息的遺失。
At least once: 最少傳遞一次，在有問題的情況會有可能造成同樣的訊息兩次。
Exactly once: 每個訊息只會傳遞一次，這是最理想的狀態。

一開始我想找官方提供mainfest file修改再apply，但是設定已經寫死在image裡，像是我想使用官方output es DaemonSet yaml，你還要進去pod 看fluentd 設定檔裡看 include conf.d/*.conf 再決定設定檔要mount在哪一個路徑下，可塑性蠻低的。

fluentd.conf in fluent/fluentd-kubernetes-daemonset:v1-debian-elasticsearch

在這種情況下我建議使用bitnami/fluentd helm chart，在value.yaml中 bitnami清楚將fluentd 分成兩種情況log forwarders 和 log aggregators ，仔細看可以注意到forwarder 是daemonset也就是負責在每個node tail log (可以被fluent bit 取代)，aggregator 使用statefule set。

因為我使用fluent bit 當作log aggregators，所以我forwarder.enabled: false。
大概的架構如下圖，需要部署 fluent bit deployment 在該服務的namespace，部署fluentd StatefulSet 在logging namespace，最後再output 到elasticsearch。

fluent bit 設定

假設去tail /var/log/jeff-api-*.log 的資訊，如果是實體檔案有額外mount到ssd 或其他空間，記得在deployment加上mount 資訊(如下圖)

configMap 輸出到fluentd-aggregator.logging.svc.cluster.local

Fluent Bit 在資料傳遞中使用internal binary representation，也就是說資料到達[OUTPUT]時，該plugin可能會在新的內存緩衝區中創建fluent 本身自己的格式以進行處理。如果application raw data輸出是json 格式，要記得使用 parser json，不然log 資訊會再有一個 log => {……}去包起來。