GROK Pattern in Streamsets Log Parser

Streamsets Log Parser allows you to parse and ingest Log Files from server
There are multiple pre-defined "Log Formats" to choose from such as CommonLog Format or Combined Log Format for Apache Access Logs
However, if you have defined your own log format then "GROK" patterns are great way to configure Log Parser to consume them.

The real challenge however is how should you define you GROK Pattern.
Test Grok Patterns (https://grokconstructor.appspot.com/do/matchis a great website to enter your GROK pattern and log line and test if things will work.

It also provides an "Automatic" mode (https://grokconstructor.appspot.com/do/automatic)
This will generate the GROK pattern for you based on the log line that you provide.

However, if you are using a customized version of Apache access log then you can use standard GROK patterns to match your log line.

For example, for my access log line GROK pattern is given below

Log Line
103.107.92.250 - - [21/Apr/2019:17:34:35 +0530] "GET /form/track-shipment/ HTTP/1.1" 200 8324 "http://onlinexpress.co.in/form/track-shipment/" "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36" 400

Grok Pattern

%{IPORHOST:clientip} %{USER:ident} %{USER:auth} \[%{HTTPDATE:timestamp}\] "(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})" %{NUMBER:response} (?:%{NUMBER:bytes}|-) %{QS:referrer} %{QS:agent} %{NUMBER:responseTime}


Streamsets Log Parser Configuration


In the screenshot MYPATTERN is the custom name that I have given for my pattern in "GROK Pattern Definition" field.
The first word is always the pattern name, which is to be entered in the "GROK Pattern" field.

0 comments:

Post a Comment