Syslog header - custom parsing

Has anyone here encountered the possibility of custom parsing the syslog header? Would you advise me how to do it?

One source is sending me syslog messages with the wrong header (changing the order of timestamp and hostname), so I would like to parse the header myself, but I don't know how to achieve this.

Thank you for your ideas.

Regards,

Jan Sevela

  • 0  

    You'd have to edit the current/user/agent/agent.properties and write a regex for the syslog header that matches the events you're receiving.  In the agent.defaults.properties, there are the defintions of the syslog headers which you can take a look at and edit to match what you need:

    # Regular Expressions used by syslog parser during the phase of preprocessing

    # syslog.header.timestamp.ip looks for and parses out the following fields
    # if they are present:
    # timestamp (MMM dd HH:mm:ss)
    # timestamp with year (described in Network Working Group RFC 5424)
    # solaris style ip ([1.1.1.1.1.1])
    # device ip address (v4 or v6)
    # These fields are removed if present, and the final capture
    # group (the remainder of the message) is left for further parsing
    #syslog.header.timestamp.ip=(?s)^\\s*((?:[A-Z][a-z]{2}\\s+\\d+\\s+\\d{1,2}:\\d{2}:\\d{2}))?(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{1,6})?(?:(?:Z)|(?:(?:\\-|\\+)\\d{2}:\\d{2})))?\\s+(?:\\[(\\d+\\.\\d+\\.\\d+\\.\\d+)\\.\\d+\\.\\d+\\]\\s+)?(?:(\\d+\\.\\d+\\.\\d+\\.\\d+)\\s+)?(?:([a-fA-F0-9:\\.]+\\:[a-fA-F0-9:\\.]+)\\s+)?(.*)
    syslog.header.timestamp.ip=(?s)^\\s*((?:[A-Z][a-z]{2}\\s+\\d+\\s+\\d{1,2}:\\d{2}:\\d{2}))?(\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}(?:\\.\\d{1,6})?(?:(?:Z)|(?:(?:\\-|\\+)\\d{2}:\\d{2})))?\\s+(?:\\[(\\d+\\.\\d+\\.\\d+\\.\\d+)\\.\\d+\\.\\d+\\]\\s+)?(?:(\\d+\\.\\d+\\.\\d+\\.\\d+)\\s+)?(?:([a-fA-F0-9:\\.]+\\:[a-fA-F0-9:\\.]+)\\s+)?(.*)
    # syslog.header.hostname looks for the hostname pattern at the beginning of the message
    # only if a solaris style or regular ip has not already been found. The second capture
    # group contains the rest of the message which is then sent to the various parsers
    # Note: The hostname pattern specifically excludes :,[,],= in addition to space because
    # certain devices send a timestamp followed by the message with no hostname (the code
    # expects to see an IP or hostname if the timestamp is present). It so happens that the
    # first word of the message contains some of these chars, that helps us distinguish it
    # from a genuine hostname. See bugs 50489 & 23466.
    # The restriction to have at least one character as alphabet comes from the catos logs
    # that starts with a year, like "2010"
    # (VNA: used to be that the first character needed to be alphabet,
    # but now it can be placed anywhere in the hostname, as digits are allowed at the start now
    # and would let the regex match with IP addresses. That was somehow messing with the parsing
    # despite the statement above that this regex is only used when IPs didn't match...)
    syslog.header.hostname=(?s)^(?:([^ :\\[\\]=]*[0-9a-zA-Z][^ :\\[\\]=]+)\\s+)?(.*)

  • 0 in reply to   

    Hello,

    thanks for reply, it's just that you can't tell what tokens it parses, where it assigns them and you can't change their positions.

    empty regex seems to work, I'll parse the header in flexagent.

    "syslog.header.timestamp.ip="

    Regards,

    Jan Sevela