SIEM @Home

I'm already mentioned in my previous article about Traffic Analysis in Qubes OS, that the IDS system alerts and logs should be passed to a log management system where we can correlate them with other logs and alerts. That system can be called SIEM However a real SIEM system makes sense in an enterprise environment only, because it is requires 7x24 monitoring,  and it is also needs special knowledge and experience to analyze the results. Meanwhile, we have more and more IT equipment in our home, and we have no clue what they really do. I admit that most of the users are not even care about this in general. The intended audience of this article is the remaining minority of home users who would like to know something about the behavior of their Smart TV, IP camera, NAS, Gaming Console, or the Smart Home server. Of course we don't need 7x24 monitoring at home, and we do not want a whole rack of IT equipment in the wardrobe. But we can use what we already have, like a NAS. That's why the used SIEM software should adapt to the very limited resources of these home devices. This means that all the SIEM solution out there are out of scope for us. We would need something less resource hungry solution. Let's see what we really need:
  • NAS - as the central log collector.

A NAS is very useful at home anyway, however now I'm just assume that you have one with free space for the logs, and you are able to install additional software components on it - if needed.

  • Router - as the main log source.

I'm pretty sure you have one. However most of the cases it's owned by your ISP. Again, I assume you can configure it as you like and it is capable of remote logging.

Log Collecting

The very first step that you must need something to be able to collect the logs from all your other devices. As I mentioned earlier the most clear choice is your NAS - especially if you already have one. Of course it needs an appropriate software. One of the best for this task is the syslog-ng. This should be already installed in case of Synology products, so you only need to configure it properly. I will use the pure - vendor independent - format of the configuration files located in (/etc/syslog-ng/syslog-ng.conf) and I hope that the vendors are not hide all the good features of this software behind their fancy GUI. The following section is responsible for receiving the local logs:
source s_local {
 system();
 internal();
};
This one is for the handling of the remote log sources:
source s_network {
 syslog(transport(udp));
 syslog(transport(tcp))
};
Of course these are just the "must have" parts, you probably need other things in your full config file. See the official product manuals for a complete guide.

Log Processing

Processing of the logs are taking place after you received all the logs, but before you would store them. The main goal here to parse your logs in a format we can use later without much effort and resources. However this is a tricky part because the message part of the syslog format is not structured, moreover the content of the message is depends on the application who created the log. I will guide you with specific examples to see how it is can be done. The first step is to separate the packetfilter and the addrwatch logs from the rest using a simple filter:
filter f_iptables { match("IN=" value("MESSAGE")) and match("OUT=" value("MESSAGE")); };
filter f_addrwatch { program("addrwatch"); };
Then we parse those messages to it's specific fields:
parser p_iptables {
 kv-parser (prefix("fw."));
};

parser p_arp {
 csv-parser(
 prefix("arp.")
 columns("facility" "timestamp", "interface", "vlan", "MAC", "IP", "type")
 delimiters(" ")
 );
};
Where the first example using the kv-parser and we are adding a custom prefix only, the second one using the csv-parser where we are defining the fields we need - according to the addrwatch log format.

Log Storage

After the custom parsing we can finaly store the logs. The syslog-ng has many choice regarding log storage. I choosed the mysql backend, because:
  • I need a structured format, to make the searches fast and resource friendly,
  • I'm already using it on the NAS, so it was pre-installed,
  • It can be configured to match the weak performance of a NAS,
So the next part is about the mysql storage:
destination mysql_messages {
 sql(type(mysql)
 host("localhost") username("syslog") password("*****")
 database("syslog")
 table("messages")
 columns("timestamp", "host", "program", "pid", "severity", "message")
 values("${YEAR}-${MONTH}-${DAY} ${HOUR}:${MIN}:${SEC}", "${HOST}", "${PROGRAM}", "${PID}", "${LEVEL_NUM}","${MESSAGE}"));
};

destination mysql_packetfilter {
 sql(type(mysql)
 host("localhost") username("syslog") password("*****")
 database("syslog")
 table("packetfilter")
 columns("timestamp", "host", "IN_Interface", "MAC", "SRC_IP", "SRC_Port", "OUT_Interface", "DST_IP", "DST_Port", "Protocol", "Type", "Action", "Chain")
 values("${YEAR}-${MONTH}-${DAY} ${HOUR}:${MIN}:${SEC}", "${HOST}", "${fw.IN}", "${fw.MAC}", "${fw.SRC}", "${fw.SPT}", "${fw.OUT}", "${fw.DST}", "${fw.DPT}", "${fw.PROTO}", "${fw.TYPE}", "${fw.A}", "${fw.C}"));
};

destination mysql_addrwatch {
 sql(type(mysql)
 host("localhost") username("syslog") password("*****")
 database("syslog")
 table("addrwatch")
 columns("timestamp", "vlan", "MAC", "IP")
 values("${YEAR}-${MONTH}-${DAY} ${HOUR}:${MIN}:${SEC}", "${arp.vlan}", "${arp.MAC}", "${arp.IP}"));
};
What we can see here is I using 3 separate table for iptables, addrwatch, and the rest of the logs. I hope those settings are meaningful enough, and probably the previous parser settings are make sense now and can be distinguish from the internal macros of the syslog-ng engine. Of course it needs some SQL knowledge, what I do not want to cover here. The syslog-ng can prepare those tables real time, however it is wise to prepare them because of the field types. The last section put the previous ones together, and probably gives more meaning of the whole solution:
log { source(s_network); filter(f_iptables); parser(p_iptables); destination(mysql_packetfilter);};
log { source(s_network); filter(f_addrwatch); parser(p_arp); destination(mysql_addrwatch);};
log { source(s_local); source(s_network); destination(mysql_messages);};

Log Sources

It is obvious that without logs there is no point talking about SIEM. So more logs collected makes more information available. In this example config I used the following log sources:

NAS

Most of the vendors are using Linux inside their NAS solution, so all the running applications are - at least - syslog aware. Hopefully you will find all the app logs in the database.

Router

Most of the routers out there are running Linux as well, so they are also able to send logs. But again it is depends on your router vendor about what can or can not do on them. The best if you using a custom firmware like OpenWRT, LEDE, DD-WRT. With these you will be not limited by any vendor and you can send the logs to your log collector for sure. Moreover you can install more apps, and/or customize the default ones to match your needs:

packetfilter

This means iptables in most cases. However the default packetfilters are rarely logging anything. To make it produce meaningful logs I used custom log targets like:

 -j LOG --log-prefix "A=DROP C=inet "

Without these the "Action" and The "Chain" fields are not filled for sure.

adrwatch

This is a pretty useful application wathtcing your network continuously, and logging all the detected devices. This is a preparation of a device management at your home. I'm pretty sure, You will be surprised how much things you already have and using your network!

Of course you can involve more and more applications, and/or devices to the log parsing to make the log analysis much more easy. Owing to the syslog-ng correlating features, you may able to correlate these logs to create a more precise results.


As I working with enterprise grade SIEM system on a daily basis, I know that this can't be called a real SIEM.  However I see that the most important part is the good parsing of the collected logs. Then you "only" need some query to generate nice diagrams from your data :) As we still will be limited by the available resources I can't use the already published solutions because they would need much more resources that we have at home.