Tools and scripts

Extra scripts and tools can be found in the tools subfolder of the repository. They usually require extra dependencies or cater to special use-cases and are thus not integrated into the main CLI tool. The tools usually lack a sophisticated user interface and error handling is kept at a minimum.

timeseries2csv.py

Note

In earlier releases, the tool was called histogram2csv.py due to a naming error. The tool does not handle histogram data but time series data.

This tool extracts time series data from the device. It supports the same resolutions as the official app and outputs CSV data. To operate, it requires click and pytz installed as well as the rctclient module.

It has one required parameter called DAY_BEFORE_TODAY that allows the user to shift the latest point to query to the last minute of the day that was DAY_BEFORE_TODAY days in the past. This is most useful for the highest resolution "minute" sampling rate, where setting this to 1 will query the entire last day, suitable for exporting the previous day in a cronjob during the night, for example. For the other resolutions, it should be set to 0 for most use cases to avoid unexpected results such as shifting to the previous month.

Usage: timeseries2csv.py [OPTIONS] DAY_BEFORE_TODAY

  Extract time series data from an RCT device. The tool works similar to the
  official App, but can be run independantly, it is designed to be run from
  a cronjob or as part of a script.

  The output format is CSV.  If --output is not given, then a name is
  constructed from the resolution and the current date.  Specify "-" to have
  the tool print the table to standard output, for use with other tools.

  Use --header-format to select the format of the first lines. If "none", no
  headers will be included. With "simple", a single line will be included
  naming all the columns. This is the preferred format for "csv2influx.py"
  as well as for importing into spreadsheet applications. Specify "influx2"
  to have a set of headers added that are meant for use with InfluxDB 2.x
  "influx write" command. See the documentation for details.

  Data is queried into the past, by specifying the latest point in time for
  which data should be queried.  Thus, DAYS_BEFORE_TODAY selects the last
  second of the day that is the given amount in the past.  0 therefor is the
  incomplete current day, 1 is the end of yesterday etc.

  The device has multiple sampling memories at varying sampling intervals.
  The resolution can be selected using --resolution, which supports
  "minutes" (which is at 5 minute intervals), day, month and year.  The
  amount of time to cover (back from the end of DAY_BEFORE_TODAY) can be
  selected using --count:

  * For --resolution=minute, if DAY_BEFORE_TODAY is 0 it selects the last
  --count hours up to the current time.

  * For --resolution=minute, if DAY_BEFORE_TODAY is greater than 0, it
  selects --count days back.

  * For all the other resolutions, --count selects the amount of days,
  months and years to go back, respectively.

  Note that the tool does not remove extra information: If the device sends
  more data than was requested, that extra data is included.

  Examples:

  * The previous 3 hours at finest resolution: --resolution=minutes
  --count=3 0

  * A whole day, 3 days ago, at finest resolution: --resolution=minutes
  --count=24 3

  * 4 Months back, at 1 month resolution: --resolution=month --count=4 0

Options:
  -h, --host TEXT                 Host to query  [required]
  -p, --port INTEGER              Port on the host to query [8899]
  -o, --output FILE               Output file (use "-" for standard output),
                                  omit for "data_<resolution>_<date>.csv"

  -H, --header-format [simple|influx2|none]
                                  Header format [simple]
  --time-zone TEXT                Timezone of the device (not the host running
                                  the script) [Europe/Berlin].

  -q, --quiet                     Supress output.
  -r, --resolution [minutes|day|month|year]
                                  Resolution to query [minutes].
  -c, --count INTEGER             Amount of time to go back, depends on
                                  --resolution, see --help.

  --help                          Show this message and exit.

The amount of data to query can be given using the --count option, it defines how much "time" to go back. The actual amount depends on the --resolution:

  • For "day", it operates on one hour intervals, so a count of 5 goes back 5 hours.

  • "week", "month" and "year" go back in "week", "month" and "year" intervals.

The output file name is either constructed from the resolution and date of the latest (that is, highest) timestamp using the schema data_<resolution>_<date>.csv or whatever is specified in the --output option. If - is specified, it writes to standard output, suitable for piping into other programs.

Note

The time zone is assumed to be Europe/Berlin, which can be overwritten using the --time-zone parameter.

The script prints all log/error information to standard error to allow the output of the tool to be read from standard output if instructed so.

Output file

The --output parameter can be omitted, which causes the tool to write to a file using the pattern data_<resolution>_<date>.csv, where <date> is an isoformat-formated date and time of the day of the highest (most recent) timestamp in the output data. So, when called on 2020-11-08 with DAY_BEFORE_TODAY, the file will be named data_day_2020-11-07T00:00:00.csv.

If - (a dash) is passed, the CSV table will be written to standard output for use by another tool via a pipe.

Finally, if a filename is passed, this file will be used.

Files are written atomically, to prevent incomplete files from being present while the tool works.

Specifying --header-format=none causes the headers to be omitted from the output, while the default --header-format=simple includes a standard header-line suitable for csv2influx.py or generic spreadsheet applications. Use --header-format=influx2 for directly importing data to an InfluxDB 2.x using the influxdb write command.

Handling of incomplete data

The script will try to get a complete dataset, but due to the devices returning a random amount of data (it takes an average of seven queries to receive one complete day for a single metric), it can only jump over holes not longer than a few hours and will request the same portion over and over again.

Holes in the devices data can occur:

  • If the battery ran empty (power_mng.soc reached power_mng.soc_min or power_mng.soc_min_island) during the night (during the day, the device powers itself from the strings).

  • If the time of the device was changed forward by more than a few hours.

  • If the device was switched off for some hours.

If the device sends invalid data (incomplete dataset with valid CRC or data with invalid CRC), the query is retried until valid data is received. Likewise, if the device sends frames that are not of interest (as may occur when another client such as the app communicates with it at the same time), the OID of that frame is logged and ignored.

Importing into InfluxDB 2.x

InfluxDB 2.x includes builtin support for reading CSV files. For earlier versions, see the tool csv2influxdb.py below.

There are two ways to import the data:

  • Let timeseries2csv.py specify the most important header information by calling it with --header-format=influx2

  • Use whatever header-format and overwrite it on the command line.

If the former option was chosen, the first line of the CSV file specifies the measurement name based on the --resolution. The second line tells influx write how it should interpret the rows. The first one is a dateTime, all the others are fields.

The latter option requires specifying the headers on the command line. It also allows overwriting the headers in the file should they exist, e.g. the measurement specification.

Connection options have been omitted from the commands, as there is a number of ways to specify.

In order to import a CSV file with --header-format=none:

influx write -b <bucketname> -o <orgname> -f <input file.csv> \
    --header "#constant measurement,minutes" \
    --header "#datatype dateTime,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field,field"

If importing a CSV file with --header-format=simple (default), then add --skipHeader 1 to the command above.

To overwrite the measurement name in --header-format=influx2, skip the first line and specify the #constant header like so:

influx write -b <bucketname> -o <orgname> -f <input file.csv> \
    --skipHeader 1 --header "#constant measurement,inverter_1_minutes"

csv2influxdb.py

This tool takes the output CSV of the aforementioned tool timeseries2csv.py and sends it to an InfluxDB database. The tool trusts both the timestamps and the header lines and does not validate the data in any way. If a column is missing, it will be missing in the InfluxDB table, if rows are missing they will be missing from the table, too.

Note

The tool was written with InfluxDB v1.x in mind. InfluxDB v2.x supports reading CSV natively using Flux or via the influx write command. See Write CSV data to InfluxDB.

Usage: csv2influxdb.py [OPTIONS]

  Reads a CSV file produced by `timeseries2csv.py` (requires headers) and
  pushes it to an InfluxDB v1.x database. This tool is intended to get you
  started and not a complete solution. It blindly trusts the timestamps and
  headers in the file. InfluxDB v2.x supports reading CSV natively using
  Flux and via the `influx write` command.

  The `--resolution` flag defines the name of the table/measurement into
  which the results are written. The schema is `history_${resolution}`.

Options:
  -i, --input FILE                Input CSV file (with headers). Supply "-" to
                                  read from standard input  [required]

  -n, --device-name TEXT          Name of the device [rct1]
  -h, --influx-host TEXT          InfluxDB hostname [localhost]
  -p, --influx-port INTEGER       InfluxDB port [8086]
  -d, --influx-db TEXT            InfluxDB database name [rct]
  -u, --influx-user TEXT          InfluxDB user name [rct]
  -P, --influx-pass TEXT          InfluxDB password [rct]
  -r, --resolution [minutes|day|month|year]
                                  Resolution of the input data
  --help                          Show this message and exit.

Influx

The script assumes that the database in the InfluxDB instance to exist. It will write to a table called history_<resolution>_<resolution>. The --device-name is used as value for the rct tag, and the fields are all float. The names are read from the first (header) line of the CSV. In a CSV produced by timeseries2csv.py, the names are the middle portion of the logger.minutes_<name>_log_ts as name. Thus, logger.minutes_ea_log_ts can be found in the ea field.

Input

Input can be read from a file, or from standard input when called with the filename -. This allows data to be piped from another program, such as timeseries2csv.py without hitting the disk.

read_pcap.py

This tool requires scapy to be installed. It reads a pcap file and displays the requests and responses to or from the device. This is most useful for debugging rctclient, as it allows to take a look at the requests that the official smartphone app performs. The tool assumes that all traffic in the capture file is protocol traffic.

Warning

This is a tool intended for debugging, knowledge of both Python and binary data representaton is required.

The tool does some tricks to try to work around communication errors that appear when multiple requests from different devices are to be processed, which commonly happens when the app is used on two different phones at the same time or the device is communicating with the vendor. Further, it removes frames whose content is either AT+\r or 0x2b3ce1. The former is used by the vendors server at the beginning of each communication session (or as keep-alive), the latter is used by the app which refers to the sequence as "switching to COM protocol". Despite two protocols mentioned already, both communicate with the same protocol after these initial bytes, so the tool simply slices them off.

An example how to work with the resulting data is provided at the end.

Preparation

The first thing to do is to capture network traffic. This is most easily done at the router or another central point. The most commonly used tool for the task is TCPDUMP(1), which is available for all commonly used operating systems. Assuming that the device under test has IP address 192.168.0.1, a command like the following should be all that's needed for a first try:

tcpdump -w rct-dump-$(date +%s).pcap host 192.168.0.1

This command writes a new file with a unique enough name each time it is invoked, allowing for quick jumps between captures. The host filer makes sure that only traffic to or from the device under test is captured.

Notice that the above command does not differentiate between protocols or TCP ports. This could easily be added to the capture filter, but for demonstration purposes we'll utilize tshark from the wireshark project to further filter the traffc:

tshark -r rct-dump-<timestamp>.pcap -Y 'ip.addr == 192.168.0.1 and tcp.port == 8899' -w rct-dump-<timestamp>.filtered.pcap

The command reads the source capture file, applies the filter for TCP port 8899 and writes a new file. The new file will be the input to the read_pcap.py tool.

In order for the tool to work, scapy needs to be installed, either system-wide or in a virtualenv (pip install -U scapy).

Invocation

The tool expects the input file name as only parameter: ./read_pcap.py rct-dump-<timestamp>.filtered.pcap.

Warning

Reading the capture file with scapy is extremely slow and very resource-intensive (mostly RAM). Avoid big files. A 35MB pcap file may take well over a minute to load.

The tool first prints an overview over the tcp sessions found inside the file. This is not to be confused with the Follow TCP stream feature in Wireshark, which follows the packets in both ways, whereas Scapy splits the sent and received packets into two streams. This has an important implication: The tool does not show the responses to requests in a concise manner, but will read one stream after the other. The result is a long list of requests, then a long list of answers.

An example for the streams looks like this:

Stream    0 TCP 192.168.0.10:52730 > 192.168.0.1:8899 <PacketList: TCP:187 UDP:0 ICMP:0 Other:0> 6840 bytes
Stream    1 TCP 192.168.0.1:8899 > 192.168.0.10:52730 <PacketList: TCP:167 UDP:0 ICMP:0 Other:0> 30281 bytes
Stream    2 TCP 192.168.0.1:3580 > 192.168.0.11:8899 <PacketList: TCP:159 UDP:0 ICMP:0 Other:0> 30281 bytes
Stream    3 TCP 192.168.0.11:8899 > 192.168.0.1:3580 <PacketList: TCP:159 UDP:0 ICMP:0 Other:0> 0 bytes

There are four streams of two devices (192.168.0.10 and 192.168.0.11) communicating with the device.

After the streams have been listed, the parsing process begins stream by stream. Each stream may contain multiple packets, they are parsed one by one in segments. One such segment is shown below:

NEW INPUT: 2021-05-07 06:36:44.530490 | 2b0104b403a7e6b9c72b0104663f1452e0692b01041ac87aa06c942b0104db2d2d69ae55ab2b010491617c58480f2b0104db11855b0f0a2b01040cb5d21b4894
frame consumed 9 bytes, 55 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=b403a7e6, address=0, data=)>
Received read : battery_placeholder[0].soc_update_since

frame consumed 9 bytes, 46 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=663f1452, address=0, data=)>
Received read : power_mng.n_batteries

frame consumed 9 bytes, 37 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=1ac87aa0, address=0, data=)>
Received read : g_sync.p_ac_load_sum_lp

frame consumed 10 bytes, 27 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=db2d69ae, address=0, data=)>
Received read : g_sync.p_ac_sum_lp

frame consumed 9 bytes, 18 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=91617c58, address=0, data=)>
Received read : g_sync.p_ac_grid_sum_lp

frame consumed 9 bytes, 9 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=db11855b, address=0, data=)>
Received read : dc_conv.dc_conv_struct[0].p_dc_lp

frame consumed 9 bytes, 0 remaining
Frame complete: <ReceiveFrame(cmd=READ, id=cb5d21b, address=0, data=)>
Received read : dc_conv.dc_conv_struct[1].p_dc_lp

END OF INPUT-SEGMENT

The frame is printed first, with the time stamp encoded in the dump and the hexadecimal output of its contents. The data is then fed to the frame parser ReceiveFrame. The first one shows that it consumed 9 bytes, so the buffer contains 55 more bytes. It is a READ command, requesting ID 0xb403a7e6. Read-requests do not carry a payload. The response is usually in another stream (for pcap files created with tcpdump at least), so the response should be further down the output. Other frames follow until the end of the segment is reached and the next one is fetched from the stream (or the next one).

Sometimes, data can have an invalid checksum. For example:

CRC mismatch, got 0xBB9B but calculated 0x6E18. Buffer: 2b050597e203f955bb9b
Attempting to decode while ignoring checksum
frame consumed 11 bytes, 36 remaining
Frame complete: <ReceiveFrame(cmd=RESPONSE, id=97e203f9, address=0, data=55)>
Received reply : power_mng.is_grid                        type: BOOL              value: True

As can be seen, the tool makes a second attempt at decoding the frame, this time ignoring the CRC check. As it is a tool meant for debugging, this approach is okay. It is not suitable anywhere but in debugging! Anyways, in this example, the frame was actually valid, but the device probably got confused by requests from multiple apps at once. Other times, the data is completely unusable.

There is a load of other quirks that the tool tries. One such quirk is that it assumes that a frame does not span across multiple packets. The protocol documentation makes no such statement, but at least for the devices it seems to be that way. Thus, if a frame is not complete when a segment ends and the next segment starts with the sequence 0x002b (which is the typical start-sequence of a device), the current frame is discarded and a new one starts consuming data. This does catch cases where the previous frame has an invalid length value, causing the parser to consume frame after frame, sometimes hundrets at once. A side-effekt is that if there is more than one frame after such a broken frame in the segment these are lost.

Decoding unknown data

Suppose we have a frame that is valid, but the OID is not known yet. In this example the OID is actually in the registry, but let's pretend it is not and thus neither its name nor data type is known:

frame consumed 14 bytes, 223 remaining
Frame complete: <ReceiveFrame(cmd=RESPONSE, id=b403a7e6, address=0, data=47000000)>
Could not find ID in registry

The above OID 0xB403A7E6 got a response payload of 0x47000000. Let's try to make sense from the data.

To work with the data, it needs to be converted to a byte stream first. The easiest way is to use bytearray.fromhex:

>>> b = bytearray.fromhex('47000000')
>>> b
bytearray(b'G\x00\x00\x00')

With the byte stream in the variable b, let's try to convert it into something usable. For this, struct.unpack is used with a set of format strings. First, try a 32 bit unsigned integer as is commonly used with unix timestamps:

>>> import struct
>>> struct.unpack('>I', b)[0]
1191182336
>>> from datetime import datetime
>>> datetime.fromtimestamp(1191182336)
datetime.datetime(2007, 9, 30, 21, 58, 56)

This 'could' very well be a timestamp, albeit representing point in time quite long ago, from 2007. Although it looks like a false track, it might still be worth checking the app to find a timestamp in that range. Sometimes, timestamps in the past are set for some settings that have not been updated. Assuming nothing was found, let's try converting it to a floating point number:

>>> struct.unpack('>f', b)[0]
32768.0

This looks like a power of two. Search the app again for values that have such a number.

In this example, the data type looks like a number. This is not always the case, for example a sequence of data that ends with a large number of 00 sequences typically contains a string (C uses NULL bytes to terminate strings). Some OIDs carry additional garbage data after the NULL byte, too, so this is something to look out for.

When lookig up the OID in the registry, we find out that it is battery_placeholder[0].soc_update_since which has a data type of float, so the last try was correct and 32768.0 is the correct result.