Skip to end of metadata
Go to start of metadata

csv-calc

csv-calc is an application to calculate statistics (such as mean, median, size, standard deviation...) on multiple fields of an input file. Input records can be grouped by id, block, or both.

One drawback of csv-calc is that it only outputs the statistics for each id and block. The input records themselves are not preserved. This means that you cannot use csv-calc as part of a pipeline.

csv-calc --append

The --append option to csv-calc passes through the input stream, adding to every record the relevant statistics for its id and block.

For example:

> echo -e "1,0\n2,0\n3,1\n4,1" | csv-calc mean --fields=a,id
Output (mean, id):
1.5,0
3.5,1
 
> echo -e "1,0\n2,0\n3,1\n4,1" | csv-calc mean --fields=a,id --append
Output (a, id, mean):
1,0,1.5
2,0,1.5
3,1,3.5
4,1,3.5

keeping track of fields and formats

Another challenge for csv-calc users is the large number of fields that it generates (it applies every operation to every indicated field).

There are now --output-fields and --output-format options to show what kind of output a given csv-calc command will produce.

Examples:

> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-fields
t/mean,a/mean,t/diameter,a/diameter,id,block
 
> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-format
t,d,d,d,ui,ui

With --append, these fields are appended to input fields:
id and block are not repeated 
> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-fields --append
t/mean,a/mean,t/diameter,a/diameter

> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-format --append
t,d,d,d

points-to-ros and points-from-ros are utilities for publishing and receiving PointCloud2 message on ROS.

Setup

To build them you need to set "snark_build_ros" to ON in snark cmake.

we use snark-graphics-test-pattern to generate some sample points in a cube:

snark-graphics-test-pattern cube 100000 0.1 0.01 >cube.csv

Here is the output: cube.csv

To run ROS, you need to setup the environment and run roscore:

source /opt/ros/kinetic/setup.bash
roscore

 

points-from-ros

This utility subscribes to the specified topic and receives PointCloud2 messages, then it writes the point data as csv or binary to stdout.

Either --binary or --format option must be specified, which sets the output to be binary or ascii csv respectively.

The field names and message format are embedded in the message, the format is used for conversion.

You can use --output-fields or --output-format to get the field names and message format from message (the publisher must be running).

source /opt/ros/kinetic/setup.bash
points-from-ros --topic "/points1" --output-fields
points-from-ros --topic "/points1" --output-format
#ascii
points-from-ros --topic "/points1" --fields x,y,z,r,g,b --format 3d,3ub | view-points --fields x,y,z,r,g,b
#binary
points-from-ros --topic "/points1" --fields x,y,z,r,g,b --binary 3d,3ub | view-points --fields x,y,z,r,g,b --binary 3d,3ub

 

points-to-ros

This utility reads binary or ascii csv data from stdin and publishes it as PointCloud2 message on ROS.

Either --binary or --format option must be specified, which indicates whether input is binary or ascii.

The --fields options specifies the field names for one point in the message.

If a field named block is present it will be used for breaking records into separate messages, records with the same block number will be grouped into one message. When no such field is present it will read the stdin until EOF and then send one message.

The --hang-on option delays the points-to-ros exit, so that the clients can receive all the data on the last message.

#ascii
cat cube.csv | points-to-ros --topic "/points1" --fields x,y,z,r,g,b,a --format 3d,3ub,ub --hang-on
#binary
cat cube.csv | csv-to-bin 3d,3ub,ub | points-to-ros --topic "/points1" --fields x,y,z,r,g,b,a --binary 3d,3ub,ub --hang-on

 

 

The problem

You are working with a data pipeline, and on a certain record, you want to end processing and exit the pipeline.

But to break on some condition in the input, you need an application that parses each input record.

Worse, the condition you want could be a combination of multiple fields, or use fields unrelated to the data you want to process.

Introducing csv-eval --exit-if!

Previously csv-eval had a --select option that passed through any records that matched the select condition.

csv-eval --exit-if also passes through input records unchanged, but it exits on the first record matching the exit condition.

Like csv-eval --select you can use any expression on the input that evaluates to bool.

Comparing the two features:

$ echo -e "1,1\n2,2\n3,0\n1,3\n1,2" | csv-eval --fields=a,b --select="a+b<>3"
Output:
1,1
2,2
1,3

$ echo -e "1,1\n2,2\n3,0\n1,3\n1,2" | csv-eval --fields=a,b --exit-if="a+b==3"
Output:
1,1
2,2

This post outlines how to run a bash function in parallel using xargs. (Note, you could optionally use "parallel" instead of xargs, but I see no advantages/disadvantages at this stage)

It may not be the best way, or the right way, and it may have unforeseen consequences, so I'd welcome any feedback on better practice.

Rationale

We often run scripts with for or while loops. In the simplest case if the operation within the loop is self contained, it's very easy to make it parallel.

E.g.

# written in confluence, might not actually run
for file in *.csv ; do
  cat $file | csv-slow-thin > ${file%.csv}.processed.csv
done

Becomes:

# written in confluence, might not actually run
echo 'file=$1;cat $file | csv-slow-thin > ${file%.csv}.processed.csv' > do-slow-thing-imp
chmod 777 do-slow-thing-imp
cat *.csv | xargs -n1 -P8 do-slow-thing-impl
rm do-slow-thing-imp

But it's clunky to write a script file like that.

Better to make a function as follows, but the specific method in the code block below doesn't work

# written in confluence, might not actually run
function do-slow-thing
{
	file=$1
	cat $file | csv-slow-thin > ${file%.csv}.processed.csv
}
cat *.csv | xargs -n1 -P8 do-slow-thing #but this doesn't work

The following is the current best solution I'm aware of:

Note: set -a could be used to automatically export all subsequently declared vars, but it has caused problems with my bigger scripts

Note: set -a might have platform specific functionality. On Dmitry's machine it exports vars and functions, whereas on James' machine it exports vars only

Note: the use of declare -f means you don't need to work out a priori which nested functions may be called (e.g. like errcho in this example)

#!/bin/bash
export readonly name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

export readonly global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world

errcho "run parallel with xargs"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

Note: if using comma_path_to_var, you can use --export to export all of the parsed command line options

No need to read beyond this point, unless you want to see the workings that lead up to this, including options that don't work.

 

The problem exposed and the solution

The following code is tested, try it, by copying into a script and running the script

#!/bin/bash

name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world

Single process works fine, output:

xargs_from_func: first run single threaded

xargs_from_func: example_function: global var is hello and passed var is world

Lets try multiple processes with xargs. Add the following line to the end of the script:

errcho "run parallel with xargs, attempt 1"
(echo oranges; echo apples) | xargs -n1 -P2 example_function

The problem is that example_function is not an executable:

xargs_from_func: run parallel with xargs, attempt 1

xargs: example_functionxargs: example_function: No such file or directory

: No such file or directory

Instead, let's run "bash" which is an executable:

errcho "run parallel with xargs, attempt 2"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "example_function {}"

The new bash process doesn't know the function:

xargs_from_func: run parallel with xargs, attempt 2

bash: example_function: command not found

bash: example_function: command not found

So let's declare it:

errcho "run parallel with xargs, attempt 3"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; example_function {}"

Getting close, but our example_function refers to another of our functions, which also needs to be declared:

xargs_from_func: run parallel with xargs, attempt 3

bash: line 3: errcho: command not found

bash: line 3: errcho: command not found

We can do that one by one, or declare all our functions in one go:

errcho "run parallel with xargs, attempt 4"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; $(declare -f errcho) ; example_function {}"

errcho "run parallel with xargs, attempt 5"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

The function itself is now working, but all the global variables are lost (including "global_var" and also the script name:

xargs_from_func: run parallel with xargs, attempt 4

: example_function: global var is  and passed var is oranges

: example_function: global var is  and passed var is apples

xargs_from_func: run parallel with xargs, attempt 5

: example_function: global var is  and passed var is oranges

: example_function: global var is  and passed var is apples

We can add these explcitly, one by one, e.g.:

errcho "run parallel with xargs, attempt 6"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; global_var=$global_var ; example_function {}"

...but it's extremely hard to work out which functions call which functions and which of all functions called use which global variables.

Leads to very hard to trace bugs in real world examples.

xargs_from_func: run parallel with xargs, attempt 6

: example_function: global var is hello and passed var is oranges

: example_function: global var is hello and passed var is apples

So the final solution I've arrived at is to pass everything through by using "set":

errcho "run parallel with xargs, attempt 7"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(set) ; example_function {}"

This spits out a lot of extra garbage because it includes an attempt to reallocate readonly variables:

xargs_from_func: run parallel with xargs, attempt 6

bash: line 1: BASHOPTS: readonly variable

bash: line 1: BASHOPTS: readonly variable

bash: line 8: BASH_VERSINFO: readonly variable

bash: line 8: BASH_VERSINFO: readonly variable

bash: line 38: EUID: readonly variable

bash: line 38: EUID: readonly variable

bash: line 68: PPID: readonly variable

bash: line 79: SHELLOPTS: readonly variable

bash: line 87: UID: readonly variable

bash: line 68: PPID: readonly variable

bash: line 79: SHELLOPTS: readonly variable

xargs_from_func: example_function: global var is hello and passed var is oranges

bash: line 87: UID: readonly variable

xargs_from_func: example_function: global var is hello and passed var is apples

...but notice that it did work.

For some reason the following doesn't hide the readonly errors:

errcho "run parallel with xargs, attempt 7"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(set) > /dev/null ; example_function {}"

...and I've tried various combos of putting the dev/null inside the $(), and redirection of stderr.

I think, therefore, the best approach is to explictly declare each global using export, and to either explicitly export each function, or use the "declare -f" statement at the xargs call

That looks like this:

#!/bin/bash
export readonly name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

export readonly global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world

errcho "run parallel with xargs"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

Note: the readonly is not strictly necessary for this example, but is good practice if it is a readonly variable.

The whole script together:

#!/bin/bash

name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world


errcho "run parallel with xargs, attempt 1"
(echo oranges; echo apples) | xargs -n1 -P2 example_function

errcho "run parallel with xargs, attempt 2"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "example_function {}"

errcho "run parallel with xargs, attempt 3"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; example_function {}"

errcho "run parallel with xargs, attempt 4"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; $(declare -f errcho) ; example_function {}"

errcho "run parallel with xargs, attempt 5"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

errcho "run parallel with xargs, attempt 6"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; global_var=$global_var ; example_function {}"

errcho "run parallel with xargs, attempt 7"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(set) ; example_function {}"


 

 

 

 

 

Some external references:

http://stackoverflow.com/questions/1305237/how-to-list-variables-declared-in-script-in-bash

http://stackoverflow.com/questions/11003418/calling-functions-with-xargs-within-a-bash-scrip

Flush when you (sig)pipe

This blog entry described a pretty subtle bug that leads to unexpected behaviour when handling PIPE signal in C++.

First, a brief reminder of how and when the PIPE signal is used. Assume we have a pipeline of commands:

pipeline
command | head -n 2

The commands generate and process textual output, but we take only the first 2 lines. Once head received two lines of output, it terminates. On the next write standard output of command has no recipient, so the operating system sends a PIPE signal to command and terminates it. The point to note is that the signal is sent only when a command attempts to write something to standard output. No signal is sent and the command keeps running as long as it stays silent, e.g.

no output, no signal
time { sleep 10 | sleep 1; }

runs for 10 seconds, even if the "recipient" of the output exits after only 1 second.

Puzzle

With this pattern in mind, consider the following snippet of code:

Loop with signal handling
std::string line;
line.reserve( 4000 );
try
{
    signal_flag is_shutdown;
    command_line_options options( ac, av, usage );
    char delimiter = options.value( "--delimiter", ',' );
    bool flush = options.exists( "--flush" );
    comma::csv::format format( av[1] );
    while( std::cin.good() && !std::cin.eof() )
    {
        if( is_shutdown ) { std::cerr << "csv-to-bin: interrupted by signal" << std::endl; return -1; }
        std::getline( std::cin, line );
        if( !line.empty() && *line.rbegin() == '\r' ) { line = line.substr( 0, line.length() - 1 ); } // windows... sigh...
        if( !line.empty() ) { format.csv_to_bin( std::cout, line, delimiter, flush ); }
    }
    return 0;
}

The code is copied from the csv-to-bin utility at git revision c2521b3d83ee5f77cb1edf3fe7d42b767b4a392b. The exact details of the signal_flag class are not relevant, it suffices to say that on receipt of INT, TERM, and PIPE signals it would evaluate to logical "true" and then return to normal execution from the place where the signal was received. If you want to follow the problem hands-on, checkout the code as git checkout c2521b3. To return the code to the current (HEAD) revision, run git checkout master.

Now consider the following script using csv-to-bin:

script
#!/bin/bash

for n in {0..9}; do
    sleep 2
    echo "$0: output $n" >&2
    echo ">>>",$n | csv-to-bin s[3],ui || { echo "output failed, $?" >&2; exit 1; }
done

Let us invoke the script in the following pattern:

usage pattern
./count-bin.sh | csv-from-bin s[3],ui --delimiter=' ' | head -n 2

The expected sequence of events is:

  1. initially we see lines "./count-bin.sh: output 0" from the script itself (on the standard error) and ">>> 0" from csv-from-bin on standard output
  2. after two iterations (two lines on standard output), head terminates
  3. when csv-from-bin attempts to write its output on the next iteration (counter n is 3), the pipe is closed and there is no recipient; therefore, csv-from-bin receives a PIPE signal and terminates; we shall see output from the script itself (on standard error) but no line ">>> 2" on standard output
  4. finally, on the next iteration there is no recipient for the output from the script itself, and therefore, csv-to-bin shall receive a PIPE signal and terminate with the "interrupted by signal" message, the script shall write the "output failed" message and exit

So far so good. The actual output, however, is:

wrong output
./count-bin.sh: output 0
>>> 0
./count-bin.sh: output 1
>>> 1
./count-bin.sh: output 2
./count-bin.sh: output 3
./count-bin.sh: output 4
./count-bin.sh: output 5
./count-bin.sh: output 6
./count-bin.sh: output 7
./count-bin.sh: output 8
./count-bin.sh: output 9

The script keeps running, csv-to-bin apparently never receives SIGPIPE, although the head and csv-from-bin processes are gone (can be confirmed by looking at the process tree from a separate terminal).

So, what went wrong?

Explanation

The standard output is (by default) buffered. Therefore, no actual write is made in the main loop of csv-to-bin (unless '–flush' option is used or the buffer is full, which does not happen in our example). Therefore, nothing is written to standard output within the loop itself, and no signal is sent.

Once all the input is processed, the main loop terminates and proceeds to the "return 0" line. Again, nothing is written yet and no signal sent.

Finally, the main function exits. At this point, C++ invokes the destructors of all the global objects including the output streams, and finally the output is written. This is the time when csv-to-bin encounters the lack of output recipient and gets a PIPE signal. However, by this time we are well out of the userland code. The signal is received but no action can be made out of it. For the end-user it looks like csv-to-bin receives a signal and ignores it, exiting with the status of 0, which is already set by "return 0" before receiving the signal.

From the point of view of count-bin.sh script, csv-to-bin call was a success, and therefore, the script keeps running contrary to what we expected to achieve by using "head -n 2".

Solution

Depending on your requirements, any of the following approaches can be used:

Do not handle PIPE signal

This is the simplest way and it has been implemented in the current version of csv-to-bin and other comma applications. If no user handler is set for SIGPIPE, the default behaviour applies and on receipt of SIGPIPE a program terminates with exit status of 141. Unless the user must do something really special on receiving the signal, e.g., write a log file, sync a database, and so on, there is no need to handle PIPE (or any other signal for that matter) explicitly.

Flush after yourself

Nuff said. If you do need to handle SIGPIPE, make sure that every output is flushed (or not buffered in the first place). The flush will trigger a PIPE signal if no-one reads your output. Note that performance may be badly affected by this approach.

Kill yourself

Change the signal handler to perform the necessary last-minute action after receiving SIGPIPE, then re-send the signal to itself. In this case, the utility will also terminate with exit status of 141.

Restore the default signal handler

The custom signal handler is instantiated in the constructor of signal_flag object. Once the object is out of scope, it shall restore the default handler. This shall be the default implementation but has not been done yet. This approach is more appropriate for longer-running applications that must handle signals during some special sections of the code. Once out of the special section, the default handler shall apply. The special handler shall perform the necessary last-minute actions and then re-send the signal to the application.

If you have csv records with multiple keys and would like to assign unique ids to those records, you could use csv-enumerate. (In particular, it would help to overcome the current limitation of csv-calc, which cannot handle multiple id fields.)

csv-enumerate appends id to the input string.

For example:

> ( echo 20170101T000000,hello ; echo 20170101T000000,world ; echo 20170101T000001,hello ; echo 20170101T000000,world ) | csv-enumerate --fields ,greeting
20170101T000000,hello,0
20170101T000000,world,1
20170101T000001,hello,0
20170101T000000,world,1
 
> ( echo 20170101T000000,hello ; echo 20170101T000000,world ; echo 20170101T000001,hello ; echo 20170101T000000,world ) | csv-enumerate --fields t,greeting
20170101T000000,hello,0
20170101T000000,world,1
20170101T000001,hello,2
20170101T000000,world,1

You also can output a list of all values, their ids, and number of entries with given ids, e.g:

> ( echo 20170101T000000,hello ; echo 20170101T000000,world ; echo 20170101T000001,hello ; echo 20170101T000000,world ; echo 20170101T000005,world ) | csv-enumerate --fields ,greeting --map
"world",1,3
"hello",0,2

Binary mode is supported as usual.

Introduction

Just another example how comma and snark utilities could be combined to cobble together something that works and easily could be further polished, all in matter of minutes.

Assume, you have information about the terrain and would like to find path from A to B.

The following shows how the first cut of it could be done in a few command lines. It uses graph-search utility with distance as objective function, but it is easy to parametrize graph-search to use a different objective function, e.g. based on gradient. It is also easy to add more simple post-processing for better obstacle avoidance, path smoothing, etc.

As usual, convert the pipelines to binary to improve their performance.

Sample dataset

> # download dataset
> curl http://perception.acfr.usyd.edu.au/data/samples/riegl/rose.st/rose.st.ground.csv.gz | gunzip > ground.csv
> curl http://perception.acfr.usyd.edu.au/data/samples/riegl/rose.st/rose.st.nonground.csv.gz | gunzip > non-ground.csv
> # make sense of data
> view-points "ground.csv;fields=x,y,z,r,g,b" "non-ground.csv;fields=x,y,z,r,g,b"

Make search graph

> # assign graph vertex ids
> cat ground.csv | cut -d, -f1,2,3 | csv-paste - line-number > nodes.csv
> # make graph edges (simply densely mesh points in a given radius
> cat nodes.csv | points-join nodes.csv --radius 0.12 --all | csv-eval --fields=,,,a,,,,b --output-if "a != b" > edges.csv

> # view graph
> view-points "nodes.csv;fields=x,y,z;colour=white" "edges.csv;shape=line;colour=grey;fields=first,,second"

 

Our search graph is very simplistic, but we got it with no effort. One easily can add more on top: filter out no-go zones, add or remove edges etc.

Search for a path

> # search for path between node with id 5000 to node with id 100000 (remember how we numbered the graph nodes using csv-paste above)
> graph-search --from 5000 --to 100000 --nodes "nodes.csv;fields=x,y,z,id" --edges "edges.csv;fields=,,,source,,,,target" > path.csv
> # view the result
> from=5000 ; to=100000 ; view-points "nodes.csv;fields=x,y,z;colour=grey;weight=1" "edges.csv;shape=line;colour=grey;fields=first,,second" <( cat nodes.csv | egrep ",$from$|,$to$" )";colour=yellow;weight=10;fields=x,y,z,label" <( cat path.csv | csv-eval --fields ,,z "z=z+0.25" )";shape=lines;colour=yellow" <( cat path.csv | csv-eval --fields ,,z "z=z+0.25" )";weight=3;colour=yellow"

Our path is fairly jagged. There are lots of smoothing methods that are relatively easy to implement. As a quick fix you could simply higher --radius value for points-join, by the price of higher computation time. Try e.g. points-join ... --radius 1.5; it takes longer, but the path is way more smooth:

Adding cost to edges

The path we got is based on minimum distance. We could add cost to each edge in edges.csv . Then graph-search will use the cost instead of distance.

Assume, it is expensive for us to drive on the grass (because the gardener will charge us for damages).

> # quick and dirty: add to each vertex the amount of green in it (the formula for colour is tuned for demonstration only)
> cat ground.csv | csv-eval --fields ,,,r,g,b "t=(-1.3*r+2*g-1.3*b)*1.1+255" | cut -d, -f1,2,3,7 | csv-paste - line-number > nodes.with-cost.csv
 
> # get edges with amount of green as their cost
> time cat nodes.with-cost.csv | points-join nodes.with-cost.csv --radius 0.2 --all -v | csv-eval --fields=,,,,a,,,,,b --output-if "a != b" > edges.with-cost.csv
 
> # search path with cost
> graph-search --from 5000 --to 100000 --nodes "nodes.with-cost.csv;fields=,,,,id" --edges "edges.with-cost.csv;fields=,,,,source,,,,cost,target" > path.avoid-grass.csv
 
> # search path by distance
> graph-search --from 5000 --to 100000 --nodes "nodes.with-cost.csv;fields=x,y,z,,id" --edges "edges.with-cost.csv;fields=,,,,source,,,,,target" > path.by-distance.csv
 
> # view results
> from=5000 ; to=100000 ; view-points "nodes.with-cost.csv;fields=x,y,z,scalar;color=290:360,jet" <( cat nodes.with-cost.csv | egrep ",$from$|,$to$" )";colour=yellow;weight=10;fields=x,y,z,,label" <( cat path.avoid-grass.csv | csv-eval --fields ,,z "z=z+0.25" )";shape=lines;colour=yellow" <( cat path.avoid-grass.csv | csv-eval --fields ,,z "z=z+0.25" )";weight=3;colour=yellow" <( cat path.by-distance.csv | csv-eval --fields ,,z "z=z+0.25" )";shape=lines;colour=green;title=by-distance" <( cat path.by-distance.csv | csv-eval --fields ,,z "z=z+0.25" )";weight=3;colour=magenta"

As you see, the path by distance (coloured magenta) is almost a straight line, while path for avoiding grass (coloured yellow) tries to avoid the green areas, albeit not completely. If in the formula above "t=(-1.3*r+2*g-1.3*b)*1.1+255" you use a greater multiplier instead of 1.1 (e.g. 1.5), it will make driving on grass so prohibitive that you will see the path going around the lawn and avoiding greens completely.

This example does not demonstrate anything novel, it all are well-known decades-old algorithms. Instead, it demonstrates how just in three command lines you could build a reasonable drivable path on a terrain represented by a relatively arbitrary point cloud.

 

When processing binary fixed-width data,  comma and snark utilities use byte order of the computer on which they run. E.g. on most of the desktops (with x86 architectures), byte order is little endian, but ARM computers will have big endian byte order.

If you have fixed-width data (e.g. from some external party or a device) that have endianness (byte order) different from your computer. There is a number of ways to deal with it at various levels (e.g. using htoi()-style functions, or python serialization, or comma::packed classes, etc).

If you just want to quickly modify your data to the desired endianness, now you could use csv-bin-reverse, e.g:

> # reverse all the fields in the data
> cat big-engian-data.bin | csv-bin-reverse t,3d,2ui
> # reverse some fields in the data
> cat big-engian-data.bin | csv-bin-reverse t,3d,2ui --fields 2-4

If you need to make sense of how it works, you could run something like:

> echo 0,1,2,3,4,5,6,7 | csv-to-bin 8ub | csv-bin-reverse uw,ui,uw | csv-from-bin 8ub
1,0,5,4,3,2,7,6
 
> echo 0,1,2,3,4,5,6,7 | csv-to-bin 8ub | csv-bin-reverse uw,ui,uw --fields 2 | csv-from-bin 8ub
0,1,5,4,3,2,6,7
> # etc...

If you have a single image or an image stream and would like to apply to each image a mask generated from the image itself, you could use the mask filter.

The mask filter allows you to run a pipeline of almost any filters available in cv-cat on an image to generate the mask and then apply it to the original image.

The benefits of the mask filter are more obvious when it is used on an image stream rather than a single image, since it allows to quickly prototype and deploy an image processing pipeline with fairly complex filtering and masking.

The example below demonstrates how to (very crudely) extract the vegetation from an image. To try it, right-click on the original image and save it on your computer as rippa.png.

> # mask image
> cv-cat --file rippa.png "mask=linear-combination:-r+2g-b|threshold:50|convert-to:ub;encode=png" --output no-header > masked.png
> # view
> eog masked.png
> # or for a quick view run
> cv-cat --file rippa.png "mask=linear-combination:-r+2g-b|threshold:50|convert-to:ub;view;null" --stay

In the cv-cat command above the mask filter is given as its parameter the pipeline: linear-combination:-r+2g-b|threshold:50|convert-to:ub. (The mask, which is very crude, says: take pixels with lots of green and not so much of red and blue; the mask must have CV_8U depth, while the output of linear-combination is always in floats, thus, the explicit convert-to operation.) cv-cat runs this pipeline on rippa.png and applies the result to rippa.png itself.

The syntax of the mask pipeline is the same as for normal cv-cat filter pipelines, except for the mask filter pipeline the separator between the filters is '|' and equal sign is ':'. (To improve the syntax, in future, we may implement separator escaping.)

I.e. if instead of applying the mask, you just want to save it in a file, you could run:

> cv-cat --file rippa.png "linear-combination=-r+2g-b;threshold=50;convert-to=ub;encode=png" --output no-header > mask.png
> eog mask.png 

You may want to apply a constant pre-computed mask to images. The example below will produce the same result as the first example.

> cv-cat --file rippa.png "mask=load:mask.png;view;null" --stay

A more elaborate example of applying filters to masked images

Masking can be the first step during feature extraction process. The next example demonstrates how a vegetation index can be applied on top of the mask operation.

We start with the following rainforest photo:

Now apply a mask and then, in the same line, apply a filter emulating one of the possible vegetation index filters (note: this is only a crude approximation used in this demo as the image lacks near-infrared data that are typically used by realistic vegetation indexes).

"Vegetation index"
> cv-cat --file rainforest.jpg --output=no-header "mask=linear-combination:-r+2g-b|threshold:50|convert-to:ub;ratio=r/g;convert-to=ub,256;encode=png" > filtered-rainforest.png
> eog filtered-rainforest.png

The ratio=r/g operation is the in-place "vegetation index" filter. Its output has floating-point precision with the expected range of output values around 1; therefore, we use convert-to before writing an output file.

You also could visualise intermediate results on the fly:

> cv-cat --file rainforest.jpg "view;mask=linear-combination:-r+2g-b|threshold:50|convert-to:ub;view;ratio=r/g;view;null" --stay

 

The black areas of the output image are not green as defined by the mask. The subsequent ratio filter distinguishes between the shades of green.

In a more general case, the ratio filter may use multiple channels, like in the (contrived) ratio=(100 + 2g + b)/(1.5*r + g + a). Note that r, g, b and a here are short-hands for channels 0, 1, 2 and 3 of the input image, respectively. The data in those channels do not have to correspond to red, green, blue or alpha and can be arbitrary false colours. According to the ratio syntax multiple terms shall be surrounded in brackets, constants can be integer or floating point values, and multiplication signs are optional.

view-points can be used now to display triangles, which may be useful, if you would like to quickly visualize a triangulated surface without converting it into a CAD model.

Draw triangles:

> ( echo 0,0,0,1,1,1,1,0,0,0 ; echo 0,0,0,1,1,1,0,1,0,1 ; echo 0,0,0,1,1,1,0,0,1,2 ) | view-points "-;shape=triangle;fields=corners,id"

Draw filled triangles:

> ( echo 0,0,0,1,1,1,1,0,0,0 ; echo 0,0,0,1,1,1,0,1,0,1 ; echo 0,0,0,1,1,1,0,0,1,2 ) | view-points "-;shape=triangle;fields=corners,id;fill"

Draw both:

> view-points <( echo 0,0,0,1,1,1,1,0,0,0 ; echo 0,0,0,1,1,1,0,1,0,1 ; echo 0,0,0,1,1,1,0,0,1,2 )";shape=triangle;fields=corners,id;title=unfilled" <( echo 2,0,0,3,1,1,3,0,0,0 ; echo 2,0,0,3,1,1,2,1,0,1 ; echo 2,0,0,3,1,1,2,0,1,2 )";shape=triangle;fields=corners,id;fill;title=filled"

It also can be used, if triangulation is produced dynamically in a stream, just as for any other shape supported by view-points.

An application publishes data, and one or more clients are listening. How do you solve these issues?

  1. The application publishes infrequently, but a client would like the data more often;
  2. The application is waiting for more input, but a client (which perhaps has just connected and thus might have missed the previous output) would like to know the last output line published.

Enter csv-repeat.

csv-repeat will pass stdin to stdout, repeating the last record after a period of inactivity.

For example:

{ echo -e "1\n2\n3"; sleep 10; } | csv-repeat --timeout=3 --period=1

It might be useful to know if the data is from the original application, or being repeated. csv-repeat can decorate the output with additional fields:

{ echo -e "1\n2\n3"; sleep 10; } | csv-repeat --timeout=3 --period=1 --append=repeating

Or perhaps you'd like the lines to be timestamped also:

{ echo -e "1\n2\n3"; sleep 10; } | csv-repeat --timeout=3 --period=1 --append=repeating,time

And of course csv-repeat supports binary data with --binary=<format>.

csv-repeat can also act as a watchdog on a data stream. In the following example, it will exit, if the input stream fails to send an update on time:

> csv-repeat --timeout 3
csv-repeat: input data timed out

You could use it to raise an alarm without disconnecting from the data producer. E.g. if you want to exit after 3 timeouts, you could write:

> for i in {1..3} ; do csv-repeat --timeout 1 ; echo TIMED OUT >&2 ; done
csv-repeat: input data timed out
TIMED OUT
csv-repeat: input data timed out
TIMED OUT
csv-repeat: input data timed out
TIMED OUT

A new functionality has recently been added to bash-related comma utilities that allows specifying default values of command line options where the options are defined. 

The following code uses command line option --filename. The script defines a bash variable called filename, whose value is specified by the option on the command line. However, if the option is not given on the command line, the default value will be used. 

#!/bin/bash
 
function options_description
{
    cat <<END
--filename=[<filename>]; default=example.txt; example file
END
}
 
source $( type -p comma-application-util )
eval "$( options_description | comma-options-to-name-value "$@" | comma_path_value_mangle )"
 
echo "filename is $filename"

Save the code above in a file called script and make it runnable with 'chmod +x ./script'. Then execute the following commands:

$ ./script
filename is example.txt
$ ./script --filename another.txt
filename is another.txt

The first command uses the default value (example.txt), while the second command uses the given value (another.txt).

The default values can be enclosed in double or single quotes if necessary. For instance:

function options_description
{
    cat <<END
--command=[<command>]; default="cat /log/file"; command to execute
END
}

Default values of variables used in the formulas evaluated by csv-eval can now be specified by --default-values option. For instance,

$ ( echo a,10 ; echo b,20 ) | csv-eval --fields=,y "a=x+y" --default-values="x=1;y=2"
a,10,11
b,20,21

assigns default values to x and y. Since y is present in the input stream as specified by --fields, its default is ignored. On the other hand, x is not in the input stream and, therefore, its default value is used in the formula.

This capability is useful as it allows one to easily omit some fields without having to change the formulas where they are used, provided that the omitted fields have default values. For example, assume you want to write a script that multiplies a 3d vectors by a scalar, where the scalar may be either different for each vector or the same. Your implementation may look like following:

#!/bin/bash
if [[ -n "$1" ]] ; then fields="x,y,z" ; defaults="--default-values=scalar=$1"
else fields="x,y,z,scalar" ; fi
csv-eval --fields "$fields" $defaults "x1=x*scalar; y1=y*scalar; z1=z*scalar"

The existing points-grep utility has been given a major facelift. Given a shape as a set of planes, e.g. a bounding box or a hull of a moving vehicle, it greps points that belong to that shape from a streamed point cloud.

Let's make a dataset: just fill a cube with points:

> for i in $( seq -5 0.5 5 ) ; do for j in $( seq -5 0.5 5 ) ; do for k in $( seq -5 0.5 5 ) ; do echo $i,$j,$k ; done ; done ; done > cube.csv

Let's cut a slice from the cube, where 1,1,1,0.5 is 1,1,1 are coordinates of a normal to the plane and 0.5 it's distance from 0,0,0. (Normals do not need to be normalized.)

> cat cube.csv | points-grep planes --normals <( echo 1,1,1,0.5 ) > filtered.csv

View it:

> view-points "cube.csv;colour=grey;hide" "filtered.csv;colour=red"

Specify multiple planes, e.g. grep octahedron:

> cat cube.csv | points-grep planes --normals <( echo -1,-1,-1,2; echo -1,-1,1,2; echo -1,1,-1,2; echo -1,1,1,2; echo 1,-1,-1,2; echo 1,-1,1,2; echo 1,1,-1,2; echo 1,1,1,2; ) > filtered.csv

Try to view the results as above.

Now, suppose we have a bounding box for a moving vehicle.

Let us prepare the dataset: the same cube, but we will merge it with the vehicle trajectory, which, for simplicity's sake will have only 3 vehicle positions given as x,y,z,roll,pitch,yaw. For each point of the cube, we will specify the corresponding position of the vehicle at the time, when the point was seen. (For demonstration's sake, I omit the usual timestamp manipulations and simply append the vehicle position to each point.)

( cat cube.csv | csv-paste - value '-1,-1,-1,0.1,0.2,0.1,0' ; cat cube.csv | csv-paste - value '0,0,0,0.2,0.1,0.3,1' ; cat cube.csv | csv-paste - value '1,2,1,-0.1,-0.1,0.1,2' ) > merged.csv

Filter the data:

> cat merged.csv | points-grep box --size=1,2,3 --fields=x,y,z,filter > filtered.csv

and view it, colouring the points by the vehicle position number:

> view-points "cube.csv;colour=100,100,100,100" "filtered.csv;fields=x,y,z,,,,,,,id;weight=5" --orthographic

If you would like to make an extensive test suite more structured, rather than having all the checks in one flat expected file, i.e. you want to run test once, but check many well-structured suites of test cases.

Now, instead of file, you can have a directory called expected.

For example, suppose, you have a test called big-test:

> tree big-test/
big-test/
├── expected
│   ├── address
│   │   └── check_city
│   └── check_temperature
└── test

with the files looking like this:

> cat big-test/test
echo city=sydney
echo temperature=25
 
> cat big-test/expected/address/check_city 
city="melbourne"
 
> cat big-test/expected/check_temperature 
temperature=22

When you run the test, the conditions in check_city and check_temperature will be checked:

> cd big-test
> comma-test-run
comma-test-run: 1 test[s] in subdirectories of /home/seva/src/comma/big-test: running...
comma-test-run: test 1 of 1: .: started...
comma-test-run: test 1 of 1: .: running...
Test output does not match expected:
expected output:
./expected/address/check_city:city/expected="melbourne"
./expected/address/check_city:city/actual="sydney"
./expected/check_temperature:temperature/expected="22"
./expected/check_temperature:temperature/actual="25"
comma-test-run: .: failed
comma-test-run: 1 test[s] in subdirectories of /home/seva/src/comma/big-test: 1 test[s] out of 1 failed

If you find yourself with a csv file with an uneven number of fields in each line a new option for csv-fields may help.

csv-fields make-fixed will make every line have the same number of fields by adding fields to short lines or, with the --force option, stripping fields from long lines.

For example:

$ { echo "a,b,c,d"; echo "x,y,z"; } | csv-fields make-fixed --count=6
a,b,c,d,,
x,y,z,,,
$ { echo "a,b,c,d"; echo "x,y,z"; } | csv-fields make-fixed --count=3 --force
a,b,c
x,y,z

If you try to crop a line without the --force option then the application will fail.

Bash Trap Gotchas

Traps (signal handlers) are useful for cleaning up resources, but have some unexpected quirks in bash.

Multiple EXIT Traps

You would hope that the following would call f1 then f2 on exit:

function f1() { echo "one" >&2; }
function f2() { echo "two" >&2; }

trap f1 EXIT  # nope!
trap f2 EXIT

... but only f2 is called, since a new EXIT trap replaces an existing one.

This is a particular problem when two EXIT traps are widely separated; for example, if one trap is inside a script sourced by another script.

EXIT Trap in a function

An EXIT trap defined in a function is called on exit from the script using that function. For instance:

#!/bin/bash
 
function f()
{
	trap 'echo trap defined in f >&2' EXIT
	echo "end of f" >&2
}
 
f
echo "after f" >&2 

will print

end of f
after f
trap defined in f

This trap will also be called if the function exits explicitly via exit statement or after receiving a signal such as INT or TERM. If the function is called as bash -c f, however, the order of messages is reversed:

#!/bin/bash

function f()
{
    trap 'echo trap defined in f >&2' EXIT
	echo "end of f" >&2
}
export -f f

bash -c f
echo "after f" >&2

produces

end of f
trap defined in f
after f

In particular, this happens when a function is invoked under comma_execute_and_wait wrapper.

Signal Traps

Traps can catch signals such as SIGTERM (from a kill command) or SIGINT (sent by Ctrl+C).

But be aware that EXIT traps are still called when a signal is received, so bye will be called twice in the following script when Ctrl+C is pressed:

function bye() { echo "Bye!" >&2; exit 1; }

trap bye EXIT SIGINT
sleep 1; sleep 1; sleep 1; sleep 1   # pressing Ctrl+C here calls bye() twice
echo "Reached the end" >&2

The EXIT trap was called a second time because of the exit command inside bye(). The solution is not to just remove the exit, since then bye() won't terminate the script on SIGINT.

It is safer just to just trap EXIT (and also ignore signals inside the trap function):

function bye()
{
    trap '' SIGINT SIGHUP SIGTERM SIGQUIT   # ignore signals
    # ... clean up resources ...
    echo "Bye!" >&2
    # no need for "exit 1"
}

trap bye EXIT
sleep 1; sleep 1; sleep 1; sleep 1   # pressing Ctrl+C here just calls bye() once now
echo "Reached the end" >&2

In this case bye() doesn't need to contain an exit command:

  • If a script runs to completion, the exit status is 0.
  • If it is terminated by exit n, it exits with status n after calling the trap function.
  • If it is terminated by a signal, the exit status is 128 plus the signal code, even if there is an exit command inside the trap function. (Use trap -l to see a list of signal codes).

RETURN Traps

RETURN traps are not called if the script terminates (via a signal or exit command), so they are best avoided in general.

An alternative method of cleaning up on function return is to call the function inside a subshell with an EXIT trap. This has the normal limitations of subshells, however (e.g. a variable set in the subshell won't be set in the parent).

  • There is an extra gotcha here: using the normal subshell bracket syntax ( ... ) doesn't seem to call the EXIT trap. Instead, create the subshell using bash -c.
  • See comma_process_kill() in comma/bash/process/comma-process-util for an example.

If you do end up using a RETURN trap, there is one last gotcha: the trap needs to be unset (using trap - RETURN), otherwise it can remain active. (This behaviour is inconsistent in bash: sometimes it happens and sometimes it doesn't).

DEBUG Traps

DEBUG traps are called for every command in a script:

function debug_trap() { echo "Line $1: $2" >&2 ; }
trap 'debug_trap $LINENO "$BASH_COMMAND"' DEBUG

The main gotcha is that DEBUG traps are not inherited by functions unless they have the "trace" attribute (declare -t).

 

csv-eval can now be used to update csv stream values in place. Simply assign new values to input stream fields. For example,

$ ( echo 1,0.1,cat; echo 2,0.01,dog )
1,0.1,cat
2,0.01,dog
$ ( echo 1,0.1,cat; echo 2,0.01,dog ) | csv-eval --fields=,x --format=,d 'x = x**2'
1,0.01,cat
2,0.0001,dog
$ ( echo 1,0.1,cat; echo 2,0.01,dog ) | csv-to-bin ui,d,s[3] | csv-eval --fields=,x --binary=ui,d,s[3] 'x = x**2' | csv-from-bin ui,d,s[3]
1,0.01,cat
2,0.0001,dog

It is also possible to update input stream and append new value simultaneously. For example,

$ ( echo 1,0.1,cat; echo 2,0.01,dog ) | csv-eval --fields=,x --format=,d 'x = x**2; flag = (x < 0.001)' --output-format=flag --output-format=ub
1,0.01,cat,0
2,0.0001,dog,1

Note that --output-fields and --output-format apply to appended fields only. Input fields and input format remain fixed.

We just have added support of Velodyne Puck (VLP-16) to velodyne-to-csv utility.

Simply run velodyne-to-csv --puck with all other options specified as usual.

For example, suppose your VLP-16 publishes its data over UDP on port 2368.

Then you could get the individual points, e.g. as:

> udp-client 2368 --timestamp | velodyne-to-csv --puck | head
20160609T072201.492662,1,1275,0,0,0,7.40179190377,-3.05843657996,0.139793775563,1.96263183445,8.01,1,1
20160609T072201.492664,2,3060,0,0,0,1.02118837253,-0.421968133996,-0.255094495626,1.96264092504,1.134,1,1
20160609T072201.492666,3,2040,0,0,0,7.02538984034,-2.90305596431,-0.398381298921,1.96265001562,7.612,1,1
20160609T072201.492669,4,5355,0,0,0,1.017899836,-0.420630935204,-0.214087692812,1.96265910621,1.122,1,1
20160609T072201.492671,5,2295,0,0,0,7.03951111896,-2.90904104991,0.666392809049,1.96266819679,7.646,1,1
20160609T072201.492673,6,5610,0,0,0,1.04425917459,-0.431545741181,-0.178961028006,1.96267728738,1.144,1,1
20160609T072201.492676,7,2805,0,0,0,7.01734553007,-2.90003060883,0.932300477049,1.96268637796,7.65,1,1
20160609T072201.492678,8,6885,0,0,0,1.01819925165,-0.420798011051,-0.13527497118,1.96269546855,1.11,1,1
20160609T072201.492680,9,2550,0,0,0,7.05049559242,-2.91388048404,1.20829980797,1.96270455914,7.724,1,1
20160609T072201.492683,10,7905,0,0,0,1.02009309536,-0.421602406843,-0.0965685629644,1.96271364972,1.108,1,1

...or view data as:

> udp-client 2368 --timestamp | velodyne-to-csv --puck --fields x,y,z,id,scan | view-points --fields x,y,z,id,block

with the output like:

Of course, ASCII CSV is too slow, thus, as before, use binary data to keep up with Velodyne data in realtime,  e.g:

> udp-client 2368 --timestamp | velodyne-to-csv --puck --fields x,y,z,id,scan --binary | view-points --fields x,y,z,id,block --binary 3d,2ui

In your bash scripts, when you are inside a loop, do not declare local array variables.

Try to run the following script and observe time per iteration time grow linearly:

#!/bin/bash
num=${1:-1000}  # How many iterations before report elapsed time
A=$(date +%s%N);    # Timestamp to nanoseconds
iteration=0
function do_something()
{
    while true;
    do
        (( ++iteration ))
        sleep 0.001             # Pretend to do some work
        local my_array=( 1 )    # create a local array
        # Report elapsed time
        if (( ! ( iteration % num ) )) ;then
            B=$( date '+%s%N' )
            echo "$iteration $(( ($B - $A)/1000000 ))"   
            A=$(date +%s%N);
        fi
    done
}

do_something

There is a problem with performance penalty in the line:

local my_array=( 1 )	# Array of one item

The line above creates an array for every single iteration. It happens to slow bash scripts down. On average, the reported duration above increases linearly (it is output for every 1000 iterations). This causes quadratic performance degradation with respect to the number of iterations. 

For example, a common trap is to declare a local array to store the array of PIPESTATUS array.

while true; do
	funcA | funcB | funcC
	local status=("${PIPESTATUS[@]}")	# For checking on all return codes in the above pipe
done

Corrected script:

declare -a status 		# Use 'local -s status' if inside a function
while true; do
	funcA | funcB | funcC
	status=("${PIPESTATUS[@]}")
done

Exposing classes and functions defined in C++ libraries to Python is now possible in comma by creating C++/Python bindings with Boost.Python.

Example

To illustrate this new capability, bindings for the C++ class format and its member function size() declared in csv/format.h have been defined in python/comma/cpp_bindings/csv.cpp:

// python/comma/cpp_bindings/csv.cpp
#include <boost/python.hpp>
#include <comma/csv/format.h>
BOOST_PYTHON_MODULE( csv )
{
    boost::python::class_< comma::csv::format >( "format", boost::python::init< const std::string& >() )
        .def( "size", &comma::csv::format::size );
	// add other csv bindings here
}
 

and added to cmake:

# fragment of python/comma/cpp_bindings/CMakeLists.txt

add_cpp_module( csv csv.cpp comma_csv )
# add other modules here

Build comma with BUILD_SHARED_LIBS=ON and BUILD_CPP_PYTHON_BINDINGS=ON, then open a python interpreter and enter the following commands:

>>> import comma.cpp_bindings.csv as csv
>>> f = csv.format('d,2ub,s[5]')
>>> f.size()
15

The function size() outputs binary size corresponding to the format string that was passed to the format object f on construction.

Under the hood

The bindings are placed inside a shared library and saved in the file csv.so, which is then installed in comma/cpp_bindings along with the other Python modules. On Ubuntu, it will usually be /usr/local/lib/python2.7/site-packages/comma/cpp_bindings/csv.so. Note that the name of the module used as a parameter for BOOST_PYTHON_MODULE macros has to match the name of the shared library declared in the cmake file, e.g. csv in the above example. 

Limitations

The bindings are exposed as a shared library and hence one is limited to building comma with shared libraries or ensuring that all required static libraries have been compiled with -fPIC. Attempting to link with static libraries without position independent code may cause linking to fail or link with shared libraries instead.

 

view-points --pass-through

view-points has gained a new option --pass-through (or --pass for short) that allows it to become part of a processing pipeline.

The basic usage is:

$ cat data.csv | some-operation | view-points --pass | some-other-operation | view-points --pass > output.csv

or alternately:

$ cat data.csv | some-operation | view-points "-;pass" "otherdata.csv" | some-other-operation | view-points "-;pass" > output.csv

When multiple data sources are viewed only one can be given the pass option. pass will also disable --output-camera-config and the ability to output the point under mouse with double-right click.

For a more complete example try:

$ cat cube.bin | view-points "-;binary=3d;pass" \
      | csv-eval --fields=x,y,z --binary=3d "a = abs(x) < 0.2" | view-points "-;fields=x,y,z,id;binary=4d;pass" \
      | points-to-voxels --fields x,y,z --binary=4d --resolution=0.2 | view-points "-;fields=,,,x,y,z;binary=3i,3d,ui;weight=5"

using the attached cube.bin input file.

You should see three concurrent windows like this:

showing three stages of the processing pipeline.

A finite-state machine can be implemented in a few minutes on the command line or in a bash script using csv-join.

Assume we have the following state machine:

moore-finite-state-machine

It has the following events and states:

events:

  1. close
  2. open
  3. sensor closed
  4. sensor opened

states:

  1. opened
  2. closing
  3. closed
  4. opening

The state transition table can be expressed in a csv file state-transition.csv:

# event,state,next_state
$ cat state-transition.csv
close,opened,closing
close,opening,closing
open,closing,opening
open,closed,opening
sensor_closed,closing,closed
sensor_opened,opening,opened

With the state transition table, csv-join can read in events, output the next state and keep track of this new state. Here is an example usage (input is marked '<', output '>'):

$ csv-join --fields event "state-transition.csv;fields=event,state,next_state" --string --initial-state "closed"
< open
> open,open,closed,opening
< sensor_opened
> sensor_opened,sensor_opened,opening,opened
< close
> close,close,opened,closing
< sensor_closed
> sensor_closed,sensor_closed,closing,closed
< open
> open,open,closed,opening
< close
> close,close,opening,closing
< sensor_closed
> sensor_closed,sensor_closed,closing,closed

The input field and joining key in this case is a single field event. As usual with csv-join any number of fields can be used to represent an event. The following example has the event represented by two fields: operation and result.

csv-join --fields operation,result "state-transition.csv;fields=operation,result,state,next_state" --string --initial-state 1

csv-join expects the state transition table to contain unique matches only (as per csv-join --unique).

The finite-state machine is only activated when the file/stream fields contain both 'state' and 'next_state'.


Related articles

 Finite-state machine

This blog is mostly driven by the ACFR software team. We plan to post on the new features that we continuously roll out in comma, snark, and other ACFR open source repositories (https://github.com/acfr), and occasionally on more general software topics.

 

 

Space contributors

{"mode":"list","scope":"descendants","limit":"5","showLastTime":"true","order":"update","contextEntityId":40044058}

Blog Posts

 

 

 

 

  • No labels