Blog

csv-split can also stream the data associated with multiple ids, to specific sockets, named pipes or files. See csv-split --help for more details about the semantics.

You can mix publishing to tcp sockets, local sockets, files, or pipes, if you need.

> # publish data corresponding to ids 2 and 3 to tcp:8888, 0 to tcp:9999 and the rest to stdout
>  csv-paste line-number "line-number;size=5;index" \
     | csv-split --fields ,id "2,3;tcp:7777" "0;tcp:8888" "...;-" > other_ids.csv

> # in separate terminal
> socat tcp:localhost:7777 - | head
6635893,3
6635897,2
6635898,3
6635902,2
6635903,3
6635907,2
6635908,3
6635912,2
6635913,3
6635917,2

> socat tcp:localhost:8888 - | head
28950290,0
28950295,0
28950300,0
28950305,0
28950310,0
28950315,0
28950320,0
28950325,0
28950330,0
28950335,0

> head other_ids.csv
1,1
4,4
6,1
9,4
11,1
14,4
16,1
19,4
21,1
24,4
 

csv-split works as before (splitting data into files) if no streams are specified. Furthermore if there are streams assigned to some ids and none to "..." , then the data relating to the remaining ids is discarded.

Besides headers, cv-calc roi and draw operations can now read the shapes from csv files. To incorporate this, usual csv options have also been added to the shape attributes.

If the shapes have (reverse) index fields it will draw / apply all the shapes in the block in each file on single image, otherwise it takes only one shape per file.

If the shapes file has field t (timestamp), then the shapes in the block are drawn only to the image with corresponding timestamp, otherwise each block of shapes is drawn on the next available image (see cv-calc --help for details).

To try following examples, download this image.

> cv-cat --file 20180101T000000.jpg \
    | cv-calc draw --rectangles=<( echo "800,500,1600,1700" )";fields=min/x,min/y,max/x,max/y;color/r=255;weight=3" \
                   --circles=<( echo "2800,1225,300" )";fields=centre/x,centre/y,radius;color/b=255;weight=3" \
                   --labels=<( echo "1,1200,1100,cat01"; echo "0,2800,1225,cat02" )";fields=index,position/x,position/y,text;color/g=255;weight=2" \
    | cv-cat "encode=jpg" --output no-header
    > drawn.jpg

> eog drawn.jpg

 

name-value-apply is a thin wrapper around name-value-convert --take-last functionality. It takes multiple configuration files in command line and takes only the value of last occurrence of each name in all the files.

This allow to consistently combine a bunch of configuration files; for example, you may have a default configuration file for your device, then a file with some settings customised, etc.

Examples:

> # create input files
> ( echo a=5; echo b=7 ) | name-value-convert --to=xml > default_config.xml
> ( echo a=6; echo c=8 ) | name-value-convert --to=json > customised.json

> # combine configs
> name-value-apply default_config.xml customised.json
a="6"
b="7"
c="8"
 
> # check where each path-value pair came from
> name-value-apply default_config.xml customised.json --source
a="customised.json"
b="default_config.xml"
c="customised.json"
 
> # output as json
> name-value-apply default_config.xml customised.json | name-value-convert --to json
{
    "a": "6",
    "b": "7",
    "c": "8"
}
 
> # as usual, you can do it on the fly, e.g. if you would like to override parameters with command-line options
> name-value-apply default_config.xml customised.json <( echo c=10 )
a="6"
b="7"
c="10"

math-array utility in snark is a trivial wrapper for a range of numpy array operations. the main purpose of math-array is to easily run array operations on streams of data compatible with the csv-style utilities in comma and snark.

math-array does not attempt to substitute numpy functionality. If you need something customised, just write your own python code as usual.

Currently, it exposes three operations:

  • split
  • transpose
  • (relatively) arbitrary numpy array operation

Examples

Split:

> ( echo some_other_stuff,0,1,2,3,4,5; echo more_other_stuff,6,7,8,9,10,11 ) | csv-to-bin s[32],6f | math-array split --shape 3,2 --header-size 32 | csv-from-bin s[32],2f
some_other_stuff,0,1
some_other_stuff,2,3
some_other_stuff,4,5
more_other_stuff,6,7
more_other_stuff,8,9
more_other_stuff,10,11

Transpose:

> # transpose
> ( echo 0,1,2,3,4,5; echo 6,7,8,9,10,11 ) | csv-to-bin 6f | math-array transpose --to-axes 1,0 --shape 3,2 | csv-from-bin 6f
0,2,4,1,3,5
6,8,10,7,9,11
 
> # the record has not only the array, but also other fields
> ( echo some_other_stuff,0,1,2,3,4,5; echo more_other_stuff,6,7,8,9,10,11 ) | csv-to-bin s[32],6f | math-array transpose --to-axes 1,0 --shape 3,2 --header-size 32 | csv-from-bin s[32],6f
some_other_stuff,0,2,4,1,3,5
more_other_stuff,6,8,10,7,9,11

(Relatively) arbitrary numpy array operation

> # swapaxes
> ( echo some_other_stuff,0,1,2,3,4,5; echo more_other_stuff,6,7,8,9,10,11 ) | csv-to-bin s[32],6f | math-array "np.swapaxes, axis1 = 0, axis2 = 1" --shape 3,2 --header-size 32 | csv-from-bin s[32],6f
some_other_stuff,0,2,4,1,3,5
more_other_stuff,6,8,10,7,9,11

See math-array --help for more details.

 

Among all, csv-paste can number lines of its output. Now, individualised parameters have been added, if there are several instances of line-number in command line parameters. Examples:

> # append single line number
> seq 0 11 | csv-paste - line-number
0,0
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
10,10
11,11
 
> # number blocks of records
> seq 0 12 | csv-paste - line-number --size 3
0,0
1,0
2,0
3,1
4,1
5,1
6,2
7,2
8,2
9,3
10,3
11,3
12,4
 
> # create multiple indices (e.g. if you need to express multidimensional array indices)
> seq 0 11 | csv-paste - "line-number;size=4" "line-number;size=4;index"
0,0,0
1,0,1
2,0,2
3,0,3
4,1,0
5,1,1
6,1,2
7,1,3
8,2,0
9,2,1
10,2,2
11,2,3
 
> # reverse indices (e.g. to use with csv-blocks down your pipeline)
> seq 0 11 | csv-paste - "line-number;size=4" "line-number;size=4;index;reverse"
0,0,3
1,0,2
2,0,1
3,0,0
4,1,3
5,1,2
6,1,1
7,1,0
8,2,3
9,2,2
10,2,1
11,2,0

As other comma utilities, all the operations csv-paste can operate on ascii or binary data. See csv-paste --help for more configuration possibilities.

csv-thin thins down high bandwidth data by a given rate.

A new option, --period, allows you to specify the period of output, regardless of the rate of the input data (assuming that it's at least as fast as the desired output rate).

Using csv-paste for a high-rate input source you can try it with:

csv-paste line-number | csv-time-stamp | csv-thin --period 0.1

By default it uses wall-clock time for clocking the data. Alternately, and useful with pre-captured data, you can use a time field in the data:

csv-paste line-number | csv-time-stamp | head -200000 > data.csv
cat data.csv | csv-thin --period 0.1 --fields t

Multiple rectangular regions can be specified in roi operation of cv-calc (like the draw operation), so that:

  • everything outside these regions in the input images is set to zero, or
  • these regions are cropped out of input images into separate images (the arguments prefixed to input will be removed).

All images in the input stream must have same number of regions. Any region with zero width or height (e.g. 0,0,0,0) will be ignored and, if needed, can be used so that all images have same number of regions.

If all the bounding boxes for an image have zero area, then the whole image will be set to zero

To try following examples, download this image.

> # mask in 2 rectangles
> cv-cat --file 20180101T000000.jpg \
    | csv-paste "value=800,500,1600,1700,2500,750,3100,1700;binary=8ui" "-;binary=t,3ui,s[21723870]" \
    | cv-calc roi --fields=rectangles,t,rows,cols,type --binary=8ui,t,3ui --rectangles="2,weight=5" \
    | csv-bin-cut --binary=8ui,t,3ui,s[21723870] --fields 9-13 \
    > masked.bin

> # crop out 2 rectangles ( csv-bin-cut not needed in this case )
> cv-cat --file 20180101T000000.jpg \
    | csv-paste "value=800,500,1600,1700,2500,750,3100,1700;binary=8ui" "-;binary=t,3ui,s[21723870]" \
    | cv-calc roi --crop --fields=rectangles,t,rows,cols,type --binary=8ui,t,3ui --rectangles="2,weight=5" \
    > cropped.bin

 

If you are putting together a training dataset for classification or object detection, you may need to create a uniformly distributed random selection of image crops from your image data.

The following pipeline helps you to do it. It picks random images, cuts 4 random patches of size 300x200 from each of them, and saves them as png files in the current directory.

(Note: index parameter in file=png,,index is required, because otherwise the filenames for the patches cut out of the same image would have different filenames.)

> cat your-image-data.bin | cv-calc thin --rate 0.01 | cv-calc random-crop --width 300 --height 200 --count 4 | cv-cat "file=png,index"

See cv-calc --help for more configuration options.

 

io-cat: now can wait

To recap: io-cat is a utility extending cat functionality towards merging live streams. io-cat semantics is the same as cat on files, but it can merge streams, too, e.g. merge three streams:

> cat some-file.csv | io-cat - tcp:localhost:12345 local:some/socket > merged.csv

It supports a couple of simple merge policies: first come first serve by default, or round robin: e.g. try:

> yes STDIN | io-cat -  <( yes ANOTHER-STREAM ) <( yes THIRD-STREAM ) --round-robin 1 | head
STDIN
ANOTHER-STREAM
THIRD-STREAM
STDIN
ANOTHER-STREAM
THIRD-STREAM
STDIN
ANOTHER-STREAM
THIRD-STREAM
STDIN

Now, io-cat also can wait for publishing servers to start, using io-cat --connect-attempts option, e.g:

> io-cat tcp:localhost:8888 --connect-attempts unlimited -v
io-cat: stream 0 (tcp:localhost:8888): connecting, attempt 1 of unlimited...
io-cat: stream 0 (tcp:localhost:8888): failed to connect
io-cat: stream 0 (tcp:localhost:8888): connecting, attempt 2 of unlimited...
io-cat: stream 0 (tcp:localhost:8888): failed to connect
...

See io-cat --help for more configuration options.

Last but not least, broadly, the right approach to persistent clients would be using a publish/subscribe middleware, of your liking. ZeroMQ is a light-weight choice (and comma zero-cat supports a core subset of it). However, if you just want to quickly cobble together simple merging of multiple streams, potentially from heterogeneous sources, io-cat is there for you.

 

control-speed utility sets the speed of each waypoint in the path based on its position in a curve.

turn operation calculates the angle at each waypoint with respect to its adjacent waypoints and assigns the speed according to given maximum lateral acceleration. By passing --stop-on-sharp-turn or --pivot, control-speed can implement spot turn by outputting an extra waypoint with relative heading and no speed, for each sharp turn in the trajectory.

$ ( echo '0.0,0.0'; echo '0.3,0.3'; echo '0.6,0.6'; echo '0.6,0.9'; echo '0.6,1.2'; echo '0.9,1.2'; echo '1.2,1.2'; echo '1.5,0.9'; echo '1.8,0.6' ) > trajectory.csv

# moderate speed
$ control-speed turn --max-acceleration=0.5 --approach-speed=0.2 --fields=x,y --speed=1 < trajectory.csv > speed-turn.csv

# stop on sharp turns
control-speed turn --max-acceleration=0.5 --approach-speed=0.2 --fields=x,y --speed=1 --pivot < trajectory.csv > speed-pivot.csv

# visualise with trajectory as blue and speed as z axis in yellow
$ view-points "trajectory.csv;fields=x,y;shape=lines;title=trajectory" <( echo 0,0,begin )";fields=x,y,label;weight=8;color=red;title=origin" "speed-pivot.csv;fields=x,y,z;shape=lines;color=yellow;title=turn"

 

control-speed decelerate operation moderates the sudden decrease in speed in the trajectory by a given deceleration.

$ control-speed decelerate --fields=x,y,speed --deceleration=0.5 < speed-pivot.csv > speed-decelerate.csv

# visualise with speed as z-axis and orange color as the decelerated speed
$ view-points "trajectory.csv;fields=x,y;shape=lines;title=trajectory" <( echo 0,0,begin )";fields=x,y,label;weight=8;color=red;title=origin" \
    "speed-pivot.csv;fields=x,y,z;shape=lines;color=yellow;title=turn" "speed-decelerate.csv;fields=x,y,z;shape=lines;color=orange;title=decelerate"

If you need to quickly deploy a bunch services for line-based or fixed-width data over TCP, local sockets, ZeroMQ, etc, now you can use io-topics, a utility in comma. You can deploy services that run continuously or start only in case if there is at least one client (e.g. if they are too resource greedy).

Perhaps, it is not a replacement for a more proper middleware like ROS or simply systemd, but the advantages of io-publish-topics are its light weight, ad-hoc nature, ability to run a mix of transport protocols.

Try the following toy example of io-topics publish:

> # run publisher with topics a and b, with b on demand
> io-topics publish --config <( echo "a/command=csv-paste line-number"; echo "a/port=8888"; echo "b/command=csv-paste line-number"; echo "b/port=9999"; echo "b/on_demand=1" )
io-topics: publish: will run 'comma_execute_and_wait --group' with commands:
io-topics: publish:    io-publish tcp:8888   -- csv-paste line-number
io-topics: publish:    io-publish tcp:9999  --on-demand -- csv-paste line-number
    
> # in a different shell, observe that topic a keeps running even if no-one is listening,
> # whereas topic b runs only if at least one client is connected:
> socat tcp:localhost:8888 | head -n5 # will output something like, since the service keeps running even if there are no clients connected:
16648534
16648535
16648536
16648537
16648538
        
> socat tcp:localhost:9999 - | head -n5 # whenever the first client connects, will start from 0, since it runs only if at least one client is connected
0
1
2
3
4

You also can create - on the fly, if you want - a light-weight subscriber, as in example below. Run publishing as in the example above and then run io-topics cat:

> io-topics cat --config <( echo "a/command=head -n5 > a.csv"; echo "a/address=tcp:localhost:8888"; echo "b/command=head -n5 > b.csv"; echo "b/address=tcp:localhost:9999" )
io-topics: cat: will run 'comma_execute_and_wait --group' with commands:
io-topics: cat:     bash -c io-cat tcp:localhost:8888   | head -n5 > a.csv
io-topics: cat:     bash -c io-cat tcp:localhost:9999   | head -n5 > b.csv
> # check output            
> cat a.csv 
203740462
203740463
203740464
203740465
203740466
> cat b.csv 
0
1
2
3
4

If you would like to suspend your log playback (e.g. for demo purposes, when, e.g. visualising point cloud stream - or any kind of CSV data - or while browsing your data), now you could use csv-play --interactive or csv-play -i, pressing <whitespace> to pause and resume. Try to run the example below:

> echo 0 | csv-repeat --period 0.1 --yes | csv-time-stamp | csv-play --interactive
csv-play: running in interactive mode; press <whitespace> to pause or resume
20180503T032156.234658,0
20180503T032156.334336,0
20180503T032156.434497,0
20180503T032156.534721,0
20180503T032156.635077,0
20180503T032156.735428,0
20180503T032156.835511,0
20180503T032156.935653,0
20180503T032157.035926,0
csv-play: paused
csv-play: resumed
20180503T032157.136239,0
20180503T032157.236530,0

Press left or down arrow keys to output one record at a time. (Keys for outputting one block at a time: todo.)

Sometimes, one may need to repeat the same record, just as linux yes does. The problem with yes is that you cannot tell it to repeat at a given time interval.

Now, csv-repeat --ignore-eof can do it for you, which is useful for example, if you need to quickly fudge a sort of heartbeat stream, a simulated data stream, or alike:

> echo hello | csv-repeat --period 0.1 --ignore-eof | head -n5
hello
hello
hello
hello
hello
> echo hello | csv-repeat --period 0.1 --ignore-eof | csv-time-stamp | head -n5
20180420T034741.498771,hello
20180420T034741.600020,hello
20180420T034741.700202,hello
20180420T034741.800367,hello
20180420T034741.900539,hello

Binary mode is supported as usual.

points-calc nearest-(min/max) and percentile operations search within a given radius around each input point. This can take a lot of time for large amount of input data.

One way to speed things up is to, instead of finding the nearest min to each point in a given radius, find the minimum for the 27 voxels in the neighbourhood of the voxel containing the point. That computed value is assigned to each point in that voxel.

This optimization is used when points-calc nearest-(min/max) or points-calc percentile is given --fast command line argument. For example

> points-calc nearest-min --full --fast --fields x,y,scalar --radius 1
> points-calc percentile --percentile=0.03 --fast --fields x,y,scalar --radius 1

On large point cloud, like that of rose street (http://perception.acfr.usyd.edu.au/data/samples/riegl/rose.st/rose.st.*.csv.gz ), optimized operations were found to be 20 times faster for extremums and more than 100 time faster for percentiles.

Assume you would like to quickly find additive changes in the scene. For example you have a static point cloud of empty car park, and would like to extract the parked cars from a stream of lidar data. If the extraction does not have to be perfect, a quick way of doing it would be using points-join --not-matching. A simple example:

> # make sample point clouds
> for i in {20..30}; do for j in {0..50}; do for k in {0..50}; do echo $i,$j,$k; done; done; done > minuend.csv
> for i in {0..50}; do for j in {20..30}; do for k in {20..30}; do echo $i,$j,$k; done; done; done > subtrahend.csv
> cat minuend.csv | points-join subtrahend.csv --radius 0.51 --not-matching | view-points "minuend.csv;colour=red;hide" "subtrahend.csv;colour=yellow;hide" "-;colour=white;title=difference"

The described car park scenario would look like:

> cat carpark-with-cars.csv | points-join --fields x,y,z "empty-carpark.csv;fields=x,y,z" --radius 0.1 --not-matching > cars-only.csv

The crude part is of course in choosing --radius value: it should be such that the spheres of a given radius around the subtrahend point cloud sufficiently overlap to capture all the points belonging to it. But then the points that are closer than the radius to the subtrahend point cloud will be filtered out, too. E.g. in the car park example above, the wheels of the cars will be chopped off at 10cm above the ground. To avoid this problem, you could for example erode somehow the subtrahend point cloud by the radius.

The described approach may be crude, but it is quick and suitable for many practical purposes.

Of course, for more sophisticated change detection in point clouds, which is more accurate and takes into account view points, occlusions, additions and deletions of objects in the scene, etc, you could use points-detect-change.