Blog from June, 2018

name-value-apply is a thin wrapper around name-value-convert --take-last functionality. It takes multiple configuration files in command line and takes only the value of last occurrence of each name in all the files.

This allow to consistently combine a bunch of configuration files; for example, you may have a default configuration file for your device, then a file with some settings customised, etc.

Examples:

> # create input files
> ( echo a=5; echo b=7 ) | name-value-convert --to=xml > default_config.xml
> ( echo a=6; echo c=8 ) | name-value-convert --to=json > customised.json

> # combine configs
> name-value-apply default_config.xml customised.json
a="6"
b="7"
c="8"
 
> # check where each path-value pair came from
> name-value-apply default_config.xml customised.json --source
a="customised.json"
b="default_config.xml"
c="customised.json"
 
> # output as json
> name-value-apply default_config.xml customised.json | name-value-convert --to json
{
    "a": "6",
    "b": "7",
    "c": "8"
}
 
> # as usual, you can do it on the fly, e.g. if you would like to override parameters with command-line options
> name-value-apply default_config.xml customised.json <( echo c=10 )
a="6"
b="7"
c="10"

math-array utility in snark is a trivial wrapper for a range of numpy array operations. the main purpose of math-array is to easily run array operations on streams of data compatible with the csv-style utilities in comma and snark.

math-array does not attempt to substitute numpy functionality. If you need something customised, just write your own python code as usual.

Currently, it exposes three operations:

  • split
  • transpose
  • (relatively) arbitrary numpy array operation

Examples

Split:

> ( echo some_other_stuff,0,1,2,3,4,5; echo more_other_stuff,6,7,8,9,10,11 ) | csv-to-bin s[32],6f | math-array split --shape 3,2 --header-size 32 | csv-from-bin s[32],2f
some_other_stuff,0,1
some_other_stuff,2,3
some_other_stuff,4,5
more_other_stuff,6,7
more_other_stuff,8,9
more_other_stuff,10,11

Transpose:

> # transpose
> ( echo 0,1,2,3,4,5; echo 6,7,8,9,10,11 ) | csv-to-bin 6f | math-array transpose --to-axes 1,0 --shape 3,2 | csv-from-bin 6f
0,2,4,1,3,5
6,8,10,7,9,11
 
> # the record has not only the array, but also other fields
> ( echo some_other_stuff,0,1,2,3,4,5; echo more_other_stuff,6,7,8,9,10,11 ) | csv-to-bin s[32],6f | math-array transpose --to-axes 1,0 --shape 3,2 --header-size 32 | csv-from-bin s[32],6f
some_other_stuff,0,2,4,1,3,5
more_other_stuff,6,8,10,7,9,11

(Relatively) arbitrary numpy array operation

> # swapaxes
> ( echo some_other_stuff,0,1,2,3,4,5; echo more_other_stuff,6,7,8,9,10,11 ) | csv-to-bin s[32],6f | math-array "np.swapaxes, axis1 = 0, axis2 = 1" --shape 3,2 --header-size 32 | csv-from-bin s[32],6f
some_other_stuff,0,2,4,1,3,5
more_other_stuff,6,8,10,7,9,11

See math-array --help for more details.

 

Among all, csv-paste can number lines of its output. Now, individualised parameters have been added, if there are several instances of line-number in command line parameters. Examples:

> # append single line number
> seq 0 11 | csv-paste - line-number
0,0
1,1
2,2
3,3
4,4
5,5
6,6
7,7
8,8
9,9
10,10
11,11
 
> # number blocks of records
> seq 0 12 | csv-paste - line-number --size 3
0,0
1,0
2,0
3,1
4,1
5,1
6,2
7,2
8,2
9,3
10,3
11,3
12,4
 
> # create multiple indices (e.g. if you need to express multidimensional array indices)
> seq 0 11 | csv-paste - "line-number;size=4" "line-number;size=4;index"
0,0,0
1,0,1
2,0,2
3,0,3
4,1,0
5,1,1
6,1,2
7,1,3
8,2,0
9,2,1
10,2,2
11,2,3
 
> # reverse indices (e.g. to use with csv-blocks down your pipeline)
> seq 0 11 | csv-paste - "line-number;size=4" "line-number;size=4;index;reverse"
0,0,3
1,0,2
2,0,1
3,0,0
4,1,3
5,1,2
6,1,1
7,1,0
8,2,3
9,2,2
10,2,1
11,2,0

As other comma utilities, all the operations csv-paste can operate on ascii or binary data. See csv-paste --help for more configuration possibilities.

csv-thin thins down high bandwidth data by a given rate.

A new option, --period, allows you to specify the period of output, regardless of the rate of the input data (assuming that it's at least as fast as the desired output rate).

Using csv-paste for a high-rate input source you can try it with:

csv-paste line-number | csv-time-stamp | csv-thin --period 0.1

By default it uses wall-clock time for clocking the data. Alternately, and useful with pre-captured data, you can use a time field in the data:

csv-paste line-number | csv-time-stamp | head -200000 > data.csv
cat data.csv | csv-thin --period 0.1 --fields t

Multiple rectangular regions can be specified in roi operation of cv-calc (like the draw operation), so that:

  • everything outside these regions in the input images is set to zero, or
  • these regions are cropped out of input images into separate images (the arguments prefixed to input will be removed).

All images in the input stream must have same number of regions. Any region with zero width or height (e.g. 0,0,0,0) will be ignored and, if needed, can be used so that all images have same number of regions.

If all the bounding boxes for an image have zero area, then the whole image will be set to zero

To try following examples, download this image.

> # mask in 2 rectangles
> cv-cat --file 20180101T000000.jpg \
    | csv-paste "value=800,500,1600,1700,2500,750,3100,1700;binary=8ui" "-;binary=t,3ui,s[21723870]" \
    | cv-calc roi --fields=rectangles,t,rows,cols,type --binary=8ui,t,3ui --rectangles="2,weight=5" \
    | csv-bin-cut --binary=8ui,t,3ui,s[21723870] --fields 9-13 \
    > masked.bin

> # crop out 2 rectangles ( csv-bin-cut not needed in this case )
> cv-cat --file 20180101T000000.jpg \
    | csv-paste "value=800,500,1600,1700,2500,750,3100,1700;binary=8ui" "-;binary=t,3ui,s[21723870]" \
    | cv-calc roi --crop --fields=rectangles,t,rows,cols,type --binary=8ui,t,3ui --rectangles="2,weight=5" \
    > cropped.bin

 

If you are putting together a training dataset for classification or object detection, you may need to create a uniformly distributed random selection of image crops from your image data.

The following pipeline helps you to do it. It picks random images, cuts 4 random patches of size 300x200 from each of them, and saves them as png files in the current directory.

(Note: index parameter in file=png,,index is required, because otherwise the filenames for the patches cut out of the same image would have different filenames.)

> cat your-image-data.bin | cv-calc thin --rate 0.01 | cv-calc random-crop --width 300 --height 200 --count 4 | cv-cat "file=png,index"

See cv-calc --help for more configuration options.

 

io-cat: now can wait

To recap: io-cat is a utility extending cat functionality towards merging live streams. io-cat semantics is the same as cat on files, but it can merge streams, too, e.g. merge three streams:

> cat some-file.csv | io-cat - tcp:localhost:12345 local:some/socket > merged.csv

It supports a couple of simple merge policies: first come first serve by default, or round robin: e.g. try:

> yes STDIN | io-cat -  <( yes ANOTHER-STREAM ) <( yes THIRD-STREAM ) --round-robin 1 | head
STDIN
ANOTHER-STREAM
THIRD-STREAM
STDIN
ANOTHER-STREAM
THIRD-STREAM
STDIN
ANOTHER-STREAM
THIRD-STREAM
STDIN

Now, io-cat also can wait for publishing servers to start, using io-cat --connect-attempts option, e.g:

> io-cat tcp:localhost:8888 --connect-attempts unlimited -v
io-cat: stream 0 (tcp:localhost:8888): connecting, attempt 1 of unlimited...
io-cat: stream 0 (tcp:localhost:8888): failed to connect
io-cat: stream 0 (tcp:localhost:8888): connecting, attempt 2 of unlimited...
io-cat: stream 0 (tcp:localhost:8888): failed to connect
...

See io-cat --help for more configuration options.

Last but not least, broadly, the right approach to persistent clients would be using a publish/subscribe middleware, of your liking. ZeroMQ is a light-weight choice (and comma zero-cat supports a core subset of it). However, if you just want to quickly cobble together simple merging of multiple streams, potentially from heterogeneous sources, io-cat is there for you.