Blog from February, 2017

If you have csv records with multiple keys and would like to assign unique ids to those records, you could use csv-enumerate. (In particular, it would help to overcome the current limitation of csv-calc, which cannot handle multiple id fields.)

csv-enumerate appends id to the input string.

For example:

> ( echo 20170101T000000,hello ; echo 20170101T000000,world ; echo 20170101T000001,hello ; echo 20170101T000000,world ) | csv-enumerate --fields ,greeting
> ( echo 20170101T000000,hello ; echo 20170101T000000,world ; echo 20170101T000001,hello ; echo 20170101T000000,world ) | csv-enumerate --fields t,greeting

You also can output a list of all values, their ids, and number of entries with given ids, e.g:

> ( echo 20170101T000000,hello ; echo 20170101T000000,world ; echo 20170101T000001,hello ; echo 20170101T000000,world ; echo 20170101T000005,world ) | csv-enumerate --fields ,greeting --map

Binary mode is supported as usual.


Just another example how comma and snark utilities could be combined to cobble together something that works and easily could be further polished, all in matter of minutes.

Assume, you have information about the terrain and would like to find path from A to B.

The following shows how the first cut of it could be done in a few command lines. It uses graph-search utility with distance as objective function, but it is easy to parametrize graph-search to use a different objective function, e.g. based on gradient. It is also easy to add more simple post-processing for better obstacle avoidance, path smoothing, etc.

As usual, convert the pipelines to binary to improve their performance.

Sample dataset

> # download dataset
> curl | gunzip > ground.csv
> curl | gunzip > non-ground.csv
> # make sense of data
> view-points "ground.csv;fields=x,y,z,r,g,b" "non-ground.csv;fields=x,y,z,r,g,b"

Make search graph

> # assign graph vertex ids
> cat ground.csv | cut -d, -f1,2,3 | csv-paste - line-number > nodes.csv
> # make graph edges (simply densely mesh points in a given radius
> cat nodes.csv | points-join nodes.csv --radius 0.12 --all | csv-eval --fields=,,,a,,,,b --output-if "a != b" > edges.csv

> # view graph
> view-points "nodes.csv;fields=x,y,z;colour=white" "edges.csv;shape=line;colour=grey;fields=first,,second"


Our search graph is very simplistic, but we got it with no effort. One easily can add more on top: filter out no-go zones, add or remove edges etc.

Search for a path

> # search for path between node with id 5000 to node with id 100000 (remember how we numbered the graph nodes using csv-paste above)
> graph-search --from 5000 --to 100000 --nodes "nodes.csv;fields=x,y,z,id" --edges "edges.csv;fields=,,,source,,,,target" > path.csv
> # view the result
> from=5000 ; to=100000 ; view-points "nodes.csv;fields=x,y,z;colour=grey;weight=1" "edges.csv;shape=line;colour=grey;fields=first,,second" <( cat nodes.csv | egrep ",$from$|,$to$" )";colour=yellow;weight=10;fields=x,y,z,label" <( cat path.csv | csv-eval --fields ,,z "z=z+0.25" )";shape=lines;colour=yellow" <( cat path.csv | csv-eval --fields ,,z "z=z+0.25" )";weight=3;colour=yellow"

Our path is fairly jagged. There are lots of smoothing methods that are relatively easy to implement. As a quick fix you could simply higher --radius value for points-join, by the price of higher computation time. Try e.g. points-join ... --radius 1.5; it takes longer, but the path is way more smooth:

Adding cost to edges

The path we got is based on minimum distance. We could add cost to each edge in edges.csv . Then graph-search will use the cost instead of distance.

Assume, it is expensive for us to drive on the grass (because the gardener will charge us for damages).

> # quick and dirty: add to each vertex the amount of green in it (the formula for colour is tuned for demonstration only)
> cat ground.csv | csv-eval --fields ,,,r,g,b "t=(-1.3*r+2*g-1.3*b)*1.1+255" | cut -d, -f1,2,3,7 | csv-paste - line-number > nodes.with-cost.csv
> # get edges with amount of green as their cost
> time cat nodes.with-cost.csv | points-join nodes.with-cost.csv --radius 0.2 --all -v | csv-eval --fields=,,,,a,,,,,b --output-if "a != b" > edges.with-cost.csv
> # search path with cost
> graph-search --from 5000 --to 100000 --nodes "nodes.with-cost.csv;fields=,,,,id" --edges "edges.with-cost.csv;fields=,,,,source,,,,cost,target" > path.avoid-grass.csv
> # search path by distance
> graph-search --from 5000 --to 100000 --nodes "nodes.with-cost.csv;fields=x,y,z,,id" --edges "edges.with-cost.csv;fields=,,,,source,,,,,target" >
> # view results
> from=5000 ; to=100000 ; view-points "nodes.with-cost.csv;fields=x,y,z,scalar;color=290:360,jet" <( cat nodes.with-cost.csv | egrep ",$from$|,$to$" )";colour=yellow;weight=10;fields=x,y,z,,label" <( cat path.avoid-grass.csv | csv-eval --fields ,,z "z=z+0.25" )";shape=lines;colour=yellow" <( cat path.avoid-grass.csv | csv-eval --fields ,,z "z=z+0.25" )";weight=3;colour=yellow" <( cat | csv-eval --fields ,,z "z=z+0.25" )";shape=lines;colour=green;title=by-distance" <( cat | csv-eval --fields ,,z "z=z+0.25" )";weight=3;colour=magenta"

As you see, the path by distance (coloured magenta) is almost a straight line, while path for avoiding grass (coloured yellow) tries to avoid the green areas, albeit not completely. If in the formula above "t=(-1.3*r+2*g-1.3*b)*1.1+255" you use a greater multiplier instead of 1.1 (e.g. 1.5), it will make driving on grass so prohibitive that you will see the path going around the lawn and avoiding greens completely.

This example does not demonstrate anything novel, it all are well-known decades-old algorithms. Instead, it demonstrates how just in three command lines you could build a reasonable drivable path on a terrain represented by a relatively arbitrary point cloud.


When processing binary fixed-width data,  comma and snark utilities use byte order of the computer on which they run. E.g. on most of the desktops (with x86 architectures), byte order is little endian, but ARM computers will have big endian byte order.

If you have fixed-width data (e.g. from some external party or a device) that have endianness (byte order) different from your computer. There is a number of ways to deal with it at various levels (e.g. using htoi()-style functions, or python serialization, or comma::packed classes, etc).

If you just want to quickly modify your data to the desired endianness, now you could use csv-bin-reverse, e.g:

> # reverse all the fields in the data
> cat big-engian-data.bin | csv-bin-reverse t,3d,2ui
> # reverse some fields in the data
> cat big-engian-data.bin | csv-bin-reverse t,3d,2ui --fields 2-4

If you need to make sense of how it works, you could run something like:

> echo 0,1,2,3,4,5,6,7 | csv-to-bin 8ub | csv-bin-reverse uw,ui,uw | csv-from-bin 8ub
> echo 0,1,2,3,4,5,6,7 | csv-to-bin 8ub | csv-bin-reverse uw,ui,uw --fields 2 | csv-from-bin 8ub
> # etc...