Blog

cv-cat is now able to perform pixel clustering by color using the k-means algorithm.

for example:

> cv-cat --file rippa.png "convert-to=f,0.0039;kmeans=4;view;null" --stay

input image:

output image (4 clusters):

A new convenience utility ros-from-csv is now available in snark. It reads CSV records and converts them into ROS messages with the usual conveniences of csv streams (customised fields, binary format, stream buffering/flushing, etc).

Disclaimer: ros-from-csv is a python application and therefore may not perform well streams that require high bandwidth or low latency.

You could try it out, using the ROS tutorial Understanding Topics (http://wiki.ros.org/ROS/Tutorials/UnderstandingTopics):

Run ROS Tutorial nodes:

> # in a new shell
> roscore
> # in a new shell
> rosrun turtlesim turtle_teleop_key

 

Send your own messages on the topic, using ros-from-csv:

> echo 1,2,3,4,5,6 | ros-from-csv /turtle1/cmd_vel

Or do a dry run:

> echo 1,2,3,4,5,6 | ros-from-csv /turtle1/cmd_vel --dry
linear: 
  x: 1.0
  y: 2.0
  z: 3.0
angular: 
  x: 4.0
  y: 5.0
  z: 6.0

You also can explicitly specify message type:

> # dry run
> echo 1,2,3 | ros-from-csv --type geometry_msgs.msg.Point --dry
x: 1.0
y: 2.0
z: 3.0
 
> # send to a topic
> echo 1,2,3 | ros-from-csv --type geometry_msgs.msg.Point some-topic

A new convenience utility ros-to-csv is now available in snark. It allows to output as CSV the ROS messages from rosbags or from topics published online.

You could try it out, using the ROS tutorial Understanding Topics (http://wiki.ros.org/ROS/Tutorials/UnderstandingTopics):

Run ROS Tutorial nodes:

> # in a new shell
> roscore
> # in a new shell
> rosrun turtlesim turtlesim_node
> # in a new shell
> rosrun turtlesim turtle_teleop_key

Run ros-to-csv; then In the shell where you run turtle_teleop_key, press arrow keys to observe something like:

> # in a new shell
> ros-to-csv /turtle1/cmd_vel --verbose
ros-to-csv: listening to topic '/turtle1/cmd_vel'...
0,0,0,0,0,-2
0,0,0,0,0,2
-2,0,0,0,0,0
-2,0,0,0,0,0
0,0,0,0,0,2
0,0,0,0,0,-2
2,0,0,0,0,0
0,0,0,0,0,2

If you log some data in a rosbag:

> # in a new shell
> rosbag record /turtle1/cmd_vel

You could convert it to csv with a command like:

> ros-to-csv /turtle1/cmd_vel --bag 2017-11-06-14-43-34.bag
2,0,0,0,0,0
0,0,0,0,0,-2
-2,0,0,0,0,0
0,0,0,0,0,2
0,0,0,0,0,2
0,0,0,0,0,-2
2,0,0,0,0,0

Sometimes, you have a large file or input stream that is mostly sorted, which you would like to fully sort (e.g. in ascending order).

More formally, suppose, you know that for any record Rn in your stream and any records Rm such that m - n > N, Rn < Rm, where N is constant.

Now, you can sort such a stream, using csv-sort, --sliding-window=<N>:

 

> ( echo 3echo 1; echo 2; echo 5echo 4 ) | csv-sort --sliding-window 3 --fields a
0
1
2
3
> ( echo 4echo 5echo 2echo 1echo 3 ) | csv-sort --sliding-window 3 --fields a --reverse
3
2
1
0

As usual, you can sort by multiple key fields (e.g. csv-sort --sliding-window=10 --fields=a,b,c), sort block by block (e.g. csv-sort --sliding-window=10 --fields=t,block), etc.

Sometimes, you have a large file or input stream that is mostly sorted by some fields with just a few records out of order now and then. You may not care about those few outliers, all you want is most of your data sorted.

Now, you can discard the records out of order, using csv-sort, e.g:

> ( echo 0; echo 1; echo 2; echo 1; echo 3 ) | csv-sort --discard-out-of-order --fields a
0
1
2
3
> ( echo 3; echo 2; echo 1; echo 2; echo 0 ) | csv-sort --discard-out-of-order --fields a --reverse
3
2
1
0

As usual, you can sort by multiple key fields (e.g. csv-sort --discard-out-of-order --fields=a,b,c), sort block by block (e.g. csv-sort --discard-out-of-order --fields=t,block), etc.

The ratio and linear-combination operations of cv-cat have been extended to support assignment to multiple channels. Previously, these operations would take up to 4 input channels (symbolically always named r, g, b, and a, regardless of the actual contents of the data) and produce a single-channel, grey-scale output. Now you can assign up to four channels:

ratio syntax
... | cv-cat "ratio=(r-b)/(r+b),(r-g)/(r+g),r+b,r+g"

The right-hand side of the ratio / linear combination operations contains comma-separated expressions defining each of the output channels through the input channels. The number of output channels is the number of comma-separated fields, it may differ from the number of input channels. As a shortcut, an empty field, such as in

ratio syntax shortcut
... | cv-cat "ratio=,r+g+b,"

is interpreted as channel pass-through. In the example above the output has three channels, with channels 0 and 2 assigned verbatim to the input channels 0 and 2 (r and b, symbolically), and the channel 1 (symbolic g) assigned to the sum of all three channels.

As yet another shortcut, cv-cat provides a shuffle operation that re-arranges the input channels without changing their values:

shuffle syntax
... | cv-cat "shuffle=b,g,r,r"

In this case, the order of the first 3 channels is reversed, while the former channel r is also duplicated into channel 3 (alpha). Internally, shuffling is implemented as a restricted case of linear combination, and therefore, other usual rules apply: the number of output channels is up to 4, it does not depend on the number of input channels, and an empty field in the right-hand side is interpreted as channel pass-through.

When using view-points, there often is a need to quickly visualise or hide several point clouds or other graphic primitives.

Now, you can group data in view-points, using groups key word. A source can be assigned to one or more groups by using the groups arguments. Basic usage is:

view-points "...;groups=g1,g2"

For example if we have two graphs as follows:

$ cat <<EOF > edges01.csv
1,1,0,4,4,0
4,4,0,4,8,0
4,4,0,8,4,0
EOF

$ cat <<EOF > nodes01.csv
1,1,0,Node00
4,4,0,Node01
4,8,0,Node02
8,4,0,Node03
EOF

$ cat <<EOF > edges02.csv
4,9,1,4,12,1
4,12,1,0,9,1
4,9,1,0,4,1
0,4,1,0,9,1
EOF

$ cat <<EOF > nodes02.csv
0,4,1,Node20
0,9,1,Node21
4,9,1,Node22
4,12,1,Node23
EOF

We can separate the graphs as well as group together nodes and edges of different graphs as follows:

$ view-points "nodes01.csv;fields=x,y,z,label;colour=yellow;weight=5;groups=graph01,nodes,all" \
	"edges01.csv;fields=first/x,first/y,first/z,second/x,second/y,second/z;shape=line;colour=yellow;shape=line;groups=graph01,edges,all" \
	"nodes02.csv;fields=x,y,z,label;colour=green;weight=5;groups=graph02,nodes,all" \
	"edges02.csv;fields=first/x,first/y,first/z,second/x,second/y,second/z;shape=line;colour=green;shape=line;groups=graph02,edges,all"

Try to switch on/off checkboxes for various groups (e.g. "graph01", "nodes", etc) and observe the effect.

A quick note on new operations in cv-calc utility. Time does not permit to present proper examples, but hopefully, cv-calc --help would be sufficient to give you an idea.

cv-calc grep

Output only those input images that conform a certain condition. Currently, only min/max number or ratio of non-zero pixels is supported, but the condition can be any set of filters applied to the input image (see cv-cat --help --verbose for the list of the filters available).

Example: Output only images that have at least 60% of pixels darker than a given threshold:

> cat images.bin | cv-calc grep --filters="convert-to=f,0.0039;invert;threshold=0.555" --non-zero=ratio,0.6

cv-calc stride

Stride to the input image with a given kernel (just like a convolution stride), output resulting images.

cv-calc thin

Thin the image stream by a given rate or a desired frames-per-second number.

csv-shape is a new utility for various operations on reshaping csv data.

For now, only one operation is implemented: concatenate:

Concatenate by Grouping Input Records

> ( echo 1,a; echo 2,b; echo 3,c; echo 4,d; ) | csv-shape concatenate -n 2
1,a,2,b
3,c,4,d

Note: For ascii text inputs the records do not have to be regular or even have the same number of fields.

Concatenate by Sliding Window

ASCII:

> ( echo 1,a; echo 2,b; echo 3,c; echo 4,d; ) | csv-shape concatenate -n 2 --sliding-window
1,a,2,b
2,b,3,c
3,c,4,d

Binary:

> ( echo 1,a; echo 2,b; echo 3,c; echo 4,d; ) | csv-to-bin ui,c | csv-shape concatenate -n 2 --sliding-window --binary ui,c | csv-from-bin ui,c,ui,c
1,a,2,b
2,b,3,c
3,c,4,d

This is a brief introduction to cv-cat new filters:

 

Filter: Accumulated

This filter is used to calculate pixel-wise (and channel-wise) average from the sequential series of input images.

As it relies on the sequential accumulated input images, this filter is run in serial mode in cv-cat. This as implications when used with 'forked' image processing.

However parallel processing is utilised on image rows dimension.

Please download the following file which contains a total of 8 images: images.bin: 8 images showing movement. Viewing the images:

cat images.bin | cv-cat "view=250;null" 


Average:

Calculating averages using all accumulated input images, the output is also 8 images.

cat images.bin | cv-cat "accumulated=average;view=250;null"

The 6th output image is the average of all 6 accumulated images, the 7th is the average of the 7 accumulated input images.

 

Exponential Moving Average (EMA):

Calculating the average using a sliding window of images. Here a sliding window of 3 images is used.

cat images.bin | cv-cat "accumulated=average,3;view=250;null"

The output is 8 images, the 6th image is the accumulation of image 1 to 6. Please research the simple EMA formula.

 

Forked Arithmetic Filters: Multiply, Divide, Add and Subtract

This group of filters work similar to the mask filer: Masking images with cv-cat, they both use a sub-filters to generate a mask or operand image. 

A mask has values of 0 or '> 0' mask file to apply to the image, a corresponding pixel in the mask with a value of 0 is masked. The arithmetic filters work on operand images where is pixel value is important.

Multiply:

This filter will do pixel-wise multiplication the operand image and the input image. It wraps cv::multiply function.

Please download this simple mask file: scaled-3f.bin

#Viewing the mask
cat scaled-3f.bin | cv-calc header
cat scaled-3f.bin| cv-cat "view=1000;null"

Applying a single scaled image to the input images:

cat images.bin | cv-cat "multiply=load:scaled-3f.bin;view=250;null"

You should see images similar to below. scaled-3f.bin has values in the range of 0 to 1.0, the command above will darken the images.

From the example above: cv-cat's multiply is run in parallel, multiple input images are applied the scaled-3f.bin file in parallel.

This is because the all sub-filter(s) can run in parallel mode, in this case there is only one sub-filter 'load'.

The example below also shows multiply running in parallel mode as as load and threshold are parallel-able filters.

cat images.bin | cv-cat "multiply=load:scaled-3f.bin|threshold:0.7,1;view=250;null"


Subtract:

This filter simply subtract the operand image from each input image. This is a wrapper to cv::subtract.

The operand image is derived from the sub-filters. In this example we shall use the accumulated filter mentioned earlier. This is a simple method for detecting moving objects in the image.

cat images.bin| cv-cat "subtract=accumulated:average,3;brightness=5;view=250;null"

Each input image is subtracted the EMA average, where the EMA window is 3.

You should see similar images shown below:

In the example above: the multiply filter is run in serial mode. This is because one of the sub-filter or sub-filters ('accumulated' in this case) can only be run in serial mode.

If you have a webcam handy or it is built into the laptop, try this command:

cv-cat --camera "subtract=accumulated:average,10;view;null"



Add:

This is a wrapper to cv::add

This filter is the opposite of subtract. In this case if you add the EMA average (the "background") to the input images. Any moving object becomes transparent.

cat images.bin| cv-cat "add=accumulated:average,3;view=250;null"

This is the result:

Of course you can always try this pipeline with a physical camera:

cv-cat --camera "subtract=accumulated:average,10;view;null"

 

Divide:

This filter wraps cv::divide, divides the input images by the operand.

The file scaled-3f.bin has values in the range 0 to 1.0, hence dividing the image by scaled-3f.bin will brighten the image.

Arithmetic filters: the output image type is the same as the input image type.

cat images.bin| cv-cat "divide=load:scaled-3f.bin;view=250;null"

 

 

A brief notification on the latest additions to cv-cat (and all other camera applications linking in the same filters).

As of today, the application provides access to all the morphology operations available in OpenCV:

  1. erosion
  2. dilation
  3. opening
  4. closing
  5. morphological gradient
  6. top-hat
  7. black-hat

See OpenCV documentation for more details. In addition, a skeleton (a.k.a. thinning) filter is implemented on top of the basic morphological operations. The implementation follows this demo. However, this is neither the fastest nor the best implementation of thinning. Possibly the optimal approach is proposed in the paper "A fast parallel algorithm for thinning digital patterns" by T.Y. Zhang and C.Y. Suen. See this demo for comparative evaluation of several thinning algorithms (highly recommended!)

Some examples of usage are given below.

Erosion

Input image

Processing

erosion
cv-cat --file spots.png "erode=circle,9,;encode=png" --output=no-header > eroded.png

Result

Multiple Iterations

OpenCV allows multiple iterations of the same morphology operation, the default iterations number is 1. Below is the same erosion operation applied twice (please see cv-cat's help):

cv-cat --file spots.png "erode=circle,9,,2;encode=png" --output=no-header > eroded-twice.png

Result

Thinning

Input image

Processing

thinning
cv-cat --file opencv-1024x341.png "channels-to-cols;cols-to-channels=0,repeat:3;skeleton=circle,3,;encode=png" --output=no-header > skeleton.png

Result

csv-calc

csv-calc is an application to calculate statistics (such as mean, median, size, standard deviation...) on multiple fields of an input file. Input records can be grouped by id, block, or both.

One drawback of csv-calc is that it only outputs the statistics for each id and block. The input records themselves are not preserved. This means that you cannot use csv-calc as part of a pipeline.

csv-calc --append

The --append option to csv-calc passes through the input stream, adding to every record the relevant statistics for its id and block.

For example:

> echo -e "1,0\n2,0\n3,1\n4,1" | csv-calc mean --fields=a,id
Output (mean, id):
1.5,0
3.5,1
 
> echo -e "1,0\n2,0\n3,1\n4,1" | csv-calc mean --fields=a,id --append
Output (a, id, mean):
1,0,1.5
2,0,1.5
3,1,3.5
4,1,3.5

keeping track of fields and formats

Another challenge for csv-calc users is the large number of fields that it generates (it applies every operation to every indicated field).

There are now --output-fields and --output-format options to show what kind of output a given csv-calc command will produce.

Examples:

> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-fields
t/mean,a/mean,t/diameter,a/diameter,id,block
 
> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-format
t,d,d,d,ui,ui

With --append, these fields are appended to input fields:
id and block are not repeated 
> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-fields --append
t/mean,a/mean,t/diameter,a/diameter

> csv-calc mean,diameter --fields=t,a,id,block --binary=t,d,ui,ui --output-format --append
t,d,d,d

points-to-ros and points-from-ros are utilities for publishing and receiving PointCloud2 message on ROS.

setup

To build them you need to set "snark_build_ros" to ON in snark cmake.

we use snark-graphics-test-pattern to generate some sample points in a cube:

snark-graphics-test-pattern cube 100000 0.1 0.01 >cube.csv

Here is the output: cube.csv

To run ROS, you need to setup the environment and run roscore:

source /opt/ros/kinetic/setup.bash
roscore

 

points-from-ros

This utility subscribes to the specified topic and receives PointCloud2 messages, then it writes the point data as csv or binary to stdout.

Either --binary or --format option must be specified, which sets the output to be binary or ascii csv respectively.

The field names and message format are embedded in the message, the format is used for conversion.

You can use --output-fields or --output-format to get the field names and message format from message (the publisher must be running).

source /opt/ros/kinetic/setup.bash
points-from-ros --topic "/points1" --output-fields
points-from-ros --topic "/points1" --output-format
#ascii
points-from-ros --topic "/points1" --fields x,y,z,r,g,b --format 3d,3ub | view-points --fields x,y,z,r,g,b
#binary
points-from-ros --topic "/points1" --fields x,y,z,r,g,b --binary 3d,3ub | view-points --fields x,y,z,r,g,b --binary 3d,3ub

 

points-to-ros

This utility reads binary or ascii csv data from stdin and publishes it as PointCloud2 message on ROS.

Either --binary or --format option must be specified, which indicates whether input is binary or ascii.

The --fields options specifies the field names for one point in the message.

If a field named block is present it will be used for breaking records into separate messages, records with the same block number will be grouped into one message. When no such field is present it will read the stdin until EOF and then send one message.

The --hang-on option delays the points-to-ros exit, so that the clients can receive all the data on the last message.

#ascii
cat cube.csv | points-to-ros --topic "/points1" --fields x,y,z,r,g,b,a --format 3d,3ub,ub --hang-on
#binary
cat cube.csv | csv-to-bin 3d,3ub,ub | points-to-ros --topic "/points1" --fields x,y,z,r,g,b,a --binary 3d,3ub,ub --hang-on

 

ros-bag-to-bin

this utility can directly cat binary data from a ros bag file

ros-bag-to-bin -h
ros-bag-to-bin [-h] [--timestamp] [--block] file topic size
e.g. 
ros-bag-to-bin "pointscloud.bag" "/velodyne_points" $(csv-size 4f) --timestamp --block | csv-from-bin t,ui,4f | head

 

 

The problem

You are working with a data pipeline, and on a certain record, you want to end processing and exit the pipeline.

But to break on some condition in the input, you need an application that parses each input record.

Worse, the condition you want could be a combination of multiple fields, or use fields unrelated to the data you want to process.

Introducing csv-eval --exit-if!

Previously csv-eval had a --select option that passed through any records that matched the select condition.

csv-eval --exit-if also passes through input records unchanged, but it exits on the first record matching the exit condition.

Like csv-eval --select you can use any expression on the input that evaluates to bool.

Comparing the two features:

$ echo -e "1,1\n2,2\n3,0\n1,3\n1,2" | csv-eval --fields=a,b --select="a+b<>3"
Output:
1,1
2,2
1,3

$ echo -e "1,1\n2,2\n3,0\n1,3\n1,2" | csv-eval --fields=a,b --exit-if="a+b==3"
Output:
1,1
2,2

This post outlines how to run a bash function in parallel using xargs. (Note, you could optionally use "parallel" instead of xargs, but I see no advantages/disadvantages at this stage)

It may not be the best way, or the right way, and it may have unforeseen consequences, so I'd welcome any feedback on better practice.

Rationale

We often run scripts with for or while loops. In the simplest case if the operation within the loop is self contained, it's very easy to make it parallel.

E.g.

# written in confluence, might not actually run
for file in *.csv ; do
  cat $file | csv-slow-thin > ${file%.csv}.processed.csv
done

Becomes:

# written in confluence, might not actually run
echo 'file=$1;cat $file | csv-slow-thin > ${file%.csv}.processed.csv' > do-slow-thing-imp
chmod 777 do-slow-thing-imp
cat *.csv | xargs -n1 -P8 do-slow-thing-impl
rm do-slow-thing-imp

But it's clunky to write a script file like that.

Better to make a function as follows, but the specific method in the code block below doesn't work

# written in confluence, might not actually run
function do-slow-thing
{
	file=$1
	cat $file | csv-slow-thin > ${file%.csv}.processed.csv
}
cat *.csv | xargs -n1 -P8 do-slow-thing #but this doesn't work

The following is the current best solution I'm aware of:

Note: set -a could be used to automatically export all subsequently declared vars, but it has caused problems with my bigger scripts

Note: set -a might have platform specific functionality. On Dmitry's machine it exports vars and functions, whereas on James' machine it exports vars only

Note: the use of declare -f means you don't need to work out a priori which nested functions may be called (e.g. like errcho in this example)

#!/bin/bash
export readonly name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

export readonly global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world

errcho "run parallel with xargs"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

Note: if using comma_path_to_var, you can use --export to export all of the parsed command line options

No need to read beyond this point, unless you want to see the workings that lead up to this, including options that don't work.

 

The problem exposed and the solution

The following code is tested, try it, by copying into a script and running the script

#!/bin/bash

name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world

Single process works fine, output:

xargs_from_func: first run single threaded

xargs_from_func: example_function: global var is hello and passed var is world

Lets try multiple processes with xargs. Add the following line to the end of the script:

errcho "run parallel with xargs, attempt 1"
(echo oranges; echo apples) | xargs -n1 -P2 example_function

The problem is that example_function is not an executable:

xargs_from_func: run parallel with xargs, attempt 1

xargs: example_functionxargs: example_function: No such file or directory

: No such file or directory

Instead, let's run "bash" which is an executable:

errcho "run parallel with xargs, attempt 2"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "example_function {}"

The new bash process doesn't know the function:

xargs_from_func: run parallel with xargs, attempt 2

bash: example_function: command not found

bash: example_function: command not found

So let's declare it:

errcho "run parallel with xargs, attempt 3"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; example_function {}"

Getting close, but our example_function refers to another of our functions, which also needs to be declared:

xargs_from_func: run parallel with xargs, attempt 3

bash: line 3: errcho: command not found

bash: line 3: errcho: command not found

We can do that one by one, or declare all our functions in one go:

errcho "run parallel with xargs, attempt 4"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; $(declare -f errcho) ; example_function {}"

errcho "run parallel with xargs, attempt 5"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

The function itself is now working, but all the global variables are lost (including "global_var" and also the script name:

xargs_from_func: run parallel with xargs, attempt 4

: example_function: global var is  and passed var is oranges

: example_function: global var is  and passed var is apples

xargs_from_func: run parallel with xargs, attempt 5

: example_function: global var is  and passed var is oranges

: example_function: global var is  and passed var is apples

We can add these explcitly, one by one, e.g.:

errcho "run parallel with xargs, attempt 6"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; global_var=$global_var ; example_function {}"

...but it's extremely hard to work out which functions call which functions and which of all functions called use which global variables.

Leads to very hard to trace bugs in real world examples.

xargs_from_func: run parallel with xargs, attempt 6

: example_function: global var is hello and passed var is oranges

: example_function: global var is hello and passed var is apples

So the final solution I've arrived at is to pass everything through by using "set":

errcho "run parallel with xargs, attempt 7"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(set) ; example_function {}"

This spits out a lot of extra garbage because it includes an attempt to reallocate readonly variables:

xargs_from_func: run parallel with xargs, attempt 6

bash: line 1: BASHOPTS: readonly variable

bash: line 1: BASHOPTS: readonly variable

bash: line 8: BASH_VERSINFO: readonly variable

bash: line 8: BASH_VERSINFO: readonly variable

bash: line 38: EUID: readonly variable

bash: line 38: EUID: readonly variable

bash: line 68: PPID: readonly variable

bash: line 79: SHELLOPTS: readonly variable

bash: line 87: UID: readonly variable

bash: line 68: PPID: readonly variable

bash: line 79: SHELLOPTS: readonly variable

xargs_from_func: example_function: global var is hello and passed var is oranges

bash: line 87: UID: readonly variable

xargs_from_func: example_function: global var is hello and passed var is apples

...but notice that it did work.

For some reason the following doesn't hide the readonly errors:

errcho "run parallel with xargs, attempt 7"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(set) > /dev/null ; example_function {}"

...and I've tried various combos of putting the dev/null inside the $(), and redirection of stderr.

I think, therefore, the best approach is to explictly declare each global using export, and to either explicitly export each function, or use the "declare -f" statement at the xargs call

That looks like this:

#!/bin/bash
export readonly name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

export readonly global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world

errcho "run parallel with xargs"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

Note: the readonly is not strictly necessary for this example, but is good practice if it is a readonly variable.

The whole script together:

#!/bin/bash

name=$( basename $0 )
function errcho { (>&2 echo "$name: $1") }

global_var=hello
function example_function
{
        passed_var=$1
        errcho "example_function: global var is $global_var and passed var is $passed_var"
}

errcho "first run as a single process"
example_function world


errcho "run parallel with xargs, attempt 1"
(echo oranges; echo apples) | xargs -n1 -P2 example_function

errcho "run parallel with xargs, attempt 2"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "example_function {}"

errcho "run parallel with xargs, attempt 3"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; example_function {}"

errcho "run parallel with xargs, attempt 4"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f example_function) ; $(declare -f errcho) ; example_function {}"

errcho "run parallel with xargs, attempt 5"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; example_function {}"

errcho "run parallel with xargs, attempt 6"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(declare -f) ; global_var=$global_var ; example_function {}"

errcho "run parallel with xargs, attempt 7"
(echo oranges; echo apples) | xargs -n1 -P2 -i bash -c "$(set) ; example_function {}"


 

 

 

 

 

Some external references:

http://stackoverflow.com/questions/1305237/how-to-list-variables-declared-in-script-in-bash

http://stackoverflow.com/questions/11003418/calling-functions-with-xargs-within-a-bash-scrip