Skip to end of metadata
Go to start of metadata

If you are putting together a training dataset for classification or object detection, you may need to create a uniformly distributed random selection of image crops from your image data.

The following pipeline helps you to do it. It picks random images, cuts 4 random patches of size 300x200 from each of them, and saves them as png files in the current directory.

(Note: index parameter in file=png,,index is required, because otherwise the filenames for the patches cut out of the same image would have different filenames.)

> cat your-image-data.bin | cv-calc thin --rate 0.01 | cv-calc random-crop --width 300 --height 200 --count 4 | cv-cat "file=png,index"

See cv-calc --help for more configuration options.

 

  • No labels

5 Comments

  1. very useful! Is png,,index a typo? For me it required "png,index"

    something our own bespoke version does is to label the image file as "timestamp_i230_j440.png" as a way to retain a trace back to the exact original image and snip location (e.g. for a top left subimage corner of 230,440). I guess this isn't possible with the pipeline approach above? Maybe it's not important - I don't think I ever used the traceback... 

    1. outputting pixel coordinates is a very good point

      unfortunately, squeezing it in semantically would make the usage of cv-calc random-crop | cv-cat "file=png,index" really clunky

      i have added the item to the generic backlog, but there is a simple workaround for now:

      if you need the box coordinates, then

      • generate random box coordinates for each image as csv
      • use them to crop (with cv-calc roi)

      it easily can be done in a single pipeline; if keeping box coordinates is ever required to make it convenient, i am happy to wrap it into a convenience script on a short notice

  2. warning: cat images | cv-calc random .... behaviour as expected

    whereas: for t in list do ; echo $t | log-index-get | cv-calc random ; done..... not random because it's restarting the cv-calc app every time

    better, if you had to iterate at all: for t in list do ; echo $t | log-index-get ; done | cv-calc random .... random as it loads cv-calc once... this is better practice anyway.

    best: cat list | log-index-get | cv-calc random

     

    1. oops, thanks a lot, James P. Underwood; it's fixed: by default, cv-calc will output different results each time (on the other hand, if one needs determinism, use fixed seed: cv-calc random-crop --seed=5)

  3. "better, if you had to iterate at all: for t in list do ; echo $t | log-index-get ; done | cv-calc random .... random as it loads cv-calc once... this is better practice anyway."

    this is a much better practice indeed, also because it is much faster