Table of contents
  1. Overview
  2. Constructor
  3. Create Single Plane Blue Noise Mask
  4. Create Multi Plane Blue Noise Mask
  5. Utility Function 1
    Save and Load Instance Data
  6. Utility Function 2
    Perform Ordered Dithering
  7. Utility Function 3
    Perform Floyd-Steinberg Dithering
  8. Matrix Creation Optimization
    Perform Faster Version

Overview

Matrix Generation Class for Organized Dither using blue noise method based on Ulichney's paper "The void-and-cluster method for dither array generation".

The algorithm flow is as follows.
  1. First, generate a white noise pattern.
  2. Convert the white noise pattern to a blue noise pattern using the void-and-cluster method.
    The following processes are repeated. It is a heavy process that takes about 10 minutes when it reaches about 256 × 256.
    A Gaussian filter is used for the low-pass filter.
    1. Obtain the cyclic convolution of the low-frequency component using an appropriate low-pass filter.
      It must be "cyclic" to avoid issues with the boundary. In general, the cyclic convolution is processed using fft→ifft, but our library uses a regular integral process. There is no problem because a fast algorithm is used for the second and subsequent cyclic convolutions.
    2. Get and remove the coordinates of "1" = cluster at the maximum value of the low-frequency component.
    3. Obtain the cyclic convolution of the low-frequency component using an appropriate low-pass filter based on the inverted image.
    4. Get the coordinates of "0" = void at the maximum value of the low-frequency component and change it to "1".
    5. Repeat until there are no cluster/void pairs that can be changed.
  3. Generate a matrix for organized dither from the blue noise pattern. This is a heavy process that takes more time than the void-and-cluster method.
Since it takes a long time for a large matrix, utility functions are provided to save/load the results.
In addition, for convenience, debugging functions are provided to actually execute the dithering method.

Constructor

No arguments.
CBNMask* pbn = new CBNMask();       // Constructor
.....
delete pbn;                         // Destructor

Create Single Plane Blue Noise Mask

Create a blue noise matrix of width × height using the makebluenoise method
// Create a blue noise matrix
// Input
//  int     width;      Mask width (in pixels)
//  int     height;     Mask height (in pixels)
//  int     level;      Number of tones in the original image plane (default value 256)
//  int     ws;         Gaussian filter window size (default value 7 (7x7 window))
//  double  sigma;      Standard deviation of the Gaussian filter (default value 1.5)
// Return
//  0....Normal completion
//  Negative...Error (defined in errcode.h, only MEMORY_SHORTAGE(-1000) error)
int makebluenoisematrix(int width, int height, int level = 256, int ws = 7, double sigma = 1.5);
The matrix is returned as a buffer of size sizeof(unsigned int) * mwidth * mheight
The number of columns, number of rows (equivalent to the arguments of makebluenoisematrix), and the starting address of the buffer can be obtained using the accessor.
Refer to
Execution of Ordered Dither for usage of the matrix.
// Accessor
int mgetwidth() {return mwidth;}
int mgetheight() {return mheight;}
unsigned char* mgetpmask(int no = 0)
{
    if(mcomponentnum == 1)
        return mpmask;
    else
        return mppchild[no]->mgetpmask(0);
}   // Return the mwidth × mheight mask
    // Not required for ordered dither, accessor for debugging and testing
unsigned int* mgetpmatrix(int no = 0)
{
    if(mcomponentnum == 1)
        return mpmatrix;
    else
        return mppchild[no]->mgetpmatrix(0);
}       // Return the matrix for ordered dither

Create Multiplane Blue Noise Mask

Create a multiplane blue noise mask based on a singleplane blue noise mask. Therefore, call after makebluenoisematrix.
Use the same parameters as the singleplane. Create a multiplane blue noise mask. It is necessary for the blue noise matrix to have a sufficient size (64x64 or larger is desirable).
The processing time is equivalent to makebluenoisematrix() x the number of planes. The number of planes can be up to 8.
If it is 9 or more, it is impossible to make sure that the blue noise masks for each plane do not overlap because there is no room. If you absolutely need to use 9 or more planes, you can only create another instance (with a different seed for white noise random number) with some overlap tolerated.
// Input
//  int      componentnum;     Number of planes (3 for RGB, 4 for CMYK. Up to 9 possible.
//                             For a larger number of planes, it is recommended to execute with 128x128 or 256x256 for improved quality)
// Return value
//  0....Normal termination
//	Negative...Error
int				mseparatebluenoisematrix(int componentnum);

Save and Load Instance Data

After makebluenoisematrix, you can save the instance data. Call mloadbnm after creating the instance to restore the state just after makebluenoisematrix.
For large blue noise matrices, it is better to save them because it takes a long time to create the matrix (about 1 hour for 256×256 with Pentium-M/1.6GHz).
Executing mclearwork() releases all non-matrix work areas, allowing you to execute dithering with minimal memory usage.

// Intermediate state save/load
// Inputs
//  char*       filename;   File name
int             msavebnm(char* filename);
int             mloadbnm(char* filename);
void			mclearwork();
Sample program
// Creating a 256×256 matrix
CBNMask* pbn = new CBNMask();       // Constructor
int i1 = pbn->makebluenoisematrix(256,256);
if(i1 < 0) {
    delete pbn;
    Error handling;
}
i1 = pbn->msavebnm("c:\tmp\256x256.bnm");
if(i1 < 0) {
    delete pbn;
    Error handling;
}
delete pbn;                         // Destructor
// Later use
CBNMask* pbn = new CBNMask();       // Constructor
int i1 = pbn->mloadbnm("c:\\tmp\\256x256.bnm");
if(i1 < 0) {
    delete pbn;
    Error handling;
}
.....
delete pbn;                         // Destructor

Execution of Ordered Dithering

Debug utility.
// 1...Ordered Dither Utility
// Performs ordered dithering using the generated matrix
//     (Typically, the matrix is obtained by the application and used for dithering)
// Input
// The image is a 256-level image for one plane
//  int             no;             Specifies the plane number of the mask to be used (0 to plane number - 1)
//                                  Specifying the same plane number results in dot-on-dot blue noise matrix processing
//                                  For RGB, it does not have to be R=0, G=1, B=2. Any combination is acceptable.
//  int             width;          Width of the image (in pixels)
//  int             height;         Height of the image (in pixels)
//  int             scanlinesize1;  Number of bytes in one line of the image
//  unsigned char*  pdata1;         256-level image
//  int             scanlinesize2;  Number of bytes in one line of the output 2/4-level image
// Output
//  unsigned char*  pdata2;         2/4-level image
//                                  Returns (0/255) for 2-level, and (0/85/170/255) for 4-level.
//                                  The library is not limited to one plane with 256 levels, but this is a debug utility, so it takes shortcuts
void            mordereddither2(int no, int width, int height, int scanlinesize1, unsigned char* pdata1,
                        int scanlinesize2, unsigned char* pdata2);
void            mordereddither4(int no, int width, int height, int scanlinesize1, unsigned char* pdata1,
                        int scanlinesize2, unsigned char* pdata2);
// 2...Ordered Dither Utility
//     Version that processes one line at a time
// Input
//   int            y;               Line number
//   Other arguments are the same as mordereddither2/mordereddither4
void			morderedditherline2(int no, int width, int y, unsigned char* pdata1, unsigned char* pdata2);
void			morderedditherline4(int no, int width, int y, unsigned char* pdata1, unsigned char* pdata2);
Source code for 2-level dithering
void CBNMask::mordereddither2(int no, int width, int height, int scanlinesize1, unsigned char* pdata1,
						int scanlinesize2, unsigned char* pdata2)
{
    if(mcomponentnum == 1) {
        int            total = mwidth * mheight;
        for(int i = 0 ; i < height ; i++) {
            unsigned char*  ps1 = pdata1 + scanlinesize1 * i;
            unsigned char*  ps2 = pdata2 + scanlinesize2 * i;
            int             y = i % mheight;
            unsigned int*   ps3 = mpmatrix + mwidth * y;
            for(int j = 0 ; j < width ; j++, ps1++, ps2++) {
                int            x = j % mwidth;
                // Greater than or equal to the matrix value (threshold or above)
                if(ps3[x] <= *ps1) {
                    *ps2 = 255;
                }
                else {
                    *ps2 = 0;
                }
            }
        }
    }
    else {
        mppchild[no]->mordereddither2(0, width, height, scanlinesize1, pdata1, scanlinesize2, pdata2);
    }
}

Execution of Error Diffusion Method

Error diffusion program for comparing image quality and speed.
// Error diffusion program for quality and speed comparison
// Floyd & Steinberg method
// Input
// The image is a single plane image (for 256-shade images)
// For RGB, the process is performed three times for each plane
//  int             width;          width of the image (in pixels)
//  int             height;         height of the image (in pixels)
//  int             scanlinesize1;  byte size of one line of the image
//  unsigned char*  pdata1;         256-shade image
//  float*          pdata2;         work area for calculations, requires sizeof(float) * width * height bytes
// Output
//  unsigned char*  pdata3;         2/4-shade image
void            mfloydsteinberg2(int width,int height,int scanlinesize1,unsigned char* pdata1,float* pdata2,unsigned char* pdata3);
void            mfloydsteinberg4(int width,int height,int scanlinesize1,unsigned char* pdata1,float* pdata2,unsigned char* pdata3);

Fast Execution

To execute the faster version of matrix creation, you only need to change the following points:
CBNMask* pbn = new CBNMask();       // Constructor
↓
CBNMaskFast* pbn = new CBNMaskFast();       // Constructor
Compared to the conventional version, the faster version creates matrices at speeds ranging from 10 to 1000 times faster. The improvement ratio in speed becomes more pronounced as the matrix size increases.
The maximum matrix sizes that can be created are as follows:
  • For 8 planes, approximately 8192×8192 on a machine with 12GB memory. 2^26
  • For 4 planes, approximately 8192×8192 on a machine with 8GB memory. 2^26
  • For grayscale, approximately 8192×8192 on a machine with 6GB memory. 2^26
  • For grayscale, approximately 16384×16384 on a machine with 16GB memory. 2^28
  • For 3 planes, approximately 16384×16384 on a machine with 24GB memory.
The matrix size does not necessarily need to be a power of 2, so adjust the matrix size according to the available memory as needed. Also, be aware that even with the faster version, creating a 1-plane matrix of size 8192x8192 may take several hours.
The multithreading of matrix creation for multiple planes was abandoned in this case because there is no room to obtain different workspace for each plane.

To save the matrix, 4 bytes of space per element is required. Although it is not random, the sequence has low regularity, so the effect of file compression is small.
The file size of a 8192x8192 matrix is 256MB. The file size of a 16384x16384 matrix is 1GB.

Due to the immense amount of memory required for faster processing, if you need to perform plane separation, freeing the work areas other than the results using the mclearwork API before the makebluenoisematrix API and after the mseparatebluenoisematrix API will create some extra room. Internally, the work areas are repeatedly released and obtained to minimize memory usage as much as possible.