Skip to main content
Became Hot Network Question
deleted 44 characters in body
Source Link
Bubbler
  • 79.3k
  • 5
  • 162
  • 484
  • Replace step 2 with stride 4x4 or 2x4: stride is larger than window in at least one dimension
  • Replace step 3 with mid convolution: image sizes will be 12x12, then 15x15, then divisionsize becomes too small at step 5 gives fractions7
  • Replace step 4 with mid convolution: mid convolution with even kernel dimension is an error
  • Replace step 6 with kernel size 9x5 or larger: kernel does not fit in the image (IR-KR+1 is zero or negative, which is an error)
  • Replace step 2 with stride 4x4 or 2x4: stride is larger than window in at least one dimension
  • Replace step 3 with mid convolution: image sizes will be 12x12, then 15x15, then division at step 5 gives fractions
  • Replace step 4 with mid convolution: mid convolution with even kernel dimension is an error
  • Replace step 6 with kernel size 9x5 or larger: kernel does not fit in the image (IR-KR+1 is zero or negative, which is an error)
  • Replace step 2 with stride 4x4 or 2x4: stride is larger than window in at least one dimension
  • Replace step 3 with mid convolution: image size becomes too small at step 7
  • Replace step 4 with mid convolution: mid convolution with even kernel dimension is an error
  • Replace step 6 with kernel size 9x5 or larger: kernel does not fit in the image (IR-KR+1 is zero or negative, which is an error)
added 37 characters in body
Source Link
Bubbler
  • 79.3k
  • 5
  • 162
  • 484

All sizesnumbers (kernel size, window size, stride) are guaranteed to be positive integers.

All sizes are guaranteed to be positive integers.

All numbers (kernel size, window size, stride) are guaranteed to be positive integers.

Tweeted twitter.com/StackCodeGolf/status/1275987159447724032
Source Link
Bubbler
  • 79.3k
  • 5
  • 162
  • 484

Is this stack of CNN layers valid?

Background

This challenge is about Convolutional neural networks, and its two main building blocks, namely Convolutional layer and Pooling layer.

For simplicity, we ignore the "depth" of the images and intermediate tensors, and just look at the width and height.

Convolutional layer

A convolutional layer works like a kernel in image processing. It is defined by kernel width and height, and kernel mode (min, mid, or max). A min kernel extracts values at positions where the entire kernel overlaps with the original image. For a mid kernel, the center of the kernel is placed over each pixel of the image; for a max kernel, all positions where any pixel overlaps with the kernel is considered.

One pixel per positioning of the kernel is produced, resulting in a 2D array which can be smaller than (min), equal to (mid), or larger than (max) the input image.

Kernel (C is the center) ### #C# ### Image ***** ***** ***** ***** ***** Min kernel convolution (results in 3x3) ###** **### #C#** **#C# ###** ... **### ***** ***** ***** ***** ... ... ***** ***** ***** ***** ###** ... **### #C#** **#C# ###** **### Mid kernel convolution (results in 5x5) ### ### #C#*** ***#C# ###*** ***### ***** ... ***** ***** ***** ***** ***** ... ... ***** ***** ***** ***** ***** ... ***** ###*** ***### #C#*** ***#C# ### ### Max kernel convolution (results in 7x7) ### ### #C# #C# ###**** ****### ***** ***** ***** ... ***** ***** ***** ***** ***** ... ... ***** ***** ***** ***** ***** ... ***** ***** ***** ###**** ****### #C# #C# ### ### 

If the input image has IR rows and IC columns, and the kernel has KR rows and KC columns, the output dimensions are defined as follows:

  • Min kernel: IR - KR + 1 rows, IC - KC + 1 columns; invalid if the resulting rows or columns are zero or negative
  • Mid kernel: IR rows, IC columns; error if either KR or KC is even
  • Max kernel: IR + KR - 1 rows, IC + KC - 1 columns

Pooling layer

A pooling layer is defined by window width and height, and the horizontal and vertical stride size (how many units to move at once in either direction). See the following illustration:

3x3 window, 2x2 stride pooling on a 7x7 image ###**** **###** ****### ###**** **###** ****### ###**** **###** ****### ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ###**** **###** ****### ###**** **###** ****### ###**** **###** ****### ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ******* ###**** **###** ****### ###**** **###** ****### ###**** **###** ****### 

If the input image has IR rows and IC columns, and the pooling layer has the window of WR/WC rows/columns and SH/SV horizontal/vertical stride, the output dimensions are defined as follows:

  • Rows: (IR - WR)/SV + 1, error if (IR - WR) % SV != 0 or WR < SV
  • Cols: (IC - WC)/SH + 1, error if (IC - WC) % SH != 0 or WC < SV

Stacking multiple layers

The convolutional and pooling layers can be stacked in any arbitrary way, so that the output of the previous layer becomes the input of the next layer. The dimensions of the input image to the entire stack is provided, and the dimensions of each intermediate image should be calculated sequentially. A stack of layers is valid if no error occurs at any layer. The final output size does not matter, as long as it can be calculated without error.

The following stack is valid:

Input image 25x25 1. Min Convolution 3x3 => Intermediate image 23x23 2. Pooling 3x3 with stride 2x2 => Intermediate image 11x11 3. Max Convolution 3x3 => Intermediate image 13x13 4. Max Convolution 4x4 => Intermediate image 16x16 5. Pooling 2x2 with stride 2x2 => Intermediate image 8x8 6. Min Convolution 5x5 => Intermediate image 4x4 7. Pooling 4x4 with stride 3x3 => Output image 1x1 

Taking any contiguous subsequence of the stack, starting with the respective (intermediate) image as the input, is also valid. (e.g. steps 2, 3, 4, 5 with input image 23x23)

Any of the following modifications to the 7-layer stack above will result in an invalid stack:

  • Replace step 2 with stride 4x4 or 2x4: stride is larger than window in at least one dimension
  • Replace step 3 with mid convolution: image sizes will be 12x12, then 15x15, then division at step 5 gives fractions
  • Replace step 4 with mid convolution: mid convolution with even kernel dimension is an error
  • Replace step 6 with kernel size 9x5 or larger: kernel does not fit in the image (IR-KR+1 is zero or negative, which is an error)

Challenge

Given the input dimensions and the description of a stack of convolutional/pooling layers, determine if it is a valid configuration, i.e. not an error.

The description of the stack can be taken in reasonable ways to represent

  • a list (sequence) of two kinds of layers
  • for a convolutional layer, the kernel size (width/height; two numbers) and mode (min/mid/max)
  • for a pooling layer, the window size (width/height) and stride (horizontal/vertical; four numbers in total)

All sizes are guaranteed to be positive integers.

You may output truthy/falsy by following your language's convention or selecting two distinct values for true/false respectively.

Standard rules apply. The shortest code in bytes wins.