Skip to content

Commit 2983ec3

Browse files
Enhance Conv2d section with code and explanations (#695)
* Enhance Conv2d section with code and explanations Added explanation and code snippet for nn.Conv2d parameters, including in_channels, out_channels, kernel_size, stride, and padding. Included output size formula for Conv2d. * Correct markdown formatting in conv-nets documentation Fixed formatting for explanation and output size equation. --------- Co-authored-by: Alexey Grigorev <alexeygrigorev@users.noreply.github.com>
1 parent d88dd5e commit 2983ec3

File tree

1 file changed

+29
-0
lines changed

1 file changed

+29
-0
lines changed

08-deep-learning/04-conv-neural-nets.md

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,35 @@ This is the first step in the process of extracting valuable features from an im
2626

2727
Consider a black and white image of 5x5 size whose pixel values are either 0 or 1 and also a filter matrix with a dimension of 3x3. Next, slide the filter matrix over the image and compute the dot product to get the convolved feature matrix.
2828

29+
```python
30+
nn.Conv2d(
31+
in_channels, # number of channels in the input image
32+
out_channels, # number of filters to learn
33+
kernel_size, # size of each filter (int or tuple)
34+
stride=1, # step size for moving the filter
35+
padding=0, # zero-padding around input
36+
```
37+
38+
Explanation:
39+
40+
* in_channels: Input depth (e.g., 3 for RGB images).
41+
* out_channels: Number of filters the layer will learn. Each produces one output feature map.
42+
* kernel_size: Size of the convolutional filter. Can be a single number (square filter) or a tuple (height, width).
43+
* stride: How many pixels the filter moves each step. Default is 1.
44+
<img width="1500" height="614" alt="image" src="https://github.com/user-attachments/assets/3cfca38d-56bd-4a51-a3ce-70d8c071d4c8" />
45+
* padding: Number of pixels added around the input to control output size. Default is 0.
46+
<img width="600" height="400" alt="image" src="https://github.com/user-attachments/assets/5465dc2e-402d-41c9-a6fb-3ecfdc384796" /> <img width="234" height="216" alt="image" src="https://github.com/user-attachments/assets/c8a57bb4-c454-4169-b18c-41b79449bbe6" />
47+
48+
Output size after Conv2d image
49+
50+
$$\text{Output size} = \frac{W - K + 2P}{S} + 1$$
51+
52+
Where:
53+
* 𝑊 = input size (height or width)
54+
* 𝐾 = kernel size
55+
* 𝑃 = padding
56+
* 𝑆 = stride
57+
2958
**ReLU layer**
3059

3160
Once the feature maps are extracted, the next step is to move them to a ReLU layer. ReLU (Rectified Linear Unit) is an activation function which performs an element-wise operation and sets all the negative pixels to 0. It introduces non-linearity to the network, and the generated output is a rectified feature map. The relu function is: `f(x) = max(0,x)`.

0 commit comments

Comments
 (0)