Skip to main content
16 events
when toggle format what by license comment
Apr 19, 2024 at 13:15 comment added Amit Levy "To approximate the memory for this, calculate the memory required to store the weights and biases and multiply that by 3 (i.e. "by 3" because we're saying the amount of memory needed to store the weights and biases is (roughly) equal to that needed for the gradients and for the momentum variables)" - wouldn't that be times 4? Because for each parameter it is the weight + grad + first moment + second moment (e.g Adam)
S Apr 15, 2023 at 7:48 history edited Lynn CC BY-SA 4.0
To get the memory metric in GBs, it should be divided by 1024^3, not 1024^2.
S Apr 15, 2023 at 7:48 history suggested CommunityBot CC BY-SA 4.0
To get the memory metric in GBs, it should be divided by 1024^3, not 1024^2.
Apr 12, 2023 at 23:46 review Suggested edits
S Apr 15, 2023 at 7:48
Aug 12, 2021 at 17:18 comment added Gabriel L. "You can divide by 1024^2 to get the answer in GB." You mean in MB.
Jul 22, 2020 at 10:33 vote accept barbolo
Jul 22, 2020 at 10:32 vote accept barbolo
Jul 22, 2020 at 10:33
Jun 15, 2020 at 6:22 review Suggested edits
Jun 15, 2020 at 8:57
Feb 19, 2019 at 23:08 comment added user3731622 Why do you say "we don't use batches in prediction"? If a user needs to make predictions on a large number of images, then it can make sense to use batches in predictions.
May 24, 2018 at 23:34 history edited Adam Hendry CC BY-SA 4.0
deleted 4 characters in body
May 24, 2018 at 23:27 history edited Adam Hendry CC BY-SA 4.0
deleted 4 characters in body
May 24, 2018 at 1:31 review Late answers
May 24, 2018 at 1:32
May 24, 2018 at 1:25 history edited Adam Hendry CC BY-SA 4.0
added 314 characters in body
May 24, 2018 at 1:19 history edited Adam Hendry CC BY-SA 4.0
added 314 characters in body
May 24, 2018 at 1:15 review First posts
May 24, 2018 at 1:49
May 24, 2018 at 1:13 history answered Adam Hendry CC BY-SA 4.0