Answer Two: Identifying [Some Of] What's Right
Here, I try to identify some concepts that sound right, and some that don't. For concepts that don't sound right, I might or might not elaborate on what is right.
The initial question poster clearly has some good understanding of the details. (That's good!) Although, as noted, "some" good understanding. That same poster also seemed to be just a bit off on some details. (Asking these questions was a great idea, so that could become clarified more quickly.)
(I'm quoting from both the question, and some of the pre-existing comments by the same poster as the question.)
I'm mainly responding to some aspects that seem wrong to me, so I'm calling them out. One reason I might not offer more clarifying text on some of the topics, here, is for the sake of some effort to attempt reduction of some redundancy, not to try to provide a bunch of the "correct" details which are covered in my other asnwer to this same question, Answer 1: Overviewing How Some Things Work, which largely focused more on providing some right details on how things work. (In contrast, this question has more focus on identifying which components, of some of the questions/comments, seemed to reveal some confusion. So, quoted text containing some more of the details that seemed inaccurate is quoted text that may have been more likely to be ending up in this answer...)
(I do suggest reading over that other posted long answer first, before going through this posted long answer that focuses on some of the other questions/content. Doing so may help make some of this text make a bit more sense, perhaps by filling in some gaps earlier on...)
ARP and NDP handling, and Layer 2's Usefulness
If the source host, hostA is in a different physical network than its target host, hostB and the MAC of hostB is unknown,
Correct, the MAC-48 address of "hostB" is unknown to "HostA". (For the moment here, I'm now just quickly confirming this piece of a comment. This answer is about to elaborate upon this idea some more when looking at a very related forthcoming question.)
default gateway which then rout[e]s it through the same procedure to hostB
That part sounds right.
My question: why does hostA need to know the MAC of hostB?
I sure hope it doesn't, because "host A" never gets the MAC-48 address of a device that is not on the same local subnet (as part of the typical standard networking/routing process). (Since "host A" doesn't get that address, I sure hope that "host A" doesn't need it!) That's not the case if hostB is in the same local subnet as hostA. But the question stated that "hostA is in a different physical network than its target host, hostB".
aka ARP or NDP to its default gateway which then rout[e]s it through the same procedure to hostB which answers to hostA with its MAC address.
Wrong. ARP or NDP will be used, to get the maching Layer 2 address and Layer 3 address, if the local device's neighboring cache doesn't have those details in memory already. That may go to the "default gateway", as you specify.
However, then what happens is a new Layer 2 frame (containing an IP packet with the same desired IP address for a "destination IP address") goes to the default gateway. (Note that the destination of this frame is the "default gateway", and so the frame's destination "MAC-48 address" may go to a different destination than the embedded IP packet's destination "IP address", and that's perfectly fine that these two destination addresses point to different locations. The Layer 2 Frame is just meant to get the packet onto the device that will handle the "next hop", so that short-range trip has a different destination than what might be a longer-range trip to eventually get the IP packet to the desired (potentially distant) IP address.
Then, "default gateway which then rout[e]s it through the same procedure to hostB" is mostly right, as long as you are understanding that the routed "it" is a version of the desired "IP packet". (The "default gateway" is not routing/extending an ARP/NDP request!) If the "default gateway" doesn't know how to get the IP address all the way to the desired destination, then it may follow as similar process, including sending an ARP/NDP request for details about the very next hop, so it can make a new "Layer 2" frame (containing the still-mostly-unaltered "IP packet") in order to help get the traffic one more hop closer.
[traffic is routed] to hostB which answers to hostA with its MAC address.
No. The ARP/NDP requests only go one hop forward. If hostB gets an ARP/NDP request, which is very possible, that will come from whatever router is located in the hop before hostB, and so the ARP/NDP response will only be sent to that router located just one hop before hostB. None of this will cause hostB to send a response to hostA. (hostB might choose to respond to hostA, e.g. if the payload inside the "IP packet" contains an ICMP/IPv4 request or an ICMPv6 request, and so the response is done to support that received request, or if the payload in the "IP packet" contains a "TCP segment", and so the request is done in response to the received TCP details.
However, [to re-cap,] any ARP/NDP response from a "hostB" destination device will stay on the same subnet as that same "hostB" destination device, and so will not go back to a "hostA" on a different subnet. Also, the act of receiving an IP packet doesn't trigger a response from "hostB" back to "hostA". (Processing the "IP packet" payload details, however, might trigger some sort or response, depending on some details including just what is actually in that "IP packet" payload.)
Now, going back to the first part of the question:
a MAC address is needed to transmit a message to a node within the same physical network
Yes.
or to a switch/router.
This isn't as common.
You may send a "Layer 2" network frame through a "Layer 2" switch (or through a [Layer 3] router since that device also forwards [Layer 2] network frames as needed). (Note: Similar to how a Layer 3 router can handle the task of a Layer 2 switch, a Layer 4 firewall can handle the task of a Layer 3 router (including also handling the task of a Layer 2 switch). When the "Layer 2" switch receives a "network frame", it will figure out what the desired destination MAC-48 address is, and continue sending that traffic [forward on] in a useful way.
However, as for actually sending traffic "to a switch/router", rather than just going "through a switch/router", that suggests that the switch/router itself is the desired end destination. That's probably not what you were asking about, though. (Sometimes, people might "manage" the configuration of a device like a "managed switch" or a [Layer 3] router. This is most commonly done using protocols, like HTTP, which utilize an IP address on that device. A device doesn't necessarily need to even have an IP address to effectively server as a "Layer 2" switch.
There might be some exceptions to this. "MikroTik RouterBOARD" devices may often support a ["Layer 2"-based] protocol called "MAC-Telnet", which is different than the standard Layer 4/5 "Telnet" protocol. The biggest limitation that prevents MAC-Telnet from being more useful may just be that software supporting that protocol is not pre-installed on as many computers/devices. (The next-biggest limitation would be the "Layer 2" nature may be prone to limiting the traffic's range to communicating easily only to only devices within a single subnet.)
Traffic not reaching higher layers
The only difference I see is that[,] by using a MAC address instead of an IP address[,] all other hosts in the same network discard the frame on L1 because they use a different MAC address.
No, not Layer one (as quoted above where it says "on L1"). Instead, "all other hosts in the same network discard the frame on" Layer 2 "because they use a different MAC address." (Layer 2, not Layer 1 like what was just quoted above. The word "frame" is a "Layer 2" word. e.g., "Ethernet frame", "Wi-Fi frame".)
Since you said "discard the frame", the word "frame" means we are now talking about Layer 2. Also, the phrase "MAC address" was in the above-quoted text, so that also indicates "Layer 2".
An example of what would actually be a case of "Layer 1" discarding traffic would be something like a cable got cut, and now Layer 1 can't communicate so the traffic get discarded. Layer 1 discarding would be something that can happen in a scenario like when a microwave oven emits radiation which interferes with the radio frequencies of Wi-Fi, so Wi-Fi traffic gets discarded. (Hopefully Layer 2 will notice either of these problems. I'm thinking that Ethernet might treat this loss like a collision, so data may be re-transmitting, so the Wi-Fi issue might get resolved when the source of interference stops (when the microwave oven stops cooking). Re-transmitting over a broken cable won't work until the cable gets repaired/replaced.)
discard the frame [...] So they do not process the frame on higher layers.
Since it's unclear if by "higher layers" you really meant Layer 1 (as stated), or meant Layer 2 (where frames get handled), let me address both by just going over some cases where some information doesn't get processed by a higher layer.
If there is a Layer 1 issue on a direct connection network, such as a console port, then the traffic is lost. You may get some jumbled data (basically looking like random bits), and data is not re-transmitted. e.g., the "rollover cable" connection I mention above, and the "phone line" connection I mention above.
An "Ethernet hub" is basically a layer 1 device: as far as I know, it may work by connecting metal together so one cable has an electrical connection to multiple other cables.
Correct. If a frame isn't getting past a lower layer, then the frame's contents don't get processed. If the Layer 2 doesn't pull apart a frame, notice an ARP packet, and respond to ARP, then no ARP response will be given. If the Layer 2 doesn't pull apart a frame, notice an IPv4 packet, and send that packet to the IPv4-handling software (which will be what implements to lower portion of the "TCP/IP stack" of software), then the computer doesn't do any further processing of the IP packet.
(Likewise, if a received IP packet is meant for a different computer's TCP port 80, then a router may help by trying to route the traffic, and a non-routing "host" may just drop the packet. So the local web brower listening on port 80 would never get that TCP traffic, because the IP layer didn't extract the TCP traffic and try to get it processed by the local machine's TCP-handling software.)
Some protocols may offer some buffers to check for successful transmission. For instance, an "Ethernet card" may use the Ethernet protocol and have "buffers" ["buffer" memory] built into that hardware. A "network switch" may use a "CAM table" within its memory to match Ethernet addresses to individual RJ-45 ports, and may similarly use a buffer. TCP has some buffers that can store data temporarily until confirmation is received, which is part of how TCP is able to act in a way that people often refer to as "reliable" (as lost data can get re-transmitted).
When an Ethernet device wants to communicate with Layer 1, it may check the Layer 1 status and notice the line is busy. Once the Ethernet device notices the Layer 1 carrier seems up and available (not busy), it may transmit.
If there is a Layer 1 problem, such as multiple devices trying to communicate at once, you may get a "collision". The traffic on Layer 1 is jumbled, with all information lost. The device supporting Layer 2 may realize this, and follow the Layer 2 protocol rules. So the Ethernet devices may send out a "jamming signal" and wait for a random(-ish?) "back-off period" of time, as mandated by Ethernet rules, before re-transmitting the frame which is still in its buffer.
When a device supports Layer 3 using IPv6 or IPv4, your data may just be sent and forgotten. The same is true if you rely on UDP as a "Layer 4" protocol, and so sending a layer 4 "datagram" using "UDP" is called "unreliable" just because it doesn't provide any confirmation of the transmission. However, if you use TCP as a "Layer 4" protocol, you get the feature of the protocol being "reliable" by design, because the sending device will store the outgoing traffic in a (Layer 4, TCP) buffer before it tries to send the traffic out Layer 3 (e.g., by using an "IP packet"). A copy of that TCP segment" remains in the sender's outgoing buffer until that sender receives confirmation by the TCP recipient. If the TCP sender never gets such a confirmation, then the TCP sender will re-send that "TCP segment". (In a sequence of TCP segments, the receiver won't care that the received segment arrives out of order.)
Details on Using MAC-48
(Back to more details form the main question...)
So: ARP and NDP are used to get the MAC address of a host within the same (!) physical network by using its IP address.
Yes. (If you have a MAC address and want the IPv4 address, RARP, Reverse-ARP, may be used. That isn't done quite so commonly.)
NDP/IPv6 works similarly to ARP/IPv4. I'm used to referring to ARP/IPv4, but do consider NDP to act similar in regards to what I'm about to say.
Switches may take an ARP request and reach out to other segments within the same Layer 2 broadcast domain. A device running IP may respond to ARP requests sent to an Ethernet broadcast address (MAC-48 address FF-FF-FF-FF-FF-FF).
Note that ARP and NDP are unroutable, by design. (Routers won't take an ARP request and send it to another IP network segment. Although, a router might act like a switch, so information may flow through a Layer 3+ router, but that is just because of how the router is acting like a switch at the layer 2 level, not because of how the router is applying Layer 3 logic.)
So what happens if the device is on the same subnet? If a Layer 3+ device uses ARP or NDP, it may pay attention to some Layer 2 MAC-48 addresses.
Quick side note: As you might expect, the network driver may notice incoming traffic on the device's own Layer 2 MAC-48 address. Most Layer 3 devices will then process an "IP packet" to see if the destination address belongs to the same device, and otherwise discard the IP packet. Routers might process an IP packet further, in order to figure out what routing needs to occur.
The network driver might also pay attention to the MAC-48 broadcast address of FF-FF-FF-FF-FF-FF. If it is a supported Layer 2 protocol of ARP or NDP, then the device will check if the request is for one of its layer 3 addresses. If so, then the an ARP or NDP response may be given back (to whatever MAC address broadcasted the ARP/NDP request).
Why Use MAC-48
So: ARP and NDP are used to get the MAC address of a host within the same (!) physical network by using its IP address. But what is the benefit of this?
The benefit is... that's how things work.
You need the MAC-48 address to send out a proper Layer 2 frame (e.g. an Ethernet frame or a Wi-Fi frame).
Let's consider if this wasn't done. (Let's analyze what would happen in a non-working way.) Actually, you probably can't even do this. If a network card supports only the layer 2 protocol of Ethernet to communicate with, the "network driver" will only support using Ethernet frames, so the "operating system" doesn't have a way to write an "IP packet" directly onto the wire. But even if a device (like a full-blown computer) somehow theoreticaly did manage to get its communications components (e.g. the RJ-45 port) to somehow send the bits of a proper "IP packet" directly onto a wire, what would happen if we didn't bother sending a proper Layer 2 frame?
The destination computer wouldn't care because its Layer 3 stack is probably not listening to the Layer 1 communication. The remote Layer 2 device would likely just consider the IP packet to be unsupported noise, most likely to be ignored (or maybe treated like noise seen in a collision, and responding that way).
So the benefit of using ARP and NDP to get the MAC address is that you need to communicate using the Layer 2 frames, just because that is what the receiving computer is going to be listening for.
Now, this may lead to the question: Why does the computer only listen to Ethernet frames, instead of listening for IP packets? Could we just skip the entire Layer 2 process?
Well, the theortical reasoning of why we use Layer 2 is because it provides some benefit. The Layer 2 communications can help. For instance, if there is a layer 1 problem like a "collision" or even "random"-seeming data disruption caused by electrical interference which disrupts a bit of communication, the layer 2 network buffer may result in a re-transmission so data successfully gets sent.
Now, in theory one might think you could just incorporate such functionality into the Layer 3 software. But then you'd be basically merging the functionality of Layers 2 and 3. In theory, functionality of layers could be mixed together a bit in software. Actually, that is quite commonly done with Layers 5-7. There might also frequently be some merged code in a TCP/IP stack, mixing layers 3 and 4 a bit.
But there's usually a pretty strict boundary separating layers 2 and 3. That way, you can have separate software (e.g. an "Ethernet network driver" or a "Wi-Fi network driver) handling layer 2, and separate options (e.g. IPv6 or IPv4) for layer 3. If you have some detail specific to "Layer 2", e.g. Wi-Fi cards need SSID information and maybe login information to be implemented, that can be handled by the "Layer 2" network driver. A different "Layer 2" network driver, for an Ethernet card, might not need that complexity at all. And, either way, your IPv6 driver won't be affected, because IPv6 is at Layer 3. Your IPv4 driver won't be affected, because IPv4 is at layer 3.
When learning networking, a web browser may communicate using Layer 4 TCP ports, while ping may use ICMP at Layer 3, and a more rarely-implemented protocol called "MAC-Telnet" may communicate using just Layer 2. ARP may expose communication troubles at Layer 2, in which case you know that layers 3 and 4 won't work until the ARP table is able to successfully show the needed Layer 2 address.
For communications at higher layers to work, the communications will need to go up the network stack.
Layer Interaction
Some of the next paragraphs might not show new concepts as much as they simply show some details on how theory gets implemented. I figure some readers might not need these details, while others might find this provides a bit of clarity that helps solidify the concepts. (Especially for those who might not have needed it, this next little bit may feel a bit redundant with the previous paragraphs which discussed the theory a bit more.)
Since different software may communicate using these different layers, it is going to be helpful to familiarize yourself with the differences between Layers 2, 3, and 4. (I show some specific examples in my first answer to this same "question" post, in the section called "Layer-Based Terminology".)
When learning these networking details, it will be helpful for you to think of the layers one through four as being pretty separate, and that each layer only communicates with the layer above and below it. (Granted, that might not be entirely true since some software might try to kind of merge implementation of multiple layers a bit, probably in a way to be a bit more efficient. However, such software should act in a way that is compatible with what would happen if the layers were separated more, so you are likely to have an easier time learning faster if you keeping things simple in your mind, by keeping the layers separate, at least mentally. That can help you to focus on only the most applicable, needed layers.)
The upper layer of the TCP/IP stack, Layer 4, handles TCP and UDP. These protocols only communicate with Layer 3 protocols, like IPv6 and IPv4, and upper-level layer(s). (Layers 5 through 7 are often implemented together, so although they have different purposes, they are often not quite as separated as other layers.) TCP and UDP never communicate with Layer 2 protocols like Ethernet or Wi-Fi. They certainly never deal with Layer 1 issues like cable line congestion or airwave collisions. When software requests a new outgoing UDP connection, that is a Layer 4 request. When software requests that the computer "listen" to a specific TCP port, and communicate any traffic received on that port to the software, that is a Layer 4 request.
In the lower half of the TCP/IP stack, Layer 3 protocols like IPv6 and IPv4, happily communicate with Layer 4 protocols with TCP and UDP, as well as Layer 2 protocols like Ethernet or Wi-Fi. However, the portion of software known as the TCP/IP stack typically ignores Layer 1. So if a request comes in using Layer 1 (an Ethernet cable or airwaves using Wi-Fi), the TCP/IP software drivers are just going to ignore the communication.
Because of the details from this previous paragraph, if you send an IP packet by itself over the wire, the receiving computer will typically ignore it.
The network driver which implements Layer 2 communications will happily communicate with layer 3 (the IP-handling software portion of the TCP/IP network stack), and will also pay attention to Layer 1 details (is the connection "up", or does the device report "no carrier"). So it will listen for a Layer 2 frame sent to a supported MAC-48 address, and if it receives such a frame, then it will investigate what is in that frame. It will, for example, perform data verification (see if the Cyclic Redundancy Check ("CRC") data matches). Also, if it is an ARP or NDP request sent to its own MAC-48 address or to a supported Ethernet broadcast address (typically just FF-FF-FF-FF-FF-FF), then that will get handled as noted earlier. On the other hand, if the frame has an IP packet, then the network driver will pull the IP packet out of the frame, and send the IP packet to the lower half of the TCP/IP stack (and disregard the remaining part of the frame).
If the local network is connected to a switch or router then I could use only the IP address of the target even if it is in the same network.
Not quite right. Here is what is right:
- If you are connected through a "layer 2"-only switch which isn't doing Layer 3 handling, that device is not going to pay any attention to the IP address in the embedded IP packet. You're relying on Layer 2 at this stage.
- When using a switch, you are using "Layer 2" frames, so you would use MAC-48 addresses. A "layer 2" switch won't care about the IP address at all.
- You can also likely use an "IP address" of any device on the same network, because trying to communicate with IP will result in an IP packet being inserted into a "Layer 2" "network frame" that the switch can handle nicely.
- When using a router, you can use an IP address to any device that is reachable via routing, which could potentially involve somewhere on the other side of the city/nation/planet.
If you're on the same subnet, then you're likely on the same Ethernet broadcast domain, so you can use the Ethernet broadcast address (FF-FF-FF-FF-FF-FF is commonly supported by hardware/drivers) to use ARP so that you can convert a local device's IP address to a local device's MAC-48 address. You'll need that destination device's MAC-48 address to be able to communicate using IP for a device on the local subnet.
Although not just asked about, I will point out that if the remote device is not on the same local subnet, then what happens? The MAC-48 address of the device with the destination IP address won't ever get figured out by the local computer initiating that IP packet, and that's fine. That initiating computer will notice the destination IP address is on a different subnet. It will then take the IP address, which continues to show the destination IP address (of the very-remote device), and try to send that IP packet in a newly created Layer 2 network frame. To make that frame, the computer initiating this communication will need to identify the MAC-48 address of a "gateway device" which is on the same subnet.
The host would send the frame with the target IP address to the switch/router, which knows the MAC address of the target corresponding to the IP address.
No, no, no. Two things wrong here:
The host would send the frame with the target IP address to the switch
A frame doesn't have a field for an IP address. (The "IP address" in a frame is simply part of the "IP address" in an "IP packet". Such an "IP packet" gets inserted into a section of the frame which we call the "payload" for the frame.
Also, the host may be sending "through" the switch, expecting the switch will "switch"-forward the traffic. (That is different than sending "to the switch", which sounds like the switch is the ultimate destination. That's not what you were trying to portray.)
The host would send the [...] target IP address [through] the [...] router, which knows the MAC address of the target corresponding to the IP address.
The near router is only expected to know (or be able to figure out, and then know) the MAC address of the device on the next hop. If multiple hops are needed, then the device on the next hop won't be the "target corresponding to the IP address".