Serverless (peer-to-peer) networking Paul Krzyzanowski [email_address] [email_address] Distributed Systems Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
Peer-to-peer models Client-server computing servers provide special services to clients clients request service from a server Pure peer-peer computing all systems have equivalent capability and responsibility symmetric communication Hybrid peer-to-peer where servers facilitate interaction between peers
Evolution of the Internet (services) First generation multiple smaller webs telnet, ftp, gopher, WAIS Second generation Mosaic browser retrieval process hidden from user merge all webs into a world-wide-web Third generation peer-to-peer (?) distributed services; distribution hidden from user
Clients are generally untapped Large business client layer might have: 2000 clients × 50 GB/client = 100 TB spare storage 2000 clients × 300 MHz/client × 9 ops/cycle = 5.4 trillion ops/second spare computing
Current peer-to-peer models
Distributed file caching Akamai Buy thousands of servers and distribute them around the world Cache pages that don’t change a lot Users annotate content on their web sites to point to akamai servers Advantages Higher availability Better performance Most references in the same network as yours. Rapid expansion is easy for an organization
Directory server mediated file sharing Users register files in a directory for sharing Search in the directory to find files to copy Central directory, distributed contents Napster Started by 19-year-old college dropout Shawn Fanning Stirred up legal battles with $15B recording industry Before it was shut down: 2.2M users/day, 28 TB data, 122 servers Access to contents could be slow or unreliable
Peer-to-peer file sharing Users register files with network neighbors Search across the network to find files to copy Does not require a centralized directory server Use time-to-live to limit hop count Gnutella Created by author of WinAMP (AOL shut down the project) Anonymous: you don’t know if the request you’re getting is from the originator or the forwarder KaZaA Supernodes: maintain partial uploaded directories and lists of other supernodes
Peer-to-peer file sharing BitTorrent To distribute a file: .torrent file : name, size, hash of each block, address of a tracker server. Start a seed node ( seeder ): initial copy of the full file To get a file: Get a .torrent file Contact tracker – tracker manages uploading & downloading of the archive: get list of nodes with portions of the file Tracker will also announce you Contact a random node for a list of block numbers request a random block of the file
Example: The Pirate Bay Torrent tracker (indexing site) > 12 million peers About 50% seeders, 50% leechers Risk: indexing sites can be shut down
Cycle sharing aka Grid Computing aggregate autonomous computing resources dynamically based on availability, capability, performance, cost. Example: Intel NetBatch >70% workstations idle, 50% servers idle Developed NetBatch c.1990 Stopped buying mainframes in 1992 1990: 100 machines 2000: >10K machines across ~20 sites 2.7 million jobs/month
Cycle sharing Example: [email_address] Scan radio telescope images Chunks of data sent to client in suspend mode (runs as screensaver) Data processed by clients when not in use and results returned to server
Cycle sharing Example: distributed.net code breaking RC5: 72 bits total keys tested: 2.315×10 19 (19.35 quintillion) total to search: 4.722×10 21 overall rate: 1.36×10 11 keys per second % complete: 0.490% 1,973 days RC5-64 challenge: total keys tested: 15.27×10 18 total to search: 18.45×10 18 overall rate: 1.024×10 11 keys per second % complete: 82.77 1,726 days
Tons of distributed efforts Berkeley Open Infrastructure for Network Computing (BOINC): boinc.berkeley.edu Choose projects Download software BOINC Manager coordinates projects on your PC When to run: location, battery/AC power, in use, range of hours, max % CPU http://boinc.netsoft-online.com/
Tons of distributed efforts [email_address] Climateprediction.net [email_address] [email_address] [email_address] BBC Climate Change Experiment [email_address] World Community Grid SIMAP SZTAKI Desktop Grid PrimeGrid uFluids MalariaControl and lots more… http://boinc.netsoft-online.com/
File servers Central servers Point of congestion, single point of failure Alleviate somewhat with replication and client caching E.g., Coda Limited replication can lead to congestion Separate set of machines to administer But … user systems have LOTS of disk space 350 GB is common on most systems 500 GB 7200 RPM Samsung SpinPoint T Series: $99 Berkeley xFS serverless file system
Amazon S3 (Simple Storage Service) Web services interface for storing & retrieving data Read, write, delete objects (1 byte – 5 GB each) Unlimited number of objects REST & SOAP interfaces Download data via HTTP or BitTorrent Fees $0.15 per GB/month $0.13 - $0.18 per GB transfer out $0.01 per 1,000 PUT/LIST requests $0.01 per 10,000 GET requests
Google File System Component failures are the norm Thousands of storage machines Some are not functional at any given time Built from inexpensive commodity components Datasets of many terabytes with billions of objects GFS cluster Multiple chunkservers Data storage: fixed-size chunks Chunks replicated on several systems (3 replicas) One master File system metadata Mapping of files to chunks
Ad hoc networking and service discovery
Ad-hoc networking and auto-discovery Device/service discovery and control Sun’s JINI Microsoft, Intel: UPnP Universal Plug and Play architecture http://www.upnp.org Networking Unreliable: nodes added/removed unpredictably Programs need to talk to programs (services)
UPnP strategy Send data only over network No executables Use standard protocols Leverage standards HTTP, XML Basic IP network connectivity
Communication Between… Control points Controller usually client Device controlled Usually server Device may take on both functions Control Point Device
Step 0 Control point and device get addresses DHCP Or AutoIP IETF draft: automatically choose IP address in ad-hoc IPv4 network Pick address in 169.256/16 range – see if it’s used DHCP request DHCP request DHCP server address address
Step 1 Control point finds device Devices advertise (broadcast) when added Periodic refresh Control points search as needed Devices respond Search for types of service Guarantee minimal capabilities advertise Detect device
Step 2 Control point learns about device capabilities SSDP: Simple Service Discovery Protocol IETF draft Administratively scoped multicast Unicast responses Get URL for description Actions, state variables expressed in XML Response Discover Protocol
Step 3 Control point invokes actions on device Send request, get result SOAP messages Get command Invoke action
Step 4 Control point listens to state changes of device Push model GENA: General Event Notification Architecture IETF draft Event Detect event
Step 5 Control point controls device and/or views device status with HTML Get request http://192.168.1.12/status
Bonjour (Rendezvous ) Apple et al. allocate addresses without a DHCP server Use 169.254/16 zeroconf range translate between names and IP addresses without a DNS server Use IP multicast locate or advertise services without using a directory server Use DNS Structured Instance Names
Mesh Networking Mobile Ad-hoc networks, Sensor networks, … Hop node-to-node until the destination is reached Nodes can act as repeaters to nearby peers Robust connectivity: find alternate routes Dynamic routing Table-based: maintain fresh lists of destinations/routes Reactive: find route on demand Hierarchical Geographical Power-aware Multicast See http://en.wikipedia.org/wiki/Ad_hoc_routing_protocol_list
Mesh Networking Examples: ZigBee (IEEE 802.15.4) 192 kbps 100-1000 ft. range ZenSys Z-Wave See http://en.wikipedia.org/wiki/Ad_hoc_routing_protocol_list
Peer-to-peer usage models Universal file sharing Collaboration Secure file sharing Distributed storage sharing Alleviate need for servers Distributed (GRID) computing Alleviate need for compute servers Intelligent agents Cooperative search engine, others… Location-aware services Ad hoc networks
Issues Security Protection of content Protection against worms, viruses Privacy Predictable connectivity Routing Fault tolerance Naming, resource discovery Standards, interoperability
The End

Serverless (Distributed computing)

  • 1.
    Serverless (peer-to-peer) networkingPaul Krzyzanowski [email_address] [email_address] Distributed Systems Except as otherwise noted, the content of this presentation is licensed under the Creative Commons Attribution 2.5 License.
  • 2.
    Peer-to-peer models Client-servercomputing servers provide special services to clients clients request service from a server Pure peer-peer computing all systems have equivalent capability and responsibility symmetric communication Hybrid peer-to-peer where servers facilitate interaction between peers
  • 3.
    Evolution of theInternet (services) First generation multiple smaller webs telnet, ftp, gopher, WAIS Second generation Mosaic browser retrieval process hidden from user merge all webs into a world-wide-web Third generation peer-to-peer (?) distributed services; distribution hidden from user
  • 4.
    Clients are generallyuntapped Large business client layer might have: 2000 clients × 50 GB/client = 100 TB spare storage 2000 clients × 300 MHz/client × 9 ops/cycle = 5.4 trillion ops/second spare computing
  • 5.
  • 6.
    Distributed file cachingAkamai Buy thousands of servers and distribute them around the world Cache pages that don’t change a lot Users annotate content on their web sites to point to akamai servers Advantages Higher availability Better performance Most references in the same network as yours. Rapid expansion is easy for an organization
  • 7.
    Directory server mediatedfile sharing Users register files in a directory for sharing Search in the directory to find files to copy Central directory, distributed contents Napster Started by 19-year-old college dropout Shawn Fanning Stirred up legal battles with $15B recording industry Before it was shut down: 2.2M users/day, 28 TB data, 122 servers Access to contents could be slow or unreliable
  • 8.
    Peer-to-peer file sharingUsers register files with network neighbors Search across the network to find files to copy Does not require a centralized directory server Use time-to-live to limit hop count Gnutella Created by author of WinAMP (AOL shut down the project) Anonymous: you don’t know if the request you’re getting is from the originator or the forwarder KaZaA Supernodes: maintain partial uploaded directories and lists of other supernodes
  • 9.
    Peer-to-peer file sharingBitTorrent To distribute a file: .torrent file : name, size, hash of each block, address of a tracker server. Start a seed node ( seeder ): initial copy of the full file To get a file: Get a .torrent file Contact tracker – tracker manages uploading & downloading of the archive: get list of nodes with portions of the file Tracker will also announce you Contact a random node for a list of block numbers request a random block of the file
  • 10.
    Example: The PirateBay Torrent tracker (indexing site) > 12 million peers About 50% seeders, 50% leechers Risk: indexing sites can be shut down
  • 11.
    Cycle sharing aka Grid Computing aggregate autonomous computing resources dynamically based on availability, capability, performance, cost. Example: Intel NetBatch >70% workstations idle, 50% servers idle Developed NetBatch c.1990 Stopped buying mainframes in 1992 1990: 100 machines 2000: >10K machines across ~20 sites 2.7 million jobs/month
  • 12.
    Cycle sharing Example: [email_address] Scan radio telescope images Chunks of data sent to client in suspend mode (runs as screensaver) Data processed by clients when not in use and results returned to server
  • 13.
    Cycle sharing Example: distributed.net code breaking RC5: 72 bits total keys tested: 2.315×10 19 (19.35 quintillion) total to search: 4.722×10 21 overall rate: 1.36×10 11 keys per second % complete: 0.490% 1,973 days RC5-64 challenge: total keys tested: 15.27×10 18 total to search: 18.45×10 18 overall rate: 1.024×10 11 keys per second % complete: 82.77 1,726 days
  • 14.
    Tons of distributedefforts Berkeley Open Infrastructure for Network Computing (BOINC): boinc.berkeley.edu Choose projects Download software BOINC Manager coordinates projects on your PC When to run: location, battery/AC power, in use, range of hours, max % CPU http://boinc.netsoft-online.com/
  • 15.
    Tons of distributedefforts [email_address] Climateprediction.net [email_address] [email_address] [email_address] BBC Climate Change Experiment [email_address] World Community Grid SIMAP SZTAKI Desktop Grid PrimeGrid uFluids MalariaControl and lots more… http://boinc.netsoft-online.com/
  • 16.
    File servers Centralservers Point of congestion, single point of failure Alleviate somewhat with replication and client caching E.g., Coda Limited replication can lead to congestion Separate set of machines to administer But … user systems have LOTS of disk space 350 GB is common on most systems 500 GB 7200 RPM Samsung SpinPoint T Series: $99 Berkeley xFS serverless file system
  • 17.
    Amazon S3 (Simple Storage Service) Web services interface for storing & retrieving data Read, write, delete objects (1 byte – 5 GB each) Unlimited number of objects REST & SOAP interfaces Download data via HTTP or BitTorrent Fees $0.15 per GB/month $0.13 - $0.18 per GB transfer out $0.01 per 1,000 PUT/LIST requests $0.01 per 10,000 GET requests
  • 18.
    Google File SystemComponent failures are the norm Thousands of storage machines Some are not functional at any given time Built from inexpensive commodity components Datasets of many terabytes with billions of objects GFS cluster Multiple chunkservers Data storage: fixed-size chunks Chunks replicated on several systems (3 replicas) One master File system metadata Mapping of files to chunks
  • 19.
    Ad hoc networkingand service discovery
  • 20.
    Ad-hoc networking andauto-discovery Device/service discovery and control Sun’s JINI Microsoft, Intel: UPnP Universal Plug and Play architecture http://www.upnp.org Networking Unreliable: nodes added/removed unpredictably Programs need to talk to programs (services)
  • 21.
    UPnP strategy Senddata only over network No executables Use standard protocols Leverage standards HTTP, XML Basic IP network connectivity
  • 22.
    Communication Between… Controlpoints Controller usually client Device controlled Usually server Device may take on both functions Control Point Device
  • 23.
    Step 0 Controlpoint and device get addresses DHCP Or AutoIP IETF draft: automatically choose IP address in ad-hoc IPv4 network Pick address in 169.256/16 range – see if it’s used DHCP request DHCP request DHCP server address address
  • 24.
    Step 1 Controlpoint finds device Devices advertise (broadcast) when added Periodic refresh Control points search as needed Devices respond Search for types of service Guarantee minimal capabilities advertise Detect device
  • 25.
    Step 2 Controlpoint learns about device capabilities SSDP: Simple Service Discovery Protocol IETF draft Administratively scoped multicast Unicast responses Get URL for description Actions, state variables expressed in XML Response Discover Protocol
  • 26.
    Step 3 Controlpoint invokes actions on device Send request, get result SOAP messages Get command Invoke action
  • 27.
    Step 4 Controlpoint listens to state changes of device Push model GENA: General Event Notification Architecture IETF draft Event Detect event
  • 28.
    Step 5 Controlpoint controls device and/or views device status with HTML Get request http://192.168.1.12/status
  • 29.
    Bonjour (Rendezvous )Apple et al. allocate addresses without a DHCP server Use 169.254/16 zeroconf range translate between names and IP addresses without a DNS server Use IP multicast locate or advertise services without using a directory server Use DNS Structured Instance Names
  • 30.
    Mesh Networking MobileAd-hoc networks, Sensor networks, … Hop node-to-node until the destination is reached Nodes can act as repeaters to nearby peers Robust connectivity: find alternate routes Dynamic routing Table-based: maintain fresh lists of destinations/routes Reactive: find route on demand Hierarchical Geographical Power-aware Multicast See http://en.wikipedia.org/wiki/Ad_hoc_routing_protocol_list
  • 31.
    Mesh Networking Examples:ZigBee (IEEE 802.15.4) 192 kbps 100-1000 ft. range ZenSys Z-Wave See http://en.wikipedia.org/wiki/Ad_hoc_routing_protocol_list
  • 32.
    Peer-to-peer usage modelsUniversal file sharing Collaboration Secure file sharing Distributed storage sharing Alleviate need for servers Distributed (GRID) computing Alleviate need for compute servers Intelligent agents Cooperative search engine, others… Location-aware services Ad hoc networks
  • 33.
    Issues Security Protectionof content Protection against worms, viruses Privacy Predictable connectivity Routing Fault tolerance Naming, resource discovery Standards, interoperability
  • 34.