Final Presentation ELET 4780 Michael Vistine Katy Rodriguez Ralph Walker 1/34
Michael Vistine Engineer 2/34 Katy Rodriguez Engineer Ralph Walker Engineer
 Motivation  Hardware  Design Description  Software  Design Improvements  Timeline  Current Status Immediate Tasks  Conclusion  Sources  Questions?? 3/34
4/34 Design • Cluster Computing • Compact • Active Cooling Raspberry Pi • Low Cost powerful device • Open Source Code Modifying Existing Design • Nodes vs. Performance • Wireless vs. Wired Performance • Add components for usability
5/34 Photo courtesy of pcworld.com Pi 1B+ Pi B 2 BeagleBone Processor 700 MHz 900-1000 MHz 1GHz Cores 1 4 1 RAM 512 MB 1 GB 512 MB Peripherals 4 USB Ports 4 USB Ports 2 USB Ports GPIO Capability 40 pins 40 pins 65 pins Memory Micro SD slot Micro SD slot 2 GB on board & Micro SD Price ~$30 ~$35 ~$55 Photo courtesy of ti.com Photo courtesy of adafruit.com
6/34 Photo courtesy of amazon • 5V per hub • Plug and play Insignia - 7-Port USB 2.0 Hub Photo courtesy of amazon Wireless Router TP-Link TL WR841N • 300Mbps wireless connection • Adjustable DHCP settings • Wireless On/Off switch • 4 LAN ports
RPI1 7/34 Power RPI0 (Master Node) RPI2 RPI3 Open MPI Test.cRouter
8/34 Final Design • 3D printed using SolidWorks • Plexiglass • Wired/Wireless router • Heat Sinks and PC fans • Power hub
 OPERATING SYSTEM – RAPSBIAN JESSIE(w/NOOBS) ◦ Easy to use ◦ Lightweight OS ◦ Open source ◦ Bash Terminal interface ◦ Linux/Unix kernel 9/34
 Bash Terminal- used to: ◦ Edit and create files to manipulate the OS and ports  i.e. setting up the host names and mounting drives ◦ Install software packages (i.e. openMPI, nfsserver) ◦ See IP addresses, Node settings and Network connections  Style of syntax used to operate in Terminal: ◦ $ sudo apt-get install (“file”) – used to install files ◦ $ sudo nano (“file”) – used to edit files 10/34
 OpenMPI: ◦ Message Passing Interface used to implement parallel computing ◦ Takes the data and breaks it into smaller chunks and distributes it to the nodes to run simultaneously ◦ This method increases processing speed and efficiency ◦ Can compile and execute programs in C, C++, & Fortran ◦ GCC compiler is used to compile the program to be processed in a parallel fashion 11/34
 First all packages were updated ◦ Gfortran ◦ Nfs-common & Nfs-kernel-server ◦ Build-essential manpages-dev ◦ openmpi-bin/-doc libopenmpi-dev ◦ Etc.  Go into the configurations using sudo raspi-config 12/34
 Settings for the master were the same as the slave nodes: ◦ Set the host names as rpi0 ◦ Enable ssh ◦ Overclock to “pi2” setting ◦ Set the memory split to 16 13/34
 Install all the same packages from the master node  Sudo raspi-config to set all the same system preferences as the master node 14/34 Photo courtesy of www.raspberrypi.org
15/34 1. # include <stdio.h> //Standard Input/output library 2. # include <mpi.h> 3. int main(int argc, char** argv) 4. { 5. //MPI variables 6. int num_processes; 7. int curr_rank; 8. char proc_name[MPI_MAX_PROCESSOR_NAME]; 9. int proc_name_len; 10. //intialize MPI 11. MPI_Init(&argc, &argv); 12. //get the number of processes 13. MPI_Comm_size(MPI_COMM_WORLD, &num_processes); 14. 15. //Get the rank of the current process 16. MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank); 17. // Get the processor name for the current thread 18. MPI_Get_processor_name(proc_name, &proc_name_len); 19. //Check that we're running this process. 20. printf("Calling process %d out of %d on %srn", curr_rank, num_processes, proc_name); 21. //Wait for all threads ot finish 22. MPI_Finalized(); 23. return 0; 24. } •Creates user specified dummy processes of equal size •Allocates the processes dynamically to each nodes •Displays the process number upon completion
 #include <stdio.h>  #include <math.h>  #include <mpi.h>  #define TOTAL_ITERATIONS 10000  int main(int argc, char *argv[])  {  //MPI variables  int num_processes;  int curr_rank;  // keep track of the current for-loop iterations  int total_iter;  int step_iter;  //variables used to calculate pi  double pi; // the final value  double curr_pi, h, sum, x; //step variables  //start up MPI  MPI_Init(&argc, &argv);  MPI_Comm_size(MPI_COMM_WORLD, &num_processes);  MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank);  //Iterate TOTAL_ITERATIONS to calculate PI within a certain error margin  for(total_iter = 2; total_iter < TOTAL_ITERATIONS; total_iter++);  {   16/34
 //init sum  sum = 0.0;  //determine step size  h = 1.0 / (double) total_iter;  //the current process will perform operations on its rank  //added by multiples of the total number of threads  // rank = 3,  for(step_iter = curr_rank +1; step_iter <= total_iter; step_iter += num_processes)  {  //determine the current step  x = h * ((double) step_iter - 0.5);  //add the current step values  sum += (4.0/(1.0 + x * x));  }  // resolve the sum into calculated value of pi  curr_pi = h * sum;  //reduce all processes' pi values to one value  MPI_Reduce(&curr_pi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);  }  // Print out the final value and error  printf("calculated Pi = %.16frn", pi);  printf("Relative Error = %.16frn", fabs(pi - M_PI));  //Wrap up MPI  MPI_Finalize(); return 0;  } 17/34
 Set all node IP addresses as static in ◦ Sudo nano /etc/network/interfaces (edit on all nodes) ◦ This step differs between wired and wireless ◦ For wired enter a static etho address ◦ For wireless enter a static address using wlan0  Set all hostnames to now static IP’s ◦ Sudo nano /etc/hosts (edit on all nodes ) ◦ Add in the hostnames and addresses, for example: ◦ rpi0 192.168.0._ ◦ rpi1 192.168.0._  Now we can ssh from one pi to another without having to type IP addresses 18/34
 Setting up the wireless connection was essentially the same as setting up the wired connection  We assigned the ip addresses onto the wireless router  Then we went in to the etc/network/hosts and added the new ips with hostnames  Added at the bottom of /etc/network/interfaces: iface TP-LINK_7236 inet static ◦ Address 192._._._ ◦ Netmask 255.255.255.0 ◦ Gateway 192._._._ 19/34
 Next a common user was created on all nodes to allow the nodes to communicate with out the need for repeated password entry ◦ Sudo useradd –m –u 2345 mpiu  Next the nodes were mounted onto the master node ◦ Sudo mkdir /mirror //makes the directory ◦ Sudo chown mpiu:mpiu /mirror/ //changes ownership ◦ Sudo service rpcbind start ◦ Sudo update-rc.d rpcbind enable 20/34
 Sudo nano /etc/exports ◦ Line added at bottom of file: ◦ /mirror 192.168.0.0/24(rw,sync) ◦ This line allows all ip addresses from 192.168.0.0 – 192.168.0.255 to be used by this system ◦ This is a possible point of concern when it comes to wireless communication  Next nfs server reset and ssh from rpi0->rpi1  Same thing done to rpi1  Then “$~sudo mount rpi0:/mirror” actually mounts the node  These steps repeated for all slave nodes 21/34
 SSH Keys generated using ◦ Ssh-keygen –t rsa ◦ A passphrase is recommended ◦ A bitmap of random characters was then generated as the key  Next key is copied to slave nodes using: ◦ Ssh-copy-id mpiu@rpi1  “keychain logic” added to file .bashrc 22/34 Photo courtesy visualgdb.com
 Log in as mpiu on master node using  Su – mpiu  Switch to the /mirror/code/ directory which holds the mpi test programs using “cd”  Mpicc calc_pi.c –o calc_pi //this line compiles the program  Time mpiexec –n 4 –H rpi0-3 calc_pi //this line executes the program on the master node and distributes it to the nodes via the mounts  The output is the solution and the time it took to execute 23/34
 Here you can see the .c files and the executables in the directory  You see the execution of the program with mpiexec 24/34
 Initially we had assumed the code wasn’t working correctly but this proved to be an incorrect diagnosis  The times we were seeing were not making much sense  We ran the MPI tests on wired and wireless and we found the processing times to be inconsistent 25/34
 This led us to determine we had an issue with the mounts on the nodes  The main issue was that the nodes wouldn’t read the mirrored programs off the master  We are still currently in the processes of improving the design and graphically interpreting the data 26/34
 Wired vs Wireless performance ◦ Test the processing performance of cluster when:  Hard wired to router  Using dongles for each node to communicate wirelessly  Use wireshark to observe packet latency between nodes  Computational benchmark tests ◦ Using benchmark software to observe total processing power across all pi’s ◦ Using complicated program as test material to solve with cluster  Graphical performance info  Implementation practical applications  Active Cooling onto the Pi’s ◦ Adding fans to final case design 27/34
Research Investigate Improvements Build Prototype Implement Improvements Build Final Design 28/34
29/34 Part Price per Item 4Pi's Quantity Total (4) Link Micro SD's 3.28 6 19.68 http://www.newegg.com/Product/Product.a Micro USB's 4.69 4 18.76 http://www.amazon.com/AmazonBasics-Mic Ethernet cables 0.82 4 3.28 http://www.newegg.com/Product/Product.a Wifi Dongles 7.99 4 31.96 http://www.amazon.com/Kootek-Raspberry Router (4-8 ports) 33.99 1 33.99 http://www.newegg.com/Product/Product.a Raspberry Pi's 41.6 4 166.4 http://www.amazon.com/Raspberry-Pi-Mod Heat sinks 2.41 4 9.64 http://www.amazon.com/Cooling-Aluminium Dual Router 19.99 1 19.99 http://www.frys.com/product/8445718?site= Fans 3.95 2 7.9 http://www.tannerelectronics.com Makers Space 35 1 35 https://dallasmakerspace.org Power USB 29.99 1 29.99 http://www.bestbuy.com/site/insignia-7-po Total of All Parts 376.59
 Diagnosing the mounting issue  Wireless and Wired communication working  Final equipment list acquired  Measuring and sketching layout of case structures for the laser cutter 30/34
 Compare wired vs wireless performance ◦ Detailed documenting and graphing of test results  Continue to debugging and improving the system ◦ Finish debugging the mounting issue  Finish first prototype for final case design ◦ Measuring and cutting the structure of the case 31/34
 Wired and wireless connection is complete  Debugging nfs and mounting issues ◦ Continuously running performance tests  Final case design blueprint is complete 32/34
 http://www.python.org/doc/current/tut/tut.html  http://likemagicappears.com/projects/raspberry-pi-cluster/  http://www.zdnet.com/article/build-your-own-supercomputer- out-of-raspberry-pi-boards/  https://Youtu.be/R0Uglgcb5g  http://www.newegg.com/  http://www.amazon.com  http://anllyquinte.blogspot.com/  http://www.slideshare.net/calcpage2011/mpi4pypdf 33/34
34/34

Senior Design: Raspberry Pi Cluster Computing

  • 1.
    Final Presentation ELET4780 Michael Vistine Katy Rodriguez Ralph Walker 1/34
  • 2.
  • 3.
     Motivation  Hardware Design Description  Software  Design Improvements  Timeline  Current Status Immediate Tasks  Conclusion  Sources  Questions?? 3/34
  • 4.
    4/34 Design • Cluster Computing •Compact • Active Cooling Raspberry Pi • Low Cost powerful device • Open Source Code Modifying Existing Design • Nodes vs. Performance • Wireless vs. Wired Performance • Add components for usability
  • 5.
    5/34 Photo courtesy of pcworld.com Pi1B+ Pi B 2 BeagleBone Processor 700 MHz 900-1000 MHz 1GHz Cores 1 4 1 RAM 512 MB 1 GB 512 MB Peripherals 4 USB Ports 4 USB Ports 2 USB Ports GPIO Capability 40 pins 40 pins 65 pins Memory Micro SD slot Micro SD slot 2 GB on board & Micro SD Price ~$30 ~$35 ~$55 Photo courtesy of ti.com Photo courtesy of adafruit.com
  • 6.
    6/34 Photo courtesy ofamazon • 5V per hub • Plug and play Insignia - 7-Port USB 2.0 Hub Photo courtesy of amazon Wireless Router TP-Link TL WR841N • 300Mbps wireless connection • Adjustable DHCP settings • Wireless On/Off switch • 4 LAN ports
  • 7.
  • 8.
    8/34 Final Design • 3Dprinted using SolidWorks • Plexiglass • Wired/Wireless router • Heat Sinks and PC fans • Power hub
  • 9.
     OPERATING SYSTEM– RAPSBIAN JESSIE(w/NOOBS) ◦ Easy to use ◦ Lightweight OS ◦ Open source ◦ Bash Terminal interface ◦ Linux/Unix kernel 9/34
  • 10.
     Bash Terminal-used to: ◦ Edit and create files to manipulate the OS and ports  i.e. setting up the host names and mounting drives ◦ Install software packages (i.e. openMPI, nfsserver) ◦ See IP addresses, Node settings and Network connections  Style of syntax used to operate in Terminal: ◦ $ sudo apt-get install (“file”) – used to install files ◦ $ sudo nano (“file”) – used to edit files 10/34
  • 11.
     OpenMPI: ◦ MessagePassing Interface used to implement parallel computing ◦ Takes the data and breaks it into smaller chunks and distributes it to the nodes to run simultaneously ◦ This method increases processing speed and efficiency ◦ Can compile and execute programs in C, C++, & Fortran ◦ GCC compiler is used to compile the program to be processed in a parallel fashion 11/34
  • 12.
     First allpackages were updated ◦ Gfortran ◦ Nfs-common & Nfs-kernel-server ◦ Build-essential manpages-dev ◦ openmpi-bin/-doc libopenmpi-dev ◦ Etc.  Go into the configurations using sudo raspi-config 12/34
  • 13.
     Settings forthe master were the same as the slave nodes: ◦ Set the host names as rpi0 ◦ Enable ssh ◦ Overclock to “pi2” setting ◦ Set the memory split to 16 13/34
  • 14.
     Install allthe same packages from the master node  Sudo raspi-config to set all the same system preferences as the master node 14/34 Photo courtesy of www.raspberrypi.org
  • 15.
    15/34 1. # include<stdio.h> //Standard Input/output library 2. # include <mpi.h> 3. int main(int argc, char** argv) 4. { 5. //MPI variables 6. int num_processes; 7. int curr_rank; 8. char proc_name[MPI_MAX_PROCESSOR_NAME]; 9. int proc_name_len; 10. //intialize MPI 11. MPI_Init(&argc, &argv); 12. //get the number of processes 13. MPI_Comm_size(MPI_COMM_WORLD, &num_processes); 14. 15. //Get the rank of the current process 16. MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank); 17. // Get the processor name for the current thread 18. MPI_Get_processor_name(proc_name, &proc_name_len); 19. //Check that we're running this process. 20. printf("Calling process %d out of %d on %srn", curr_rank, num_processes, proc_name); 21. //Wait for all threads ot finish 22. MPI_Finalized(); 23. return 0; 24. } •Creates user specified dummy processes of equal size •Allocates the processes dynamically to each nodes •Displays the process number upon completion
  • 16.
     #include <stdio.h> #include <math.h>  #include <mpi.h>  #define TOTAL_ITERATIONS 10000  int main(int argc, char *argv[])  {  //MPI variables  int num_processes;  int curr_rank;  // keep track of the current for-loop iterations  int total_iter;  int step_iter;  //variables used to calculate pi  double pi; // the final value  double curr_pi, h, sum, x; //step variables  //start up MPI  MPI_Init(&argc, &argv);  MPI_Comm_size(MPI_COMM_WORLD, &num_processes);  MPI_Comm_rank(MPI_COMM_WORLD, &curr_rank);  //Iterate TOTAL_ITERATIONS to calculate PI within a certain error margin  for(total_iter = 2; total_iter < TOTAL_ITERATIONS; total_iter++);  {   16/34
  • 17.
     //init sum sum = 0.0;  //determine step size  h = 1.0 / (double) total_iter;  //the current process will perform operations on its rank  //added by multiples of the total number of threads  // rank = 3,  for(step_iter = curr_rank +1; step_iter <= total_iter; step_iter += num_processes)  {  //determine the current step  x = h * ((double) step_iter - 0.5);  //add the current step values  sum += (4.0/(1.0 + x * x));  }  // resolve the sum into calculated value of pi  curr_pi = h * sum;  //reduce all processes' pi values to one value  MPI_Reduce(&curr_pi, &pi, 1, MPI_DOUBLE, MPI_SUM, 0, MPI_COMM_WORLD);  }  // Print out the final value and error  printf("calculated Pi = %.16frn", pi);  printf("Relative Error = %.16frn", fabs(pi - M_PI));  //Wrap up MPI  MPI_Finalize(); return 0;  } 17/34
  • 18.
     Set allnode IP addresses as static in ◦ Sudo nano /etc/network/interfaces (edit on all nodes) ◦ This step differs between wired and wireless ◦ For wired enter a static etho address ◦ For wireless enter a static address using wlan0  Set all hostnames to now static IP’s ◦ Sudo nano /etc/hosts (edit on all nodes ) ◦ Add in the hostnames and addresses, for example: ◦ rpi0 192.168.0._ ◦ rpi1 192.168.0._  Now we can ssh from one pi to another without having to type IP addresses 18/34
  • 19.
     Setting upthe wireless connection was essentially the same as setting up the wired connection  We assigned the ip addresses onto the wireless router  Then we went in to the etc/network/hosts and added the new ips with hostnames  Added at the bottom of /etc/network/interfaces: iface TP-LINK_7236 inet static ◦ Address 192._._._ ◦ Netmask 255.255.255.0 ◦ Gateway 192._._._ 19/34
  • 20.
     Next acommon user was created on all nodes to allow the nodes to communicate with out the need for repeated password entry ◦ Sudo useradd –m –u 2345 mpiu  Next the nodes were mounted onto the master node ◦ Sudo mkdir /mirror //makes the directory ◦ Sudo chown mpiu:mpiu /mirror/ //changes ownership ◦ Sudo service rpcbind start ◦ Sudo update-rc.d rpcbind enable 20/34
  • 21.
     Sudo nano/etc/exports ◦ Line added at bottom of file: ◦ /mirror 192.168.0.0/24(rw,sync) ◦ This line allows all ip addresses from 192.168.0.0 – 192.168.0.255 to be used by this system ◦ This is a possible point of concern when it comes to wireless communication  Next nfs server reset and ssh from rpi0->rpi1  Same thing done to rpi1  Then “$~sudo mount rpi0:/mirror” actually mounts the node  These steps repeated for all slave nodes 21/34
  • 22.
     SSH Keysgenerated using ◦ Ssh-keygen –t rsa ◦ A passphrase is recommended ◦ A bitmap of random characters was then generated as the key  Next key is copied to slave nodes using: ◦ Ssh-copy-id mpiu@rpi1  “keychain logic” added to file .bashrc 22/34 Photo courtesy visualgdb.com
  • 23.
     Log inas mpiu on master node using  Su – mpiu  Switch to the /mirror/code/ directory which holds the mpi test programs using “cd”  Mpicc calc_pi.c –o calc_pi //this line compiles the program  Time mpiexec –n 4 –H rpi0-3 calc_pi //this line executes the program on the master node and distributes it to the nodes via the mounts  The output is the solution and the time it took to execute 23/34
  • 24.
     Here youcan see the .c files and the executables in the directory  You see the execution of the program with mpiexec 24/34
  • 25.
     Initially wehad assumed the code wasn’t working correctly but this proved to be an incorrect diagnosis  The times we were seeing were not making much sense  We ran the MPI tests on wired and wireless and we found the processing times to be inconsistent 25/34
  • 26.
     This ledus to determine we had an issue with the mounts on the nodes  The main issue was that the nodes wouldn’t read the mirrored programs off the master  We are still currently in the processes of improving the design and graphically interpreting the data 26/34
  • 27.
     Wired vsWireless performance ◦ Test the processing performance of cluster when:  Hard wired to router  Using dongles for each node to communicate wirelessly  Use wireshark to observe packet latency between nodes  Computational benchmark tests ◦ Using benchmark software to observe total processing power across all pi’s ◦ Using complicated program as test material to solve with cluster  Graphical performance info  Implementation practical applications  Active Cooling onto the Pi’s ◦ Adding fans to final case design 27/34
  • 28.
  • 29.
    29/34 Part Price perItem 4Pi's Quantity Total (4) Link Micro SD's 3.28 6 19.68 http://www.newegg.com/Product/Product.a Micro USB's 4.69 4 18.76 http://www.amazon.com/AmazonBasics-Mic Ethernet cables 0.82 4 3.28 http://www.newegg.com/Product/Product.a Wifi Dongles 7.99 4 31.96 http://www.amazon.com/Kootek-Raspberry Router (4-8 ports) 33.99 1 33.99 http://www.newegg.com/Product/Product.a Raspberry Pi's 41.6 4 166.4 http://www.amazon.com/Raspberry-Pi-Mod Heat sinks 2.41 4 9.64 http://www.amazon.com/Cooling-Aluminium Dual Router 19.99 1 19.99 http://www.frys.com/product/8445718?site= Fans 3.95 2 7.9 http://www.tannerelectronics.com Makers Space 35 1 35 https://dallasmakerspace.org Power USB 29.99 1 29.99 http://www.bestbuy.com/site/insignia-7-po Total of All Parts 376.59
  • 30.
     Diagnosing themounting issue  Wireless and Wired communication working  Final equipment list acquired  Measuring and sketching layout of case structures for the laser cutter 30/34
  • 31.
     Compare wiredvs wireless performance ◦ Detailed documenting and graphing of test results  Continue to debugging and improving the system ◦ Finish debugging the mounting issue  Finish first prototype for final case design ◦ Measuring and cutting the structure of the case 31/34
  • 32.
     Wired andwireless connection is complete  Debugging nfs and mounting issues ◦ Continuously running performance tests  Final case design blueprint is complete 32/34
  • 33.
     http://www.python.org/doc/current/tut/tut.html  http://likemagicappears.com/projects/raspberry-pi-cluster/ http://www.zdnet.com/article/build-your-own-supercomputer- out-of-raspberry-pi-boards/  https://Youtu.be/R0Uglgcb5g  http://www.newegg.com/  http://www.amazon.com  http://anllyquinte.blogspot.com/  http://www.slideshare.net/calcpage2011/mpi4pypdf 33/34
  • 34.