Operating Systems for Parallel and Distributed Architectures

Homework #2

Test an MPI program of your choice and benchmark it on your cluster using a different number of compute nodes. Run it both in manual mode and through the job scheduler available on your Rocks cluster instance. Deadline: Week 14 (January 13th).


Dear All, some announcements from my side (Nov., 23rd):

  • Homework Deadline Extension:
    Since installing Rocks version 7 has proven to be more challenging, the deadline for the first homework assignment has been extended to Week 11 (December 9th). I recommend completing the homework using Rocks version 6 and Oracle VirtualBox version 5 (my working cluster with Rocks version 6 runs on Oracle VirtualBox 5.2.44). Please note that other virtualization software might work as well. For those determined enough to stick with Rocks version 7, extra points will be awarded for a fully functioning virtualized cluster 😊.
  • Exam Dates:
    After consulting with some of you, I have filled-in the exam dates in Academic Info. The dates are:

    • Regular session: February 5, 2025, at 17:00, Room C310 (Campus);
    • Retake session: February 19, 2025, at 17:00, Room C310 (Campus);
    • Additionally, there will be an extra date for the regular session that I will schedule.
  • Clustering Presentation:
    Regarding the clustering-related presentation you need to prepare:

    • Please select an available time slot from this file. (You cannot edit the file yourself.)
    • Send me an email with your chosen time slot and the title of your presentation. Any clustering-related topic is acceptable, but I need to approve it.
    • Presentations with “catchier” topics and engaging delivery will be graded higher.
    • Available time slots are in Week 12, Week 14, and on one Monday during the regular session at 17:00 when your group has no other exams scheduled. If you identify such a date in your schedule, please email me so I can coordinate accordingly.

Homework #1

Homework 1: Install a virtualized cluster containing a head node and at least two compute nodes in a virtual environment such as Oracle Virtual Box. The virtual cluster should run ROCKS Cluster Distribution 6.2 or 7.0. The number of compute nodes will depend on the available memory of your physical host. Deadline: Week 9 (November 25, 2024).

Information regarding the discipline

Name of the discipline: Operating Systems for Parallel and Distributed Architectures
Course coordinator: Assoc. prof. Darius Bufnea, darius.bufnea at ubbcluj punct ro

Prerequisites

Curriculum: Operating Systems, Distributed Operating Systems, Computer Networks
Competencies: Average administration and programming skills

Objectives of the discipline

General objective of the discipline: Know the key concepts of parallel cluster architectures
Specific objective of the discipline: At the end of the course, students will know how to build, deploy, configure, maintain, monitor, debug a Linux parallel cluster.

Content

  1. Introduction to Operating systems for parallel architectures
  2. Parallel Cluster architecture: Cluster Head Nodes, Computer Nodes, Clustering Middleware
  3. Parallel Cluster Paradigms: Single system image, Centralized system management, High processing capacity, Resource consolidation, Optimal use of resources, High-availability, Redundancy, Single points of failure, Failover protection and disaster recovery, Horizontal and vertical scalability, Load-balancing, Elasticity, Run jobs anytime, anywhere
  4. Design and configuration. Network prerequisites for a parallel cluster: LAN, bandwidth, latency, interface, security aspects. Nodes automatic configuration and deployment
  5. Virtualization of hardware, operating system, storage devices, computer network resources
  6. Beowulf clusters deployment and administrations
  7. Linux Cluster Distributions: Mosix, ClusterKnoppix. Automated operating systems and software provisioning for a Linux Cluster: Open Source Cluster Application Resources (OSCAR)
  8. Cluster resources: distributed memory architecture and distributed shared memory, distributed file systems (examples: IBM General Parallel File System, Microsoft’s Cluster Shared Volumes, Oracle Cluster File System
  9. Nodes and head node management, Cluster system management, Debugging and monitoring a parallel cluster, Node failure management
  10. Data sharing and communication, Message passing and communication, Parallel processing libraries: Parallel Virtual Machine toolkit and the Message Passing Interface library
  11. Software and development environment, Parallel application development and execution (Parallel Environment – PE), Job scheduling & management

Bibliography

  1. Gregory Pfister: In Search of Clusters, Prentice Hall; 2nd edition (December 22, 1997), ISBN-10: 0138997098, ISBN-13: 978-0138997090;
  2. George F. Coulouris, Jean Dollimore, Tim Kindberg: Distributed Systems: Concepts and Design, Addison-Wesley; 5th edition (May 7, 2011), ISBN-10: 0132143011, ISBN-13: 978-0132143011;
  3. Joseph D. Sloan: High Performance Linux Clusters with OSCAR, Rocks, OpenMosix, and MPI, O’Reilly Media (November 23, 2004), ISBN-10: 0596005709, ISBN-13: 978-0596005702;
  4. Daniel F. Savarese, Donald J. Becker, John Salmon, Thomas Sterling: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters, The MIT Press (May 28, 1999), ISBN-10: 026269218X, ISBN-13: 978-0262692182;
  5. Gordon Bell, Thomas Sterling: Beowulf Cluster Computing with Linux, The MIT Press; 1st edition (October 1, 2001), ISBN-10: 0262692740, ISBN-13: 978-0262692748;
  6. Charles Bookman: Linux Clustering: Building and Maintaining Linux Clusters, Sams Publishing; 1st edition (June 29, 2002), ISBN-10: 1578702747, ISBN-13: 978-1578702749.

Evaluation

Type of activity Evaluation criteria Evaluation methods Share in the grade (%)
Course Know the key theoretical concepts of parallel cluster architectures Written exam 30%
Seminar/lab activities Know how to deploy, maintain, debug and monitor a parallel cluster Homework assignments 30%
Presentation on clustering related topics 30%
Default 10%