This course focuses on the issues encountered in building Internet and Web systems, such as scalability, interoperability, consistency, replication, fault tolerance, and security. We will examine how services like Google or Amazon handle billions of requests from all over the world each day, (almost) without failing or becoming unreachable. We will study how to collect massive-scale data sets, how to process them, and how to extract useful information from them, and we will have a look at the massive, heavily distributed infrastructure that is used to run these services (and similar cloud-based services) today.
An important feature of the course is that we will not just discuss issues and solutions but also provide hands-on experience, using web search as our case study. There will be several substantial implementation projects throughout the semester, each of which will focus on a particular component of the search engine, such as frontend, storage, crawler, or indexer. The final project will be to build a Google-style search engine, and to deploy and run it on the cloud.
Notice that this is NOT a course on web design, or on web application development! Instead of learning how to use a web server such as Apache or a scalable analytics system such as Spark, we will actually build our own little web server, and a little mini-“Spark”, from scratch. As a side effect, you will learn about some aspects of large-scale software development, such as working with APIs and specifications, thinking about modularity, reading other people’s code, managing versions, and debugging.
Pre-Requisites
CIT 5950 Computer Systems Programming. Suggested: CIS 5470 Software Analysis, CIS 5490 Wireless Communications for Mobile Networks and Internet of Things, CIS 5510 Computer & Network Security, CIS 5530 Networked Systems, or CIT 5820 Blockchains & Cryptography (or any course that has students write a substantial program)