CIS 545 Big Data Analytics

Short Description

This course focuses on the fundamentals of scaling computation to handle common data analytics tasks. You will learn about basic tasks in collecting, wrangling, and structuring data; programming models for performing certain kinds of computation in a scalable way across many compute nodes; common approaches to converting algorithms to such programming models; standard toolkits for data analysis consisting of a wide variety of primitives; and popular distributed frameworks for analytics tasks such as filtering, graph analysis, clustering, and classification.

Portfolio Building Course

No

Pre-Requisites

CIT 591 (Introduction to Software Development) or equivalent programming experience; Broad familiarity with probability and statistics, as well as programming in Python; Additional background in statistics, data analysis (e.g., in Matlab or R), and machine learning is helpful (example: ESE 542)

Content

In the new era of big data, we are increasingly faced with the challenges of processing vast volumes of data. Given the limits of individual machines (compute power, memory, bandwidth), increasingly the solution is to process the data in parallel on many machines. This course focuses on the fundamentals of scaling computation to handle common data analytics tasks. You will learn about basic tasks in collecting, wrangling, and structuring data; programming models for performing certain kinds of computation in a scalable way across many compute nodes; common approaches to converting algorithms to such programming models; standard toolkits for data analysis consisting of a wide variety of primitives; and popular distributed frameworks for analytics tasks such as filtering, graph analysis, clustering, and classification.

Course Offerings
  • Fall 2021 Zachary G. Ives
Course Creators
  • Zachary G. Ives