Geometric Foundations of Data Analysis I

Winter Semester 2024

There is no true interpretation of anything; interpretation is a vehicle in the service of human comprehension. The value of interpretation is in enabling others to fruitfully think about an idea.

Andreas Buja

An illustration of a highly complicated graph © Martin Grandjean (CC BY-SA 3.0)

At the core, the goal of data analysis is to make sense of data, which can come in the form of measurements, survey results, behavior patterns, etc. Often this data comes to us in a very ‘‘high dimension’’. That is, there are so many variables that it is impossible to visualize, and even in low dimensions, it may not be clear what kinds of conclusions one can reasonably make.

We will explore four key methods to data analysis in this module:

  1. Least Squares Fitting,
  2. Principal Component Analysis,
  3. Clustering and hierarchical clustering,
  4. Nearest Neighbors and the Johnson–Lindenstrauss Theorem.

We will also split our lectures into “theoretical” and “practical” components. The morning lectures are the theoretical parts, and the afternoon lectures are the practical parts. We will get our hands dirty using Python and standard data analysis packages like pandas.


Important Module Information: