Geometric Foundations of Data Analysis I

There is no true interpretation of anything; interpretation is a vehicle in the service of human comprehension. The value of interpretation is in enabling others to fruitfully think about an idea.

Andreas Buja

© Martin Grandjean (CC BY-SA 3.0)

At the core, the goal of data analysis is to make sense of data, which can come in the form of measurements, survey results, behaviour patterns, etc. Often this data comes to us in a very ‘‘high dimension’’. That is, there are so many variables that it is impossible to visualize, and even in low dimensions, it may not be clear what kinds of conclusions one can reasonably make.

We will explore four key methods to data analysis in this module:

  1. Least Squares Fitting,
  2. Principal Component Analysis,
  3. Clustering and hierarchical clustering,
  4. Nearest Neighbours and the Johnson–Lindenstrauss Theorem.

We will also split our lectures into “theoretical” and “practical” components. The morning lectures are the theoretical parts, and the afternoon lectures are the practical parts. We will get our hands dirty using Python and standard data analysis packages like pandas.


Important Module Information: