Causal Representation Learning aims to disentangle latent generative factors that give rise to high dimensional observations by acting in the world and seeing changes in the observations very similar to humans learning world models. An example in robotics is when the robot's state is physical but you observe an image of the robot from a specific camera view. In general, we observe only the high dimensional transformation (as images etc.) of the true causal variable that matter for a downstream task. The central problem in causal representation learning is to invert the unknown transformation between true causal variables and the observations up to coordinate-wise scaling and permutation. ICA literature dealt with this problem when variables are independent or conditionally independent. The goal here is to generalize to causally interacting variables using interventional datasets. We show that this is possible with enough interventional diversity by exploiting two key ideas: a) Represent interventional distributions in terms of their scores (gradient of likelihoods). b) The encoder-decoder pair that minimizes reconstruction loss and sparsifies the score difference in the latent space is the optimal pair. We show various versions of these results for linear and general transforms, assuming mild regularity on the diversity of interventions. We also will discuss recent empirical results on scaling this up into a learning algorithm for robot pose estimation.Joint work with Burak Varici (CMU), Emre Acarturk (RPI), Abhishek Kumar (Amazon, ex-GDM), Ali Tajer (RPI). Talk is based on papers (https://arxiv.org/abs/2510.20884, https://arxiv.org/abs/2402.00849)