Knowledge Guided Representation Learning and Causal Structure Learning

Conference paper

Sharma, S., Sharma, S., Liu L., Tushir, R., Neal, A. L., Ness, R., Kumar, V., Crawford, J. W., Kiciman, E. and Chandra, R. 2023. Knowledge Guided Representation Learning and Causal Structure Learning. 29th Association for Computing Machinery SIGKDD conference on knowledge discovery and data mining. Long Beach, CA 06 Aug 2023 Association for Computing Machinery SIGKDD.

AuthorsSharma, S., Sharma, S., Liu L., Tushir, R., Neal, A. L., Ness, R., Kumar, V., Crawford, J. W., Kiciman, E. and Chandra, R.
TypeConference paper
Abstract

In the physical sciences, process-based models are used extensively to study the behavior of natural or engineered systems. The simulations can help solve downstream tasks and in real-life decision making, especially when observational data is sparse. However, such simulators are approximations of reality, introducing bias in generated data and limiting their generalization in out-of-distribution use cases. In this paper we propose a framework, Knowledge-Guided Representation learning and Causal Structure Learning (KGRCL), that enables causal structure learning from both simulated and observed data representing the same underlying real world process. Simulated data can often be used to complement observed data, providing unobserved or difficult to observe system characteristics. We propose to use conditional distribution matching to obtain better
representations of simulated data. We present a case study on soil organic carbon modeling using data from three farms located in the UK and USA. First, we evaluate the proposed causal structure learning approach. Experiments show that KGRCL outperforms other popular causal discovery methods. Next, we highlight that downstream tasks are improved by using the learned causal graph. We show empirically that soil organic carbon prediction accuracy exceeds that of other ML methods in out-of-distribution scenarios.

KeywordsCausal discovery; Distribution matching; Process models; Simulators; Simulator calibration; Out-of-distribution prediction; Soil organic carbon prediction; Soil health; Robustness
Year of Publication2023
Conference title29th Association for Computing Machinery SIGKDD conference on knowledge discovery and data mining
Conference locationLong Beach, CA
Event dateAug 2023
Web address (URL)https://kdd.org/kdd2023/
Open accessPublished as ‘gold’ (paid) open access
PublisherAssociation for Computing Machinery SIGKDD
Output statusSubmitted

Permalink - https://repository.rothamsted.ac.uk/item/98v47/knowledge-guided-representation-learning-and-causal-structure-learning

Restricted files

Accepted author manuscript

Under embargo indefinitely

59 total views
1 total downloads
42 views this month
0 downloads this month