top of page

Moral Machine Project using Postgres, Dbt, AWS S3 and Streamlit

  • Writer: Christoph Nguyen
    Christoph Nguyen
  • Jan 6
  • 3 min read

Updated: Feb 17

#artificial intelligence #ai ethics

December 2024 | Christopher Nguyen


(app may take a few minutes to wake up)


This project presents a data pipeline designed to align artificial intelligent (AI) systems with human values. The data used for this project comes from the Moral Machine Experiment, a platform for gathering human perspectives on moral decisions made by autonomous vehicles. Participants can visit the moral machine website to take the experiment, which provides 13 different ethical dilemma scenarios where an AV has to make a decision (e.g hitting a barrier, killing everyone in the car or hitting an elderly person).


Goal: Predict saved outcomes by an AV based on various moral dilemmas


Data source

The data can be accessed and downloaded from the OSF homepage as a .csv file. The specific data file used is the SharedResponseSurvey.csv, which contains responses from all participants who took the experiment and filled out the survey at the end.


OFS homepage: https://osf.io/3hvt2/


Tools:

  • Postgres for local data storage

  • Dbt to transform data with SQL

  • Python and SQL for data handling, feature engineering, and ml modeling

  • AWS S3 for cloud storage

  • Data visualization and deployment using streamlit, pandas and plotly


Data Migration

This project starts off with building the data engineering pipeline locally then adds an ml model to streamlit for model prediction and visualization


Transformation and data destinations

  1. Loaded raw moral machine experiment dataset into postgres pgAdmin locally

  2. Connected postgres database to dbt

  3. Transformed data in dbt staging to clean columns, remove illegal characters, NAs, and change data types

  4. Created marts to prepare dataset for ml classifications on AV survival rates (survival_predictions.sql)

  5. Trained ml models (trained_model.py) using the survival_prediction.sql table in marts and compared logistic regression, random forest classification, and XGboost classification with scikit-learn -- used logistic regression model as it had the highest accuracy = 0.7012 and F1 score = 0.69, caveat that the model could be improved upon with additional feature engineering

  6. Uploaded the logistic_regression.pkl model to AWS S3 to be added to streamlit for model prediction

  7. Decided to use streamlit cloud for public deployment for it simplicity, UI design, and cost effectiveness


The final product details predictions on saved attributes based on different scenarios along with data visualizations of saved by country.



Here we have a demo of the streamlit app predicting survival rate for a human (hoomans) with a pedestrian present, no crossing signal or barrier present, country USA, and political and religious reviews null. The results show the model predicted roughly a 75% survival rate in this type of scenario.




Future research would include data visualizations on the ethical implications of designing an AV to prioritize certain groups over others.


Pipeline Workflow

This ETL pipeline semi-automates the workflow for transforming raw data from the moral machines experiment into machine learning ready analysis. Aspects that can be considered automated is using DBT to generate new tables and training various machine learning models based on those tables. We can also easily reconfigure the dbt tables and python ml models for different research needs. Additionally, we added the ml model to AWS S3 to be used across streamlit. There is no orchestration added into the workflow and most of the data processing are manually triggered across the different tools. For these reasons, this is a semi-automated ETL workflow.



Get in Touch

If you would like to chat, please reach out!

bottom of page