Moral Machine Project using Postgres, Dbt, AWS S3 and Streamlit

#artificial intelligence #ai ethics

December 2024 | Christopher Nguyen

(app may take a few minutes to wake up)

This project presents a data pipeline designed to align artificial intelligent (AI) systems with human values. The data used for this project comes from the Moral Machine Experiment, a platform for gathering human perspectives on moral decisions made by autonomous vehicles. Participants can visit the moral machine website to take the experiment, which provides 13 different ethical dilemma scenarios where an AV has to make a decision (e.g hitting a barrier, killing everyone in the car or hitting an elderly person).

Goal: Predict saved outcomes by an AV based on various moral dilemmas

Data source

The data can be accessed and downloaded from the OSF homepage as a .csv file. The specific data file used is the SharedResponseSurvey.csv, which contains responses from all participants who took the experiment and filled out the survey at the end.

OFS homepage: https://osf.io/3hvt2/

Tools:

Postgres for local data storage
Dbt to transform data with SQL
Python and SQL for data handling, feature engineering, and ml modeling
AWS S3 for cloud storage
Data visualization and deployment using streamlit, pandas and plotly

Data Migration

This project starts off with building the data engineering pipeline locally then adds an ml model to streamlit for model prediction and visualization

Transformation and data destinations

Loaded raw moral machine experiment dataset into postgres pgAdmin locally
Connected postgres database to dbt
Transformed data in dbt staging to clean columns, remove illegal characters, NAs, and change data types
Created marts to prepare dataset for ml classifications on AV survival rates (survival_predictions.sql)
Trained ml models (trained_model.py) using the survival_prediction.sql table in marts and compared logistic regression, random forest classification, and XGboost classification with scikit-learn -- used logistic regression model as it had the highest accuracy = 0.7012 and F1 score = 0.69, caveat that the model could be improved upon with additional feature engineering
Uploaded the logistic_regression.pkl model to AWS S3 to be added to streamlit for model prediction
Decided to use streamlit cloud for public deployment for it simplicity, UI design, and cost effectiveness

The final product details predictions on saved attributes based on different scenarios along with data visualizations of saved by country.

The streamlit app link is found below: https://moral-machine-experiment-5txnfc3lv8qg8ruwc4npy3.streamlit.app/

Here we have a demo of the streamlit app predicting survival rate for a human (hoomans) with a pedestrian present, no crossing signal or barrier present, country USA, and political and religious reviews null. The results show the model predicted roughly a 75% survival rate in this type of scenario.

Future research would include data visualizations on the ethical implications of designing an AV to prioritize certain groups over others.

Pipeline Workflow

This ETL pipeline semi-automates the workflow for transforming raw data from the moral machines experiment into machine learning ready analysis. Aspects that can be considered automated is using DBT to generate new tables and training various machine learning models based on those tables. We can also easily reconfigure the dbt tables and python ml models for different research needs. Additionally, we added the ml model to AWS S3 to be used across streamlit. There is no orchestration added into the workflow and most of the data processing are manually triggered across the different tools. For these reasons, this is a semi-automated ETL workflow.