Jan, 2025 | Christopher Nguyen

Contents

Introduction
Data Collection
Data Wrangling and Cleaning
Zero-Shot and Sentiment Analysis
Data Visualization
Conclusion

Introduction

This research investigates whether YouTube comments provide valuable insights into user needs, with the purpose of improving the user experience of AI systems such as ChatGPT. We use language learning as a practical example to test hypothesized use cases for GPT. We then use pre-trained NLP models to classify the language learning categories and their overall sentiment. The goal of the research seeks to improve research methods for evaluating human-AI interactions and the implications of using YouTube as a data source for AI research.

Research Questions:

How are users using ChatGPT to learn new languages
What is the sentiment on using ChatGPT for language learning
What are the implications of using YouTube for human-AI development

YouTube API Setup

YouTube API Endpoints:

Data was requested from YouTube search, videos, and commentThreads endpoints.

The Youtube API endpoints:

Endpoints:

'https://www.googleapis.com/youtube/v3/search '

'https://www.googleapis.com/youtube/v3/videos '

'https://www.googleapis.com/youtube/v3/commentThreads '

We then set parameters to search for YouTube videos with keywords "ChatGPT language learning" using the search endpoint. And also used english as the main language, and 'relevance' to ensure the videos align with the search query.

Request Video Info and Statistics

We then pulled video content such as title, description, published info and added it to a dataframe for later use. Video statistics were also requested but we ended up not using the video data since we did not need video level statistics and were more interested in video title and comment level statistics.

      
  View Count Like Count Comment Count  
0     765290      39252           954  
1      65972       2979           163  
2     376024      12085           398  
3      28942       1398            25  
4      40323       1584           162

Collect comments from YouTube videos

The CommentThreads endpoint was used to request all comments from the relevant YouTube video list. YouTube comment parameters that were set included video id, textformat plain, and 'relevance'. We use 'relevance' over 'time' because we not only wanted relevant comments but to understand what people are interested in. Additionally, we pulled comment level statistics such as comment like count.


      Comment_like_count  
0                   1413  
1                   2758  
2                     17  
3                    293  
4                    656  
...                  ...  
9049                   1  
9050                   0  
9051                   0  
9052                   0  
9053                   0  

[9054 rows x 5 columns]

We re-ran the API request and received a lower comment count than previous requests. For this reason, we use a previous requested dataset with more comments yt_comments_rel.csv which contains 9,322 comments.

Data Wrangling and Cleaning

Most of the data cleaning was removing NAs, changing data to strings and remove invalid characters. Based on our cleaned YouTube comment dataset we have 9,317 comments for the analysis.

Machine Learning Workflow

The ML workflow are described into three steps using zero-shot and sentiment analysis from hugging face pre-trained models.

Step 1: Use Zero Shot to classify comments (generate columns: comments, classification, scores, comment like count)

Step 2: Extract top scores to include classifications above x > .50 (generate columns: comments, classification, scores (> .50), comment like count)

Step 3: Use sentiment analysis to uncover which categories have positive, negative, and neutral comments

Zero-Shot Classification

Zero-Shot Classification was used with facebook/bart-large-mnli model to classify comments by conversation practice', 'pronunciation assistance', 'vocabulary', 'language translation', 'writing', 'cultural understanding', 'dialects', and 'video content feedback'. Categories were chosen based on microsoft article on using ChatGPT for learning a foreign language, such mentions were vocabulary, language translation and pronunciation.

https://www.microsoft.com/en-us/microsoft-365-life-hacks/writing/using-chatgpt-for-foreign-language-learning#:~:text=If%20you're%20learning%20a,prompts%20for%20you%20to%20answer.


                   Top Label  Top Score  
0                    writing   0.600024  
1       language translation   0.751136  
2      conversation practice   0.759139  
4                    writing   0.679662  
5       language translation   0.664833  
...                      ...        ...  
9281    language translation   0.513560  
9288  cultural understanding   0.506968  
9292    language translation   0.611566  
9303                dialects   0.510719  
9315                dialects   0.542964  

[2048 rows x 3 columns]

Besides from video content feedback, zero-shot classification revealed that language translation, conversation practice and writing were the most common comments for using ChatGPT for language learning. This is with filtering categories with zero-shot confidence scores above 50% from a sample of 2,048 and 1,293 when removing video content feedback from the sample. Surprisingly, these findings suggest that conversation practice is among the top use case, while being above writing.

Sentiment Analysis

Sentiment analysis was used with hugging face pre-trained model cardiffnlp/twitter-roberta-base-sentiment-latest to classify which categories were positive, neutral, and negative.

Sentiment Summary (Counts):
Sentiment Label           negative  neutral  positive
Top Label                                            
conversation practice           60      154       177
cultural understanding           4       30        42
dialects                         9       15         5
language translation           125      240       111
pronunciation assistance        29       44        39
vocabulary                      11       25        26
writing                         41       58        48

Sentiment Summary (Percentages):
Sentiment Label            negative    neutral   positive
Top Label                                                
conversation practice     15.345269  39.386189  45.268542
cultural understanding     5.263158  39.473684  55.263158
dialects                  31.034483  51.724138  17.241379
language translation      26.260504  50.420168  23.319328
pronunciation assistance  25.892857  39.285714  34.821429
vocabulary                17.741935  40.322581  41.935484
writing                   27.891156  39.455782  32.653061

Data Visualization

Our sentiment analysis suggest users have a general positive sentiment for using ChatGPT for learning a language across language translation, pronunciation, writing, dialects, cultural understanding, and vocabulary.

Additionally, when it came to category specific use cases, conversation practice, vocabulary, and cultural understanding mostly most positive sentiment, while language translation, writing, pronunication assistance, and dialect had mixed sentiment. Also, keep in mind that cultural understanding, vocabulary, and dialects had very small sample sizes.

Conclusion

Based on the zero-shot classification, language translation, conversation practice and writing had the greatest proportion of user comments. In addition to these results, the sentiment analysis further revealed an overall positive sentiment towards using ChatGPT for language learning. Specificaly, conversation practice received the most positive sentiment along with being one of the most frequent categories, while language translation and writing had mixed sentiment. Although, this research demonstrated promise as an exploratory research tool, implications of using YouTube comments as a data source for understanding human-AI interaction present advantages and challenges.

Implications of using YouTube data

Pros/Cons of 'relevance' parameter: Using YouTube API parameters 'relevance' comments presents the risk of having comments biased towards more positive sentiment. For example, YouTube algorithm may recommend positive comments over negative ones to promote positivity on their platform. While, one benefit of using relevance is being able to identify user interests based on the knowledge sharing nature of YouTube comments.

Indirect sampling: Another issue is the sparse sample sizes, seen by an imbalance of comment categories (high proportion of video content feedback). Video content feedback was expected to be high in frequency, as users tend to comment specifically about the Youtube creator.

Zero-shot categories: Moreover, selecting language learning categories for zero-shot was a challenge. For example, there is overlap between different use cases (vocabulary can be a subset of writing etc.).

Overall, using YouTube data along with NLP pre-trianed models for uncovering patterns in human-AI interaction presents itself as a foundational tool for quantifying categories around user interests. Future research would need to include more direct sampling (i.e analyzing actual user interactions with ChatGPT, balanced between positive and negative interactions, and using more advanced ML techniques for extracting keywords from text data).