What kinds of articles generated by ChatGPT would be more similar to human beings ?
Flesch Score:
The Flesch score is an indicator of readability in written content. Our study found that, in general, AI-generated articles
had lower readability scores compared to human-generated ones. For blog posts and news articles, human-generated content had
a wider range of Flesch score distributions and more paragraphs than AI-generated articles. However, for language exam articles,
we found that Flesch score metrics were similar between AI- and human-generated articles, as humans are required to write under
specific formats and topics during language exams. Interestingly, we observed an unusual standard deviation of the blog post
Flesch scores for AI-generated articles, which could be due to the presence of AI-generated content in those blog posts.
BERT Embeddings:
We also used BERT embeddings to analyze the homogeneity of the first and last paragraphs across AI- and human-generated articles.
Our study found that AI-generated articles had a higher homogeneity in the first paragraph compared to human-generated articles.
Even with the most diverse writing prompt, news, AI-generated articles were still more homogeneous than human-generated ones.
In the last paragraph, we found that AI-generated articles were more similar to each other, while human-generated articles had
more variation. We identified two outliers in the human category, one from a blog post and one from a news article, which had
extremely different last paragraphs.
Limitations:
It is important to note that our study has some limitations, including a limited number of writing prompts and articles. Additionally,
we could conduct further investigation on the outliers identified in our analysis to gain deeper insights into the differences between
AI- and human-generated articles.