
CONCLUSION
The discussion of misinformation, as well as the framing around regulation has disclosed the nuances around the manner in which digital content can shape public perception. Reviewing articles across multiple outlets and subjects, it quickly became apparent that the construction of information, its wording, and its emotional framing are significant part of how information is received, and how widely it spreads. This project approached the topic through a combination of classification and exploration methods that has helped pull out trends within the data.
From this analysis, one of the noisy observations was that misinformation tends to be punctuated with emotional language (positive or negative) or polarizing language. On the other hand, articles, or groups of articles, that express opposition to regulation are often punctuated by more assertive and repetitive phrasing, which can be more easily identified in a pattern search. While articles that express a neutral, or balanced approach tend to be more linguistically and structurally bland in comparison, and thus more difficult to isolate. The variance in language framing creates a good reference point for understanding the undercurrents of influence embedded in text.​


Using a variety of different methods to group and interpret the materials—including actual clustering and supervised sentiment classification—the project was able to identify both first and second-level signals. The findings suggested that it was possible to identify recurring themes in a cluster of articles, such as those associated with: control of technology; referring to public policy; referring to digital rights. Specifically there was a recognisable structure across the dataset, suggesting these themes were not randomly generated, but representations of predictable lines of inquiry demonstrating how the features of a theme are reinforced through repetition and framing.The inclusion of the neural networks added further complexity to the analysis. The system, without attaching preconceived rules about what sentiment was, was able to find out what different sentiments look like expressed in text. The solar trees varied between categories—some were more accurate than others—but the results were a sign that there are cues in natural language, however small, that can be effectively used to evaluate a text’s tone or position on an issue.
Automated procedures were also consistently well-suited for finding content that has an obvious opinion, which suggests exciting potential and large scale in the prospects of classification.​The importance of visualization again played a central role in this project as well. When the findings were visualized as topic clusters, confusion matrices, and sentiment heatmaps, the analysis became much clearer and could be communicated more easily. When visualizations were included, the analysis turned into more definitive observations that invited pattern observations that may otherwise not be evident in raw textual formats. The visual layer was an important aspect of adding accessibility to the data and enhancing the overall effect of the analysis. In summary, this project demonstrated how combining structured data preparation, technology and new analysis tools can bring forth rich insights into the narratives that exist in digital spaces your media engagement materials may exist within today. From the tone of individual articles to the broader patterns that arise out of a dataset, the findings would underscore the significance of text analytics on the ever-evolving media landscape we find ourselves in. In an environment dominated by digital misinformation and disinformation, this project represents a starting point for better--more transparent, scalable, and proactive--readings of the online flow of information.
Github Repo (Code and Data): https://github.com/saketh-saridena/TextMining_Project