Skip to main content

Emoji Retrieval from Gibberish or Garbled Social Media Text: A Novel Methodology and a Case Study

  • Conference paper
  • First Online:
HCI International 2024 – Late Breaking Papers (HCII 2024)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 15375))

Included in the following conference series:

  • 420 Accesses

Abstract

Emojis, considered an integral aspect of social media conversations, are widely used on almost all social media platforms. However, social media data may be noisy and may also include gibberish or garbled text which is difficult to detect and work with. Most naïve data preprocessing approaches recommend removing such gibberish or garbled text from social media posts before performing any form of data analysis or before passing such data to any machine learning model. However, it is important to note that such gibberish or garbled text may have been an emoji(s) in the original social media post(s) and failure to retrieve the actual emoji(s) may result in the loss or lack of contextual meaning of the analyzed social media data. The work presented in this paper aims to address this challenge by proposing a three-step reverse engineering-based novel methodology for retrieving emojis from garbled or gibberish text in social media posts. The development of this methodology also helped to unravel the reasons that could lead to the generation of gibberish or garbled text related to data mining of social media posts. To evaluate the effectiveness of the proposed methodology, the model was applied to a dataset of 509,248 Tweets about the Mpox outbreak, that has been used in about 30 prior works in this field, none of which were able to retrieve the emojis in the original Tweets from the gibberish text present in this dataset. Using our methodology, we were able to retrieve a total of 157,748 emojis present in 76,914 Tweets in this dataset by processing the gibberish or garbled text. The effectiveness of this methodology has been discussed in the paper through the presentation of multiple metrics related to text readability and text coherence which include the Flesch Reading Ease, Flesch Kincaid Grade Score, Coleman Liau index, Automated Readability Index, Dale Chall Readability Score, Text Standard, and Reading Time for the Tweets before and after the application of the methodology to the Tweets. The results showed that the application of this methodology to the Tweets improved the readability and coherence scores. Finally, as a case study, the frequency of emoji usage in these Tweets about the Mpox outbreak was analyzed and the results are presented.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
€34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
EUR 29.95
Price includes VAT (Germany)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
EUR 60.98
Price includes VAT (Germany)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
EUR 79.17
Price includes VAT (Germany)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

') var buybox = document.querySelector("[data-id=id_"+ timestamp +"]").parentNode var buyingOptions = buybox.querySelectorAll(".buying-option") ;[].slice.call(buyingOptions).forEach(initCollapsibles) var buyboxMaxSingleColumnWidth = 480 function initCollapsibles(subscription, index) { var toggle = subscription.querySelector(".buying-option-price") subscription.classList.remove("expanded") var form = subscription.querySelector(".buying-option-form") var priceInfo = subscription.querySelector(".price-info") var buyingOption = toggle.parentElement if (toggle && form && priceInfo) { toggle.setAttribute("role", "button") toggle.setAttribute("tabindex", "0") toggle.addEventListener("click", function (event) { var expandedBuyingOptions = buybox.querySelectorAll(".buying-option.expanded") var buyboxWidth = buybox.offsetWidth ;[].slice.call(expandedBuyingOptions).forEach(function(option) { if (buyboxWidth buyboxMaxSingleColumnWidth) { toggle.click() } else { if (index === 0) { toggle.click() } else { toggle.setAttribute("aria-expanded", "false") form.hidden = "hidden" priceInfo.hidden = "hidden" } } }) } initialStateOpen() if (window.buyboxInitialised) return window.buyboxInitialised = true initKeyControls() })()

Institutional subscriptions

Similar content being viewed by others

References

  1. Aichner, T., Grünfelder, M., Maurer, O., Jegeni, D.: Twenty-five years of social media: a review of social media applications and definitions from 1994 to 2019. Cyberpsychol. Behav. Soc. Netw. 24, 215–222 (2021). https://doi.org/10.1089/cyber.2020.0134

    Article  Google Scholar 

  2. Global daily social media usage 2024. https://www.statista.com/statistics/433871/daily-social-media-usage-worldwide/. Accessed 30 Mar 2024

  3. Belle Wong, J.D.: Top social media statistics and trends of 2024. https://www.forbes.com/advisor/business/social-media-statistics/. Accessed 30 Mar 2024

  4. Number of worldwide social network users 2027. https://www.statista.com/statistics/278414/number-of-worldwide-social-network-users/. Accessed 30 Mar 2024

  5. Rodríguez-Ibánez, M., Casánez-Ventura, A., Castejón-Mateos, F., Cuenca-Jiménez, P.-M.: A review on sentiment analysis from social media platforms. Expert Syst. Appl. 223, 119862 (2023). https://doi.org/10.1016/j.eswa.2023.119862

    Article  Google Scholar 

  6. Dhiman, D.B.: Ethical issues and challenges in social media: A current scenario. SSRN Electron. J. (2023). https://doi.org/10.2139/ssrn.4406610

    Article  Google Scholar 

  7. Thakur, N., Han, C.: An exploratory study of tweets about the SARS-CoV-2 Omicron variant: insights from sentiment analysis, language interpretation, source tracking, type classification, and embedded URL detection. COVID. 2, 1026–1049 (2022). https://doi.org/10.3390/covid2080076

    Article  Google Scholar 

  8. Thakur, N.: A large-scale dataset of Twitter chatter about online learning during the current COVID-19 Omicron wave. Data (Basel) 7, 109 (2022). https://doi.org/10.3390/data7080109

  9. Ge, J., Gretzel, U.: Emoji rhetoric: a social media influencer perspective. J. Mark. Manag. 34, 1272–1295 (2018). https://doi.org/10.1080/0267257x.2018.1483960

    Article  Google Scholar 

  10. World Emoji Day statistics —. https://worldemojiday.com/statistics. Accessed 30 Mar 2024

  11. Smileys, People: Emoji statistics. https://emojipedia.org/stats. Accessed 30 Mar 2024

  12. Tang, J., Chang, Y., Liu, H.: Mining social media with social theories: a survey. https://www.cse.msu.edu/~tangjili/publication/Tang-Chang-Liu.pdf. Accessed 30 Mar 2024

  13. Agarwal, N., Yiliyasi, Y.: Information quality challenges in social media. In: MIT International Conference on Information Quality (2010)

    Google Scholar 

  14. Social Data Mining for Crime Intelligence: Contributions to Social Data Quality Assessment and Prediction Methods. https://bradscholars.brad.ac.uk/handle/10454/16066. Accessed 30 Mar 2024

  15. Date, D.#: P., Sg-, P.L.C., Reply-to:, S.-22, Jabot, C.: Correct UTF-8 handling during phase 1 of translation. https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2021/p2295r0.pdf. Accessed 03 May 2024

  16. Mohapatra, R.K., et al.: Transmission dynamics, complications and mitigation strategies of the current mpox outbreak: a comprehensive review with bibliometric study. Rev. Med. Virol. 34 (2024). https://doi.org/10.1002/rmv.2541

  17. Cuetos-Suárez, D., Gan, R.K., Cuetos-Suárez, D., Arcos González, P., Castro-Delgado, R.: A review of mpox outbreak and public health response in Spain. Risk Manag. Healthc. Policy. 17, 297–310 (2024). https://doi.org/10.2147/rmhp.s440035

    Article  Google Scholar 

  18. Masirika, L.M., et al.: Ongoing mpox outbreak in Kamituga, South Kivu province, associated with monkeypox virus of a novel Clade I sub-lineage, Democratic Republic of the Congo, 2024. Euro Surveill. 29 (2024). https://doi.org/10.2807/1560-7917.es.2024.29.11.2400106

  19. Multi-country outbreak of mpox, External situation report#33, 31 May 2024. https://www.who.int/publications/m/item/multi-country-outbreak-of-mpox--external-situation-report-33--31-may-2024. Accessed 07 Jun 2024

  20. Chouhan, A., Nanda, D., Jain, J., Pattni, K., Kurup, L.: Emotion prediction of comments in Twitch.Tv livestream environment. In: Fong, S., Dey, N., Joshi, A. (eds.) ICT Analysis and Applications. Lecture Notes in Networks and Systems, vol. 517. Springer, Singapore (2023). https://doi.org/10.1007/978-981-19-5224-1_40

  21. https://ijadst.com/ajradmin/certificates/138/IJADST_20210438.pdf. Accessed 04 May 2024

  22. https://www.researchgate.net/profile/Muhammad-Nusrat-2/publication/373649914_Emoji_Prediction_in_Tweets_using_BERT/links/64f5ea6348c07f3da3d86513/Emoji-Prediction-in-Tweets-using-BERT.pdf. Accessed 04 May 2024

  23. Kone, V.S., Anagal, A.M., Anegundi, S., Jadekar, P., Patil, P.: Emoji prediction using bi-directional LSTM. ITM Web Conf. 53, 02004 (2023). https://doi.org/10.1051/itmconf/20235302004

    Article  Google Scholar 

  24. Ranjan, R., Yadav, P.: Emoji prediction using LSTM and Naive Bayes. In: TENCON 2021 - 2021 IEEE Region 10 Conference (TENCON). IEEE (2021)

    Google Scholar 

  25. Stoikos, S., Izbicki, M.: Multilingual emoticon prediction of tweets about COVID-19. In: Nissim, M., Patti, V., Plank, B., Durmus, E. (eds.) Proceedings of the Third Workshop on Computational Modeling of People’s Opinions, Personality, and Emotion’s in Social Media, pp. 109–118. Association for Computational Linguistics, Barcelona, Spain (Online) (2020)

    Google Scholar 

  26. Inan, E.: An active learning based emoji prediction method in Turkish. Int. J. Intell. Syst. Appl. Eng. 8, 1–5 (2020). https://doi.org/10.18201/ijisae.2020158882

  27. Kumar, S., Harichandana, B.S.S., Arora, H.: VoiceMoji: a novel on-device pipeline for seamless emoji insertion in dictation. In: 2021 IEEE 18th India Council International Conference (INDICON). IEEE (2021)

    Google Scholar 

  28. Barbieri, F., Ronzano, F., Saggion, H.: What does this emoji mean? a vector space skip-gram model for twitter emojis (2016)

    Google Scholar 

  29. Gupta, A., et al.: Context-aware emoji prediction using deep learning. In: Dev, A., Agrawal, S.S., Sharma, A. (eds.) Artificial Intelligence and Speech Technology. AIST 2021. Communications in Computer and Information Science, vol. 1546. Springer, Cham (2022). https://doi.org/10.1007/978-3-030-95711-7_22

  30. Shobana, J., Amudha, S., Kumar, S.: Emoji anticipation and prediction using deep neural network model. In: 2022 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS). IEEE (2022)

    Google Scholar 

  31. Barbieri, F., Ballesteros, M., Saggion, H.: Are emojis predictable? In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, vol. 2, Short Papers. Association for Computational Linguistics, Stroudsburg, PA, USA (2017)

    Google Scholar 

  32. Zhao, S., et al.: PEDM: A multi-task learning model for persona-aware Emoji-embedded dialogue generation. ACM Trans. Multimed. Comput. Commun. Appl. 19, 1–21 (2023). https://doi.org/10.1145/3571819

    Article  Google Scholar 

  33. Sv, P., Ittamalla, R.: What concerns the general public the most about monkeypox virus? – a text analytics study based on Natural Language Processing (NLP). Travel Med. Infect. Dis. 49, 102404 (2022). https://doi.org/10.1016/j.tmaid.2022.102404

    Article  Google Scholar 

  34. Ng, Q.X., Yau, C.E., Lim, Y.L., Wong, L.K.T., Liew, T.M.: Public sentiment on the global outbreak of monkeypox: an unsupervised machine learning analysis of 352,182 twitter posts. Publ. Health 213, 1–4 (2022). https://doi.org/10.1016/j.puhe.2022.09.008

    Article  Google Scholar 

  35. Cooper, L.N., et al.: Analyzing an emerging pandemic on Twitter: Monkeypox. Open Forum Infect. Dis. 10 (2023). https://doi.org/10.1093/ofid/ofad142

  36. Iparraguirre-Villanueva, O., et al.: The public health contribution of sentiment analysis of Monkeypox tweets to detect polarities using the CNN-LSTM model. Vaccines (Basel) 11, 312 (2023). https://doi.org/10.3390/vaccines11020312

  37. Dsouza, V.S., et al.: A sentiment and content analysis of tweets on monkeypox stigma among the LGBTQ+ community: a cue to risk communication plan. Dialogues Health. 2, 100095 (2023). https://doi.org/10.1016/j.dialog.2022.100095

    Article  Google Scholar 

  38. Zuhanda, M.K., Syofra, A.H.S., Mathelinea, D., Gio, P.U., Anisa, Y.A., Novita, N.: Analysis of twitter user sentiment on the monkeypox virus issue using the NRC lexicon. Mantik 6, 3854–3860 (2023). https://doi.org/10.35335/mantik.v6i4.3502

  39. Knudsen, B., Høeg, T.B., Prasad, V.: Analysis of tweets discussing the risk of Mpox among children and young people in school (May–October 2022): a retrospective observational study. BMJ Paediatr. Open. 8, e002236 (2024). https://doi.org/10.1136/bmjpo-2023-002236

    Article  Google Scholar 

  40. Bengesi, S., Oladunni, T., Olusegun, R., Audu, H.: A machine learning-sentiment analysis on Monkeypox outbreak: an extensive dataset to show the polarity of public opinion from twitter tweets. IEEE Access. 11, 11811–11826 (2023). https://doi.org/10.1109/access.2023.3242290

    Article  Google Scholar 

  41. Farahat, R.A., Yassin, M.A., Al-Tawfiq, J.A., Bejan, C.A., Abdelazeem, B.: Public perspectives of monkeypox in Twitter: A social media analysis using machine learning. New Microbes New Infect. 49–50, 101053 (2022). https://doi.org/10.1016/j.nmni.2022.101053

    Article  Google Scholar 

  42. Chen, Y., Yuan, J., You, Q., Luo, J.: Twitter sentiment analysis via bi-sense emoji embedding and attention-based LSTM. In: Proceedings of the 26th ACM International Conference on Multimedia. ACM, New York (2018)

    Google Scholar 

  43. Lou, Y., Zhang, Y., Li, F., Qian, T., Ji, D.: Emoji-based sentiment analysis using attention networks. ACM Trans. Asian Low-resour. Lang. Inf. Process. 19, 1–13 (2020). https://doi.org/10.1145/3389035

  44. Thakur, N., Patel, K.A., Poon, A., Shah, R., Azizi, N., Han, C.: A comprehensive analysis and investigation of the public discourse on twitter about exoskeletons from 2017 to 2023. Future Int. 15, 346 (2023). https://doi.org/10.3390/fi15100346

    Article  Google Scholar 

  45. Liu, C., et al.: Improving sentiment analysis accuracy with emoji embedding. J. Safety Sci. Resil. 2, 246–252 (2021). https://doi.org/10.1016/j.jnlssr.2021.10.003

    Article  Google Scholar 

  46. Grover, V.: Exploiting emojis in sentiment analysis: a survey. J. Inst. Eng. (India): Series B 103(1), 259–272 (2021). https://doi.org/10.1007/s40031-021-00620-7

    Article  Google Scholar 

  47. Thakur, N., Cui, S., Khanna, K., Knieling, V., Duggal, Y.N., Shao, M.: Investigation of the gender-specific discourse about online learning during COVID-19 on Twitter using sentiment analysis, subjectivity analysis, and toxicity analysis. Computers. 12, 221 (2023). https://doi.org/10.3390/computers12110221

    Article  Google Scholar 

  48. Calisir, E., Brambilla, M.: The problem of data cleaning for knowledge extraction from social media. In: Pautasso, C., Sánchez-Figueroa, F., Systä, K., Murillo Rodríguez, J. (eds.) Current Trends in Web Engineering. ICWE 2018. Lecture Notes in Computer Science(), vol. 11153. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-03056-8_10

  49. Batrinca, B., Treleaven, P.C.: Social media analytics: a survey of techniques, tools and platforms. AI Soc. 30, 89–116 (2015). https://doi.org/10.1007/s00146-014-0549-4

    Article  Google Scholar 

  50. http://www.jacet-hokkaido.org/JACET_RBET_pdf/2019/Sato_2019.pdf. Accessed 04 May 2024

  51. Thakur, N.: MonkeyPox2022Tweets: A large-scale Twitter dataset on the 2022 Monkeypox outbreak, findings from analysis of Tweets, and open research questions. Infect. Dis. Rep. 14, 855–883 (2022). https://doi.org/10.3390/idr14060087

    Article  Google Scholar 

  52. Malaeb, D., et al.: Knowledge, attitude and conspiracy beliefs of healthcare workers in Lebanon towards Monkeypox. Trop. Med. Infect. Dis. 8, 81 (2023). https://doi.org/10.3390/tropicalmed8020081

    Article  Google Scholar 

  53. Mohbey, K.K., Meena, G., Kumar, S., Lokesh, K.: A CNN-LSTM-based hybrid deep learning approach for sentiment analysis on Monkeypox tweets. New Gener. Comput. 42, 89–107 (2024). https://doi.org/10.1007/s00354-023-00227-0

    Article  Google Scholar 

  54. Subramani, N., Veerappampalayam Easwaramoorthy, S., Mohan, P., Subramanian, M., Sambath, V.: A gradient boosted decision tree-based influencer prediction in social network analysis. Big Data Cogn. Comput. 7, 6 (2023). https://doi.org/10.3390/bdcc7010006

    Article  Google Scholar 

  55. Hassani, H., Komendantova, N., Rovenskaya, E., Yeganegi, M.R.: Social intelligence mining: unlocking insights from X. Mach. Learn. Knowl. Extr. 5, 1921–1936 (2023). https://doi.org/10.3390/make5040093

    Article  Google Scholar 

  56. https://www.who.int/westernpacific/emergencies/mpox-outbreak. Accessed 04 May 2024

  57. https://wonder.cdc.gov/nndss/static/2024/11/2024-11-table968.html. Accessed 04 May 2024

  58. Encodings supported by Python 3.12. https://docs.python.org/3.12/library/codecs.html. Accessed 07 Jun 2024

  59. Encodings supported by Python 2.5. https://docs.python.org/2.5/lib/standard-encodings.html. Accessed 07 Jun 2024

  60. Encodings supported by Python 2.6, https://docs.python.org/2.6/library/codecs.html. Accessed 07 Jun 2024

  61. Encodings supported by Python 2.7. https://docs.python.org/2.7/library/codecs.html. Accessed 07 Jun 2024

  62. Encodings supported by Python 3.0. https://docs.python.org/3.0/library/codecs.html. Accessed 07 Jun 2024

  63. Encodings supported by Python 3.1. https://docs.python.org/3.1/library/codecs.html. Accessed 07 Jun 2024

  64. Java, A., Song, X., Finin, T., Tseng, B.: Why we Twitter: understanding microblogging usage and communities. In: Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 Workshop on Web Mining and Social Network Analysis. ACM, New York (2007)

    Google Scholar 

  65. Jansen, B.J., Zhang, M., Sobel, K., Chowdury, A.: Twitter power: tweets as electronic word of mouth. J. Am. Soc. Inf. Sci. Technol. 60, 2169–2188 (2009). https://doi.org/10.1002/asi.21149

    Article  Google Scholar 

  66. Python. https://www.python.org/. Accessed 07 Jun 2024

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nirmalya Thakur .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cui, S., Thakur, N., Poon, A. (2025). Emoji Retrieval from Gibberish or Garbled Social Media Text: A Novel Methodology and a Case Study. In: Coman, A., Vasilache, S., Fui-Hoon Nah, F., Siau, K.L., Wei, J., Margetis, G. (eds) HCI International 2024 – Late Breaking Papers. HCII 2024. Lecture Notes in Computer Science, vol 15375. Springer, Cham. https://doi.org/10.1007/978-3-031-76806-4_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-76806-4_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-76805-7

  • Online ISBN: 978-3-031-76806-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics