{"status":"ok","message-type":"work","message-version":"1.0.0","message":{"indexed":{"date-parts":[[2025,7,26]],"date-time":"2025-07-26T08:39:49Z","timestamp":1753519189934,"version":"3.41.0"},"reference-count":42,"publisher":"Association for Computing Machinery (ACM)","issue":"1","license":[{"start":{"date-parts":[[2014,1,1]],"date-time":"2014-01-01T00:00:00Z","timestamp":1388534400000},"content-version":"vor","delay-in-days":0,"URL":"https:\/\/www.acm.org\/publications\/policies\/copyright_policy#Background"}],"content-domain":{"domain":["dl.acm.org"],"crossmark-restriction":true},"short-container-title":["Commun. ACM"],"published-print":{"date-parts":[[2014,1]]},"abstract":"
What do we know now that we did not know 40 years ago?","DOI":"10.1145\/2500887","type":"journal-article","created":{"date-parts":[[2014,1,2]],"date-time":"2014-01-02T13:09:43Z","timestamp":1388668183000},"page":"94-103","update-policy":"https:\/\/doi.org\/10.1145\/crossmark-policy","source":"Crossref","is-referenced-by-count":100,"title":["A historical perspective of speech recognition"],"prefix":"10.1145","volume":"57","author":[{"given":"Xuedong","family":"Huang","sequence":"first","affiliation":[{"name":"Microsoft Corp., Redmond, WA"}]},{"given":"James","family":"Baker","sequence":"additional","affiliation":[{"name":"Dragon Systems in Newton, MA"}]},{"given":"Raj","family":"Reddy","sequence":"additional","affiliation":[{"name":"Moza Bint Nasser University"}]}],"member":"320","published-online":{"date-parts":[[2014,1]]},"reference":[{"key":"e_1_2_1_1_1","volume-title":"Proceedings of ICASSP","author":"Bahl L.","year":"1986","unstructured":"Bahl , L. et al. Maximum mutual information estimation of HMM parameters . In Proceedings of ICASSP ( 1986 ), 49--52. Bahl, L. et al. Maximum mutual information estimation of HMM parameters. In Proceedings of ICASSP (1986), 49--52."},{"key":"e_1_2_1_2_1","volume-title":"Stochastic modeling for ASR","author":"Baker J.","year":"1975","unstructured":"Baker , J. Stochastic modeling for ASR . Speech Recognition. D.R. Reddy, ed. Academic Press , 1975 . Baker, J. Stochastic modeling for ASR. Speech Recognition. D.R. Reddy, ed. Academic Press, 1975."},{"key":"e_1_2_1_3_1","first-page":"1","article-title":"Statistical Estimation for Probabilistic Functions of a Markov Process","author":"Baum L","year":"1972","unstructured":"Baum , L . Statistical Estimation for Probabilistic Functions of a Markov Process . Inequalities III , ( 1972 ), 1 -- 8 . Baum, L. Statistical Estimation for Probabilistic Functions of a Markov Process. Inequalities III, (1972), 1--8.","journal-title":"Inequalities"},{"key":"e_1_2_1_4_1","volume-title":"Proceedings of Interspeech","author":"Chen X.","year":"2012","unstructured":"Chen , X. , Pipelined back-propagation for context-dependent deep neural networks . In Proceedings of Interspeech , 2012 . Chen, X., et al. Pipelined back-propagation for context-dependent deep neural networks. In Proceedings of Interspeech, 2012."},{"key":"e_1_2_1_5_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASL.2011.2134090"},{"key":"e_1_2_1_6_1","doi-asserted-by":"publisher","DOI":"10.1109\/TASSP.1980.1163420"},{"key":"e_1_2_1_7_1","volume-title":"Proceedings of NIPS","author":"Dean J.","year":"2012","unstructured":"Dean , J. et al. Large scale distributed deep networks . In Proceedings of NIPS ( Lake Tahoe, NV , 2012 ). Dean, J. et al. Large scale distributed deep networks. In Proceedings of NIPS (Lake Tahoe, NV, 2012)."},{"key":"e_1_2_1_8_1","first-page":"1","article-title":"Maximum likelihood from incomplete data via the EM algorithm","volume":"39","author":"Dempster","year":"1977","unstructured":"Dempster , et al. Maximum likelihood from incomplete data via the EM algorithm . JRSS 39 , 1 ( 1977 ), 1--38. Dempster, et al. Maximum likelihood from incomplete data via the EM algorithm. JRSS 39, 1 (1977), 1--38.","journal-title":"JRSS"},{"key":"e_1_2_1_9_1","volume-title":"Spoken Dialogue with Computers","author":"De Mori R.","year":"1998","unstructured":"De Mori , R. Spoken Dialogue with Computers . Academic Press , 1998 . De Mori, R. Spoken Dialogue with Computers. Academic Press, 1998."},{"key":"e_1_2_1_10_1","doi-asserted-by":"publisher","DOI":"10.1145\/962081.962108"},{"key":"e_1_2_1_11_1","volume-title":"Proceedings of Interspeech","author":"Deng L.","year":"2010","unstructured":"Deng , L. et al. Binary coding of speech spectrograms using a deep auto-encoder . In Proceedings of Interspeech , 2010 . Deng, L. et al. Binary coding of speech spectrograms using a deep auto-encoder. In Proceedings of Interspeech, 2010."},{"key":"e_1_2_1_12_1","volume-title":"Proceedings of IEEE ASRU Workshop","author":"Fiscus J.","year":"1997","unstructured":"Fiscus , J. Recognizer output voting error reduction (ROVER) . In Proceedings of IEEE ASRU Workshop ( 1997 ), 347--354. Fiscus, J. Recognizer output voting error reduction (ROVER). In Proceedings of IEEE ASRU Workshop (1997), 347--354."},{"key":"e_1_2_1_13_1","first-page":"5","article-title":"Discriminative learning in sequential pattern recognition","volume":"25","author":"He X.","year":"2008","unstructured":"He , X. , Discriminative learning in sequential pattern recognition . IEEE Signal Processing 25 , 5 ( 2008 ), 14--36. He, X., et al. Discriminative learning in sequential pattern recognition. IEEE Signal Processing 25, 5 (2008), 14--36.","journal-title":"IEEE Signal Processing"},{"key":"e_1_2_1_14_1","first-page":"11","article-title":"Deep neural networks for acoustic modeling","volume":"29","author":"Hinton G.","year":"2012","unstructured":"Hinton , G. , Deep neural networks for acoustic modeling in SR. IEEE Signal Processing 29 , 11 ( 2012 ). Hinton, G., et al. Deep neural networks for acoustic modeling in SR. IEEE Signal Processing 29, 11 (2012).","journal-title":"SR. IEEE Signal Processing"},{"key":"e_1_2_1_15_1","volume-title":"Spoken Language Processing","author":"Huang X.","year":"2001","unstructured":"Huang , X. , Acero , A. , and Hon , H . Spoken Language Processing . Prentice Hall , Upper Saddle River, NJ, 2001 . Huang, X., Acero, A., and Hon, H. Spoken Language Processing. Prentice Hall, Upper Saddle River, NJ, 2001."},{"key":"e_1_2_1_16_1","volume-title":"Proceedings of ICASSP","author":"Huang X.","year":"2001","unstructured":"Huang , X. et al. MiPad: A multimodal interaction prototype . In Proceedings of ICASSP ( Salt Lake City, UT , 2001 ). Huang, X. et al. MiPad: A multimodal interaction prototype. In Proceedings of ICASSP (Salt Lake City, UT, 2001)."},{"key":"e_1_2_1_17_1","volume-title":"Proceedings of ICASSP","author":"Huang J.","year":"2013","unstructured":"Huang , J. et al. Cross-language knowledge transfer using multilingual DNN . In Proceedings of ICASSP ( 2013 ), 7304--7308. Huang, J. et al. Cross-language knowledge transfer using multilingual DNN. In Proceedings of ICASSP (2013), 7304--7308."},{"key":"e_1_2_1_18_1","first-page":"4","article-title":"Shared-distribution HMMs for speech","volume":"1","author":"Hwang M.","year":"1993","unstructured":"Hwang , M. , and Huang , X . Shared-distribution HMMs for speech . IEEE Trans S&AP 1 , 4 ( 1993 ), 414--420. Hwang, M., and Huang, X. Shared-distribution HMMs for speech. IEEE Trans S&AP 1, 4 (1993), 414--420.","journal-title":"IEEE Trans S&AP"},{"key":"e_1_2_1_19_1","volume-title":"Statistical Methods for Speech Recognition","author":"Jelinek F.","year":"1997","unstructured":"Jelinek , F. Statistical Methods for Speech Recognition . MIT Press , Cambridge, MA , 1997 . Jelinek, F. Statistical Methods for Speech Recognition. MIT Press, Cambridge, MA, 1997."},{"key":"e_1_2_1_20_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1976.10159"},{"key":"e_1_2_1_21_1","doi-asserted-by":"publisher","DOI":"10.1109\/5.726793"},{"key":"e_1_2_1_22_1","volume-title":"Proceedings of Interspeech","author":"Kingsbury B.","year":"2012","unstructured":"Kingsbury , B. et al. Scalable minimum Bayes risk training of deep neural network acoustic models . In Proceedings of Interspeech 2012 . Kingsbury, B. et al. Scalable minimum Bayes risk training of deep neural network acoustic models. In Proceedings of Interspeech 2012."},{"key":"e_1_2_1_23_1","doi-asserted-by":"publisher","DOI":"10.1121\/1.381666"},{"key":"e_1_2_1_24_1","first-page":"8","article-title":"On adaptive decision rules and decision parameters adaption for ASR","volume":"88","author":"Lee C.","year":"2000","unstructured":"Lee , C. and Huo , Q . On adaptive decision rules and decision parameters adaption for ASR . In Proceedings of the IEEE 88 , 8 ( 2000 ), 1241--1269. Lee, C. and Huo, Q. On adaptive decision rules and decision parameters adaption for ASR. In Proceedings of the IEEE 88, 8 (2000), 1241--1269.","journal-title":"Proceedings of the IEEE"},{"key":"e_1_2_1_25_1","volume-title":"ASR: The Development of the Sphinx Recognition System","author":"Lee K.","year":"1988","unstructured":"Lee , K. ASR: The Development of the Sphinx Recognition System . Springer-Verlag , 1988 . Lee, K. ASR: The Development of the Sphinx Recognition System. Springer-Verlag, 1988."},{"key":"e_1_2_1_27_1","volume-title":"Proceedings of ICASSP","author":"Mikolov T.","year":"2011","unstructured":"Mikolov , T. et al. Extensions of recurrent neural network language model . In Proceedings of ICASSP ( 2011 ), 5528--5531. Mikolov, T. et al. Extensions of recurrent neural network language model. In Proceedings of ICASSP (2011), 5528--5531."},{"key":"e_1_2_1_28_1","doi-asserted-by":"publisher","DOI":"10.1006\/csla.2001.0184"},{"key":"e_1_2_1_29_1","doi-asserted-by":"publisher","DOI":"10.1109\/ICASSP.1990.115720"},{"key":"e_1_2_1_30_1","doi-asserted-by":"publisher","DOI":"10.5555\/1895550.1895604"},{"key":"e_1_2_1_31_1","volume-title":"Visible Speech","author":"Potter R.","year":"1947","unstructured":"Potter , R. , Kopp , G. and Green , H . Visible Speech . Van Nostrand , New York, NY , 1947 . Potter, R., Kopp, G. and Green, H. Visible Speech. Van Nostrand, New York, NY, 1947."},{"key":"e_1_2_1_32_1","doi-asserted-by":"publisher","DOI":"10.3115\/116580.116612"},{"key":"e_1_2_1_33_1","volume-title":"Fundamentals of Speech Recognition","author":"Rabiner L.","year":"1993","unstructured":"Rabiner L. and Juang , B . Fundamentals of Speech Recognition , Prentice Hall , Englewood Cliffs, NJ , 1993 . Rabiner L. and Juang, B. Fundamentals of Speech Recognition, Prentice Hall, Englewood Cliffs, NJ, 1993."},{"key":"e_1_2_1_34_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1976.10158"},{"key":"e_1_2_1_35_1","first-page":"1","article-title":"A NL system for spoken language application","volume":"18","author":"Seneff S.","year":"1992","unstructured":"Seneff S. Tina : A NL system for spoken language application . Computational Linguistics 18 , 1 ( 1992 ), 61--86. Seneff S. Tina: A NL system for spoken language application. Computational Linguistics 18, 1 (1992), 61--86.","journal-title":"Computational Linguistics"},{"key":"e_1_2_1_36_1","volume-title":"R","author":"Tur G.","year":"2011","unstructured":"Tur , G. , and De Mori , R . SLU : Systems for Extracting Semantic Information from Speech. Wiley , U.K., 2011 . Tur, G., and De Mori, R. SLU: Systems for Extracting Semantic Information from Speech. Wiley, U.K., 2011."},{"key":"e_1_2_1_37_1","volume-title":"Proceedings of Interspeech","author":"Yan Z.","year":"2013","unstructured":"Yan , Z. , Huo , Q. , and Xu , J . A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR . In Proceedings of Interspeech ( 2013 ). Yan, Z., Huo, Q., and Xu, J. A scalable approach to using DNN-derived features in GMM-HMM based acoustic modeling for LVCSR. In Proceedings of Interspeech (2013)."},{"key":"e_1_2_1_38_1","first-page":"104","article-title":"Recurrent neural networks for language understanding","author":"Yao K.","year":"2013","unstructured":"Yao , K. . Recurrent neural networks for language understanding . In Proceedings of Interspeech ( 2013 ), 104 -- 108 . Yao, K. et al. Recurrent neural networks for language understanding. In Proceedings of Interspeech (2013), 104--108.","journal-title":"Proceedings of Interspeech ("},{"key":"e_1_2_1_39_1","volume-title":"et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR","author":"Yu D.","year":"2013","unstructured":"Yu , D. et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR ( 2013 ). Yu, D. et al. Feature learning in DNN---Studies on speech recognition tasks. ICLR (2013)."},{"key":"e_1_2_1_40_1","doi-asserted-by":"publisher","DOI":"10.1109\/29.21701"},{"key":"e_1_2_1_41_1","doi-asserted-by":"publisher","DOI":"10.3115\/1075812.1075857"},{"key":"e_1_2_1_42_1","doi-asserted-by":"publisher","DOI":"10.1016\/j.csl.2006.06.008"},{"key":"e_1_2_1_43_1","doi-asserted-by":"publisher","DOI":"10.1109\/PROC.1985.13342"}],"container-title":["Communications of the ACM"],"original-title":[],"language":"en","link":[{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2500887","content-type":"unspecified","content-version":"vor","intended-application":"text-mining"},{"URL":"https:\/\/dl.acm.org\/doi\/pdf\/10.1145\/2500887","content-type":"unspecified","content-version":"vor","intended-application":"similarity-checking"}],"deposited":{"date-parts":[[2025,6,18]],"date-time":"2025-06-18T07:34:27Z","timestamp":1750232067000},"score":1,"resource":{"primary":{"URL":"https:\/\/dl.acm.org\/doi\/10.1145\/2500887"}},"subtitle":[],"short-title":[],"issued":{"date-parts":[[2014,1]]},"references-count":42,"journal-issue":{"issue":"1","published-print":{"date-parts":[[2014,1]]}},"alternative-id":["10.1145\/2500887"],"URL":"https:\/\/doi.org\/10.1145\/2500887","relation":{},"ISSN":["0001-0782","1557-7317"],"issn-type":[{"type":"print","value":"0001-0782"},{"type":"electronic","value":"1557-7317"}],"subject":[],"published":{"date-parts":[[2014,1]]},"assertion":[{"value":"2014-01-01","order":2,"name":"published","label":"Published","group":{"name":"publication_history","label":"Publication History"}}]}}