Iycee Charles de Gaulle Summary Speech coding and recognition Essay

Speech coding and recognition Essay

Abstract-This paper investigates the public presentation of a address recognizer in an synergistic voice response system for assorted coded address signals, coded by utilizing a vector quantisation technique viz.

Multi Switched Split Vector Quantization Technique. The procedure of acknowledging the coded end product can be used in Voice banking application. The acknowledgment technique used for the acknowledgment of the coded address signals is the Hidden Markov Model technique. The spectral deformation public presentation, computational complexness, and memory demands of Multi Switched Split Vector Quantization Technique and the public presentation of the address recognizer at assorted spot rates have been computed. From consequences it is found that the address recognizer is demoing better public presentation at 24 bits/frame and it is found that the per centum of acknowledgment is being varied from 100 % to 93.33 % for assorted spot rates.Keywords-Linear prognostic cryptography, Speech Recognition, Voice banking, Multi Switched Split Vector Quantization, Hidden Markov Model, Linear Predictive Coefficients.

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Introduction

This paper takes the advantage of voice banking application and examined the public presentation of a address recognizer in an Synergistic voice response system for the coded end product obtained by utilizing Multi switched split vector quantisation technique ( MSSVQ ) at assorted spot rates. MSSVQ has already been proved that it has better Spectral deformation public presentation, less Computational complexness and less Memory demands when compared to other merchandise codification vector quantisation techniques. So this paper uses MSSVQ as the vector quantisation technique for coding.

Voice Banking is a enormous telephone banking service that makes the user to be in touch with his history information and other banking services 24 hours a twenty-four hours 365 yearss a twelvemonth by doing a simple phone call. In voice banking clients can talk their picks, or can utilize a touch tone computer keyboard to come in choices.The address techniques involved in voice banking are the address cryptography, address sweetening and address acknowledgment. This paper investigates the public presentation of a address recognizer utilizing concealed Markov theoretical account ( HMM ) technique ( [ 1 ] , [ 2 ] , [ 3 ] ) for the coded end products obtained by utilizing a intercrossed vector quantisation technique. The intercrossed vector quantisation technique used for cryptography is the Multi Switched Split vector quantisation ( MSSVQ ) technique ( [ 4 ] , [ 5 ] , [ 6 ] , [ 7 ] ) . The address parametric quantities used for coding are the line spectral frequences ( LSF ) ( [ 8 ] , [ 9 ] , [ 10 ] ) so as to guarantee the filter stableness, the codebooks used for coding are generated by utilizing the Linde Buzo Gray ( LBG ) algorithm [ 11 ] the coevals of the codebooks is a boring and clip devouring procedure necessitating big sums of memory for coevals and storing intents, the memory required for the coevals of the codebooks increases with the figure of developing vectors figure of samples per vector and spots used for codebook coevals.The address acknowledgment technique used for acknowledgment is the concealed Markov theoretical account technique. HMM is a aggregation of assorted statistical mold techniques, in which the passage chance matrix is estimated by utilizing the Baum Welch algorithm ( [ 1 ] , [ 2 ] ) , the emanation matrix is generated by utilizing the K-means bunch algorithm and is estimated by utilizing the Baum Welch algorithm.

The Viterbi algorithm can besides be used for the appraisal of the passage and emanation matrices. For a given sequence the most likely sequence way is estimated by utilizing the Viterbi algorithm ( [ 1 ] , [ 2 ] ) , from which chance of a peculiar sequence is estimated by utilizing the forward algorithm or the backward algorithm.The purpose of this article is to look into the public presentation of the address recognizer utilizing HMM for a coded end product obtained by utilizing multi switched split vector quantisation technique at different spot rates. The address parametric quantities that can be used for acknowledgment are the Linear prognostic coefficients ( LPC ) and Mel Cepstrum coefficients ( MFCC ) .In this paper LPC coefficients were used for acknowledgment and Line spectral frequences were used for coding To better the public presentation of acknowledgment energy, delta and acceleration coefficients must be used but in this paper they were non used because if they were used the coevals of codebooks during coding becomes a job.

Speech cryptography and acknowledgment

This paper is intended for voice banking application, so it requires the engineering of address cryptography and acknowledgment. The sweetening technique used is the Spectral minus technique ( [ 11 ] , [ 12 ] , [ 13 ] ) . The cryptography technique used is the Multi Switched Split Vector Quantization technique ( MSSVQ ) .

The acknowledgment technique used is the Hidden Markov theoretical account technique. The stairss involved in address cryptography and acknowledgment intended for voice banking are

  • First the silence portion of the address signal is removed by utilizing the voice activation and sensing technique and following the channel noise included in the speech signal must be removed by utilizing an sweetening technique.
  • Second the address signal must be coded by utilizing the MSSVQ technique.
  • Third the coded end product with added channel noise must be enhanced by utilizing the spectral minus technique.

  • Next the enhanced speech signal must be given to a voice bank recognizer so as to acknowledge the coded end product.
  • Finally the per centum of acknowledgment was computed as a step of the acknowledgment truth.

By utilizing these address techniques it is found that the acknowledgment truth is being varied from 100 % to 93.

33 % for the coded end products at different spot rates.

MULTI SWITCHED SPLIT VECTOR QUANTIZATION

In MSSVQ for a peculiar switch the coevals of codebooks at different phases is shown in Fig 1.

  • Initially the codebook at the first phase is generated by utilizing the Linde, Buzo and Gray ( LBG ) [ 14 ] algorithm with the preparation vectors set as an input.
  • Second the preparation difference vectors are extracted from the input preparation vectors set and the quantal preparation vectors of the first phase.
  • Finally the preparation difference vectors are used to bring forth the codebook of the 2nd phase.

This process is continued for the needed figure of phases and the figure of codebooks to be generated will be equal to the figure of phases used for quantisation.A P x m x s MSSVQ is shown in Fig 2, where P corresponds to the figure of phases, m corresponds to the figure of switches, and s corresponds to the figure of splits.

  • Each input vector ten that is to be quantized is applied to SSVQ at the first phase so as to obtain the approximative vectors at each codebook of the first phase.
  • Extract the approximative vector with minimal deformation from the set of approximative vectors at the first phase I, vitamin E. =Q [ x1 ] .

  • Calculate the mistake vector ensuing at the first phase of quantisation and allow the mistake vector be, .
  • The mistake vector at the first phase is given as an input to the 2nd phase so as to obtain the quantal version of the mistake vector.

This procedure is continued for the needed figure of phases. Finally the decipherer takes the indices, , from each phase and adds the quantal vectors at each phase so as to obtain the reconstructed vector given by. Where Q [ x1 ] is the quantal input vector at the first phase, Q [ e1 ] is the quantal mistake vector at the 2nd phase and Q [ e2 ] is the quantal mistake vector at the 3rd phase and so on.

. As this procedure involves the quantisation of the mistake vectors and summing of the mistake vectors with the approximative vector at the first phase the spectral deformation public presentation can be greatly improved when compared to SSVQ and SVQ.

SPECTRAL DISTORTION

In order to objectively mensurate the deformation between a coded and uncoded LPC parametric quantity vector, the spectral deformation is frequently used in narrow set address cryptography. For the ith frame the spectral deformation ( in dubnium ) , , [ 5 ] is defined asWhere FS is the sampling frequence and and are the LPC power spectra of the uncoded and coded ith frame, severally. degree Fahrenheit is the frequence in Hz, and the frequence scope is given by f1 and f2. the frequence scope used in pattern is 0-4000Hz. The mean spectral deformation SD is given byThe conditions for transparent address from narrowband LPC parametric quantity quantisation are.

  • The mean spectral deformation ( SD ) must be less than or equal to 1dB.

  • There must be no outlier frames holding a spectral deformation grater than 4dB.
  • The no of outlier frames between 2 to 4dB must be less than 2 % .

Consequence

Tables 1 to 4 gives the chance of acknowledging an vocalization ONE at spot rates 24, 23, 22, 21. From tabular arraies it is observed that the acknowledgment truth is being varied from 100 % to 93.33 % for different spot rates and it is found that the acknowledgment truth is good at 24 and 23 bits/frame. The ground for taking multi switched split vector quantisation technique is that it is holding better spectral deformation public presentation, less computational complexness and less memory demands when compared to other merchandise codification vector quantisation techniques which can be observed from tabular arraies 5 to 8.

As a consequence the cost of the merchandise will be less when utilizing MSSVQ and can hold better marketability. The lessening in spectral deformation, complexness and memory demands for MSSVQ can besides be observed from Fig ‘s 3 to 5. The spectral deformation is measured in units of decibles ( dubnium ) , computational complexness is measured in units of kflops/frame, and memory demands are measured in units of floats.

Decision

The Speech recognizer utilizing HMM performs good for the coded end product obtained by utilizing MSSVQ, It has been observed that the per centum of acknowledgment varies from 100 % to 93.33 % for different spot rates. Another advantage with MSSVQ is that it provides better tradeoff between spot rate and spectral deformation public presentation, computational complexness, and memory demands, when compared to other merchandise codification vector quantisation strategies like Split vector quantisation ( SVQ ) , Multi phase vector quantisation ( MSVQ ) , and Switched Split vector quantisation ( SSVQ ) .

So MSSVQ is proved to be better. When compared to all the merchandise codification vector quantisation techniques. So MSSVQ is proved to be the better LPC cryptography technique for voice banking application. The public presentation can break improved by increasing the figure of developing vectors and spots for codebook coevals, by increasing the figure of provinces of an vocalization, by utilizing an efficient algorithm for the coevals of emanation matrix that takes into history the full preparation set unless the K-means bunch that randomly choices vectors from the preparation set for the coevals of an emanation matrix. , and by utilizing a package holding grater grade of preciseness. With Matlab it is hard to obtain grater grade of preciseness when a big figure of provinces are taken for a peculiar vocalization.

Recognitions

The writers place on record their thankful thanks to the governments of Chalapathi Institute of Technology, Mothadaka, Guntur, AP, INDIA, R.

V.R & A ; J.C.College of Engineering, Guntur, A.

P, INDIA, K L College of Engineering, Guntur, A.P, INDIA, and Jawaharlal Nehru Technological University, College of Engineering, Hyderabad, INDIA for supplying the installations.

Mentions

  1. Rabiner Lawrence, Juang Bing-Hwang, Fundamentals of address Recognition, Prentice Hall, New Jersey, 1993, ISBN 0-13-015157-2.
  2. Lawrence R.Rabiner, A tutorial on Hidden Markov Models and selected applications in address acknowledgment, Poceedings of the IEEE, Vol 77, no.2, Feb 1989, pp.154-161.
  3. Rabiner L.

    R, Levinson S.E. , Rosenberg A.

    E. & A ; Wilpon J.G, Speaker independent acknowledgment of stray words utilizing constellating techniques, IEEE Trans. Acousticss, Speech, Signal Proc. , 1979, pp.336-349.

  4. M.Satya Sai Ram.

    , P.Siddaiah. , & A ; M.MadhaviLatha, Multi Switched Split Vector Quantization of Narrow Band Speech Signals, Proceedings World Academy of Science, Engineering and Technology, WASET, Vol.27, Feb 2008, pp.

    236-239.

  5. M.Satya Sai Ram. , P.Siddaiah. , & A ; M.MadhaviLatha, Multi Switched Split Vector Quantizer, International Journal of Computer, Information, and Systems scientific discipline, and Engineering, IJCISSE, WASET, Vol.2, no.

    1, May 2008, pp.1-6.

  6. Paliwal. K.K, Atal. B.S, Efficient vector quantisation of LPC Parameters at 24 bits/frame, IEEE Trans.

    Speech Audio Process, 1993, pp. 3-14.

  7. Stephen. So, & A ; Paliwal. K. K, Efficient merchandise codification vector quantisation utilizing switched split vector quantizer, Digital Signal Processing diary, Elsevier, Vol 17, Jan 2007, pp.138-171.
  8. Bastiaan Kleijn.

    W, Tom Backstrom, & A ; Paavo Alku, On Line Spectral Frequencies, ” IEEE Signal Processing Letters, Vol.10, no.3, 2003.

  9. Soong. F, Juang.

    B, Line spectrum brace ( LSP ) and speech informations compaction, IEEE Conference. On Acoustics, Speech Signal Processing, vol 9, no.1, Mar 1984, pp. 37-40.

  10. P. Kabal, & A ; P. Rama Chandran, The Computation of Line Spectral Frequencies Using Chebyshev multinomials, IEEE Trans.

    On Acoustics, Speech Signal Processing, Vol 34, no.6, 1986, pp. 1419-1426.

  11. P. Lockwood and J. Boudy, .Experiments with a Nonlinear Spectral Subtraction ( NSS ) , Hidden Markov Models and the Projection, for Robust Speech Recognition in Cars. Speech Communiaction, vol. 11, 1992, pp.

    215.228.

  12. S.F. Boll, Suppression of Acoustic Noise in Speech utilizing Spectral Subtraction, IEEE Trans. on ASSP, vol. 27 ( 2 ) , 1979, pp.

    113-120.

  13. M. Berouti, R. Schwartz, and J.

    Makhoul, Enhancement of Speech Corrupted by Acoustic Noise. in Proc. ICASSP, 1979, pp.

    208.211.

  14. Linde.Y, Buzo. A, & A ; Gray. R.M, An Algorithm for Vector Quantizer Design, IEEE Trans.

    Commun, 28, Jan.1980, pp. 84-95.

M.Satya Sai Ram obtained B.Tech grade in Electronics and Communication Engineering from Nagarjuna University, Guntur in 2003. He received his M.

Tech grade from Nagarjuna University, Guntur in 2005. He started his calling as a lector at R.V.R & A ; J.C. College of Engineering, Guntur, AP, INDIA in 2005 and promoted as a Sr.Lecturer in the twelvemonth 2007.

At present M.Satya Sai Ram is working as an Associate professor in the section of Electronics and Communication Engineering, at Chalapathi Institute of Technology, Mothadaka, Guntur, AP, INDIA. He actively involved in research and guiding Undertakings for Post Graduate pupils in the country of Speech & A ; Signal Processing, . He has taught a broad assortment of classs for UG pupils and guided several undertakings. He has published more than Six documents in International Conferences and Diaries.Dr.

P.Siddaiah obtained B.Tech grade in Electronics and Communication Engineering from JNTU college of Engineering in 1988. He received his M.

Tech grade from SV University, Tirupathi. He did his Ph.d. plan in JNTU, Hyderabad.

He is the main Investigator for several outsourcing undertaking sponsored by Defense organisations and AICTE. He started his calling as lector at SV University in 1993. At present Dr P. Siddaiah is working as an Professor & A ; HOD in the section of Electronics and Communication Engineering, KL College of Engineering and actively involved in research and steering pupils in the country of Antennas, Speech & A ; Signal Processing, .. He has taught a broad assortment of classs for UG & A ; PG pupils and guided several undertakings. Several members prosecuting their PhD grade under counsel. He has published several documents in National and International Journals and Conferences.

He is the life member of FIETE, IE, and MISTE.M. Madhavi Latha graduated in B. Technical school from NU in 1986, Post Graduation in M.

Tech from JNTU in 1993 and Ph. D from JNTU in 2002. She has been actively involved in research and steering pupils in the country of Signal & A ; Image Processing, VLSI ( Mixed Signal design ) and hardware execution of Speech CODECs.

She has published more than 30 documents in National/ International Conferences and Journals. Presently, she has been working as Professor in ECE, JNTU College of Engineering, Hyderabad, Andhra Pradesh. She is the life member of FIETE, MISTE, MIEEE.