Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chunks have to be sent with at least one per 5 seconds frequency. Как решать? #155

Open
PavelNen opened this issue Jan 18, 2024 · 1 comment

Comments

@PavelNen
Copy link

Как с помощью stt в SDK делать расшифровку аудио любой длины?

Ошибка
Chunks have to be sent with at least one per 5 seconds frequency

Задача
Делаю сервис, который берёт аудиофайл WAV формата и расшифровывает с помощью Яндекс Облака.
В целом, бизнес-задача: взять аудиофайл с бакета, расшифровать и сохранить текст в базе. То есть реалтайм не нужен.

Вопрос
Какие есть быстрые способы решения этой проблемы в рамках сервиса SpeechKit и SDK?

import { Injectable } from '@nestjs/common';
import { serviceClients, Session } from '@yandex-cloud/nodejs-sdk';
import {
  RecognitionSpec_AudioEncoding,
  StreamingRecognitionRequest,
} from '@yandex-cloud/nodejs-sdk/dist/generated/yandex/cloud/ai/stt/v2/stt_service';
import * as wav from 'wav';
import { PassThrough, Readable } from 'stream';
import { getEnv } from '../utils/get-env';
import { log } from '../utils/logger';

@Injectable()
export class SpeechService {
  async streamToText(
    audioStream: Readable,
    responseStream: PassThrough,
  ): Promise<void> {
    const reader = new wav.Reader({});
    const writer = new wav.Writer({
      sampleRate: 16000,
      channels: 1,
      bitDepth: 16,
    });
    const data = new PassThrough();
    const authToken = getEnv('YC_OAUTH_TOKEN');
    const folderId = getEnv('YC_FOLDER_ID');
    const session = new Session({ oauthToken: authToken });
    const client = session.client(serviceClients.SttServiceClient);

    const formatPromise = new Promise<wav.Format>((resolve) => {
      reader.on('format', (format: wav.Format) => {
        resolve(format);
      });
    });

    audioStream.pipe(writer).pipe(data);

    async function* createRequest(): AsyncIterable<StreamingRecognitionRequest> {
      const format = await formatPromise;

      yield StreamingRecognitionRequest.fromPartial({
        config: {
          specification: {
            audioEncoding: RecognitionSpec_AudioEncoding.LINEAR16_PCM,
            sampleRateHertz: format.sampleRate,
            audioChannelCount: format.channels,
          },
          folderId,
        },
      });
      for await (const chunk of writer) {
        yield StreamingRecognitionRequest.fromPartial({
          audioContent: chunk,
        });
      }
    }

    try {
      for await (const response of client.streamingRecognize(createRequest())) {
        const text = JSON.stringify(response, null, 2);
        responseStream.write(text);
      }
      responseStream.end();
    } catch (error) {
      log(error);
      responseStream.destroy(error);
    }
  }
}

@nikolaymatrosov
Copy link
Contributor

Попробуйте асинхронное распознавание.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants