scraibe package

Subpackages

Submodules

scraibe.audio module

Audio Processor Module

This module provides the AudioProcessor class, utilizing PyTorchaudio for handling audio files. It includes functionalities to load, cut, and manage audio waveforms, offering efficient and flexible audio processing.

Available Classes: - AudioProcessor: Processes audio waveforms and provides methods for loading,

cutting, and handling audio.

Usage:

from .audio_import AudioProcessor

processor = AudioProcessor.from_file(“path/to/audiofile.wav”) cut_waveform = processor.cut(start=1.0, end=5.0)

Constants: - SAMPLE_RATE (int): Default sample rate for processing. - NORMALIZATION_FACTOR (float): Normalization factor for audio waveform.

class scraibe.audio.AudioProcessor(waveform: Tensor, sr: int = 16000, *args, **kwargs)[source]

Bases: object

Audio Processor class that leverages PyTorchaudio to provide functionalities for loading, cutting, and handling audio waveforms.

waveform

torch.Tensor The audio waveform tensor.

sr

int The sample rate of the audio.

__init__(waveform: Tensor, sr: int = 16000, *args, **kwargs) None[source]

Initialize the AudioProcessor object.

Parameters:
  • waveform (torch.Tensor) – The audio waveform tensor.

  • sr (int, optional) – The sample rate of the audio. Defaults to SAMPLE_RATE.

  • args – Additional arguments.

  • kwargs – Additional keyword arguments, e.g., device to use for processing.

  • available (If CUDA is) –

  • CUDA. (it defaults to) –

Raises:

ValueError – If the provided sample rate is not of type int.

cut(start: float, end: float) Tensor[source]

Cut a segment from the audio waveform between the specified start and end times.

Parameters:
  • start (float) – Start time in seconds.

  • end (float) – End time in seconds.

Returns:

The cut waveform segment.

Return type:

torch.Tensor

classmethod from_file(file: str, *args, **kwargs) AudioProcessor[source]

Create an AudioProcessor instance from an audio file.

Parameters:

file (str) – The audio file path.

Returns:

An instance of the AudioProcessor class containing the loaded audio.

Return type:

AudioProcessor

static load_audio(file: str, sr: int = 16000)[source]

Open an audio file and read it as a mono waveform, resampling if necessary. This method ensures compatibility with pyannote.audio and requires the ffmpeg CLI in PATH.

Parameters:
  • file (str) – The audio file to open.

  • sr (int, optional) – The desired sample rate. Defaults to SAMPLE_RATE.

Returns:

A NumPy array containing the audio waveform in float32 dtype

and the sample rate.

Return type:

tuple

Raises:

RuntimeError – If failed to load audio.

scraibe.autotranscript module

Scraibe Class

This class serves as the core of the transcription system, responsible for handling transcription and diarization of audio files. It leverages pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), providing an accessible interface for audio processing tasks such as transcription, speaker separation, and timestamping.

By encapsulating the complexities of underlying models, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.

Available Classes: - Scraibe: Main class for performing transcription and diarization.

Includes methods for loading models, processing audio files, and formatting the transcription output.

Usage:

from scraibe import Scraibe

model = Scraibe() transcript = model.autotranscribe(“path/to/audiofile.wav”)

class scraibe.autotranscript.Scraibe(whisper_model: bool | str | whisper | None = None, dia_model: bool | str | DiarisationType | None = None, **kwargs)[source]

Bases: object

Scraibe is a class responsible for managing the transcription and diarization of audio files. It serves as the core of the transcription system, incorporating pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), allowing for comprehensive audio processing.

transcriber

The transcriber object to handle transcription.

Type:

Transcriber

diariser

The diariser object to handle diarization.

Type:

Diariser

__init__()[source]

Initializes the Scraibe class with appropriate models.

transcribe()[source]

Transcribes an audio file using the whisper model and pyannote diarization model.

remove_audio_file()[source]

Removes the original audio file to avoid disk space issues or ensure data privacy.

get_audio_file()[source]

Gets an audio file as an AudioProcessor object.

__init__(whisper_model: bool | str | whisper | None = None, dia_model: bool | str | DiarisationType | None = None, **kwargs) None[source]

Initializes the Scraibe class.

Parameters:
  • whisper_model (Union[bool, str, whisper], optional) – Path to whisper model or whisper model itself.

  • diarisation_model (Union[bool, str, DiarisationType], optional) – Path to pyannote diarization model or model itself.

  • **kwargs – Additional keyword arguments for whisper and pyannote diarization models.

autotranscribe(audio_file: str | Tensor | ndarray, remove_original: bool = False, **kwargs) Transcript[source]

Transcribes an audio file using the whisper model and pyannote diarization model.

Parameters:
  • audio_file (Union[str, torch.Tensor, ndarray]) – Path to audio file or a tensor representing the audio.

  • remove_original (bool, optional) – If True, the original audio file will be removed after transcription.

  • *args – Additional positional arguments for diarization and transcription.

  • **kwargs – Additional keyword arguments for diarization and transcription.

Returns:

A Transcript object containing the transcription,

which can be exported to different formats.

Return type:

Transcript

diarization(audio_file: str | Tensor | ndarray, **kwargs) dict[source]

Perform diarization on an audio file using the pyannote diarization model.

Parameters:
  • audio_file (Union[str, torch.Tensor, ndarray]) – The audio source which can either be a path to the audio file or a tensor representation.

  • **kwargs – Additional keyword arguments for diarization.

Returns:

A dictionary containing the results of the diarization process.

Return type:

dict

static get_audio_file(audio_file: str | Tensor | ndarray, *args, **kwargs) AudioProcessor[source]

Gets an audio file as TorchAudioProcessor.

Parameters:
  • audio_file (Union[str, torch.Tensor, ndarray]) – Path to the audio file or a tensor representing the audio.

  • *args – Additional positional arguments.

  • **kwargs – Additional keyword arguments.

Returns:

An object containing the waveform and sample rate in

torch.Tensor format.

Return type:

AudioProcessor

static remove_audio_file(audio_file: str, shred: bool = False) None[source]

Removes the original audio file to avoid disk space issues or ensure data privacy.

Parameters:
  • audio_file_path (str) – Path to the audio file.

  • shred (bool, optional) – If True, the audio file will be shredded, not just removed.

transcribe(audio_file: str | Tensor | ndarray, **kwargs)[source]

Transcribe the provided audio file.

Parameters:
  • audio_file (Union[str, torch.Tensor, ndarray]) – The audio source, which can either be a path or a tensor representation.

  • **kwargs – Additional keyword arguments for transcription.

Returns:

The transcribed text from the audio source.

Return type:

str

scraibe.cli module

Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.

scraibe.cli.cli()[source]

Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.

This function can be executed from the command line to perform transcription tasks, providing a user-friendly way to access the Scraibe class functionalities.

scraibe.diarisation module

Diarisation Class

This class serves as the heart of the speaker diarization system, responsible for identifying and segmenting individual speakers from a given audio file. It leverages a pretrained model from pyannote.audio, providing an accessible interface for audio processing tasks such as speaker separation, and timestamping.

By encapsulating the complexities of the underlying model, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.

Available Classes: - Diariser: Main class for performing speaker diarization.

Includes methods for loading models, processing audio files, and formatting the diarization output.

Constants: - TOKEN_PATH (str): Path to the Pyannote token. - PYANNOTE_DEFAULT_PATH (str): Default path to Pyannote models. - PYANNOTE_DEFAULT_CONFIG (str): Default configuration for Pyannote models.

Usage:

from .diarisation import Diariser

model = Diariser.load_model(model=”path/to/model/config.yaml”) diarisation_output = model.diarization(“path/to/audiofile.wav”)

class scraibe.diarisation.Diariser(model)[source]

Bases: object

Handles the diarization process of an audio file using a pretrained model from pyannote.audio. Diarization is the task of determining “who spoke when.”

Parameters:

model – The pretrained model to use for diarization.

static _get_diarisation_kwargs(**kwargs) dict[source]

Validates and extracts the keyword arguments for the pyannote diarization model.

Ensures that the provided keyword arguments match the expected parameters, filtering out any invalid or unnecessary arguments.

Returns:

A dictionary containing the validated keyword arguments.

Return type:

dict

static _get_token()[source]

Retrieves the Huggingface token from a local file. This token is required for accessing certain online resources.

Raises:

ValueError – If the token is not found.

Returns:

The Huggingface token.

Return type:

str

static _save_token(token)[source]

Saves the provided Huggingface token to a local file. This facilitates future access to online resources without needing to repeatedly authenticate.

Parameters:

token – The Huggingface token to save.

diarization(audiofile: str | Tensor | dict, *args, **kwargs) Annotation[source]

Perform speaker diarization on the provided audio file, effectively separating different speakers and providing a timestamp for each segment.

Parameters:
  • audiofile – The path to the audio file or a torch.Tensor containing the audio data.

  • args – Additional arguments for the diarization model.

  • kwargs – Additional keyword arguments for the diarization model.

Returns:

A dictionary containing speaker names,

segments, and other information related to the diarization process.

Return type:

dict

static format_diarization_output(dia: Annotation) dict[source]

Formats the raw diarization output into a more usable structure for this project.

Parameters:

dia – Raw diarization output.

Returns:

A structured representation of the diarization, with speaker names

as keys and a list of tuples representing segments as values.

Return type:

dict

classmethod load_model(model: str = '/home/runner/.cache/torch/models/pyannote/config.yaml', use_auth_token: str | None = None, cache_token: bool = True, cache_dir: Path | str = '/home/runner/.cache/torch/models/pyannote', hparams_file: str | Path | None = None, *args, **kwargs) Pipeline[source]

Loads a pretrained model from pyannote.audio, either from a local cache or online repository.

Parameters:
  • model – Path or identifier for the pyannote model. default: /models/pyannote/speaker_diarization/config.yaml

  • token – Optional HUGGINGFACE_TOKEN for authenticated access.

  • cache_token – Whether to cache the token locally for future use.

  • cache_dir – Directory for caching models.

  • hparams_file – Path to a YAML file containing hyperparameters.

  • args – Additional arguments only to avoid errors.

  • kwargs – Additional keyword arguments only to avoid errors.

Returns:

A pyannote.audio Pipeline object, encapsulating the loaded model.

Return type:

Pipeline

scraibe.misc module

scraibe.misc.config_diarization_yaml(file_path: str, path_to_segmentation: str | None = None) None[source]

Configure diarization pipeline from a YAML file.

This function updates the YAML file to use the given segmentation model offline, and avoids manual file manipulation.

Parameters:
  • file_path (str) – Path to the YAML file.

  • path_to_segmentation (str, optional) – Optional path to the segmentation model.

Raises:

FileNotFoundError – If the segmentation model file is not found.

scraibe.transcriber module

Transcriber Module

This module provides the Transcriber class, a comprehensive tool for working with Whisper models. The Transcriber class offers functionalities such as loading different Whisper models, transcribing audio files, and saving transcriptions to text files. It acts as an interface between various Whisper models and the user, simplifying the process of audio transcription.

Main Features:
  • Loading different sizes and versions of Whisper models.

  • Transcribing audio in various formats including str, Tensor, and nparray.

  • Saving the transcriptions to the specified paths.

  • Adaptable to various language specifications.

  • Options to control the verbosity of the transcription process.

Constants:

WHISPER_DEFAULT_PATH: Default path for downloading and loading Whisper models.

Usage:
>>> from your_package import Transcriber
>>> transcriber = Transcriber.load_model(model="medium")
>>> transcript = transcriber.transcribe(audio="path/to/audio.wav")
>>> transcriber.save_transcript(transcript, "path/to/save.txt")
class scraibe.transcriber.Transcriber(model: whisper)[source]

Bases: object

Transcriber Class

The Transcriber class serves as a wrapper around Whisper models for efficient audio transcription. By encapsulating the intricacies of loading models, processing audio, and saving transcripts, it offers an easy-to-use interface for users to transcribe audio files.

model

The Whisper model used for transcription.

Type:

whisper

transcribe()[source]

Transcribes the given audio file.

save_transcript()[source]

Saves the transcript to a file.

load_model()[source]

Loads a specific Whisper model.

_get_whisper_kwargs()[source]

Private method to get valid keyword arguments for the whisper model.

Examples

>>> transcriber = Transcriber.load_model(model="medium")
>>> transcript = transcriber.transcribe(audio="path/to/audio.wav")
>>> transcriber.save_transcript(transcript, "path/to/save.txt")

Note

The class supports various sizes and versions of Whisper models. Please refer to the load_model method for available options.

__init__(model: whisper) None[source]

Initialize the Transcriber class with a Whisper model.

Parameters:

model (whisper) – The Whisper model to use for transcription.

static _get_whisper_kwargs(**kwargs) dict[source]

Get kwargs for whisper model. Ensure that kwargs are valid.

Returns:

Keyword arguments for whisper model.

Return type:

dict

classmethod load_model(model: str = 'medium', download_root: str = '/home/runner/.cache/torch/models/whisper', device: str | device | None = None, in_memory: bool = False, *args, **kwargs) Transcriber[source]

Load whisper model.

Parameters:
  • model (str) – Whisper model. Available models include: - ‘tiny.en’ - ‘tiny’ - ‘base.en’ - ‘base’ - ‘small.en’ - ‘small’ - ‘medium.en’ - ‘medium’ - ‘large-v1’ - ‘large-v2’ - ‘large’

  • download_root (str, optional) – Path to download the model. Defaults to WHISPER_DEFAULT_PATH.

  • device (Optional[Union[str, torch.device]], optional) – Device to load model on. Defaults to None.

  • in_memory (bool, optional) – Whether to load model in memory. Defaults to False.

  • args – Additional arguments only to avoid errors.

  • kwargs – Additional keyword arguments only to avoid errors.

Returns:

A Transcriber object initialized with the specified model.

Return type:

Transcriber

static save_transcript(transcript: str, save_path: str) None[source]

Save a transcript to a file.

Parameters:
  • transcript (str) – The transcript as a string.

  • save_path (str) – The path to save the transcript.

Returns:

None

transcribe(audio: str | Tensor | ndarray, *args, **kwargs) str[source]

Transcribe an audio file.

Parameters:
  • audio (Union[str, Tensor, nparray]) – The audio file to transcribe.

  • *args – Additional arguments.

  • **kwargs – Additional keyword arguments, such as the language of the audio file.

Returns:

The transcript as a string.

Return type:

str

scraibe.transcript_exporter module

class scraibe.transcript_exporter.Transcript(transcript: dict)[source]

Bases: object

Class for storing transcript data, including speaker information and text segments, and exporting it to various file formats such as JSON, HTML, and LaTeX.

__init__(transcript: dict) None[source]

Initializes the Transcript object with the given transcript data.

Parameters:

transcript (dict) – A dictionary containing the formatted transcript string. Keys should correspond to segment IDs, and values should contain speaker and segment information.

__repr__() str[source]

Return a string representation of the Transcript object.

Returns:

A string that provides an informative description of the object.

Return type:

str

__str__() str[source]

Converts the transcript to a string representation.

Returns:

String representation of the transcript, including speaker names and

time stamps for each segment.

Return type:

str

_extract_segments() list[source]

Extracts all the text segments from the transcript.

Returns:

List of segments, where each segment is represented

by the starting and ending times.

Return type:

list

_extract_speakers() list[source]

Extracts the unique speaker names from the transcript.

Returns:

List of unique speaker names in the transcript.

Return type:

list

annotate(*args, **kwargs) dict[source]

Annotates the transcript to associate specific names with speakers.

Parameters:
  • args (list) – List of speaker names. These will be mapped sequentially to the speakers.

  • kwargs (dict) – Dictionary with speaker names as keys and list of segments as values.

Returns:

Dictionary with speaker names as keys and list of segments as values.

Return type:

dict

Raises:

ValueError – If the number of speaker names does not match the number of speakers, or if an unknown speaker is found.

classmethod from_json(json: dict | str) Transcript[source]

Load transcript from json file

Parameters:

path (str) – path to json file

Returns:

Transcript object

Return type:

Transcript

get_dict() dict[source]

Get transcript as dict

Returns:

transcript as dict

Return type:

dict

get_html() str[source]

Get transcript as html string

Returns:

transcript as html string

Return type:

str

get_json(*args, use_annotation: bool = True, **kwargs) str[source]

Get transcript as json string :return: transcript as json string :rtype: str

get_md() str[source]

Get transcript as Markdown string, using HTML formatting.

Returns:

Transcript as a Markdown string.

Return type:

str

get_tex() str[source]

Get transcript as LaTeX string. If no annotations are present, the speakers will be annotated with the first letters of the alphabet.

Returns:

Transcript as LaTeX string.

Return type:

str

save(path: str, *args, **kwargs) None[source]

Save transcript to file with the given path and file format.

This method can save the transcript in various formats including JSON, TXT, MD, HTML, TEX, and PDF. The file format is determined by the extension of the path.

Parameters:
  • path (str) – Path to save the file, including the desired file extension.

  • *args – Additional positional arguments to be passed to the specific save methods.

  • **kwargs – Additional keyword arguments to be passed to the specific save methods.

Raises:

ValueError – If the file format specified in the path is unknown.

to_html(path: str) None[source]

Save transcript as html file

Parameters:

path (str) – path to save file

to_json(path, *args, **kwargs) None[source]

Save transcript as json file

Parameters:

path (str) – path to save file

to_md(path: str) None[source]

Get transcript as Markdown string, using HTML formatting.

Returns:

Transcript as a Markdown string.

Return type:

str

to_pdf(path: str) None[source]

Save transcript as a PDF file (placeholder function, implementation needed).

Parameters:

path (str) – Path to save the PDF file.

to_tex(path: str) None[source]

Save transcript as a LaTeX file (placeholder function, implementation needed).

Parameters:

path (str) – Path to save the LaTeX file.

to_txt(path: str) None[source]

Save transcript as a LaTeX file (placeholder function, implementation needed).

Parameters:

path (str) – Path to save the LaTeX file.

scraibe.version module

scraibe.version.get_version(build_version=False)[source]
scraibe.version.git_version()[source]

Module contents