scraibe package
Subpackages
Submodules
scraibe.audio module
Audio Processor Module
This module provides the AudioProcessor class, utilizing PyTorchaudio for handling audio files. It includes functionalities to load, cut, and manage audio waveforms, offering efficient and flexible audio processing.
Available Classes: - AudioProcessor: Processes audio waveforms and provides methods for loading,
cutting, and handling audio.
- Usage:
from .audio_import AudioProcessor
processor = AudioProcessor.from_file(“path/to/audiofile.wav”) cut_waveform = processor.cut(start=1.0, end=5.0)
Constants: - SAMPLE_RATE (int): Default sample rate for processing. - NORMALIZATION_FACTOR (float): Normalization factor for audio waveform.
- class scraibe.audio.AudioProcessor(waveform: Tensor, sr: int = 16000, *args, **kwargs)[source]
Bases:
object
Audio Processor class that leverages PyTorchaudio to provide functionalities for loading, cutting, and handling audio waveforms.
- waveform
torch.Tensor The audio waveform tensor.
- sr
int The sample rate of the audio.
- __init__(waveform: Tensor, sr: int = 16000, *args, **kwargs) None [source]
Initialize the AudioProcessor object.
- Parameters:
waveform (torch.Tensor) – The audio waveform tensor.
sr (int, optional) – The sample rate of the audio. Defaults to SAMPLE_RATE.
args – Additional arguments.
kwargs – Additional keyword arguments, e.g., device to use for processing.
available (If CUDA is) –
CUDA. (it defaults to) –
- Raises:
ValueError – If the provided sample rate is not of type int.
- cut(start: float, end: float) Tensor [source]
Cut a segment from the audio waveform between the specified start and end times.
- Parameters:
start (float) – Start time in seconds.
end (float) – End time in seconds.
- Returns:
The cut waveform segment.
- Return type:
torch.Tensor
- classmethod from_file(file: str, *args, **kwargs) AudioProcessor [source]
Create an AudioProcessor instance from an audio file.
- Parameters:
file (str) – The audio file path.
- Returns:
An instance of the AudioProcessor class containing the loaded audio.
- Return type:
- static load_audio(file: str, sr: int = 16000)[source]
Open an audio file and read it as a mono waveform, resampling if necessary. This method ensures compatibility with pyannote.audio and requires the ffmpeg CLI in PATH.
- Parameters:
file (str) – The audio file to open.
sr (int, optional) – The desired sample rate. Defaults to SAMPLE_RATE.
- Returns:
- A NumPy array containing the audio waveform in float32 dtype
and the sample rate.
- Return type:
tuple
- Raises:
RuntimeError – If failed to load audio.
scraibe.autotranscript module
Scraibe Class
This class serves as the core of the transcription system, responsible for handling transcription and diarization of audio files. It leverages pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), providing an accessible interface for audio processing tasks such as transcription, speaker separation, and timestamping.
By encapsulating the complexities of underlying models, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.
Available Classes: - Scraibe: Main class for performing transcription and diarization.
Includes methods for loading models, processing audio files, and formatting the transcription output.
- Usage:
from scraibe import Scraibe
model = Scraibe() transcript = model.autotranscribe(“path/to/audiofile.wav”)
- class scraibe.autotranscript.Scraibe(whisper_model: bool | str | whisper | None = None, dia_model: bool | str | DiarisationType | None = None, **kwargs)[source]
Bases:
object
Scraibe is a class responsible for managing the transcription and diarization of audio files. It serves as the core of the transcription system, incorporating pretrained models for speech-to-text (such as Whisper) and speaker diarization (such as pyannote.audio), allowing for comprehensive audio processing.
- transcriber
The transcriber object to handle transcription.
- Type:
- transcribe()[source]
Transcribes an audio file using the whisper model and pyannote diarization model.
- remove_audio_file()[source]
Removes the original audio file to avoid disk space issues or ensure data privacy.
- __init__(whisper_model: bool | str | whisper | None = None, dia_model: bool | str | DiarisationType | None = None, **kwargs) None [source]
Initializes the Scraibe class.
- Parameters:
whisper_model (Union[bool, str, whisper], optional) – Path to whisper model or whisper model itself.
diarisation_model (Union[bool, str, DiarisationType], optional) – Path to pyannote diarization model or model itself.
**kwargs – Additional keyword arguments for whisper and pyannote diarization models.
- autotranscribe(audio_file: str | Tensor | ndarray, remove_original: bool = False, **kwargs) Transcript [source]
Transcribes an audio file using the whisper model and pyannote diarization model.
- Parameters:
audio_file (Union[str, torch.Tensor, ndarray]) – Path to audio file or a tensor representing the audio.
remove_original (bool, optional) – If True, the original audio file will be removed after transcription.
*args – Additional positional arguments for diarization and transcription.
**kwargs – Additional keyword arguments for diarization and transcription.
- Returns:
- A Transcript object containing the transcription,
which can be exported to different formats.
- Return type:
- diarization(audio_file: str | Tensor | ndarray, **kwargs) dict [source]
Perform diarization on an audio file using the pyannote diarization model.
- Parameters:
audio_file (Union[str, torch.Tensor, ndarray]) – The audio source which can either be a path to the audio file or a tensor representation.
**kwargs – Additional keyword arguments for diarization.
- Returns:
A dictionary containing the results of the diarization process.
- Return type:
dict
- static get_audio_file(audio_file: str | Tensor | ndarray, *args, **kwargs) AudioProcessor [source]
Gets an audio file as TorchAudioProcessor.
- Parameters:
audio_file (Union[str, torch.Tensor, ndarray]) – Path to the audio file or a tensor representing the audio.
*args – Additional positional arguments.
**kwargs – Additional keyword arguments.
- Returns:
- An object containing the waveform and sample rate in
torch.Tensor format.
- Return type:
- static remove_audio_file(audio_file: str, shred: bool = False) None [source]
Removes the original audio file to avoid disk space issues or ensure data privacy.
- Parameters:
audio_file_path (str) – Path to the audio file.
shred (bool, optional) – If True, the audio file will be shredded, not just removed.
- transcribe(audio_file: str | Tensor | ndarray, **kwargs)[source]
Transcribe the provided audio file.
- Parameters:
audio_file (Union[str, torch.Tensor, ndarray]) – The audio source, which can either be a path or a tensor representation.
**kwargs – Additional keyword arguments for transcription.
- Returns:
The transcribed text from the audio source.
- Return type:
str
scraibe.cli module
Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.
- scraibe.cli.cli()[source]
Command-Line Interface (CLI) for the Scraibe class, allowing for user interaction to transcribe and diarize audio files. The function includes arguments for specifying the audio files, model paths, output formats, and other options necessary for transcription.
This function can be executed from the command line to perform transcription tasks, providing a user-friendly way to access the Scraibe class functionalities.
scraibe.diarisation module
Diarisation Class
This class serves as the heart of the speaker diarization system, responsible for identifying and segmenting individual speakers from a given audio file. It leverages a pretrained model from pyannote.audio, providing an accessible interface for audio processing tasks such as speaker separation, and timestamping.
By encapsulating the complexities of the underlying model, it allows for straightforward integration into various applications, ranging from transcription services to voice assistants.
Available Classes: - Diariser: Main class for performing speaker diarization.
Includes methods for loading models, processing audio files, and formatting the diarization output.
Constants: - TOKEN_PATH (str): Path to the Pyannote token. - PYANNOTE_DEFAULT_PATH (str): Default path to Pyannote models. - PYANNOTE_DEFAULT_CONFIG (str): Default configuration for Pyannote models.
- Usage:
from .diarisation import Diariser
model = Diariser.load_model(model=”path/to/model/config.yaml”) diarisation_output = model.diarization(“path/to/audiofile.wav”)
- class scraibe.diarisation.Diariser(model)[source]
Bases:
object
Handles the diarization process of an audio file using a pretrained model from pyannote.audio. Diarization is the task of determining “who spoke when.”
- Parameters:
model – The pretrained model to use for diarization.
- static _get_diarisation_kwargs(**kwargs) dict [source]
Validates and extracts the keyword arguments for the pyannote diarization model.
Ensures that the provided keyword arguments match the expected parameters, filtering out any invalid or unnecessary arguments.
- Returns:
A dictionary containing the validated keyword arguments.
- Return type:
dict
- static _get_token()[source]
Retrieves the Huggingface token from a local file. This token is required for accessing certain online resources.
- Raises:
ValueError – If the token is not found.
- Returns:
The Huggingface token.
- Return type:
str
- static _save_token(token)[source]
Saves the provided Huggingface token to a local file. This facilitates future access to online resources without needing to repeatedly authenticate.
- Parameters:
token – The Huggingface token to save.
- diarization(audiofile: str | Tensor | dict, *args, **kwargs) Annotation [source]
Perform speaker diarization on the provided audio file, effectively separating different speakers and providing a timestamp for each segment.
- Parameters:
audiofile – The path to the audio file or a torch.Tensor containing the audio data.
args – Additional arguments for the diarization model.
kwargs – Additional keyword arguments for the diarization model.
- Returns:
- A dictionary containing speaker names,
segments, and other information related to the diarization process.
- Return type:
dict
- static format_diarization_output(dia: Annotation) dict [source]
Formats the raw diarization output into a more usable structure for this project.
- Parameters:
dia – Raw diarization output.
- Returns:
- A structured representation of the diarization, with speaker names
as keys and a list of tuples representing segments as values.
- Return type:
dict
- classmethod load_model(model: str = '/home/runner/.cache/torch/models/pyannote/config.yaml', use_auth_token: str | None = None, cache_token: bool = True, cache_dir: Path | str = '/home/runner/.cache/torch/models/pyannote', hparams_file: str | Path | None = None, *args, **kwargs) Pipeline [source]
Loads a pretrained model from pyannote.audio, either from a local cache or online repository.
- Parameters:
model – Path or identifier for the pyannote model. default: /models/pyannote/speaker_diarization/config.yaml
token – Optional HUGGINGFACE_TOKEN for authenticated access.
cache_token – Whether to cache the token locally for future use.
cache_dir – Directory for caching models.
hparams_file – Path to a YAML file containing hyperparameters.
args – Additional arguments only to avoid errors.
kwargs – Additional keyword arguments only to avoid errors.
- Returns:
A pyannote.audio Pipeline object, encapsulating the loaded model.
- Return type:
Pipeline
scraibe.misc module
- scraibe.misc.config_diarization_yaml(file_path: str, path_to_segmentation: str | None = None) None [source]
Configure diarization pipeline from a YAML file.
This function updates the YAML file to use the given segmentation model offline, and avoids manual file manipulation.
- Parameters:
file_path (str) – Path to the YAML file.
path_to_segmentation (str, optional) – Optional path to the segmentation model.
- Raises:
FileNotFoundError – If the segmentation model file is not found.
scraibe.transcriber module
Transcriber Module
This module provides the Transcriber class, a comprehensive tool for working with Whisper models. The Transcriber class offers functionalities such as loading different Whisper models, transcribing audio files, and saving transcriptions to text files. It acts as an interface between various Whisper models and the user, simplifying the process of audio transcription.
- Main Features:
Loading different sizes and versions of Whisper models.
Transcribing audio in various formats including str, Tensor, and nparray.
Saving the transcriptions to the specified paths.
Adaptable to various language specifications.
Options to control the verbosity of the transcription process.
- Constants:
WHISPER_DEFAULT_PATH: Default path for downloading and loading Whisper models.
- Usage:
>>> from your_package import Transcriber >>> transcriber = Transcriber.load_model(model="medium") >>> transcript = transcriber.transcribe(audio="path/to/audio.wav") >>> transcriber.save_transcript(transcript, "path/to/save.txt")
- class scraibe.transcriber.Transcriber(model: whisper)[source]
Bases:
object
Transcriber Class
The Transcriber class serves as a wrapper around Whisper models for efficient audio transcription. By encapsulating the intricacies of loading models, processing audio, and saving transcripts, it offers an easy-to-use interface for users to transcribe audio files.
- model
The Whisper model used for transcription.
- Type:
whisper
Examples
>>> transcriber = Transcriber.load_model(model="medium") >>> transcript = transcriber.transcribe(audio="path/to/audio.wav") >>> transcriber.save_transcript(transcript, "path/to/save.txt")
Note
The class supports various sizes and versions of Whisper models. Please refer to the load_model method for available options.
- __init__(model: whisper) None [source]
Initialize the Transcriber class with a Whisper model.
- Parameters:
model (whisper) – The Whisper model to use for transcription.
- static _get_whisper_kwargs(**kwargs) dict [source]
Get kwargs for whisper model. Ensure that kwargs are valid.
- Returns:
Keyword arguments for whisper model.
- Return type:
dict
- classmethod load_model(model: str = 'medium', download_root: str = '/home/runner/.cache/torch/models/whisper', device: str | device | None = None, in_memory: bool = False, *args, **kwargs) Transcriber [source]
Load whisper model.
- Parameters:
model (str) – Whisper model. Available models include: - ‘tiny.en’ - ‘tiny’ - ‘base.en’ - ‘base’ - ‘small.en’ - ‘small’ - ‘medium.en’ - ‘medium’ - ‘large-v1’ - ‘large-v2’ - ‘large’
download_root (str, optional) – Path to download the model. Defaults to WHISPER_DEFAULT_PATH.
device (Optional[Union[str, torch.device]], optional) – Device to load model on. Defaults to None.
in_memory (bool, optional) – Whether to load model in memory. Defaults to False.
args – Additional arguments only to avoid errors.
kwargs – Additional keyword arguments only to avoid errors.
- Returns:
A Transcriber object initialized with the specified model.
- Return type:
- static save_transcript(transcript: str, save_path: str) None [source]
Save a transcript to a file.
- Parameters:
transcript (str) – The transcript as a string.
save_path (str) – The path to save the transcript.
- Returns:
None
- transcribe(audio: str | Tensor | ndarray, *args, **kwargs) str [source]
Transcribe an audio file.
- Parameters:
audio (Union[str, Tensor, nparray]) – The audio file to transcribe.
*args – Additional arguments.
**kwargs – Additional keyword arguments, such as the language of the audio file.
- Returns:
The transcript as a string.
- Return type:
str
scraibe.transcript_exporter module
- class scraibe.transcript_exporter.Transcript(transcript: dict)[source]
Bases:
object
Class for storing transcript data, including speaker information and text segments, and exporting it to various file formats such as JSON, HTML, and LaTeX.
- __init__(transcript: dict) None [source]
Initializes the Transcript object with the given transcript data.
- Parameters:
transcript (dict) – A dictionary containing the formatted transcript string. Keys should correspond to segment IDs, and values should contain speaker and segment information.
- __repr__() str [source]
Return a string representation of the Transcript object.
- Returns:
A string that provides an informative description of the object.
- Return type:
str
- __str__() str [source]
Converts the transcript to a string representation.
- Returns:
- String representation of the transcript, including speaker names and
time stamps for each segment.
- Return type:
str
- _extract_segments() list [source]
Extracts all the text segments from the transcript.
- Returns:
- List of segments, where each segment is represented
by the starting and ending times.
- Return type:
list
- _extract_speakers() list [source]
Extracts the unique speaker names from the transcript.
- Returns:
List of unique speaker names in the transcript.
- Return type:
list
- annotate(*args, **kwargs) dict [source]
Annotates the transcript to associate specific names with speakers.
- Parameters:
args (list) – List of speaker names. These will be mapped sequentially to the speakers.
kwargs (dict) – Dictionary with speaker names as keys and list of segments as values.
- Returns:
Dictionary with speaker names as keys and list of segments as values.
- Return type:
dict
- Raises:
ValueError – If the number of speaker names does not match the number of speakers, or if an unknown speaker is found.
- classmethod from_json(json: dict | str) Transcript [source]
Load transcript from json file
- Parameters:
path (str) – path to json file
- Returns:
Transcript object
- Return type:
- get_html() str [source]
Get transcript as html string
- Returns:
transcript as html string
- Return type:
str
- get_json(*args, use_annotation: bool = True, **kwargs) str [source]
Get transcript as json string :return: transcript as json string :rtype: str
- get_md() str [source]
Get transcript as Markdown string, using HTML formatting.
- Returns:
Transcript as a Markdown string.
- Return type:
str
- get_tex() str [source]
Get transcript as LaTeX string. If no annotations are present, the speakers will be annotated with the first letters of the alphabet.
- Returns:
Transcript as LaTeX string.
- Return type:
str
- save(path: str, *args, **kwargs) None [source]
Save transcript to file with the given path and file format.
This method can save the transcript in various formats including JSON, TXT, MD, HTML, TEX, and PDF. The file format is determined by the extension of the path.
- Parameters:
path (str) – Path to save the file, including the desired file extension.
*args – Additional positional arguments to be passed to the specific save methods.
**kwargs – Additional keyword arguments to be passed to the specific save methods.
- Raises:
ValueError – If the file format specified in the path is unknown.
- to_html(path: str) None [source]
Save transcript as html file
- Parameters:
path (str) – path to save file
- to_json(path, *args, **kwargs) None [source]
Save transcript as json file
- Parameters:
path (str) – path to save file
- to_md(path: str) None [source]
Get transcript as Markdown string, using HTML formatting.
- Returns:
Transcript as a Markdown string.
- Return type:
str
- to_pdf(path: str) None [source]
Save transcript as a PDF file (placeholder function, implementation needed).
- Parameters:
path (str) – Path to save the PDF file.