Skip to main content

speaker_identification.proto

path phonexia/grpc/technologies/speaker_identification/v1/speaker_identification.proto

package phonexia.grpc.technologies.speaker_identification.v1

Phonexia Speaker Identification gRPC API.


Messages

CompareRequest

The message sent by the client for the Compare method.

NameTypeDescription
voiceprints_arepeated phonexia.grpc.common.Voiceprint

First list of voiceprints to be compared.

voiceprints_brepeated phonexia.grpc.common.Voiceprint

Second list of voiceprints to be compared.

CompareResponse

The top-level message returned to the client by the Compare method. It contains comparison matrix of log-likelihood ratio (LLR) scores.

Comparison matrix containing similarity scores between voiceprints_a and voiceprints_b. The element at row i and column j corresponds to the comparison result between voiceprints_a[i] and voiceprints_b[j].

NameTypeDescription
scoresphonexia.grpc.common.Matrix

The similarity scores are expressed as log-likelihood ratio (LLR) values, which fall within the interval of (-inf;+inf), with higher values indicating higher similarity.

ConvertRequest

The message sent by the client for the Convert method.

NameTypeDescription
voiceprintsrepeated phonexia.grpc.common.Voiceprint

Voiceprints to be converted to vector database format.

ConvertResponse

The top-level message returned to the client by the Convert method. It contains the converted vectors.

NameTypeDescription
vector_voiceprintsrepeated phonexia.grpc.common.Vector

Voiceprints converted to vector format for vector databases. Use cosine distance to compare vector voiceprints in vector databases. Note: Individual vectors will be empty for input voiceprints that contain no speech.

ExtractConfig

NameTypeDescription
speech_lengthgoogle.protobuf.Duration

Specifies the maximum speech length from which the voiceprint will be extracted. If there is less speech in the audio than the specified duration, the voiceprint will be extracted from the entire audio.

enable_vector_voiceprintbool

If set to true, the extraction result will contain both voiceprint and vector_voiceprint fields. If false (default), only voiceprint is returned.

ExtractRequest

The top-level message sent by the client for the Extract method.

NameTypeDescription
audiophonexia.grpc.common.Audio

Audio to extract the voiceprints from. If the audio is in a raw format and the config.speech_length is set, the result can be returned before the whole audio was transferred if the requirement for speech length were met. There is no minimum audio length limit.

configExtractConfig

Voiceprint extraction configuration.

ExtractResponse

The top-level message returned to the client by the Extract method. It contains the result as zero or more ExtractResult messages.

NameTypeDescription
resultExtractResult

Result containing the extracted voiceprint.

processed_audio_lengthgoogle.protobuf.Duration

When available, total length of the processed audio. Set only if this is the last response in the stream.

ExtractResult

A voiceprint extraction result.

NameTypeDescription
speech_lengthgoogle.protobuf.Duration

Speech length from which the voiceprint was extracted.

voiceprintphonexia.grpc.common.Voiceprint

Extracted voiceprint is always included in the result.

vector_voiceprintphonexia.grpc.common.Vector

Vector voiceprint optimized for vector databases. Included only if enable_vector_voiceprint is set to true in ExtractConfig. Use cosine distance to compare vector voiceprints in vector databases. Note: Will be empty if the input audio contains no speech.

MergeRequest

The message sent by the client for the Merge method.

NameTypeDescription
voiceprintsrepeated phonexia.grpc.common.Voiceprint

List of voiceprints to be merged.

MergeResponse

The top-level message returned to the client by the Merge method. It contains the merged voiceprint.

NameTypeDescription
speech_lengthgoogle.protobuf.Duration

Speech length of merged voiceprint. It's a sum of all input voiceprints' lengths.

voiceprintphonexia.grpc.common.Voiceprint

Merged voiceprint.


Services

VoiceprintComparison

Service that implements voiceprint comparison.

Compare

MethodCompare
RequestCompareRequest stream
ResponseCompareResponse stream
Description

Performs synchronous comparison of two voiceprint lists. Each voiceprint from one list is compared with each voiceprint from other list. Returns a message containing matrix of comparison scores (results of individual voiceprint-to-voiceprint comparisons).

VoiceprintConversion

Service that implements voiceprint conversion to vector database format.

Convert

MethodConvert
RequestConvertRequest stream
ResponseConvertResponse stream
Description

Converts a voiceprint to a vector optimized for vector database usage. This enables efficient similarity search and storage in vector database systems.

VoiceprintExtraction

Service that implements voiceprint extraction.

Extract

MethodExtract
RequestExtractRequest stream
ResponseExtractResponse
Description

Performs synchronous voiceprint extraction from an audio. Returns the result after completely sent and processed audio.

VoiceprintMerging

Service that implements voiceprint merging.

Merge

MethodMerge
RequestMergeRequest
ResponseMergeResponse
Description

Merges a list of voiceprints into a single voiceprint. The new voiceprint is a combination of all voiceprints in the list. It uses a weighted arithmetic mean by weighting each voiceprint by its speech length. This method is intended for merging voiceprints of a single speaker, not for merging voiceprints from multiple different speakers. Note that the resulting voiceprint is not the same as the voiceprint extracted from a file created by concatenating all individual audio files into one, but it is very similar.