speaker_identification.proto
path phonexia/grpc/technologies/speaker_identification/v1/speaker_identification.proto
package phonexia.grpc.technologies.speaker_identification.v1
Phonexia Speaker Identification gRPC API.
Messages
CompareRequest
The message sent by the client for the Compare method.
| Name | Type | Description |
|---|---|---|
voiceprints_a | repeated phonexia.grpc.common.Voiceprint | First list of voiceprints to be compared. |
voiceprints_b | repeated phonexia.grpc.common.Voiceprint | Second list of voiceprints to be compared. |
CompareResponse
The top-level message returned to the client by the Compare
method. It contains comparison matrix of log-likelihood ratio (LLR) scores.
Comparison matrix containing similarity scores between
voiceprints_a and voiceprints_b. The element at
row i and column j corresponds to the comparison
result between voiceprints_a[i] and
voiceprints_b[j].
| Name | Type | Description |
|---|---|---|
scores | phonexia.grpc.common.Matrix | The similarity scores are expressed as log-likelihood ratio (LLR) values,
which fall within the interval of |
ConvertRequest
The message sent by the client for the Convert method.
| Name | Type | Description |
|---|---|---|
voiceprints | repeated phonexia.grpc.common.Voiceprint | Voiceprints to be converted to vector database format. |
ConvertResponse
The top-level message returned to the client by the Convert
method. It contains the converted vectors.
| Name | Type | Description |
|---|---|---|
vector_voiceprints | repeated phonexia.grpc.common.Vector | Voiceprints converted to vector format for vector databases. Use cosine distance to compare vector voiceprints in vector databases. Note: Individual vectors will be empty for input voiceprints that contain no speech. |
ExtractConfig
| Name | Type | Description |
|---|---|---|
speech_length | google.protobuf.Duration | Specifies the maximum speech length from which the voiceprint will be extracted. If there is less speech in the audio than the specified duration, the voiceprint will be extracted from the entire audio. |
enable_vector_voiceprint | bool | If set to true, the extraction result will contain both voiceprint and vector_voiceprint fields. If false (default), only voiceprint is returned. |
ExtractRequest
The top-level message sent by the client for the Extract method.
| Name | Type | Description |
|---|---|---|
audio | phonexia.grpc.common.Audio | Audio to extract the voiceprints from.
If the audio is in a raw format and the |
config | ExtractConfig | Voiceprint extraction configuration. |
ExtractResponse
The top-level message returned to the client by the Extract
method. It contains the result as zero or more ExtractResult
messages.
| Name | Type | Description |
|---|---|---|
result | ExtractResult | Result containing the extracted voiceprint. |
processed_audio_length | google.protobuf.Duration | When available, total length of the processed audio. Set only if this is the last response in the stream. |
ExtractResult
A voiceprint extraction result.
| Name | Type | Description |
|---|---|---|
speech_length | google.protobuf.Duration | Speech length from which the voiceprint was extracted. |
voiceprint | phonexia.grpc.common.Voiceprint | Extracted voiceprint is always included in the result. |
vector_voiceprint | phonexia.grpc.common.Vector | Vector voiceprint optimized for vector databases. Included only if enable_vector_voiceprint is set to true in ExtractConfig. Use cosine distance to compare vector voiceprints in vector databases. Note: Will be empty if the input audio contains no speech. |
MergeRequest
The message sent by the client for the Merge method.
| Name | Type | Description |
|---|---|---|
voiceprints | repeated phonexia.grpc.common.Voiceprint | List of voiceprints to be merged. |
MergeResponse
The top-level message returned to the client by the Merge
method. It contains the merged voiceprint.
| Name | Type | Description |
|---|---|---|
speech_length | google.protobuf.Duration | Speech length of merged voiceprint. It's a sum of all input voiceprints' lengths. |
voiceprint | phonexia.grpc.common.Voiceprint | Merged voiceprint. |
Services
VoiceprintComparison
Service that implements voiceprint comparison.
Compare
| Method | Compare |
|---|---|
| Request | CompareRequest stream |
| Response | CompareResponse stream |
| Description | Performs synchronous comparison of two voiceprint lists. Each voiceprint from one list is compared with each voiceprint from other list. Returns a message containing matrix of comparison scores (results of individual voiceprint-to-voiceprint comparisons). |
VoiceprintConversion
Service that implements voiceprint conversion to vector database format.
Convert
| Method | Convert |
|---|---|
| Request | ConvertRequest stream |
| Response | ConvertResponse stream |
| Description | Converts a voiceprint to a vector optimized for vector database usage. This enables efficient similarity search and storage in vector database systems. |
VoiceprintExtraction
Service that implements voiceprint extraction.
Extract
| Method | Extract |
|---|---|
| Request | ExtractRequest stream |
| Response | ExtractResponse |
| Description | Performs synchronous voiceprint extraction from an audio. Returns the result after completely sent and processed audio. |
VoiceprintMerging
Service that implements voiceprint merging.
Merge
| Method | Merge |
|---|---|
| Request | MergeRequest |
| Response | MergeResponse |
| Description | Merges a list of voiceprints into a single voiceprint. The new voiceprint is a combination of all voiceprints in the list. It uses a weighted arithmetic mean by weighting each voiceprint by its speech length. This method is intended for merging voiceprints of a single speaker, not for merging voiceprints from multiple different speakers. Note that the resulting voiceprint is not the same as the voiceprint extracted from a file created by concatenating all individual audio files into one, but it is very similar. |