Sound Buddy Principles | Shanghai Rave Index

Short answer

Technically credible

The page uses standard browser APIs to request microphone input, route it through a Web Audio graph, and read real-time frequency/time-domain data. This is normal Web Audio territory, not speculative AI.

Scientifically grounded

Music information retrieval has long used timbral, rhythmic, spectral, temporal, and tonal descriptors for classification, similarity, tempo, and annotation tasks. Sound Buddy uses a small, transparent subset of that family of ideas.

Not final truth

The project should not claim exact subgenre identification. It should say "leans", "likeness", "candidate", and "learning cue", because genre classification is not purely objective and the page does not use a trained reference dataset.

The pipeline

1. Permission

The browser asks for microphone access through `getUserMedia`. If the user denies permission, Sound Buddy cannot listen live and falls back to practice mode.

Basis: W3C Media Capture defines APIs for requesting access to microphones and cameras; MDN documents explicit user permission requirements.

2. Local analysis

The microphone stream is connected to an `AnalyserNode`. The analyzer exposes time-domain and frequency-domain data without needing to save or upload audio.

Basis: Web Audio defines an audio routing graph; `AnalyserNode` is designed for real-time frequency and time-domain analysis.

3. Feature extraction

The code reduces raw bins into broad bands: sub, kick/bass, low-mid body, mids, top, and air. It also estimates balance, brightness, pulse stability, and rough BPM.

Basis: MIR systems commonly describe audio with spectral, temporal, rhythmic, and high-level descriptors.

4. Feedback layer

The app maps features to prompts: "low end is leading", "top texture opened", "pulse is steady", "genre mix reads breaks plus electro plus techno".

Basis: the translation from features to advice is editorial and educational; it is deliberately framed as guidance, not identification.

What is theoretically strong

1. Real-time frequency data

The browser can supply frequency-domain values from live audio. MDN documents that `getByteFrequencyData()` copies current frequency data into an unsigned byte array; the data spans from 0 to half the sample rate. That supports meters, spectral balance, and texture prompts.

2. Spectral balance

Audio analysis libraries commonly treat each frame of a magnitude spectrum as a distribution over frequency bins. Spectral centroid is one standard example: it estimates where spectral energy is centered. Sound Buddy uses simpler band ratios and brightness, but the concept is aligned.

3. Rhythm and tempo cues

Beat tracking research commonly works from onset strength, tempo estimation, and beat peak selection. Sound Buddy does a lighter, browser-friendly pulse estimate. It is enough for "rough BPM / stable pulse" comments, not enough for definitive beat tracking.

4. Genre-relevant features

Tzanetakis and Cook's classic genre-classification work proposed timbral texture, rhythmic content, and pitch content feature sets. Electronic styles often differ strongly in rhythm, low-end weight, and timbre, so those cues are useful for education.

What is not strong enough to overclaim

No track ID

The project does not fingerprint audio, compare against a catalog, or recognize artists. It cannot tell you the track title.

No authoritative subgenre ruling

"Hard techno 71%" or "UKG / garage 56%" means similarity to a cue profile inside this app. It is not a database-backed ground truth label.

No room correction

Phone speakers, laptop mics, club PA bleed, distance from the source, echo, and browser audio processing can shift the spectrum. The advice is best used as a listening aid.

Evidence map

Input level

Measures whether the microphone signal is loud enough to analyze. Below a threshold, the page waits.

Supported by standard time-domain audio analysis; useful for avoiding false notes on silence.

Low-end weight

Compares sub and bass bands against total spectrum. This helps teach kick pressure, bass tail, and physical weight.

Reasonable for techno education; vulnerable to playback system and microphone response.

Top texture

Compares high and air bands. This helps identify hats, metallic noise, hiss, resonance, and perceived intensity.

Grounded in spectral feature thinking; not a direct instrument detector.

Pulse / BPM

Looks for repeated low-end movement and estimates a rough pulse. It can say "fast pulse" or "steady loop".

Beat tracking is a serious MIR task; this browser heuristic is intentionally lighter than academic beat trackers.

Genre mix

Ranks broad lanes such as techno, electro, house, breaks, acid, trance, garage/UKG, jungle/DNB, bass/dubstep, downtempo/ambient, hard dance/hardcore, and industrial by feature similarity over a recent audio-stream window, not one isolated frame.

The current window is about 12 seconds: long enough to cover a short groove phrase at common club tempos, short enough to react when the arrangement changes.

Subgenre candidates

Maps the recent stream cues to detailed labels such as hard techno, dub techno, electro funk, UKG, jungle/DNB, dubstep, footwork, psytrance, hardstyle, hardcore/gabber, ambient, and hard trance.

The weakest layer scientifically. It is editorial taxonomy plus signal heuristics, so the UI must keep confidence language cautious.

Privacy position

Local by design

The current implementation reads browser analyzer arrays and produces comments in the page. It does not need server audio processing.

Permission-bound

Microphone access depends on browser permission. A denial is handled as a normal state, not an error to bypass.

Minimal claim

The safe public claim is: no audio is intentionally hosted, stored, uploaded, or fingerprinted by this page.

Sources

These references support the technical basis and the project's caution around genre naming. They do not certify the app's exact thresholds; those thresholds remain product heuristics.

W3C Web Audio API

Defines the high-level browser audio graph model used for processing and analyzing audio in web applications.

w3.org/TR/webaudio-1.1

MDN AnalyserNode

Documents the browser node that provides real-time frequency and time-domain analysis information.

MDN AnalyserNode

MDN getByteFrequencyData

Documents how frequency data is copied into a byte array and how frequency bins map up to half the sample rate.

MDN frequency data

W3C Media Capture

Defines browser APIs for requesting access to local media devices such as microphones.

w3.org/TR/mediacapture-streams

MDN getUserMedia

Documents the user-permission requirement for opening microphone or camera input.

MDN getUserMedia

W3C Permissions

Defines common infrastructure for permission states around powerful web-platform features.

w3.org/TR/permissions

Tzanetakis and Cook, 2002

Classic IEEE paper on musical genre classification using timbral texture, rhythmic content, and pitch content features.

Musical genre classification of audio signals

Tzanetakis, 2001 ISMIR

Early genre-classification work noting perceptual criteria related to texture, instrumentation, and rhythmic structure.

Automatic musical genre classification

Lippens, Martens, Mulder, 2004

Compares human and automatic genre classification and supports the project's caution that genre labels are inherently subjective.

Human and automatic genre classification

librosa spectral centroid

Documents a standard spectral descriptor that treats a magnitude spectrum as a frequency-bin distribution.

librosa spectral centroid

librosa beat tracking

Documents a beat-tracking pipeline based on onset strength, tempo estimation, and beat peak selection.

librosa beat track

Essentia

Reference MIR library with spectral, temporal, tonal, rhythm, and high-level descriptors for audio analysis.

essentia.upf.edu

TISMIR scope

Shows MIR as an interdisciplinary research field covering rhythm, beat, tempo, timbre, style, genre, and classification.

Transactions of ISMIR

Tempo estimation review

Peer-reviewed discussion of tempo-estimation evaluation, applications, metrics, and datasets.

Music Tempo Estimation: Are We Done Yet?

Operational conclusion

Sound Buddy is viable as an always-on educational listening companion. It should remain transparent, local, and cautious: strong on feature-based feedback, modest on genre certainty, explicit about privacy, and clear that style detection is a learning cue rather than proof.

Return to Sound Buddy