Files
dictia-public/src/models/user.py
Allison aa269c5bc0 feat(auth): B-2.5 TOTP MFA + recovery codes (Fernet-encrypted secret)
Adds TOTP-based two-factor authentication (RFC 6238) with 10 single-use
recovery codes. Secret is encrypted at rest with a Fernet key derived
deterministically from app SECRET_KEY (SHA-256 -> urlsafe-base64); the raw
base32 secret never lives in the database. Recovery codes are bcrypt-hashed
and consumed atomically (single-use, removed from the JSON list on match).

Routes:
- GET /2fa/setup: generate fresh secret + QR + 10 recovery codes; cache
  pending state in session, render auth/totp_setup.html with inline QR
  data URL and the 10 codes shown ONCE.
- POST /2fa/setup: verify the user-submitted 6-digit code against the
  pending secret; on success persist encrypted secret + hashes and flip
  totp_enabled=True. On invalid code re-render same QR (don't rotate),
  preserving the user's authenticator scan.
- GET /2fa/verify: second factor during login; reads pending_totp_user_id
  from session and renders auth/totp_verify.html (TOTP code input +
  collapsed recovery code form, with X codes restants notice).
- POST /2fa/verify: accepts EITHER a 6-digit TOTP code OR a recovery code;
  on success finalises login_user (preserving remember-me intent + next
  URL captured at the password step), audits success/failure.
- POST /2fa/disable: requires password re-auth; nullifies the 3 TOTP fields.

Login gate (src/api/auth.py /login): after password+email-verification
checks but BEFORE login_user, if user.totp_enabled set
session['pending_totp_user_id'] / pending_totp_remember /
pending_totp_next and 302 -> /2fa/verify. OAuth/SSO/magic-link paths are
intentionally NOT gated in B-2.5 (deferred — IdP handles its own MFA).

Schema:
- New JSON column User.totp_recovery_codes (nullable) added via
  add_column_if_not_exists in src/init_db.py (no Alembic, follows existing
  pattern).
- Re-uses B-2.1 columns totp_secret_encrypted (VARCHAR 255) and
  totp_enabled (BOOLEAN); both already migrated.

Compatibility audit overrides honoured:
- Service layer at src/auth/totp.py (NOT a new src/auth_extended/ pkg).
- Templates at templates/auth/totp_setup.html and templates/auth/totp_verify.html
  extending marketing/base.html with brand tokens + WCAG patterns
  (focus-visible, role=alert, aria-required, autocomplete=one-time-code,
  inputmode=numeric).
- account.html integration deferred to a polish task — admins access
  /2fa/setup directly for now.

Tests (21, all green via Windows manual driver):
- Service layer: encrypt/decrypt round-trip, key-mismatch rejection, secret
  validity, code verification (current/wrong/non-digit), recovery codes
  (10 pairs, 1:1 bcrypt mapping, single-use consumption, unknown rejection),
  set/disable user TOTP fields.
- Routes: login redirect-to-/2fa/verify when totp_enabled, direct login
  when disabled, /2fa/verify with correct/wrong TOTP, recovery code consume,
  redirect-to-login when no pending session, /2fa/setup GET creates pending,
  POST with valid code enables MFA, POST with invalid code keeps pending +
  returns 400, /2fa/disable wrong/correct password.

Regression check: prior 21 OAuth+magic-link, 16 email-service, and 9
signup-Loi-25 tests all still pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-28 00:08:40 -04:00

130 lines
6.4 KiB
Python
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
"""
User and Speaker database models.
This module defines the User model for authentication and user profiles,
and the Speaker model for tracking speaker profiles used in diarization.
"""
from datetime import datetime
from flask_login import UserMixin
from src.database import db
# ConsentLog backref defined in src/models/consent.py — accessible as User.consent_logs
class User(db.Model, UserMixin):
"""User model — authentication, profile, MFA enrollment, and subscription state.
Post-B-2.1 columns include MFA (totp_secret_encrypted, totp_enabled,
webauthn_credentials), Stripe billing (stripe_customer_id, subscription_status),
and ordre professionnel context (ordre_pro, cabinet) used at signup (B-2.2).
Consent audit trail in src/models/consent.py via User.consent_logs backref.
"""
id = db.Column(db.Integer, primary_key=True)
username = db.Column(db.String(20), unique=True, nullable=False)
email = db.Column(db.String(120), unique=True, nullable=False)
password = db.Column(db.String(60), nullable=True)
sso_provider = db.Column(db.String(100), nullable=True)
sso_subject = db.Column(db.String(255), unique=True, nullable=True)
is_admin = db.Column(db.Boolean, default=False)
can_share_publicly = db.Column(db.Boolean, default=True) # Permission to create public share links
recordings = db.relationship('Recording', backref='owner', lazy=True)
transcription_language = db.Column(db.String(10), nullable=True) # For ISO 639-1 codes
output_language = db.Column(db.String(50), nullable=True) # For full language names like "Spanish"
ui_language = db.Column(db.String(10), nullable=True, default='en') # For UI language preference (en, es, fr, zh)
summary_prompt = db.Column(db.Text, nullable=True)
extract_events = db.Column(db.Boolean, default=False) # Enable event extraction from transcripts
name = db.Column(db.String(100), nullable=True)
job_title = db.Column(db.String(100), nullable=True)
company = db.Column(db.String(100), nullable=True)
diarize = db.Column(db.Boolean, default=False)
# Default naming template for title generation
default_naming_template_id = db.Column(db.Integer, db.ForeignKey('naming_template.id', ondelete='SET NULL'), nullable=True)
default_naming_template = db.relationship('NamingTemplate', foreign_keys=[default_naming_template_id])
# Token budget (None = unlimited)
monthly_token_budget = db.Column(db.Integer, nullable=True)
# Transcription budget in seconds (None = unlimited)
monthly_transcription_budget = db.Column(db.Integer, nullable=True)
# Email verification fields
email_verified = db.Column(db.Boolean, default=False)
email_verification_token = db.Column(db.String(200), nullable=True, index=True)
email_verification_sent_at = db.Column(db.DateTime, nullable=True)
# Password reset fields
password_reset_token = db.Column(db.String(200), nullable=True, index=True)
password_reset_sent_at = db.Column(db.DateTime, nullable=True)
# Auto speaker labelling settings
auto_speaker_labelling = db.Column(db.Boolean, default=False) # Enable auto-labelling when voice confidence exceeds threshold
auto_speaker_labelling_threshold = db.Column(db.String(10), nullable=True, default='medium') # 'low', 'medium', 'high'
# Auto summarization setting (user can disable if admin hasn't globally disabled)
auto_summarization = db.Column(db.Boolean, default=True)
# Transcription hints (hotwords and initial prompt for improving ASR accuracy)
transcription_hotwords = db.Column(db.Text, nullable=True)
transcription_initial_prompt = db.Column(db.Text, nullable=True)
# === B-2.1: MFA / WebAuthn / Stripe / Loi 25 fields (Phase 2 backend) ===
# B-2.5 service layer encrypts the base32 secret with SECRET_KEY before storing.
# The encrypted blob (Fernet token) is what lives in this column. NEVER assign a
# raw base32 secret to this attribute — use the service-layer setter.
totp_secret_encrypted = db.Column(db.String(255), nullable=True)
totp_enabled = db.Column(db.Boolean, default=False, nullable=False)
# WebAuthn / Passkey credentials (B-2.6) — list of credential dicts:
# [{'id': str, 'public_key': str, 'sign_count': int, 'transports': list[str]}]
webauthn_credentials = db.Column(db.JSON, nullable=True)
# B-2.5: 10 single-use recovery codes (bcrypt-hashed). Cleared when MFA disabled.
totp_recovery_codes = db.Column(db.JSON, nullable=True)
# Loi 25 + ordre professionnel context (used at signup B-2.2)
ordre_pro = db.Column(db.String(50), nullable=True) # 'barreau', 'cpa', 'chad', etc.
cabinet = db.Column(db.String(255), nullable=True)
# Stripe billing (B-2.7 / B-2.8)
stripe_customer_id = db.Column(db.String(120), nullable=True, index=True)
# 'trialing' | 'active' | 'past_due' | 'canceled' | 'incomplete' | None
subscription_status = db.Column(db.String(20), nullable=True, index=True)
def __repr__(self):
return f"User('{self.username}', '{self.email}')"
class Speaker(db.Model):
"""Speaker model for tracking voice profiles used in diarization."""
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(100), nullable=False)
user_id = db.Column(db.Integer, db.ForeignKey('user.id'), nullable=False)
created_at = db.Column(db.DateTime, default=datetime.utcnow)
last_used = db.Column(db.DateTime, default=datetime.utcnow)
use_count = db.Column(db.Integer, default=1)
# Voice embedding fields (256 dimensions from WhisperX)
average_embedding = db.Column(db.LargeBinary, nullable=True) # Binary numpy array (256 × 4 bytes = 1024 bytes)
embeddings_history = db.Column(db.JSON, nullable=True) # List of metadata: [{recording_id, timestamp, similarity}, ...]
embedding_count = db.Column(db.Integer, default=0) # Number of embeddings collected
confidence_score = db.Column(db.Float, nullable=True) # 0-1 score based on embedding consistency
# Relationship to user
user = db.relationship('User', backref=db.backref('speakers', lazy=True, cascade='all, delete-orphan'))
def to_dict(self):
"""Convert model to dictionary representation."""
return {
'id': self.id,
'name': self.name,
'created_at': self.created_at,
'last_used': self.last_used,
'use_count': self.use_count,
'embedding_count': self.embedding_count,
'confidence_score': self.confidence_score
}