MVAT Mirror — Technical White Paper

1. Abstract

MVAT Mirror is a mobile application that builds personality profiles from communication patterns without requiring users to answer a single questionnaire question. By analyzing structural properties of writing — sentence length, question frequency, vocabulary diversity, response timing, and other statistical features — Mirror produces Big Five (OCEAN) personality scores with quantified confidence intervals.

All analysis runs on-device. Raw message content is processed in-memory, converted into numerical feature vectors, and immediately discarded. Only derived personality scores are stored or synced. This architecture makes it structurally impossible for Mirror to read, store, or transmit the content of user communications.

This paper describes the scientific foundations underpinning Mirror's analysis, the feature extraction pipeline, the confidence scoring model, the privacy architecture, and the supported personality frameworks.

2. The Problem with Personality Questionnaires

Traditional personality assessment relies on self-report questionnaires — instruments like the NEO-PI-R, the BFI-44, or the MBTI Form M. While these instruments have significant empirical backing, they share structural limitations that reduce their practical utility for most people:

Social desirability bias. Respondents systematically answer in ways they perceive as socially favorable. Studies show this effect accounts for 10-25% of variance in self-report measures (Paulhus, 1991).
Reference group effects. "I am organized" is answered relative to an internal reference group that varies by culture, profession, and social context. Two equally organized individuals may give different answers.
Temporal instability. Questionnaire results fluctuate with mood, context, and recency effects. Test-retest reliability for short-form instruments typically ranges from r = 0.70 to 0.85 (Gosling et al., 2003).
Completion burden. The gold-standard NEO-PI-R contains 240 items. Even short-form instruments require 10-15 minutes of focused attention. Most people never complete one outside of an academic or clinical setting.
One-time snapshots. Questionnaires capture a single moment. They cannot track personality expression changes over weeks, months, or contexts.

Computer-based behavioral analysis addresses each of these limitations. Writing style is difficult to consciously manipulate, is measured against absolute scales rather than subjective reference groups, can be aggregated across hundreds of samples to reduce noise, imposes zero burden on the user, and naturally tracks longitudinal patterns.

Key finding: Youyou, Kosinski, and Stillwell (2015) demonstrated that computer models predicted personality from digital behavior more accurately than most human judges. With 300+ behavioral signals, the computer model achieved r = 0.56 against self-reports — surpassing the accuracy of coworkers (r = 0.27), friends (r = 0.45), and family members (r = 0.49). Only spouses performed comparably (r = 0.58).

3. Mirror's Approach: Behavioral Signals

Mirror takes a fundamentally different approach to personality assessment. Rather than asking users to describe themselves, Mirror observes how they naturally communicate — and derives personality indicators from the structural patterns in that communication.

Communication
sources

→

In-memory
text processing

→

Feature vector
extraction

→

Trait score
computation

→

Personality
profile

The key distinction is between content and style. Mirror never interprets what a message says — it measures how the message is structured. Consider two messages:

"Hey, quick question — do you think we should go with option A? I was thinking maybe B could work too but I'm not sure. What do you think?"
"Go with A."

Mirror does not evaluate the topic being discussed. Instead, it extracts structural features: message 1 has 32 words, 3 questions, 2 hedging markers ("maybe", "I'm not sure"), high first-person pronoun usage, and a collaborative closing. Message 2 has 3 words, 0 questions, 0 hedging markers, and a directive structure. These structural differences correlate with measurable personality trait differences in Agreeableness, Openness, and Extraversion.

Why Style Beats Content

The insight that function words (pronouns, prepositions, articles) reveal more about personality than content words (nouns, verbs) was established by James Pennebaker's pioneering work at the University of Texas at Austin. His Linguistic Inquiry and Word Count (LIWC) framework demonstrated that the small, easily-overlooked words people use — "I" vs. "we", "but" vs. "and", "maybe" vs. "definitely" — are more predictive of personality traits than topic choice or vocabulary sophistication (Pennebaker, 2011).

Mirror builds on this research with a feature extraction approach inspired by LIWC's lexicon-based methodology, adapted for mobile communication patterns including email, SMS, and messaging metadata.

4. The Big Five (OCEAN) Model

Mirror's primary personality framework is the Big Five model, the most empirically validated framework in personality psychology. Unlike categorical systems (e.g., MBTI's 16 types), the Big Five describes personality as positions on five continuous spectra. This continuous measurement allows for nuanced, reproducible assessment and enables statistical confidence intervals.

Openness to Experience

Curiosity, imagination, aesthetic sensitivity

Correlated signals: Vocabulary diversity, question frequency, use of abstract language, varied sentence structures, exploration of tangential topics

Conscientiousness

Organization, dependability, self-discipline

Correlated signals: Response timing consistency, message completeness, punctuation correctness, structured formatting, follow-through in threaded conversations

Extraversion

Sociability, assertiveness, positive affect

Correlated signals: Message length and volume, exclamation frequency, emoji usage, positive emotion words, conversation initiation rate

Agreeableness

Warmth, cooperation, empathy

Correlated signals: Social and cooperative words, question-to-statement ratio, hedging language, first-person plural pronouns ("we"), collaborative phrasing

Emotional Stability

Resilience, composure, stress management

Correlated signals: Variance in message timing, negative emotion word frequency, sentiment consistency across messages, cognitive complexity markers

Display convention: Mirror displays the fifth dimension as "Emotional Stability" rather than "Neuroticism." The underlying score is computed as neuroticism and inverted for display, so a high Emotional Stability score corresponds to low neuroticism. This reframing follows positive psychology conventions and avoids pathologizing language.

Each trait is scored on a continuous scale from 0 to 100, where 50 represents the population average. Scores are accompanied by confidence intervals that narrow as more communication data is analyzed. A score of 72 in Openness with a confidence interval of [65, 79] means Mirror estimates the user's Openness is above average, and is 95% confident the true value falls within that range.

5. Feature Extraction Pipeline

Mirror's analysis pipeline extracts 12 numerical features from each communication event. These features are the only output of raw text processing — once extracted, the source text is discarded from memory. The feature vector is the atomic unit of Mirror's analysis.

Feature	Type	Description
wordCount	Integer	Total number of words in the message
questionCount	Integer	Number of question marks or interrogative constructions
exclamationCount	Integer	Number of exclamation marks
firstPersonPronounRatio	Float [0, 1]	Proportion of words that are first-person pronouns (I, me, my, mine)
positiveEmotionWords	Integer	Count of words from positive emotion lexicon (joy, love, great, excited)
negativeEmotionWords	Integer	Count of words from negative emotion lexicon (hate, angry, sad, worried)
socialWords	Integer	Count of social and cooperative terms (we, together, team, help)
cognitiveComplexityWords	Integer	Count of causal and reasoning markers (because, therefore, however, although)
avgSentenceLength	Float	Mean number of words per sentence
vocabularyDiversity	Float [0, 1]	Type-token ratio: unique words divided by total words
emojiCount	Integer	Number of emoji characters used
isReply	Boolean	Whether the message is a response to another message (contextual estimate)

Lexicon-Based Feature Detection

Four of the twelve features — positiveEmotionWords, negativeEmotionWords, socialWords, and cognitiveComplexityWords — rely on curated word lists (lexicons). Mirror's approach is inspired by the LIWC (Linguistic Inquiry and Word Count) framework developed by Pennebaker and colleagues (Tausczik & Pennebaker, 2010). Each lexicon category contains carefully selected marker words and their common variations.

The lexicon approach has a critical advantage for privacy: it requires only word-level matching against predefined lists, not sentence-level comprehension. Mirror counts how many words from each category appear — it does not parse grammar, resolve references, or interpret meaning. The lexicons are static assets bundled with the application; no network lookup is required.

Feature Aggregation

Individual message features are aggregated into running statistical distributions. Mirror tracks the mean, variance, and trend (slope over time) for each numerical feature across all analyzed messages. This aggregation serves two purposes: it smooths out noise from individual messages, and it enables trend detection — personality expression is not static, and Mirror can observe shifts in communication patterns over time.

6. Mapping Features to Traits

Mirror maps extracted features to Big Five trait scores using weighted scoring functions derived from published correlation matrices. Each trait draws signal from multiple features, and each feature may contribute to multiple traits. The mapping reflects empirical findings from computational linguistics research.

Feature-Trait Correlation Matrix

The following summarizes the primary directional relationships between features and Big Five dimensions, based on research by Yarkoni (2010), Schwartz et al. (2013), and Pennebaker (2011):

Openness draws positive signal from vocabularyDiversity, questionCount, cognitiveComplexityWords, and avgSentenceLength. High Openness correlates with linguistically complex, question-rich communication.
Conscientiousness draws signal from temporal consistency (low variance in response timing), avgSentenceLength stability, and low variance in wordCount. Consistent communicators tend to score higher.
Extraversion draws positive signal from wordCount, exclamationCount, emojiCount, and positiveEmotionWords. Extraverts write more, with more expressive markers.
Agreeableness draws positive signal from socialWords, questionCount (especially collaborative questions), and positiveEmotionWords. It draws negative signal from high firstPersonPronounRatio — agreeable communicators use more "we" and less "I."
Emotional Stability draws inverse signal from negativeEmotionWords and from high variance in sentiment across messages. Emotionally stable communicators show more consistent emotional tone.

Scoring Mechanism

For each Big Five dimension, Mirror computes a weighted sum of normalized feature values. Features are normalized against population baselines derived from published research on email and messaging corpora. The weighted sum is passed through a sigmoid function to produce a score between 0 and 100, centered at 50 (the population mean).

Weights are not learned from Mirror user data — they are derived from published correlation coefficients in the personality-language literature. This design choice means Mirror does not require a training dataset of users who have taken both questionnaires and had their writing analyzed. It also means Mirror never builds behavioral models from user data that could be subject to data mining concerns.

7. Confidence & Maturity Model

Personality estimation accuracy improves with the volume of analyzed communication. Mirror quantifies this through two complementary systems: per-trait confidence intervals and an overall profile maturity indicator.

Per-Trait Confidence

Each Big Five trait score is accompanied by a confidence value between 0.0 and 1.0, representing Mirror's certainty in the estimate. Confidence is computed from the number of events analyzed for that trait and the consistency (low variance) of the underlying feature distributions. A trait with high event count but high feature variance will have lower confidence than a trait with fewer events but highly consistent signals.

Confident (confidence ≥ 0.70): The trait score is reliable and the confidence interval is narrow. Displayed with full visual emphasis.
Building (confidence 0.40 – 0.69): The trait score is directionally meaningful but may shift as more data is analyzed. Displayed with a "building" indicator.
Maturing (confidence < 0.40): Insufficient data for a reliable estimate. The trait score is hidden or shown as "gathering data."

Profile Maturity

The overall profile maturity reflects the total number of communication events processed across all connected data sources. This provides users with a simple, intuitive sense of how complete their profile is.

Maturing

0 – 49

events processed. Profile is forming. Early signals may appear.

Building

50 – 499

events processed. Trait estimates emerging. Confidence intervals wide.

Confident

500+

events processed. Reliable profile. Scores stabilized and intervals narrow.

Mirror also provides an estimated time-to-confident calculation based on the user's current data accumulation rate. If a user is generating 15 events per day and has processed 200, Mirror will display "~20 more days of your normal messaging" as a progress indicator.

8. Early Signals System

Before a full personality profile forms, users want meaningful feedback. Mirror's early signals system surfaces notable patterns as soon as 50 communication events have been processed — well before the 500-event threshold for a confident profile.

Early signals are observations about communication patterns that hint at personality traits without committing to a full score. Each early signal includes:

Trait association: Which Big Five dimension the pattern relates to
Observation: A plain-language description of the detected pattern (e.g., "You ask more questions than average — this is linked to Openness")
Detection timestamp: When the pattern was first identified
Event count at detection: How many events had been analyzed when the signal appeared

Early signals are generated by detecting statistical outliers in feature distributions relative to population baselines. If a user's question frequency is consistently above the 75th percentile after 50 messages, Mirror surfaces this as an early signal for Openness — without assigning a numerical score.

Design principle: Early signals use qualitative language ("more than average", "notably consistent") rather than numerical scores. This prevents users from anchoring on premature estimates that may shift significantly as more data arrives.

9. Supported Personality Frameworks

While the Big Five model serves as Mirror's foundation, the app supports seven personality frameworks total. Six additional frameworks are available to Pro subscribers. All derived frameworks are computed from the Big Five scores using published mapping functions — they do not require separate data analysis.

Big Five (OCEAN)

FREE

Core personality dimensions on continuous scales. The foundation for all other framework mappings.

Enneagram

PRO

Nine interconnected personality types revealing core motivations and growth paths.

MBTI Analysis

PRO

16 cognitive preference types mapped from Big Five scores without the unreliable questionnaire.

DISC Assessment

PRO

Workplace behavior and communication style profiling: Dominance, Influence, Steadiness, Conscientiousness.

Attachment Style

PRO

Relationship patterns and emotional bonding style: secure, anxious, avoidant, disorganized.

Love Languages

PRO

How you express and receive affection: words, acts, gifts, time, touch.

Values in Action

PRO

Character strengths and virtues classification based on positive psychology research.

Cross-Framework Derivation

The Big Five model serves as a universal foundation because it has well-documented statistical relationships with most other personality frameworks. Research has established reliable mapping functions between Big Five scores and MBTI preferences (McCrae & Costa, 1989), Enneagram types (Dris & Noftle, 2011), DISC dimensions, and attachment style classifications. Mirror uses these published mapping functions to derive secondary framework scores from the primary Big Five analysis, rather than performing separate language analysis for each framework.

10. Data Sources & Connectivity

Mirror connects to three categories of communication data. Each source undergoes the same feature extraction pipeline. Free accounts can connect up to 3 sources; Pro accounts have no limit.

Gmail (OAuth 2.0)

Mirror requests read access to sent mail only — it does not access received emails, drafts, or other mailbox contents. Each sent email is processed in-memory to extract a feature vector. The raw email body is never written to disk, stored in a database, or transmitted to any server. OAuth tokens are stored securely on-device and can be revoked at any time through Mirror's settings.

SMS / Messages

On Android, Mirror accesses SMS content with user permission to extract writing pattern features. On iOS, due to platform privacy restrictions, Mirror cannot access iMessage or SMS content directly. On iOS, Mirror falls back to contact frequency and timing metadata analysis only, which provides Conscientiousness and Extraversion signals but limited Openness and Agreeableness data.

Call Logs

Mirror analyzes call duration and frequency patterns — it does not record or analyze call audio. Call log data contributes primarily to Extraversion signals (call frequency, average duration, breadth of contacts) and Conscientiousness signals (consistency of calling patterns, time-of-day regularity).

Historical Import

Users can optionally import historical communications to accelerate profile maturity. Historical import requires explicit opt-in and processes messages in batches with a progress indicator. This is the fastest path to a confident profile — a user with 500+ sent emails in their Gmail account can reach confident profile status immediately after the initial import completes.

11. Privacy Architecture

Privacy in Mirror is not a policy decision — it is a structural property of the system architecture. The service interfaces are designed so that raw text physically cannot be transmitted beyond the feature extraction boundary.

The Privacy Boundary

Mirror's code architecture enforces a strict boundary between text processing and personality analysis. The data ingestion service processes raw text and outputs only feature vectors (the 12 numerical features described in Section 5). The personality analysis service accepts only feature vectors as input — it has no API for receiving raw text. This structural API design means that even a code bug or misconfiguration cannot cause raw text to reach the cloud sync layer, because the types do not permit it.

1

Raw text enters memory

Communication content is loaded into device memory for processing. It is never written to disk.

2

Feature extraction

12 numerical features are computed from the text. Word lists are matched against static lexicons. No semantic analysis occurs.

3

Text discarded

The raw text is released from memory. Only the numerical feature vector (a few hundred bytes) remains.

4

Trait scoring

Feature vectors are aggregated and mapped to Big Five scores with confidence intervals. All computation is on-device.

5

Personality vectors stored

Only the final personality scores, confidence values, and event counts are persisted — locally and optionally to cloud backup.

What Is Never Stored or Transmitted

Raw email, SMS, or message content
Individual words, phrases, or sentences from communications
Recipient identities, email addresses, or phone numbers
Message timestamps at individual granularity (only aggregated temporal patterns)
Subject lines or conversation topics

What Is Stored

On device: Personality trait scores and confidence intervals, aggregate feature distributions (mean, variance per feature), event counts per source, source connection status
Cloud synced (optional): Personality trait scores (for backup/restore), subscription tier, profile maturity level, account metadata

Data Export and Deletion

Users can export their complete data at any time through Mirror's privacy controls. The export is a JSON file containing only personality vectors and metadata — it explicitly declares that no raw communications are included. Account deletion triggers a full purge of all data from both the device and cloud storage, completing within 30 seconds.

12. Security & Data Protection

Authentication

Mirror supports sign-in via Google OAuth and Apple Sign-In. Authentication is handled through Firebase Authentication, with identity tokens stored securely in the device keychain (iOS) or encrypted shared preferences (Android). Mirror does not implement custom password authentication.

Data at Rest

On-device personality vectors and feature distributions are stored in encrypted local storage. Cloud-synced data is stored in Firestore with Firebase security rules that restrict read and write access to the authenticated user's own documents.

Data in Transit

All network communication uses TLS 1.2 or higher. OAuth token exchanges follow the Authorization Code flow with PKCE (Proof Key for Code Exchange), the industry standard for mobile applications. The critical point is that very little data is in transit — the on-device architecture minimizes network exposure by design.

OAuth Token Management

Gmail OAuth tokens are stored on-device and used only for fetching sent mail during active sync operations. Tokens are never transmitted to Mirror servers. When a user disconnects a data source, access tokens are revoked and deleted from the device. Derived personality vectors from previously analyzed communications are retained unless the user explicitly requests deletion.

Third-Party Data Sharing

Mirror does not share personality data, behavioral data, usage data, or any other user data with third parties, advertisers, data brokers, or analytics providers. The only external services Mirror communicates with are Firebase (for authentication and optional cloud backup) and the respective OAuth providers during sign-in.

13. Limitations & Ethical Considerations

Mirror provides personality estimates, not clinical assessments. Users and any downstream consumers of Mirror data should understand the following limitations:

Accuracy Limitations

Not a diagnostic tool. Mirror's personality estimates should not be used for clinical diagnosis, employment screening, or high-stakes decision making. They are informational and intended for self-reflection.
Context dependency. Writing style varies across contexts. A user who writes formally at work and casually with friends may produce different profiles depending on which communications are analyzed. Multi-source analysis helps mitigate this, but does not eliminate it.
Cultural and linguistic factors. The underlying research was primarily conducted on English-language corpora from Western populations. Mirror's accuracy may be reduced for non-English text or communication styles from underrepresented cultural contexts.
Sample size effects. Profiles based on fewer than 500 events have wider confidence intervals and are more susceptible to noise. Mirror communicates this through the maturity model, but users may overinterpret early scores.
Temporal snapshots. While Mirror tracks trends, personality expression can shift due to life events, mental health changes, or context shifts that may not be captured by the model.

Ethical Principles

Informed consent. Mirror explains what it analyzes, how it works, and what data it accesses before requesting any permissions. The onboarding flow includes a plain-language explanation of the analysis methodology.
User control. Users can disconnect sources, delete all data, and export their personality profiles at any time. There are no lock-in mechanisms.
No dark patterns. Mirror does not use personality scores to manipulate user behavior, surface targeted content, or optimize for engagement. The data exists for the user's self-understanding only.
Transparency. This whitepaper, the in-app "How It Works" explanation, and the source code architecture are designed to make Mirror's methodology inspectable and understandable.

14. References

Pennebaker, J.W. (2011). The Secret Life of Pronouns: What Our Words Say About Us. Bloomsbury Press.
Tausczik, Y.R. & Pennebaker, J.W. (2010). The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social Psychology, 29(1), 24-54.
Yarkoni, T. (2010). Personality in 100,000 Words: A Large-Scale Analysis of Personality and Word Use among Bloggers. Journal of Research in Personality, 44(3), 363-373.
Schwartz, H.A., Eichstaedt, J.C., Kern, M.L., et al. (2013). Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach. PLoS ONE, 8(9), e73791.
Youyou, W., Kosinski, M., & Stillwell, D. (2015). Computer-Based Personality Judgments Are More Accurate Than Those Made by Humans. Proceedings of the National Academy of Sciences, 112(4), 1036-1040.
Stachl, C., Au, Q., Schoedel, R., et al. (2020). Predicting Personality from Patterns of Behavior Collected with Smartphones. Proceedings of the National Academy of Sciences, 117(30), 17680-17687.
Golbeck, J., Robles, C., Edmondson, M., & Turner, K. (2011). Predicting Personality from Twitter. IEEE International Conference on Social Computing, 149-156.
Gosling, S.D., Rentfrow, P.J., & Swann, W.B. Jr. (2003). A Very Brief Measure of the Big-Five Personality Domains. Journal of Research in Personality, 37(6), 504-528.
Paulhus, D.L. (1991). Measurement and Control of Response Bias. In J.P. Robinson, P.R. Shaver, & L.S. Wrightsman (Eds.), Measures of Personality and Social Psychological Attitudes (pp. 17-59). Academic Press.
McCrae, R.R. & Costa, P.T. Jr. (1989). Reinterpreting the Myers-Briggs Type Indicator from the Perspective of the Five-Factor Model of Personality. Journal of Personality, 57(1), 17-40.
John, O.P., Naumann, L.P., & Soto, C.J. (2008). Paradigm Shift to the Integrative Big Five Trait Taxonomy. In O.P. John, R.W. Robins, & L.A. Pervin (Eds.), Handbook of Personality: Theory and Research (3rd ed., pp. 114-158). Guilford Press.
Park, G., Schwartz, H.A., Eichstaedt, J.C., et al. (2015). Automatic Personality Assessment through Social Media Language. Journal of Personality and Social Psychology, 108(6), 934-952.

MVAT Mirror Technical White Paper

Contents