WhatYouSay - Method

WhatYouSay uses a two-level analysis pipeline for exported text conversations.
Level A produces deterministic, reproducible metrics from raw text.
Level B generates a narrative interpretation via a large language model, grounded in Level A metrics.
The levels are separated by design: Level A is independently verifiable; Level B is interpretive and should be read as pattern description, not diagnosis.

Input: raw .txt file parsed into individual messages attributed to speakers.

Metrics produced:

Message counts, character counts, and estimated word counts per speaker.
Speaker participation distribution (percentage of corpus).
Average message length (words per message).
Question rate (percentage of messages containing question marks).
Emoji usage rate.
Overall tone valence (composite score).
Emotion distribution via NRC Word-Emotion Association Lexicon (Mohammad and Turney, 2013).
Moral/value framing via LibertyMFD (Araque et al., 2022) and MoralStrength (Araque et al., 2019).

Reliability thresholds:

Individual analysis is most reliable with at least 250 messages from the selected user.
Group comparisons require at least 5% contribution to total corpus.
Below these thresholds, outputs are flagged as potentially noisy or incomplete.

All Level A processing is deterministic: identical input yields identical Level A metrics.

Level B uses a large language model to generate a reflective narrative summary of the selected user's communication patterns in group context.

Input to Level B includes Level A metrics plus contextual anonymised excerpts from the conversation.

Output sections include:

Level B is interpretive, not deterministic. Re-running may produce different wording, while core pattern anchors remain tied to Level A data.

All speaker names except the user handle are anonymised in Level B output.

WhatYouSay is not a clinical, psychiatric, or diagnostic tool.
It does not assess personality, mental health, intelligence, or character.
Outputs describe language-use patterns and do not evaluate individuals.
The Practical upgrades section offers observational suggestions, not therapeutic advice.

Mohammad, S. M., and Turney, P. D. (2013). Crowdsourcing a Word-Emotion Association Lexicon. NRC Emotion Lexicon resource.
Mohammad, S. M., and Turney, P. D. (2010). Emotions Evoked by Common Words and Phrases. Resource page and references.
Araque, O., Gatti, L., and Kalimeri, K. (2022). LibertyMFD: A Lexicon to Assess the Moral Foundation of Liberty. DOI: 10.1145/3524458.3547264.
Araque, O., Gatti, L., and Kalimeri, K. (2019). MoralStrength. arXiv:1904.08314.
Vasse, BR. (2026). Measuring Within-Person Variation in Written Communication Patterns Across Social Contexts. Zenodo:18890804.