Evaluating Inclusive Engagement through Gamification

TLDR

(Too Long Didn’t Read)

How might we engage new editors in contributing to Wikipedia to achieve knowledge equity? This research evaluated designs to encourage new editors to keep editing. I worked closely with an experienced design researcher, learned from navigating the challenges of gamification research and cross-cultural evaluative research in collaboration with global research partners, and presented research findings to cross-functional stakeholders and senior leadership at the Wikimedia Foundation, the nonprofit that hosts Wikipedia. This research impacted features now available to new Wikipedia editors.

The Growth Team at the Wikimedia Foundation builds features that encourage newcomers to make edits on Wikipedia. The team is driven by the Foundation’s commitment to knowledge equity—enabling an inclusive community where people of all backgrounds can contribute and benefit from knowledge.

Following generative research and discussion with the community of editors about positive reinforcement for new editors, this study focused on evaluating users’ reactions to static designs across several diverse Wikipedia communities—Arabic, Spanish, and English.

Overview

Timeline

July-August 2022

Role

Design research intern responsible for analysis of English sessions, synthesis of findings from all sessions (Arabic, English, and Spanish), and presentation of findings to Wikimedia Foundation cross-functional stakeholders and leadership.

1. What do users expect from the current designs?

2. How do users feel about the current designs?

3. What kinds of information about their own editing activity do users want to be able to access in their homepage?

Research Questions

Process

I had the chance to comment on the discussion guide before testing began, but I picked up leading the project after data collection. The one-hour sessions focused on the participant’s relationship with Wikipedia, their engagement with other online or offline communities, and their review and discussion of static design images.

Analyzing English Sessions: The cross-functional team wanted feedback on each design image. To organize the quantity of data by the design across the seven participants, I developed a spreadsheet template organizing analysis by participant, design, and theme where I recorded both observations and analysis. For each sub-topic of features—Impact, Levelling Up, and Personalized Praise—I added a summary synthesis of key findings.

Synthesizing Cross-cultural Research: We worked with two external research agencies in Argentina and Jordan for this study. These agencies followed the same discussion guide and returned a summary report of their findings. I reviewed the reports, returned clarification questions, and integrated the findings from my own report on English sessions with the Arabic and Spanish reports—highlighting where shared and contrasting reactions aligned with shared behaviors (for example, power editors vs. readers) or the targeted Wikipedia community (for example, Spanish vs. Arabic).

Presenting Findings: I prepared three presentations of the report—a preview of the English Wikipedia study presented first for the lead UX designer and product manager on the project, a presentation of the completed study for the entire cross-functional Growth Team (Community Relations Specialist, Product Managers, Designers, Engineers, etc.), and a brief high level overview I presented in a Growth Team update to the VP of Product and VP of Product Design.

Findings

I presented specific findings by design and summarized the following key insights and considerations. In short, the findings highlighted the tension between incentivizing editing through fun and games or by building credibility and skill, reflecting the seriousness that readers and editors ascribe to Wikipedia.

Make impact data actionable: Impact data was a compelling feature for participants with more experience editing, which several related to their interest in data. For those new to editing, impact data, beyond views and basic editing activity, may be more interesting if linked to goal-setting and optimizing impact.

Evaluate the ideal editing interval: Across features, daily intervals seemed likely to be overly ambitious for new and casual editors. Consider consulting usage analytics to identify “natural” intervals for new and casual editors to make goals more attainable.

Ensure credibility of assessments: Novice editor participants were interested in validation of their skills through these features. Some hoped that badges could lend credibility to their work reviewed by more experienced editors. With that potential, it could be valuable to evaluate how to leverage them to garner community trust of newcomers.

Reward quality and collaboration over quantity: Both editor and reader participants from Spanish Wikipedia were more interested in recognition of their knowledge or expertise (quality) than the number of edits they have made (quantity). Similarly, some Arabic and English editors are motivated by their professional interests and skill development to edit. Orienting goals and rewards to other indicators of skilled edits, such as adding references or topical contributions, and collaboration or community involvement may also help mitigate concerns about competition overtaking collaboration.

Prioritize human recognition: While scores and badges are potentially valued, recognition from other editors garnered the strongest reaction. Reinforcing previous studies—features which promote giving, receiving, and revisiting thanks seemed most engaging.

Experiment with playfulness of designs: Some participants (primarily from Spanish Wikipedia) felt that simple, fun designs were overly childish or playful for the seriousness of Wikipedia. Consider experimenting with visual designs that vary in levels of playfulness to evaluate broader reactions to “fun” on Wikipedia.

“It looks like a little certificate, and it could be used as proof that you know how to edit online. It would be useful to show that even though I’m a disaster with computers, I can still do something with it. It could give me some credibility.”

— Novice Spanish and English Wikipedia Editor

Impact

The Growth Team collaborates closely with ambassadors to various Wiki communities and regularly consults these community advocates throughout the product development process. The designs tested were inpsired by community feedback, and the Growth Team shared the research findings back to the community. This collaborative relationship between the team and the community is fundamental to effectively growing and maintaining Wikipedia editing communities.

The research contributed to iteration on the Impact Module and Leveling Up feature designs, which are now available to new editors, such as revisions to the top of the Impact Module to emphasize thanks and make streaks more noticeable. The research also helped to inform subsequent hypotheses and A/B testing of redesigned features.

*The progression of one of the features included in this test from the test designs to the redesign released into production*

“Thank you for assembling this research! There are definitely some actionable findings plus some confirmation of assumptions we had. The team will have a good conversation based on this valuable research.”

— Group Product Manager

Learnings

This study was an opportunity to explore the limits of spoken feedback on gamification features. Engagement in gamification features is often the result of intuitive reactions that participants may not be able to anticipate and articulate in reviewing static designs, and I had to remind myself not to over-index on reactions in the moment. While the test sessions elicited concrete usability feedback (designs that did not meet participants’ expectations), I learned to emphasize the limits of the predictive value of the thematic findings and frame suggestions as considerations rather than recommendations that could inspire upcoming experimentation. The qualitative feedback could add a lot of value in informing hypotheses but not predicting editor behavior.
Collaborating with research partners around the world was exciting, interesting, and also challenging. I struggled with the tension between efficiency of taking other researchers’ findings at face value versus understanding more specifically how those findings emerged from their data. In the future, I would want to spend more time with research partners at the start to align and build trust in how we approach conducting sessions and structuring our analysis to ensure more confidence in synthesizing findings efficiently at the end.

The full research report is available on Wikimedia Commons here.