TLDR

(Too Long Didn’t Read)

Why do more Japanese Wikipedia editors edit without logging than other editing language communities? How should the Foundation accommodate them as IP editing (the current mode of editing without logging in) is phased out? I immersed in grounded coding to make sense of qualitative data at scale—885 open-ended survey responses from 427 Japanese Wikipedia editors. Japanese Wikipedia is unusual in the prominence of quality edits by editors who do not log in, and this survey data offered a rare deep dive into why editors don’t log in to help inform the Wikimedia Foundation’s evolving approach to editing Wikipedia without an account. 


Overview

In 2021, the Wikimedia Foundation (WMF) Design Research team conducted a mixed methods study to understand why some editors don’t log in to edit and to explore ways that the Foundation might be able to encourage them to. At the time, editors who didn’t log in saw their edits attributed to their IP address, which remained publicly visible on the edited article’s page history. The WMF recognized that the practice of “IP editing” differs substantially across different language Wikipedias, and so the research explored the practice in English, Spanish, Japanese, and Arabic.

This project—through user analytics, interviews, and surveys—established that the Japanese Wikipedia community both sees a much higher rate of contribution from IP editors and is much less likely to revert (i.e., delete or reverse) changes made by IP editors. The Japanese community was also much more responsive than other communities surveyed, and a public survey on Japanese Wikipedia received a large number of responses, including a large volume of open-ended text entry responses. 

Over 427 Japanese Wikipedia editors responded to the survey and together submitted 885 open-ended survey responses, which I was asked to analyze. I deployed grounded coding of those open-ended responses. The analysis highlighted opportunities to enhance usability and the complex relationship between IP editing, managing vandalism, and enabling anonymity.


Timeline

July 2022


Role

Design research intern responsible for analysis of open-ended survey responses, synthesis, and presentation of findings


On Japanese Wikipedia…

1. Why do editors choose to edit without logging in?

2. How might editors be encouraged to create accounts and edit while logged in?

3. Why and how might editors be better accommodated as “anonymous” editors?

Research Questions


Process

Background Research: I reviewed the preceding research report to better understand the anomalies of the Japanese Wikipedia IP editing and contextualize the survey questions I would be analyzing. I also surveyed existing Foundation research on IP editing and the state of IP editing reform.

Machine Translation: With limited budget for this exploratory analysis and given the brevity of responses, we decided to employ machine translation of the Japanese responses. I integrated three machine translations for each open-ended response: Google, Yandex, and DeepL—the latter had been specifically recommended by Japanese editors and proved to be the most consistently coherent, a practical takeaway for future Japanese machine translation use. Similar translations between the three provided increased confidence in the translation while differences flagged the limits of machine translation. Overall, there were very few unintelligible responses.

Grounded Coding: I conducted two cycles of grounded coding of each response and applied two to three levels of codes to each. I organized the analysis in spreadsheets to ease organization and synthesis, developing a codebook for each of the three questions analyzed in the process. Given the reliance on machine translation, it was critical in the analysis to focus on topics surfaced and not weigh the style or tone of the response. While I was working independently, I attempted to validate internal consistency in coding by re-coding 10% of responses after sufficient time had intervened that I no longer recalled my original choice and then comparing between the two rounds of coding. Over 90% of responses had consistent codes for each question, which provided increased confidence in relying on the codes as an effective summary of the substance. While the luxury of time for this level of rigor is rarely available in industry, this thorough analysis and validation was an ideal opportunity to hone my aptitude for analysis of qualitative data.

Synthesis: I started the synthesis with some descriptive statistics included for the highest level of codes (themes) and used secondary and tertiary codes to synthesize emerging themes. The insights surfaced were qualitative in nature, aimed as surfacing depths of perspective and experiences, not quantitative, i.e. inferential or causal. However, the quantity of responses lent robustness to emergent themes, and I noted quantities for context in the report.

Presentation: The original research project was driven by the Foundation’s Growth Team, but IP editing is a topic of interest across product functions at the Foundation. I was given the opportunity to present the research to a cross-functional meeting of teams from both the US-based Wikimedia Foundation and the Design Research team at Wikimedia Deutschland.


Findings

In addition to more nuanced findings for each question analyzed, the following key themes emerged.

Convenient IP Editing: Respondents emphasized barriers related to efficiency (time and effort) and learnability (awareness) of logging in—indicating that some editors simply do not log in because it is more convenient not to. Respondents suggested enhancing and communicating the benefits of logging in to increase the draw of logging in.

Vandalism Concerns on Both Sides: Concerns about managing vandalism were raised in advocating for and against IP editing. Many respondents attribute vandalism to IP editors and flagged issues with variable IPs. Others raised concerns that an alternative to IP editing could eliminate a critical patrolling tool. Offering alternatives to vandalism management and communicating those mechanisms to communities could help mitigate concerns on both sides.

Well Intentioned IP Editing: There is evidence of good faith, significant, and intentionally anonymous (i.e. dissociated from a profile) engagement from IP editors on Japanese Wikipedia. For some, anonymity is key to safety from aggression and their preferences for operating online. However, enabling alternative, unregistered editing or configuration of account history visibility may provide suitable or even preferable alternatives to IP editing for some.

I have been editing for over 15 years without logging in. I realize that the eyes of logged-in users can be harsh, but that is why I am always careful to edit without violating the policy. I don’t understand the significance of encouraging people to log in at all.
— DeepL translation from Japanese

Impact


Due to regulatory and technical factors beyond the Foundation’s control, IP editing as it has functioned on Wikipedia cannot continue. The Foundation has been working closely with editing communities to find an alternative approach. However, logged in editors tend to be more involved in collaboration with the Foundation than IP editors. Because IP editing is relatively more significant on Japanese Wikipedia (more common and more common to result in edits which the community maintains), this research surfaced perspectives from a subsection of Wikipedians who are difficult to reach. This study was one of many that have contributed to the Foundation’s approach to the IP Editing: Privacy Enhancement and Abuse Mitigation project but offered rare value in sharing the experiences of editing on an IP-editing prominent Wikipedia, including perspectives from many dedicated IP editors themselves. While IP editing is often associated with vandalism on Wikipedia, this research provided additional context on quality IP editing that the Foundation can seek to maintain in their alternative solution—currently evolving as masking IPs and providing a temporary account name.

I feel like I’m in a toy factory.
— UX Designer who works on anonymity and security following the research presentation

Learnings

  • The bandwidth to conduct this level of analysis was a gift of an internship and an opportunity for me to refine my aptitude for analysis of qualitative data. With practice, coding became more efficient, and I could see the value of the time spent for high stakes research questions for which statistically significant but open-ended responses would be critical. Overall, I was surprised to find how much color emerged from short text responses. For example, some responses poignantly surfaced that respondents did not agree with the framing of the questions—nuance lost in close-ended survey questions.

  • For future research, I would be cautious in relying on machine translation but rather would prefer if machine translation could expedite review of a fluent speaker. I had to constantly remind myself not to over-index on style or tone in my analysis because of the reliance on machine translation.

  • Because many teams’ work touches on IP editing from different angles this research was broadly appealing, but it was difficult to navigate what was unknown and how what we could learn from this data could be used. It was unclear whether the resulting research was more than just interesting. In the future, I would start by connecting with stakeholders earlier to better understand what unknowns they had in this domain and how answering those unknowns could inform decision making.

  • The value of this approach in part motivated me to further study quantitative research methods in the second year of my graduate program following my internship, so I could more readily apply tests of statistical and practical significance and effective data visualizations to analyses like these.

The full research report is available on Wikimedia Commons here.