ANN ARBOR—Despite the massive amount of information created by social media, social scientists have not been able to reliably access the underlying data, undermining attempts to understand the impact of social media on society.
As someone who studies how people use social media to organize, discuss, and enact social change, University of Michigan professor Libby Hemphill knows this struggle firsthand.
"Some researchers studying social media have to do everything themselves," Hemphill said, “from getting approval to accessing data to writing programs to storing and analyzing data on their own.”
And that's not all.
"They might need to know how to work with data in different formats. Or, they may need to pay for a service geared toward market research, which carries its own limitations on how they can download and use data," she said.
That is a daunting undertaking for anyone trying to access and understand social media data, Hemphill said.
For several years, Hemphill has been working to establish a social media archive at ICPSR. She says her existing work creating this archive was possible because of a 2021 Propelling Original Data Science grant from the Michigan Institute for Data Science, called "Ensuring FAIRness in Social Media Archives."
Now, Meta and the University of Michigan have partnered to support the Social Media Archive (SOMAR).
"From the emotional well-being of local youth to the outcomes of global political processes, social media play a critical but poorly understood role," said ISR Director Kathleen Cagney. "At ISR and ICPSR, it is our imperative to shed light on these processes.”
Led by Hemphill and housed in the Inter-university Consortium for Political and Social Research (ICPSR) at the U-M Institute for Social Research (ISR), the SOMAR project will democratize access to some of the most consequential information in contemporary society.
The $1.3M gift from Meta is an investment to support the vision of SOMAR and to help build it so that it continues to exist and support research for years to come.
“In order to help advance the world’s understanding of key social issues, we have provided a gift to support ICPSR's creation of a social media archive at the University of Michigan. This effort is part of our longstanding commitment to find the right ways to share data for the purpose of academic research,” said Pratiti Raychoudhury, Vice President and Head of Research at Meta.
"SOMAR will provide the foundation to address thousands of important research questions," said ISR Development Director Henry Jewell. “Once the data are made available to the research community, insights will begin to emerge immediately. Within a decade, we anticipate SOMAR will yield findings on election integrity in the time of social media, the way advertisers leverage social media data to influence consumers, and other critical issues. This is just the beginning," Jewell said.
SOMAR's home, ICPSR, has a long history of handling data with the utmost confidentiality and privacy. Stringent protections are in place for securing and distributing sensitive data. This attention to ethical data use is irreplaceable when it comes to the data of millions of social media users.
"In an increasingly data-driven world, ICPSR seeks to make data more accessible, more useful, and more understandable," said ICPSR Director Margaret Levenstein.
While the new social media archive is still in its early stages, existing social media data held at ICPSR will be cross-listed when SOMAR is up and running. The datasets include:
- "#MeToo Tweet IDs, October 15-28, 2017 (ICPSR 37447)," a collection of tweet IDs pertaining to the first two weeks of the #MeToo hashtag campaign in October 2017.
- "Appealing to the Base or to the Moveable Middle? Incumbents' Partisan Messaging Before the 2016 U.S. Congressional Elections," which contains weekly measures of partisanship for verified official U.S. Congress Twitter accounts for September-November 2016.
- "What Social Media Platforms Miss About White Supremacist Speech," which includes 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API.
Students and scholars around the world will use SOMAR data to conduct research about the phenomenon of social media use; its impacts on social, political, and psychological processes; and the views and behaviors of social media users. With services to support the analysis of this new kind of data, SOMAR will catalyze a new field of research, spurring potentially transformative discoveries.
In addition to removing data-access hurdles, SOMAR will offer training and outreach to help researchers and community members learn how to leverage social media data to form usable insights.
"A resource like SOMAR will lower persistent barriers to data access for researchers and is desperately needed," Levenstein said. "The future of our society depends on it."
What is the goal of the Social Media Archive (SOMAR)?
SOMAR will provide access to social media data and develop a robust set of wraparound services, including training in social media data use and learning opportunities for the community.
Who will be able to access the data, and how?
The SOMAR project will democratize access to some of the most consequential information in contemporary society. By providing a reliable, unbiased resource to researchers everywhere, ICPSR and SOMAR foster clarity and transparency during a time in which these qualities seem ever scarcer. Much of SOMAR's data will be available through restricted use applications and the data will be accessed through a virtual data enclave. Applications for data will be accepted starting in the summer of 2022.
Are there any publications yet from SOMAR data analysis, or any early findings?
While SOMAR is still in its early stages, there are already social media data held at ICPSR, which will be cross-listed when SOMAR is up and running. The datasets include:
"#MeToo Tweet IDs, October 15-28, 2017 (ICPSR 37447)," a collection of tweet IDs pertaining to the first two weeks of the #MeToo hashtag campaign in October 2017.
"Appealing to the Base or to the Moveable Middle? Incumbents' Partisan Messaging Before the 2016 U.S. Congressional Elections," which contains weekly measures of partisanship for verified official U.S. Congress Twitter accounts for September-November 2016.
"What Social Media Platforms Miss About White Supremacist Speech," which includes 274,668 posts scraped from Stormfront and 509,982 comments collected from the Reddit API.
How is data confidentiality being handled in SOMAR?
ICPSR is experienced with handling data with the utmost confidentiality and privacy. Stringent protections are in place for securing and accessing sensitive data and ensuring that any analyses of SOMAR data do not reveal sensitive information about individuals. This attention to ethical data use is irreplaceable when it comes to the data of millions of social media users.
How is SOMAR funded?
Existing work on SOMAR has been made possible by a $41K Propelling Original Data Science grant from the Michigan Institute for Data Science, called "Ensuring FAIRness in Social Media Archives. The $1.3M gift from Meta is an investment to support the vision of SOMAR and to help build it so that it continues to exist and support research for years to come. This is an opportunity for other funders to get involved, and potential supporters are encouraged to reach out to the ISR Development team.
Where are SOMAR data archived?
Users should assume all data in SOMAR are "restricted use" because of their sensitive nature. Some of the data will be accessible through ICPSR's Virtual Data Enclave. Data that platforms can't deposit, whether because it's too big or is under government regulation, will be hosted and disseminated elsewhere. Access will depend on the dataset.
I'm a journalist. Who do I contact for SOMAR interview requests?
SOMAR project lead Libby Hemphill directs the Resource Center for Minority Data at ICPSR and holds a joint appointment as an associate professor at the U-M School of Information. You may also contact the SOMAR team at email@example.com.
I have social media data to share. How can I get involved?
SOMAR accepts data deposits from researchers and will build data-sharing partnerships with social media companies, thereby engaging key stakeholders in this complex and urgently important discussion. By lowering these longstanding barriers to rich datasets, SOMAR will enable the most effective social media research of our time. Institutions are encouraged to contact ISR Development Director Henry Jewell to join the movement to democratize social media data. Individual PIs are encouraged to email the SOMAR team at firstname.lastname@example.org.
- Social Media Archive at ICPSR (SOMAR)
- Meta builds technologies that help people connect, find communities and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps like Messenger, Instagram, and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology.
- One of the nation’s top public universities, the University of Michigan has been a leader in research, learning, and teaching for more than 200 years. With the highest research volume of all public universities in the country, U-M is advancing new solutions and knowledge in areas ranging from the COVID-19 pandemic to driverless vehicle technology, social justice, and carbon neutrality. Its main campus in Ann Arbor comprises 19 schools and colleges; there are also regional campuses in Dearborn and Flint, and a nationally ranked health system, Michigan Medicine. The university also boasts a world-renowned intercollegiate athletics program and has been the site of many important events in U.S. history, including JFK’s announcement of the Peace Corps, LBJ’s “Great Society” speech, and the clinical trials of the Salk polio vaccine. U-M’s alumni body is one of the largest in the world and includes a U.S. president, scientists, actors, astronauts, and inventors.
- An international consortium of more than 780 academic institutions and research organizations, the Inter-university Consortium for Political and Social Research (ICPSR) provides leadership and training in data access, curation, and methods of analysis for the social science research community. ICPSR is a unit within the Institute for Social Research at the University of Michigan.
Contact: Dory Knight-Ingram, email@example.com