Downloadable Data / Resources: Kalev H. Leetaru
This page lists several datasets that I receive a lot of requests for and have made available for open research.
- Chicago Tribune Study
- Drudge Report Study
- Soundbite University: 60 Years of University News Coverage
| Chicago Tribune: Content Velocity Analysis |
This study for the Center for Research Libaries on behalf of the Library of Congress, was designed to answer key questions around the volume of new content added to the Chicago Tribune's website over a one month period from September to October 2010, the overall rate of change, linking structure and ease of traversal for archival crawlers, and overall structure, linking, and content characterization considerations. Crawlers were used to archive all 105 gateway pages every 30 minutes, resulting in a total of 136,605 snapshots of the site's content.
This study was conducted as part of a larger study by the Center for Research Libaries for the Library of Congress on the future of news in the digital era in a report titled Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition.
Links- Full Report (PDF)
- Library of Congress Report (PDF)
- Library of Congress Summary
- Domains By Links (XLS) Excel Spreadsheet of all domains linked by the Chicago Tribune.
- Linked URLs by Lifespan (XLS) Excel Spreadsheet of all URLs linked by the Chicago Tribune and number of hours link was active.
- Gateway Pages (XLS) Excel Spreadsheet of Chicago Tribune "gateway" pages.
- Statistics by Section (XLS) Excel Spreadsheet of aggregate statistics of Chicago Tribune by section.
- Leetaru, Kalev. (2010). Chicago Tribune: Content Velocity Analysis. In Preserving News in the Digital Environment: Mapping the Newspaper Industry in Transition. Alverson, Jessica; Leetaru, Kalev; McCargar Victoria; Ondracek, Kayla; Simon, James; Reilly, Bernard. (2011). Center for Research Libraries on behalf of the Library of Congress. Data downloaded from http://contentanalysis.ichass.illinois.edu/data/
| Drudge Report |
The Drudge Report is one of the founding flag bearers of "new media": a U.S.-based news aggregator founded in the late 1990s that has developed a reputation for breaking tomorrow's news today. The site has become a powerful force in the U.S. media sphere and its founder was named one of Time Magazine's most influential people in 2006. In existence for more than a decade, the Drudge Report makes an ideal case study for examining the "new media versus old media" argument. How dependent is such a "new media" aggregator on the "old media" it draws from, and how does it find its breaking stories? A cross-section of analytical techniques is used to demonstrate how to profile a news Web site, and finds that the Drudge Report relies heavily on wire services and obscure news outlets to find small stories that will break large tomorrow, making it highly dependent on mainstream "old media" sites.
Original Study- New media vs. old media: A portrait of the Drudge Report 2002-2008. Leetaru, Kalev. (July 6, 2009). First Monday. Vol. 14, Issue 7. (Issue Headlining Article).
- Master-List-All-Drudge-Report-Links.txt. Master list of all URLs linked to by the Drudge Report 2002-2008, with the URL, link text, date link first appeared (in YYYYMM format), and number of snapshots the link was alive for (can be multiplied by 2 minutes to determine the total length of time the link was alive for). In cases where the the link text was changed over time while the link URL was kept the same, the final link text was used.
- Link-Extract-Drudge-Report-Breitbart.txt. Custom extract of the master link list - includes only links to Breitbart.
- Link-Extract-Drudge-Report-Myway.txt. Custom extract of the master link list - includes only links to Myway.
- Link-Extract-Drudge-Report-Reuters.txt. Custom extract of the master link list - includes only links to Reuters.
- Link-Extract-Drudge-Report-Washington-Post.txt. Custom extract of the master link list - includes only links to Washington Post.
- Link-Extract-Drudge-Report-Yahoo.txt. Custom extract of the master link list - includes only links to Yahoo.
- Master-List-All-Drudge-Report-Link-Texts.txt. Master list of all link text used for links on the Drudge Report 2002-2008, with the link text and number of snapshots the text was used for any link (can be multiplied by 2 minutes to determine the total length of time the link was alive for). In cases where the same link text was used for multiple links, or where the underlying link URL changed over time while the link text itself remained the same, it is only counted once.
- Master-List-All-Drudge-Report-Link-Words.txt. Master list of all link text used for links on the Drudge Report 2002-2008, broken down by word.
- Master-List-All-Link-Words-By-Year.txt. Same as above, but further broken down by year.
- Master-List-All-Domains.txt. Master list of all domains linked to by the Drudge Report 2002-2008.
- Master-Geocoded-Domain-List.txt. Same as above, but with automatically-generated information regarding the physical location of the domain's owner, including an approximate city-centroid latitude/longitude.
- Master-Geocoded-Domain-List-By-Year.txt. Same as above, but broken down by year with automatically-generated information regarding the physical location of the domain's owner, including an approximate city-centroid latitude/longitude.
- Timeline-Updates-By-Month.txt. Total number of updates by month 2002-2008.
- Leetaru, Kalev. (2009). New media vs. old media: A portrait of the Drudge Report 2002-2008. First Monday. Vol. 14, Issue 7. Data downloaded from http://contentanalysis.ichass.illinois.edu/data/
| Soundbite University: 60 Years of University News Coverage |
Soundbite University is a large-scale study conducted by Kalev Leetaru and Paul Magelli exploring the broader changes in how higher education has been covered in the national press over the last 60 years. More than 18 million documents comprising the entire run of the New York Times from 1945 to 2005 were examined for all references to United States research universities and compared to spatial, temporal, and a variety of institutional indicators to examine how coverage has changed over this period and the characteristics most commonly associated with elevated national press visibility. One of the most surprising findings is the transition of the research university from a newsmaker to a news commentator, suggesting a need for universities to profoundly change the ways in which they interact with the press, especially as we enter a new era in media.
While individual institutional trends are available as interactive graphs on the project website, we have made the entire set of timelines available here as an Excel spreadsheet to assist in further research.
Links- Original Study Website
- Download Complete Set of University Attributes (Format: MS Excel 2010 .xlsx / Size: 59K)
- Download Complete Set of University Timelines (Format: MS Excel 2010 .xlsx / Size: 518K)
- Leetaru, Kalev & Magelli, Paul. (2010). The Soundbite University: 60 Years of University News Coverage. The American Council on Education's The Presidency. http://www.ichass.illinois.edu/SoundbiteUniversity/. Data downloaded from http://contentanalysis.ichass.illinois.edu/data/
