Thursday, August 21, 2014

Using Twitter for Perceiving Unemployment in Real Time

The official unemployment rate predictions, released early each month, are based on a monthly survey. It's a good survey, even an excellent survey, but the data is inevitably a month old. In addition, any survey is somewhat constrained by the specific wording of its questions and definitions. Would it be possible to get a faster and reasonably accurate view of labor market conditions by looking at mentions of certain key terms on Twitter and other social media? The University of Michigan Economic Indicators from Social Media has started research program on this topic. The first research paper up at the side is "Using Social Media to Measure Labor Market Flows," by Dolan Antenuccia,  Michael Cafarellab, Margaret C. Levenstein, Christopher RĂ©, and Matthew D. Shapiro, which is based on data from 19.3 billion Twitter messages sent between July 2011 and November 2013--which is about 10% of all the tweets sent in that time.

For those who want detail on how the official unemployment rate is calculated, the Bureau of Labor Statistics published a short memo on "How the Government Measures Unemployment" in June 2014. Basically, the government has been doing the Current Population Survey (CPS)
every month since 1940. In its current form. As BLS describes it:

There are about 60,000 eligible households in the sample for this survey. This translates into approximately 110,000 individuals each month, a large sample compared to public  opinion surveys, which usually cover fewer than 2,000 people. The CPS sample is  selected so as to be representative of the entire population of the United States ... Every month, one-fourth of the households in the sample are changed, so that no  household is interviewed for more than 4 consecutive months. After a household is  interviewed for 4 consecutive months, it leaves the sample for 8 months, and then is  again interviewed for the same 4 calendar months a year later, before leaving the sample  for good. As a result, approximately 75 percent of the sample remains the same from  month to month and 50 percent remains the same from year to year. This procedure  strengthens the reliability of estimates of month-to-month and year-to-year change in the data.  Each month, highly trained and experienced Census Bureau employees contact the 60,000 eligible sample households and ask about the labor force activities (jobholding and job seeking) or non-labor force status of the members of these households during the  survey reference week (usually the week that includes the 12th of the month).
Although the headline unemployment rate and total jobs number gets most of the attention, the survey also tries to explore whether those not looking for jobs are "discouraged" workers who would actually like a job, but have given up looking, or whether they are part-time workers who would prefer a full-time job.

At present, perhaps the main source of data on labor markets that comes out more frequently than the unemployment rate itself is the data on initial claims for unemployment insurance, which comes out weekly (for example, here). However, this data can be a an imperfect indicator--or as economists would say, a "noisy" indicator--of the actual state of the labor market. Not everyone who becomes unemployed applies for unemployment insurance or is eligible for it, and many of the long-term unemployed are no longer eligible for unemployment insurance. So the practical question about using Twitter or other social media to look at labor markets is not whether offer a perfect picture, but whether the information from such estimates is less "noisy" and more useful than the data from the initial claims for unemployment insurance. \

The University of Michigan researchers searched the 19.3 billion tweets for terms of four words or less related to job loss. Some examples would include four-word blocks of text that include the words axed, canned, downsized, outsourced, pink slip, lost job, fired job, been fired, laid off, and unemployment. Some experimentation and analysis is involved in choosing terms. For example, it turned out that "let go" was used much more frequently than any other term on this list, presumably because many there were many four-word blocks of text that used "let" and "go" but weren't related to labor market issues.

Each week, the Michigan group plans to publish a comparison between the official unemployment insurance claims data and a prediction based purely on its Twitter-based methodology. Here's the current figure:


As you can see, the patterns are similar, which is somewhat remarkable. It shows that social media content provides a similar outcome to the official statistics. The patterns are not identical, which is unremarkable, because they are after all measuring different things. The interesting question then becomes: Is there some additional information or value-added to be gained about the state of the labor market from looking at the social-media based index?

In certain specific cases, the answer seems clearly to be "yes." For example, the authors explain that the official date on unemployment insurance claims showed a big drop in September 2013 that occurred because of a data processing issue in California--that is, it wasn't a real effect. The social media prediction shows no decline. More broadly, the authors look at the predictions from market experts a few days before the data comes out on  unemployment insurance claims, and they find that the social media measure would improve these predictions.

The researchers are looking at how social media might reflect various various other measures of labor markets, including job search, job postings, and how labor markets react to short-term events like Hurricane Sandy. Of course, the goal is to develop methods that give a reasonably reliable real-time sense of how the economy is evolving based on immediately available data

 For those interested in doing their own research project based on collecting publicly available data from the web, a useful overall starting point is the article by Benjamin Edelman, "Using Internet Data for Economic Research," in the Spring 2012 issue of the Journal of Economic Perspectives, where I have worked as Managing Editor since the first issue back in 1987. As with all JEP articles, it is freely available on-line compliments of the American Economic Association. Social science researchers are busily writing programs that collect data on search queries, on how prices change in a wide variety of databases, and much more.