I was first told about the Marie Augusta Neal Papers by a colleague at the University Archives, which were donated to the University of Notre Dame in 1996. The finding aid shows the comprehensiveness and the scale of the collection. The one significant piece of work that underpins Sister Neal’s scholarly activity, however, is the Sister’s Survey of 1967. With 649 variables and responses from over 130,000 Catholic sisters, it is believed to be “the largest, single, data gathering event ever performed with regard to women religious”.
Sister Neal was a sociologist at Emmanuel College in Boston, Massachusetts. She received a PhD from Harvard University in 1963. Her research deals with the renewal in religious life, using a combined method of sociology and the prophetic tradition of biblical religion.
The 1967 Sister Survey was a population attitude survey, designed to assess American sisters readiness for renewal. It was sent to nearly every active sister in the United States – 157,917 surveys were mailed out in April 1967, and asked questions about theology, authority, governance and lifestyle. The survey’s critics regarded it as a indoctrination tool, designed to introduce widely radical renewal concepts [see Sisters in Crisis: The Tragic Unraveling of Women’s Religious Communities].
According to the notes by the Harvard programmer who initially assembled and processed the survey, data was stored on “seven reels of 800 dpi tapes, ]rec]120, blocksize 12,000, approximately 810,000 records in all”. This was extracted from the original EBCDIC format tapes and converted to newer formats in 1996, before being deposited to the University Archives. Under the custodianship of Notre Dame, the survey data was transferred to CDs then to computer hard disk in 1999. The 1967 survey data has fortunately survived the format migrations. Some other data in the collection however was lost: there were at least 3 tape reels which could not be read during the 1996 migration exercise and at least one file which could not be copied in 1999.
The survey data has not been used for 18 years since 1996 – nicely and appropriately described by the colleague as “a lifetime in the digital world”.
When I first received the data, it came without any file extension. Running the file through DROID did not deliver any useful information. Knowing that we are looking at cross-tabular data, and the possible involvement of statistical software, I then tried a few obvious extensions. I could open the files, even with Excel. The files however seem to contain endless lines of numbers without much meaning so the first challenge is knowing how to truncate the data. I looked for “/”s for hours in vain, which according to one set of notes from 1993 denotes the beginning of each record. But the “/”s were simply not there.
The truncation problem was later resolved by another colleague, our Economics Librarian, who has a much better understanding of survey data – “I just saw the pattern”, he explained.
Things followed smoothly from this point onwards. The dataset has now been reformatted and stored in .dta and .csv formats. We also recreated the “codebook” – our librarian wrote scripts and extracted all the questions and pre-defined responses (a bit like a drop-down list of possible answers) from PDFs and pulled them together in one document. The dataset is in the best possible format for re-use. We are just dotting the i’s and crossing the t’s, before releasing it publicly.
This is only an example of our digital collection items that require intervention or preservation actions. A few takeaways:
- Active use seems to be the best way for monitoring and detecting digital obsolescence.
- Metadata really is essential. Without the notes, finding aid and scanned codebook, we would not be able to make sense of the dataset.
- Do not wait a lifetime to think about digital preservation. The longer you wait, the more difficult it gets.