The Bureau of Justice Statistics (BJS) has a ton of data regarding law-enforcement and incarceration. For a class research project, I am looking into examining data on the pre-incarceration incomes of incarcerated parents. For this topic, I am looking to examine three datasets: the Survey of Inmates in Federal Correctional Facilities (SIFCF), the Survey of Inmates in State Correctional Facilities (SISCF), and the Survey of Inmates in Local Jails (SILJ). I will be downloading the most recent datasets from each, which includes 2004 data from the SIFCF and SISCF as well as 2002 data from the SILJ.
This data, for the most part, is formatted to be accessible in closed-source programs that I would need to purchase a license to use. There are two problems with that: federally funded data should not require the public to purchase software to access it, and I don't want to use that software. Here's how to circumvent that and import the data from ASCII + SAS format into R:
#download datasets
#jails: http://www.icpsr.umich.edu/icpsrweb/ICPSR/studies/4359
#prisons: http://www.icpsr.umich.edu/icpsrweb/NACJD/studies/4572
#I am not allowed to re-distribute this original dataset
#use "ASCII + SAS setup" links
#unzip and concatenate datasets to same two folders (jails: "ICPSR_04359", prisons: "ICPSR_04572")
#set dir to dir with both data folders
setwd("YOURDIRECTORYHERE")
#convert data to R, save in repsective folders
library(SAScii)
#SAScii package allows R to follow .sas input directions for .txt ASCII files, so long as the user specifies line in the .sas file at which "INPUT" starts manually
#Use a text editor to open .sas files and find "INPUT" lines
#define input lines
input04359_DS0001 <- 3681
input04359_DS0002 <- 50
input04572_DS0001 <- 3904
input04572_DS0002 <- 3898
input04572_DS0003 <- 424
input04572_DS0004 <- 424
#define lrcel, also manually from .sas files
lrcel04359_DS0001 <- 4679
lrcel04359_DS0002 <- 14326
lrcel04572_DS0001 <- 8845
lrcel04572_DS0002 <- 8845
lrcel04572_DS0003 <- 2580
lrcel04572_DS0004 <- 2580
#load data. takes a long time.
data04359_DS0001 <- read.SAScii("ICPSR_04359/DS0001/04359-0001-Data.txt", "ICPSR_04359/DS0001/04359-0001-Setup.sas", input04359_DS0001, lrecl = lrcel04359_DS0001)
data04359_DS0002 <- read.SAScii("ICPSR_04359/DS0002/04359-0002-Data.txt", "ICPSR_04359/DS0002/04359-0002-Setup.sas", input04359_DS0002, lrecl = lrcel04359_DS0002)
data04572_DS0001 <- read.SAScii("ICPSR_04572/DS0001/04572-0001-Data.txt", "ICPSR_04572/DS0001/04572-0001-Setup.sas", input04572_DS0001, lrecl = lrcel04572_DS0001)
data04572_DS0002 <- read.SAScii("ICPSR_04572/DS0002/04572-0002-Data.txt", "ICPSR_04572/DS0002/04572-0002-Setup.sas", input04572_DS0002, lrecl = lrcel04572_DS0002)
data04572_DS0003 <- read.SAScii("ICPSR_04572/DS0003/04572-0003-Data.txt", "ICPSR_04572/DS0003/04572-0003-Setup.sas", input04572_DS0003, lrecl = lrcel04572_DS0003)
data04572_DS0004 <- read.SAScii("ICPSR_04572/DS0004/04572-0004-Data.txt", "ICPSR_04572/DS0004/04572-0004-Setup.sas", input04572_DS0004, lrecl = lrcel04572_DS0004)
#save in "data" folder
dir.create("data")
setwd("data/")
save(data04359_DS0001,file="data04359_DS0001.Rda")
save(data04359_DS0002,file="data04359_DS0002.Rda")
save(data04572_DS0001,file="data04572_DS0001.Rda")
save(data04572_DS0002,file="data04572_DS0002.Rda")
save(data04572_DS0003,file="data04572_DS0003.Rda")
save(data04572_DS0004,file="data04572_DS0004.Rda")
4572_DS0002$V1825)
And that's it. Analyze away. This script is on GitHub.
Nice. Is that original code or did you grab it from somewhere? It's like ripping YouTube videos to .mp4 files, but a more hardcore version of it, lol
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Yep! Thanks for reminding me actually, I forgot to include a link to the GitHub repository I have for this.
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit
Clueless here, but I appreciate your point!
Downvoting a post can decrease pending rewards and make it less visible. Common reasons:
Submit