Use the read.csv
command to read a csv file.
See the example below:
data <- read.csv('dat/data.csv', sep=",", header = T, stringsAsFactors=FALSE)
If you want to change the separator modify the parameter sep=","
After you read the csv file you will have to look at the dimensions of your data frame and the column of the classes. dim gives dimensions; the class function gives the type of a column and the str function will give you a summary of the structure of the object.
dim(data)
lapply(data, class)
str(data)
The file is avaible in this link.
First try to read it:
multas <- read.csv('http://datos.madrid.es/datosabiertos/MULTAS/2015/04/201504_detalle.csv', sep=";", header = T, stringsAsFactors=FALSE)
Note that we are reading directly from the website and not from the file in a folder.
multas <- read.csv('http://datos.madrid.es/datosabiertos/MULTAS/2015/04/201504_detalle.csv', sep=";", header = T, stringsAsFactors=FALSE)
> dim(multas)
[1] 186619 12
> lapply(multas, class)
$CALIFICACION
[1] "character"
$LUGAR
[1] "character"
$MES
[1] "integer"
$ANIO
[1] "integer"
$HORA
[1] "numeric"
$IMP_BOL
[1] "numeric"
$DESCUENTO
[1] "character"
$PUNTOS
[1] "integer"
$DENUNCIANTE
[1] "character"
$HECHO.BOL
[1] "character"
$VEL_LIMITE
[1] "integer"
$VEL_CIRCULA
[1] "integer"
> str(multas)
'data.frame': 186619 obs. of 12 variables:
$ CALIFICACION: chr "GRAVE " "LEVE " "GRAVE " "GRAVE " ...
$ LUGAR : chr "M 30 KM 29 CALZADA 1 " "AV GLORIETAS-RDA SUR " "PO ALABARDEROS 24 " "KM 12, M-30 CALZADA 1 " ...
$ MES : int 4 4 4 4 4 4 4 4 4 4 ...
$ ANIO : int 2015 2015 2015 2015 2015 2015 2015 2015 2015 2015 ...
$ HORA : num 9.45 13.35 13.5 16.35 12.05 ...
$ IMP_BOL : num 200 100 200 100 90 200 200 90 200 90 ...
$ DESCUENTO : chr "SI" "SI" "SI" "SI" ...
$ PUNTOS : int 0 0 0 0 0 4 0 0 3 0 ...
$ DENUNCIANTE : chr "POLICIA MUNICIPAL " "POLICIA MUNICIPAL " "POLICIA MUNICIPAL " "POLICIA MUNICIPAL " ...
$ HECHO.BOL : chr "CONDUCCION NEGLIGENTE: CIRCULAR POR ENCIMA DE VELOCIDAD REBASANDO VHOS " "CIRCULAR POR ZONA RESERVADA AL USO EXCLUSIVO DE PEATONES. " "ESTACIONAR EN ZONA SE\xd1ALIZADA PARA USO EXCLUSIVO DE PERSONAS CON MOVILIDAD REDUCIDA. "| __truncated__ "SOBREPASAR LA VELOCIDADM\xc1XIMA EN V\xcdAS LIMITADAS EN 60 km/h O M\xc1S. "| __truncated__ ...
$ VEL_LIMITE : int NA NA NA 80 NA NA NA NA NA NA ...
$ VEL_CIRCULA : int NA NA NA 84 NA NA NA NA NA NA ...
Sometimes you may have need to fix some problems.
When you try to read a csv file and get the error below you have to add row.names=NULL
in the read.csv
command.
The error:
multas <- read.csv('~/git/github.com/it4dgroup/datosabiertosespana/dat/ayuntamiento-madrid/multas/201504_detalle.csv', sep=",", header = T, stringsAsFactors=FALSE)
Error in read.table(file = file, header = header, sep = sep, quote = quote, :
duplicate 'row.names' are not allowed
The solution:
multas <- read.csv('~/git/github.com/it4dgroup/datosabiertosespana/dat/ayuntamiento-madrid/multas/201504_detalle.csv', sep=",", header = T, stringsAsFactors=FALSE, row.names=NULL)
But in the case above the real problem was the separator. You have to change from sep=","
to sep=";"
and it will work.
Add the parameter skip=rowtoskip
like skip=1
multas <- read.csv('~/git/github.com/it4dgroup/datosabiertosespana/dat/ayuntamiento-madrid/multas/201504_detalle.csv', sep=";", header = T, stringsAsFactors=FALSE, row.names=NULL, skip=1)
Add the parameter header = T
to header = F
T = Header equal True F = Header equal False
See the example below with Header equal False
multas <- read.csv('~/git/github.com/it4dgroup/datosabiertosespana/dat/ayuntamiento-madrid/multas/201504_detalle.csv', sep=";", header = F, stringsAsFactors=FALSE)
Source:
Other links: http://www.statmethods.net/input/missingdata.html
http://science.nature.nps.gov/im/datamgmt/statistics/r/fundamentals/manipulation.cfm