R-��: ��

�� R 2.10 �� , �� GZIP �� . �� , ��, �� : �� , �� , 50% + �� . �� , �� , �� ? �� .

��-��, �� 10 ��. �� 1000 ��, �� , � �� :

#��

x <- matrix(rnorm(1e7), ncol=1000)

#��

write.table(x, file=»bigdata.txt», sep=»,», row.names=FALSE, col.names=FALSE)

��-��, �� Gzip, �� :

#�� , ��

system(«cp bigdata.txt bigdata-compressed.txt»)

#��

system(«gzip bigdata-compressed.txt»)

��, �� :

compr <- file.info(«bigdata-compressed.txt.gz»)$size

big <- file.info(«bigdata.txt»)$size

print(c(big, compr))

print(1-compr/big)

> print(c(big, compr))
[1] 181596432  83666283
> print(1-compr/big)
[1] 0.5392735

�� , �� 173��, � �� 79��, �� 55%. �� : �� R?

> system.time(read.table("bigdata.txt", sep=","))
������������      �������       ������ 
     292.880        2.404      432.667 
> system.time(read.table("bigdata-compressed.txt.gz", sep=","))
������������      �������       ������ 
     188.616        1.592      240.393

�� , �� 2 ��, �� , �� . �� , �� , �� . ��, �� , �� , �� , �� .

��, �� read.table() �� scan(). �� :

> system.time(scan("bigdata-compressed.txt.gz", sep=",", what=rep(0,1000)))
Read 10000000 items
������������      �������       ������ 
      30.072        0.324       67.200 
> system.time(scan("bigdata.txt", sep=",", what=rep(0,1000)))
Read 10000000 items
������������      �������       ������ 
      28.476        0.492       63.743

�� : http://blog.revolutionanalytics.com/2009/12/r-tip-save-time-and-space-by-compressing-data-files.html

� ������ ���������� �������

R-������: �������� ����� � ����� �� ����� ����� ������ ����� ������

Leave a Reply

������

������ ������

������ �����

�������

������

� ��

R-��: ��

��

��

��

��

��