How to read csv data with unknown encoding in R -

May 15, 2010

i have .csv data, , view webpage, when read r, of data couldn't showed. data available here home.ustc.edu.cn/~lanrr/data.csv

mydata = read.csv("http://home.ustc.edu.cn/~lanrr/data.csv", header = t) view(mydata)  # show this: # 9:39:37   665 600160  �޻��ɷ�  ����    ����    8.050   100 805.00  ��ȯ �ɽ�           ��ȯ����   e004017669  665   2 9:39:38 697 930 ��������    ����    ����    4.360   283 1233.88       ����  �ɽ� ����Ʒ����   680001369   697

the data contains chinese words, don't if need change encode or other things, has meet problem before?

mydata = read.csv("http://home.ustc.edu.cn/~lanrr/data.csv",                     encoding = "utf-8", header = t, stringsasfactors = f) view(mydata) # 9:39:37   665 600160  <u+00be><u+07bb><u+00af><u+00b9><u+0277><dd>    <c2><f4>     <u+00b3><f6>  <c2><f2><c2><f4>    8.050   100 805.00  <c8><da><u+022f>        <u+00b3><u+027d><u+00bb>  <c8><da><u+022f><c2><f4><u+00b3><f6>    e004017669  665   2 9:39:38 697 930 <d6><d0><u+0078><c9><fa><u+00bb><u+00af>    <c2><f4>   <u+00b3><f6>  <c2><f2><c2><f4>    4.360   283 1233.88 <d0><c5><d3><c3>       <u+00b3><u+027d><u+00bb>  <u+00b5><u+00a3><u+00b1><u+00a3><u+01b7><c2><f4><u+00b3>    <f6>  680001369   697  sessioninfo() # r version 2.15.2 (2012-10-26)   platform: x86_64-redhat-linux-gnu (64-bit)    locale:    [1] lc_ctype=en_us.utf-8       lc_numeric=c               lc_time=en_us.utf-8                  lc_collate=en_us.utf-8        [5] lc_monetary=en_us.utf-8    lc_messages=en_us.utf-8    lc_paper=c                       lc_name=c                     [9] lc_address=c               lc_telephone=c             lc_measurement=en_us.utf-8      lc_identification=c            attached base packages:    [1] compiler  stats     graphics  grdevices utils     datasets  methods   base          other attached packages:    [1] data.table_1.8.8 ttr_0.22-0       xts_0.9-3        zoo_1.7-9               timedate_2160.97 matrix_1.0-9     lattice_0.20-10      loaded via namespace (and not attached):    [1] grid_2.15.2  tools_2.15.2

i in way finally:

sys.setlocale("lc_collate", "chinese") sys.setlocale("lc_ctype", "chinese") sys.setlocale("lc_monetary", "chinese") sys.setlocale("lc_time", "chinese") sys.setlocale("lc_messages", "chinese") sys.setlocale("lc_measurement", "chinese")

you utilize read.csv encoding utf-8:

df <-read.csv("data.csv", encoding="utf-8", stringsasfactors=false)

to make chinese letters characters , not factors.

note: don't have chinese language pack installed in environment can not determine if garbled characters in .csv provided corrupted or unrecognized.

Search This Blog

KHS

How to read csv data with unknown encoding in R -

Comments

Post a Comment

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -