How to read csv data with unknown encoding in R -


i have .csv data, , view webpage, when read r, of data couldn't showed. data available here home.ustc.edu.cn/~lanrr/data.csv

mydata = read.csv("http://home.ustc.edu.cn/~lanrr/data.csv", header = t) view(mydata)  # show this: # 9:39:37   665 600160  �޻��ɷ�  ����    ����    8.050   100 805.00  ��ȯ �ɽ�           ��ȯ����   e004017669  665   2 9:39:38 697 930 ��������    ����    ����    4.360   283 1233.88       ����  �ɽ� ����Ʒ����   680001369   697 

the data contains chinese words, don't if need change encode or other things, has meet problem before?

mydata = read.csv("http://home.ustc.edu.cn/~lanrr/data.csv",                     encoding = "utf-8", header = t, stringsasfactors = f) view(mydata) # 9:39:37   665 600160  <u+00be><u+07bb><u+00af><u+00b9><u+0277><dd>    <c2><f4>     <u+00b3><f6>  <c2><f2><c2><f4>    8.050   100 805.00  <c8><da><u+022f>        <u+00b3><u+027d><u+00bb>  <c8><da><u+022f><c2><f4><u+00b3><f6>    e004017669  665   2 9:39:38 697 930 <d6><d0><u+0078><c9><fa><u+00bb><u+00af>    <c2><f4>   <u+00b3><f6>  <c2><f2><c2><f4>    4.360   283 1233.88 <d0><c5><d3><c3>       <u+00b3><u+027d><u+00bb>  <u+00b5><u+00a3><u+00b1><u+00a3><u+01b7><c2><f4><u+00b3>    <f6>  680001369   697  sessioninfo() # r version 2.15.2 (2012-10-26)   platform: x86_64-redhat-linux-gnu (64-bit)    locale:    [1] lc_ctype=en_us.utf-8       lc_numeric=c               lc_time=en_us.utf-8                  lc_collate=en_us.utf-8        [5] lc_monetary=en_us.utf-8    lc_messages=en_us.utf-8    lc_paper=c                       lc_name=c                     [9] lc_address=c               lc_telephone=c             lc_measurement=en_us.utf-8      lc_identification=c            attached base packages:    [1] compiler  stats     graphics  grdevices utils     datasets  methods   base          other attached packages:    [1] data.table_1.8.8 ttr_0.22-0       xts_0.9-3        zoo_1.7-9               timedate_2160.97 matrix_1.0-9     lattice_0.20-10      loaded via namespace (and not attached):    [1] grid_2.15.2  tools_2.15.2 

i in way finally:

sys.setlocale("lc_collate", "chinese") sys.setlocale("lc_ctype", "chinese") sys.setlocale("lc_monetary", "chinese") sys.setlocale("lc_time", "chinese") sys.setlocale("lc_messages", "chinese") sys.setlocale("lc_measurement", "chinese") 

you utilize read.csv encoding utf-8:

df <-read.csv("data.csv", encoding="utf-8", stringsasfactors=false) 

to make chinese letters characters , not factors.

note: don't have chinese language pack installed in environment can not determine if garbled characters in .csv provided corrupted or unrecognized.


Comments

Popular posts from this blog

python - How to create a legend for 3D bar in matplotlib? -

java - Multi-Label Document Classification -

php - Dynamic url re-writing using htaccess -