How to read csv data with unknown encoding in R -
i have .csv data, , view webpage, when read r, of data couldn't showed. data available here home.ustc.edu.cn/~lanrr/data.csv
mydata = read.csv("http://home.ustc.edu.cn/~lanrr/data.csv", header = t) view(mydata) # show this: # 9:39:37 665 600160 ���ɷ� ���� ���� 8.050 100 805.00 ��ȯ �ɽ� ��ȯ���� e004017669 665 2 9:39:38 697 930 �������� ���� ���� 4.360 283 1233.88 ���� �ɽ� ����Ʒ���� 680001369 697 the data contains chinese words, don't if need change encode or other things, has meet problem before?
mydata = read.csv("http://home.ustc.edu.cn/~lanrr/data.csv", encoding = "utf-8", header = t, stringsasfactors = f) view(mydata) # 9:39:37 665 600160 <u+00be><u+07bb><u+00af><u+00b9><u+0277><dd> <c2><f4> <u+00b3><f6> <c2><f2><c2><f4> 8.050 100 805.00 <c8><da><u+022f> <u+00b3><u+027d><u+00bb> <c8><da><u+022f><c2><f4><u+00b3><f6> e004017669 665 2 9:39:38 697 930 <d6><d0><u+0078><c9><fa><u+00bb><u+00af> <c2><f4> <u+00b3><f6> <c2><f2><c2><f4> 4.360 283 1233.88 <d0><c5><d3><c3> <u+00b3><u+027d><u+00bb> <u+00b5><u+00a3><u+00b1><u+00a3><u+01b7><c2><f4><u+00b3> <f6> 680001369 697 sessioninfo() # r version 2.15.2 (2012-10-26) platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] lc_ctype=en_us.utf-8 lc_numeric=c lc_time=en_us.utf-8 lc_collate=en_us.utf-8 [5] lc_monetary=en_us.utf-8 lc_messages=en_us.utf-8 lc_paper=c lc_name=c [9] lc_address=c lc_telephone=c lc_measurement=en_us.utf-8 lc_identification=c attached base packages: [1] compiler stats graphics grdevices utils datasets methods base other attached packages: [1] data.table_1.8.8 ttr_0.22-0 xts_0.9-3 zoo_1.7-9 timedate_2160.97 matrix_1.0-9 lattice_0.20-10 loaded via namespace (and not attached): [1] grid_2.15.2 tools_2.15.2 i in way finally:
sys.setlocale("lc_collate", "chinese") sys.setlocale("lc_ctype", "chinese") sys.setlocale("lc_monetary", "chinese") sys.setlocale("lc_time", "chinese") sys.setlocale("lc_messages", "chinese") sys.setlocale("lc_measurement", "chinese")
you utilize read.csv encoding utf-8:
df <-read.csv("data.csv", encoding="utf-8", stringsasfactors=false) to make chinese letters characters , not factors.
note: don't have chinese language pack installed in environment can not determine if garbled characters in .csv provided corrupted or unrecognized.
Comments
Post a Comment