A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
csvkit
包中。
-bash: csvlook: command not found #通过 pip 安装 sudo pip install csvkit #Debian apt-get install csvkit #Ubuntu apt-get install csvkit #Kali Linux apt-get install csvkit #Fedora dnf install python3-csvkit #OS X brew install csvkit #Raspbian apt-get install python3-csvkit
csvlook 以 Markdown 兼容的固定宽度格式将 CSV 呈现到命令行。
csvlook [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b] [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-L LOCALE] [-S] [--blanks] [--date-format DATE_FORMAT] [--datetime-format DATETIME_FORMAT] [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--max-rows MAX_ROWS] [--max-columns MAX_COLUMNS] [--max-column-width MAX_COLUMN_WIDTH] [-y SNIFF_LIMIT] [-I] [FILE]
csvkit
的工具共享一组常用的命令行参数。并非每种工具都支持所有参数,因此请使用带有 --help
标志的工具检查哪些参数支持:
-d DELIMITER, --delimiter DELIMITER Delimiting character of the input CSV file. -t, --tabs Specify that the input CSV file is delimited with tabs. Overrides "-d". -q QUOTECHAR, --quotechar QUOTECHAR Character used to quote strings in the input CSV file. -u {0,1,2,3}, --quoting {0,1,2,3} Quoting style used in the input CSV file. 0 = Quote Minimal, 1 = Quote All, 2 = Quote Non-numeric, 3 = Quote None. -b, --no-doublequote Whether or not double quotes are doubled in the input CSV file. -p ESCAPECHAR, --escapechar ESCAPECHAR Character used to escape the delimiter if --quoting 3 ("Quote None") is specified and to escape the QUOTECHAR if --no-doublequote is specified. -z FIELD_SIZE_LIMIT, --maxfieldsize FIELD_SIZE_LIMIT Maximum length of a single field in the input CSV file. -e ENCODING, --encoding ENCODING Specify the encoding of the input CSV file. -L LOCALE, --locale LOCALE Specify the locale (en_US) of any formatted numbers. -S, --skipinitialspace Ignore whitespace immediately following the delimiter. --blanks Do not coerce empty, "na", "n/a", "none", "null", "." strings to NULL values. --date-format DATE_FORMAT Specify a strptime date format string like "%m/%d/%Y". --datetime-format DATETIME_FORMAT Specify a strptime datetime format string like "%m/%d/%Y %I:%M %p". -H, --no-header-row Specify that the input CSV file has no header row. Will create default headers (a,b,c,...). -K SKIP_LINES, --skip-lines SKIP_LINES Specify the number of initial lines to skip before the header row (e.g. comments, copyright notices, empty rows). -v, --verbose Print detailed tracebacks when errors occur. -l, --linenumbers Insert a column of line numbers at the front of the output. Useful when piping to grep or as a simple primary key. --zero When interpreting or displaying column numbers, use zero-based numbering instead of the default 1-based numbering. -V, --version Display version information and exit.
csv 文件
使用 csvlook 查看 csv 文件数据:
csvlook data.csv
csvlook 的输出看起来并不是很清爽,有可能会看到数据格式比较乱(竖线字符和破折号)。这是因为该数据集有很多列,并且它们不能一次全部放入终端中。这时候有两种方式解决:
# 1. 将输出通过管道传输到以显示行而无需换行,并使用箭头键左右滚动:less -S csvlook data.csv | less -S # 2. 在查看数据集之前,请减少其显示的列,使用 csvcut $ csvcut -n data.csv 1: state 2: county 3: fips 4: nsn 5: item_name 6: quantity 7: ui 8: acquisition_cost 9: total_cost 10: ship_date 11: federal_supply_category 12: federal_supply_category_name 13: federal_supply_class 14: federal_supply_class_name #如上,我们的数据有14列。现在,我们只想取列2、5、6,命令行如下: $ csvcut -c 2,5,6 data.csv #此时 CSV 输出,只有3列,还可以通过名称来引用列: $ csvcut -c county,item_name,quantity data.csv