csvstat 命令详解

| 选择喜欢的代码风格  

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

csvstat 命令安装:


-bash/zsh: csvstat command not found

# Windows (WSL2)
sudo apt-get update sudo apt-get install csvkit

# Debian
apt-get install csvkit

# Ubuntu
apt-get install csvkit

# Kali Linux
apt-get install csvkit

# Fedora
dnf install python3-csvkit

# OS X
brew install csvkit

# Raspbian
apt-get install python3-csvkit

# Dockerfile
dockerfile.run/csvstat

csvstat 命令补充说明:


csvstat 命令可以打印 CSV 文件中所有列的描述性统计信息。将智能地确定每列的类型,然后打印与该类型相关的分析(日期的范围、整数的平均值和中位数等):

csvstat 命令语法:


csvstat [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
               [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-L LOCALE]
               [-S] [--blanks] [--null-value NULL_VALUES [NULL_VALUES ...]]
               [--date-format DATE_FORMAT] [--datetime-format DATETIME_FORMAT]
               [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--csv] [--json]
               [-i INDENT] [-n] [-c COLUMNS] [--type] [--nulls] [--non-nulls]
               [--unique] [--min] [--max] [--sum] [--mean] [--median]
               [--stdev] [--len] [--max-precision] [--freq]
               [--freq-count FREQ_COUNT] [--count]
               [--decimal-format DECIMAL_FORMAT] [-G] [-y SNIFF_LIMIT] [-I]
               [FILE]

csvstat 命令参数:


FILE            The CSV file to operate on. 
                If omitted, will accept input as piped data via STDIN.

csvstat 命令选项:


-h, --help            show this help message and exit
--csv                 Output results as a CSV table, rather than plain text.
--json                Output results as JSON text, rather than plain text.
-i INDENT, --indent INDENT
                      Indent the output JSON this many spaces. Disabled by
                      default.
-n, --names           Display column names and indices from the input CSV
                      and exit.
-c COLUMNS, --columns COLUMNS
                      A comma-separated list of column indices, names or
                      ranges to be examined, e.g. "1,id,3-5". Defaults to
                      all columns.
--type                Only output data type.
--nulls               Only output whether columns contains nulls.
--non-nulls           Only output counts of non-null values.
--unique              Only output counts of unique values.
--min                 Only output smallest values.
--max                 Only output largest values.
--sum                 Only output sums.
--mean                Only output means.
--median              Only output medians.
--stdev               Only output standard deviations.
--len                 Only output the length of the longest values.
--max-precision       Only output the most decimal places.
--freq                Only output lists of frequent values.
--freq-count FREQ_COUNT
                      The maximum number of frequent values to display.
--count               Only output total row count.
--decimal-format DECIMAL_FORMAT
                      %-format specification for printing decimal numbers.
                      Defaults to locale-specific formatting with "%.3f".
-G, --no-grouping-separator
                      Do not use grouping separators in decimal numbers.
-y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT
                      Limit CSV dialect sniffing to the specified number of
                      bytes. Specify "0" to disable sniffing entirely, or
                      "-1" to sniff the entire file.
-I, --no-inference    Disable type inference when parsing the input. Disable
                      reformatting of values.

csvstat 命令实例:


csv 显示所有列的所有统计数据:

csvstat data.csv

当传递统计数据名称给到 csvstat 时,只会打印该统计数据:

csvstat --min examples/realdata/FY09_EDU_Recipients_by_State.csv

  1. State Name: None
  2. State Abbreviate: None
  3. Code: 1
  4. Montgomery GI Bill-Active Duty: 435
  5. Montgomery GI Bill- Selective Reserve: 48
  6. Dependents' Educational Assistance: 118
  7. Reserve Educational Assistance Program: 60
  8. Post-Vietnam Era Veteran's Educational Assistance Program: 1
  9. TOTAL: 768
 10. j: None

csvstat 如果请求单个统计数据和单个列,则只会返回一个值:

csvstat -c 4 --mean examples/realdata/FY09_EDU_Recipients_by_State.csv
6,263.904

# Show all stats for columns 2 and 4:
csvstat -c 2,4 data.csv 

csvstat 显示所有列的总和:

csvstat --sum data.csv

csv 显示第 3 列的最大值长度:

csvstat -c 3 --len data.csv

csvstat 显示 name 列中唯一值的数量:

csvstat -c name --unique data.csv

csvstat 命令扩展阅读:




csvstat 命令评论

共收录到 512Linux 命令