csvstat 命令详解

| 选择喜欢的代码风格  

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

csvstat 命令安装:


  1. -bash/zsh: csvstat command not found
  2.  
  3. # Windows (WSL2)
  4. sudo apt-get update sudo apt-get install csvkit
  5.  
  6. # Debian
  7. apt-get install csvkit
  8.  
  9. # Ubuntu
  10. apt-get install csvkit
  11.  
  12. # Kali Linux
  13. apt-get install csvkit
  14.  
  15. # Fedora
  16. dnf install python3-csvkit
  17.  
  18. # OS X
  19. brew install csvkit
  20.  
  21. # Raspbian
  22. apt-get install python3-csvkit
  23.  
  24. # Dockerfile
  25. dockerfile.run/csvstat

csvstat 命令补充说明:


csvstat 命令可以打印 CSV 文件中所有列的描述性统计信息。将智能地确定每列的类型,然后打印与该类型相关的分析(日期的范围、整数的平均值和中位数等):

csvstat 命令语法:


  1. csvstat [-h] [-d DELIMITER] [-t] [-q QUOTECHAR] [-u {0,1,2,3}] [-b]
  2. [-p ESCAPECHAR] [-z FIELD_SIZE_LIMIT] [-e ENCODING] [-L LOCALE]
  3. [-S] [--blanks] [--null-value NULL_VALUES [NULL_VALUES ...]]
  4. [--date-format DATE_FORMAT] [--datetime-format DATETIME_FORMAT]
  5. [-H] [-K SKIP_LINES] [-v] [-l] [--zero] [-V] [--csv] [--json]
  6. [-i INDENT] [-n] [-c COLUMNS] [--type] [--nulls] [--non-nulls]
  7. [--unique] [--min] [--max] [--sum] [--mean] [--median]
  8. [--stdev] [--len] [--max-precision] [--freq]
  9. [--freq-count FREQ_COUNT] [--count]
  10. [--decimal-format DECIMAL_FORMAT] [-G] [-y SNIFF_LIMIT] [-I]
  11. [FILE]

csvstat 命令参数:


  1. FILE The CSV file to operate on.
  2. If omitted, will accept input as piped data via STDIN.

csvstat 命令选项:


  1. -h, --help show this help message and exit
  2. --csv Output results as a CSV table, rather than plain text.
  3. --json Output results as JSON text, rather than plain text.
  4. -i INDENT, --indent INDENT
  5. Indent the output JSON this many spaces. Disabled by
  6. default.
  7. -n, --names Display column names and indices from the input CSV
  8. and exit.
  9. -c COLUMNS, --columns COLUMNS
  10. A comma-separated list of column indices, names or
  11. ranges to be examined, e.g. "1,id,3-5". Defaults to
  12. all columns.
  13. --type Only output data type.
  14. --nulls Only output whether columns contains nulls.
  15. --non-nulls Only output counts of non-null values.
  16. --unique Only output counts of unique values.
  17. --min Only output smallest values.
  18. --max Only output largest values.
  19. --sum Only output sums.
  20. --mean Only output means.
  21. --median Only output medians.
  22. --stdev Only output standard deviations.
  23. --len Only output the length of the longest values.
  24. --max-precision Only output the most decimal places.
  25. --freq Only output lists of frequent values.
  26. --freq-count FREQ_COUNT
  27. The maximum number of frequent values to display.
  28. --count Only output total row count.
  29. --decimal-format DECIMAL_FORMAT
  30. %-format specification for printing decimal numbers.
  31. Defaults to locale-specific formatting with "%.3f".
  32. -G, --no-grouping-separator
  33. Do not use grouping separators in decimal numbers.
  34. -y SNIFF_LIMIT, --snifflimit SNIFF_LIMIT
  35. Limit CSV dialect sniffing to the specified number of
  36. bytes. Specify "0" to disable sniffing entirely, or
  37. "-1" to sniff the entire file.
  38. -I, --no-inference Disable type inference when parsing the input. Disable
  39. reformatting of values.

csvstat 命令实例:


csv 显示所有列的所有统计数据:

  1. csvstat data.csv

当传递统计数据名称给到 csvstat 时,只会打印该统计数据:

  1. csvstat --min examples/realdata/FY09_EDU_Recipients_by_State.csv
  2.  
  3. 1. State Name: None
  4. 2. State Abbreviate: None
  5. 3. Code: 1
  6. 4. Montgomery GI Bill-Active Duty: 435
  7. 5. Montgomery GI Bill- Selective Reserve: 48
  8. 6. Dependents' Educational Assistance: 118
  9. 7. Reserve Educational Assistance Program: 60
  10. 8. Post-Vietnam Era Veteran's Educational Assistance Program: 1
  11. 9. TOTAL: 768
  12. 10. j: None

csvstat 如果请求单个统计数据和单个列,则只会返回一个值:

  1. csvstat -c 4 --mean examples/realdata/FY09_EDU_Recipients_by_State.csv
  2. 6,263.904
  3.  
  4. # Show all stats for columns 2 and 4:
  5. csvstat -c 2,4 data.csv

csvstat 显示所有列的总和:

  1. csvstat --sum data.csv

csv 显示第 3 列的最大值长度:

  1. csvstat -c 3 --len data.csv

csvstat 显示 name 列中唯一值的数量:

  1. csvstat -c name --unique data.csv

csvstat 命令扩展阅读:




csvstat 命令评论

共收录到 521Linux 命令