linux – 如何比较多个文件之间的相同单词?

我想在多个文件中计算相同的单词,然后显示它们在哪个文件中.

文件1:

This is so beautiful

文件2:

There are so beautiful

文件3:

so beautiful

所需的输出1:

so:3
beautiful:3

所需的输出2:

so:
file1:1
file2:1
file3:1

beautiful:
file1:1
file2:1
file3:1
最佳答案
试试这个,

# Declare the files you want to include
files=( file* )

# Function to find common words in any number of files
wcomm() {
    # If no files provided, exit the function.
    [ $# -lt 1 ] && return 1
    # Extract words from first file
    local common_words=$(grep -o "\w*" "$1" | sort -u)
    while [ $# -gt 1 ]; do
        # shift $1 to next file
        shift
        # Extract words from next file
        local next_words=$(grep -o "\w*" "$1" | sort -u)
        # Get only words in common from $common_words and $next_words
        common_words=$(comm -12 <(echo "${common_words,,}") <(echo "${next_words,,}"))
    done
    # Output the words common to all input files
    echo "$common_words"
}

# Output number of matches for each of the common words in total and per file
for w in $(wcomm "${files[@]}"); do
    echo $w:$(grep -oiw "$w" "${files[@]}" | wc -l);
    for f in "${files[@]}"; do
        echo $f:$(grep -oiw "$w" "$f" | wc -l);
    done;
    echo;
done

输出:

beautiful:3
file1:1
file2:1
file3:1

so:3
file1:1
file2:1
file3:1

说明:

包含在脚本中的注释.

特征:

>与ARG_MAX允许的文件数量相同
>查找由grep理解为单词分隔符的所有单词.
>忽略大小写,所以“美丽”和“美丽”是同一个词.

转载注明原文:linux – 如何比较多个文件之间的相同单词? - 代码日志