Bigrams in SOWPODS

in visualization •  6 years ago 

"Bigrams" = two-letter combinations
"SOWPODS" = a Scrabble word list

(SOWPODS is the list I keep around for playing around with word algorithms or solving puzzles, because I could get it in electronic form. Probably but not certainly downloaded from here: https://github.com/jesstess/Scrabble/blob/master/scrabble/sowpods.txt I would be happy to see other word lists available for use.)

Visualized here: the count of words which contain both the specified letters, or, if doubled, that letter twice. For example, 36,788 of the words in the list contain two A's, while 1,420 contain B and Z. The version of SOWPODS I have has 267,751 words in total.


sowpods-bigrams.png
full size

Code fragment for visualization in pyplot, taken from this heatmap tutorial and this StackOverflow answer on plotting upper or lower triangles of a matrix.

import matplotlib.pyplot as plt
import matplotlib.cm as cm
import numpy as np

def plotBigrams():
    # an array [ bigram, count ]
    tlCounts = twoLetters()
    
    a = np.zeros( (26,26), int )    
    mask = np.tri( 26, 26, k=-1 ).transpose()
    for (bigram, count) in tlCounts:
        i = letters.index( bigram[0] )
        j = letters.index( bigram[1] )
        a[j,i] = count

    aa = np.ma.array( a, mask=mask )

    fig, ax = plt.subplots()
    c = cm.get_cmap( "viridis" )
    c.set_bad( 'w' )
    im = ax.imshow( aa, cmap = c, aspect=1.0/3.0 )
    ax.set_xticks( np.arange( 26 ) )
    ax.set_yticks( np.arange( 26 ) )
    ax.set_xticklabels( list( letters ) )
    ax.set_yticklabels( list( letters ) )

    for i in range(26):
        for j in range(26):
            text = ax.text(i, j, aa[j, i],
                           ha="center", va="center", color="w")
Authors get paid when people like you upvote their post.
If you enjoyed what you read here, create your account today and start earning FREE STEEM!
Sort Order:  

Hm, "bigram" usually means contiguous letters and that's not what I meant here (though that would be interesting too.) What I actually did was count whenever the two letters appeared in the same word.

Hello! Your post has been resteemed and upvoted by @ilovecoding because we love coding! Keep up good work! Consider upvoting this comment to support the @ilovecoding and increase your future rewards! ^_^ Steem On!

Reply !stop to disable the comment. Thanks!