Analysis Scripts | Meaning Extraction Helper

On this page, I have links to a handful of scripts that I have written that might make your life easier. Many of these scripts are linked elsewhere on this website, but I wanted to make them all easier to find. All scripts on this page are completely free for you to use however you want. Below are links / brief descriptions of each script.

If you are doing the Meaning Extraction Method, you can use the “binary”, “verbose”, or “raw count” outputs for your Principal Component Analysis. I have written an R script for running the PCA, which you are free to use — the script is located here.

Once you have completed your PCA, you may find that you would like to format your results/loadings in a somewhat more readable format — one that shows you the words corresponding to each component rather than the loadings themselves. This view of a PCA result would be analogous to how a lot of classic Latent Dirichlet Allocation output tables would be formatted. If this approach is more up your alley, you might want to check out the “PCA Results Churner” scripts, kindly authored and shared here by John Henry Cruz. These scripts are available for download in a vanilla python format, as well as in Jupyter Notebook format.

The “raw count” document term matrix is something that you can use to run Latent Dirichlet Allocation, in addition to other types of analyses. If you are new to LDA, or you simply need an R script that makes LDA easy to use with MEH’s raw count output, I have written one that you may freely use. It can be downloaded here.

If you would like to get texts from your spreadsheets into separate .txt files prior to analysis with MEH, I have made a few different scripts available at https://github.com/ryanboyd/CSVtoTXTscripts. With some modifications, you can do things like aggregate your texts into specific files, etc. — essentially, aggregating texts as you desire prior to analysis with MEH.

If you want to compare word frequencies / likelihoods from 2 different corpora, I have written a script that calculates all of the same same indices that are found on Paul Rayson’s extremely helpful page. Essentially, you can get 2 separate frequency lists from MEH (1 for each corpus), then apply this script.