A Quick Guide on Writing your Own tl;dr

Image for post
Image for post
Photo by Aaron Burden on Unsplash

Project Gutenberg offers over 60,000 full length books. Wikipedia contains over 55 million unique articles. Wattpad has over 400 million short stories. In the age of the internet, there is no shortage of literature to read.

These numbers, however, are completely overwhelming. A person could spend a lifetime attempting to read the entirety of the internet and never scratch more than a fraction of the surface.

The ocean of written material creates a paradoxical problem: because there’s an overabundance of information, finding relevant information becomes more difficult.

Automatically generating text summarizations may help the problem. Instead of leaving users to skim through large walls of text, presenting a brief summary provides the user with key pieces of information so they can make a more informed decision to continue reading without wasting the time to parse the text themselves. …


An introduction with a Python example

Image for post
Image for post
Photo by Jon Tyson on Unsplash

K-Nearest Neighbor Algorithm

K-Nearest Neighbor (KNN) is an easy to understand, but essential and broadly applicable supervised machine learning technique. To understand the intuition behind KNN, examine the scatterplot below. The plot shows the relationship between two arbitrary dimensions, x and y. The blue points represent members of group A and the orange points represent the members of group B. This will represent the training data for KNN.

Image for post
Image for post

Now suppose a new, unclassified data point is presented and plotted to the graph. The KNN would classify it based on the K nearest points (or, nearest neighbors), take a majority vote, and classify according. …


A Primer with a Real World Demonstration

Image for post
Image for post
Photo by Pineapple Supply Co. on Unsplash

Retailers have access to an unprecedented amount of shopper transactions. As shopping habits have become more electronic, records of every purchase are neatly stored in databases, ready to be read and analyzed. With such an arsenal of data at their disposal, they can uncover patterns of consumer behavior.

What is Market Basket Analysis?

A market basket analysis is a set of affinity calculations meant to determine which items sell together. For example, a grocery store may use market basket analysis to determine that consumers typically buy both hot dogs and hot dog buns together.

If you’ve ever gone onto an online retailer’s website, you’ve probably seen a recommendation on a product’s page phrased as “Customers who bought this item also bought” or “Customers buy these together”. More than likely, the online retailer performed some sort of market basket analysis to link the products together. …


How to Create an Electron-Like App with Python

Image for post
Image for post
Photo by Carl Heyerdahl on Unsplash

From web development to data science, Python offers an incredibly diverse set of tools. Its easy-to-read syntax and quick learning curve makes it a popular language but it lacks the diverse and beautiful GUI support of web technologies. Anyone who’s used Flask, a popular and lightweight web framework, has probably wondered if they could take the same principles and apply them to desktop app development. The temptation of combining an HTML and CSS frontend with a Python backend is alluring,

A few libraries attempt to achieve this, but lack the customization options and don’t have a large community. …


Writing a News Article Recommender to Reduce Polarization and Radicalization

Image for post
Image for post
Photo by Matthew Guay on Unsplash

The popular Netflix documentary, The Social Dilemma, outlines many of the fundamental problems of social media. The film explores how the technology used by many platforms don’t necessarily align to user interests.

In theory, social media companies can create data-driven systems to deliver interesting and engaging content to users. Normally, this is a feature meant to filter irrelevant content, but in practice, it ensures nobody is confronted with opposing views if they don’t want to see them.

As people become increasingly polarized by the content they see online, it’s imperative that social media find a balance between delivering engaging content and actively driving users away from reality. …


A Python library for making high quality static maps

Image for post
Image for post
Photo by Capturing the human heart. on Unsplash

With the age of big data comes the age of big geospatial data. Researchers increasingly have access to geographic information which can do anything from track the migration patterns of endangered species to map every donut shop in the country.

To help visualize this information, the python library Cartopy can create professional and publishable maps with only a few lines of code. Built with Matplotlib in mind, its syntax is familiar and easy to understand.

Simple Maps

To start, we’ll create the simplest possible world map. Before writing any heavy code, the Cartopy and Matplotlib libraries in python should be installed.

import cartopy.crs as crs
import cartopy.feature as cfeature
import matplotlib.pyplot as…


A Practical Application of Clustering in Creating Recommendations

Image for post
Image for post
Photo by Malte Wingen on Unsplash

Spotify presents no shortage of playlists to offer. On my home page right now, I see playlists for: Rap Caviar, Hot Country, Pump Pop, and many others that span all sorts of musical textures.

While many users enjoy going through songs and creating their own playlists based on their own tastes, I wanted to do something different. I used an unsupervised learning technique to find closely related music and create its own playlists.

The algorithm doesn’t need to classify every song nor does every playlist need to be perfect. …


How to Avoid Problems and Fix Your Data Analysis

Image for post
Image for post
Photo by Tra Nguyen on Unsplash

Linear regressions are among the most common and most powerful tools for data analysis. While other, more advanced forms of statistics have been developed over the years, linear regressions remain incredibly popular, because they’re easy to understand, interpret, and perform.

You can find regression implementations in nearly any programming language, analytical software, and even the standard TI-84 calculator. Its ubiquity allows math teachers to introduce it as early as middle school, meaning most people are at least familiar with it.

With the linear regression’s success, however, comes its misuse. …


Inventory Planning with Machine Learning

Image for post
Image for post
Photo by chuttersnap on Unsplash

What is ABC Analysis?

ABC analysis assumes that revenue-generating items in an inventory follow a Pareto distribution, where a very small percent of items generate the most amount of revenue. Using the following conventions, an item in inventory is assigned a letter based on importance:

  • A items are 20% of items, but contribute 70% of revenue
  • B items are 30% of items, but contribute 25% of revenue
  • C items are 50% of items, but contribute 5% of revenue

Keep in mind these numbers are rough and will vary significantly based on the actual distribution of sales. The key takeaway is that A items are a small percent of inventory, but contribute the most to revenue, C items are a large percent of inventory, but contribute the least to revenue, and B items fall somewhere in the middle. …


A basic tutorial using real life data

A person entering their credit card details for an online sale
A person entering their credit card details for an online sale
Photo by rupixen.com on Unsplash

The Data Set

In order to demonstrate a random forest regression, a data set of e-commerce sales from popular online retailer, Wish, will be used. The data comes from Kaggle and only features sales information on summer clothing. Among the attributes include product descriptions, rating, whether ad boosts were used, whether urgency text was added to the product listing, and the number of units sold, among others.

To show the power of the random forest regression, the number of units sold will be predicted. Making good, accurate predictions would be invaluable to not only inventory planners, who need to make estimates on how much product to order or produce, but also sales, who need to understand how product moves in an e-commerce setting. …

About

Andrew Udell

Data Science Enthusiast | Code Junkie | Lifelong Student https://www.linkedin.com/in/andrew-udell-108802140/

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store