Photo by vadim kaipov on Unsplash

One of the advantages of a decorator in Python is that it can make the usage of function be extended, but no need to modify the original functions.

Today let’s take an example to check it step by step.

Let’s say I have defined a lot of original functions as below:

The problem: After serval months, I want to print some words to all of the calculation functions (add, minus, and multiply) before and after the calculation to make the process more clear. What should I do?

Method1: Change the original functions: add the printed words to the original…


Understand basic FunkSVD and focus on validation cases

Photo by Solé Bicycles on Unsplash

Today I would like to include the below parts:

  1. FunkSVD: math, process, and code
  2. Based on 1, make predictions, validation and will give emphasis on the different cases in order to help to understand the validation process

Let’s get started.

FunkSVD: briefly intro

  1. FunkSVD is a kind of matrix factorization. The original algorithm proposed by Simon Funk in his blog post [3] factorized the user-item rating matrix as the product of two lower dimensional matrices[1].
  2. The predicted ratings can be computed as R=HW, where R is the user-item rating matrix, H contains the user’s latent factors and W is the…

How to convert SVD to k dimension and make the prediction

Photo by Georg-Johann from Wiki

We know that there are cons with SVD to make a prediction in reality, for example, it can’t predict if there is even a NaN in the dataset, but as this is the starting point of collaborative filtering and I want to reproduce the procedure with SVD to see how it works, with some compromisation of the dataset.

This story will focus on code realization with SVD, have no offline testing (no splitting of train test dataset), and include some basic linear algebra-related terminologies.

Linear algebra basic

SVD is singular value decomposition. Detail explanation can be found in Wiki. Here…


Content-based, weighted content-based, Numpy functions

Photo from Michal Matlon on Unsplash

Today I would like to discuss two examples for content-based recommendation systems and some efficient array functions I learn from them. The two examples are

1: Based on item content recommendation

2: Based on weighted content recommendation

I use a simple movie set as an example and would like to focus on the main process and ignore other processes and special cases. Let’s get started.

Datasets preparing:

Use the below codes to generate two datasets: movie_df and review_df

The two tables as:


np.setdiff1d, np.where and unstack etc.

Photo from Michal Matlon on Unsplash

In my previous story, some NumPy functions have been used in Recommendation System data processing. Because the emphasis is on the content-based recommendation system, it is a pity that these functions haven’t be displayed the efficient usage in detail there. Now in this story, I would like to explain them in detail.

For your information, for function 3,4,5, you might want to check the dataframe I used. It is here:

  1. np.setdiff1d(a1,a2,assume_unique=default)

This function finds the difference of two arrays and returns the unique values in a1 that are not in a2. It can be used to compare lists and…


squeeze(), unsqueeze(), tensor[None],max(),argmax(), and view()

Pic from github

When I started to learn PyTorch, I found that there are various functions which seem vague to understand for me. Today I would like to summarize them with examples, which I think helpful greatly.

The functions are :

  • squeeze(),
  • unsqueeze() and a[None],
  • max()
  • argmax()
  • view()

You can also check the explanations from the official website one by one, but summarizing them together helps me.

1. squeeze()

squeeze(i): it is kind of dimension reduction: if the original dimension is 1, then it can be reduced. Let’s check an example:


Focus on the backend, without frontend and authentication

Flask is the tool that can be used to create API server. It is a micro-framework, which means that its core functionality is kept simple, but there are numerous extensions to allow developers to add other functionality (such as authentication and database support).

Heroku is a cloud platform where developers host applications, databases, and other services in several languages. Developers can use Heroku to deploy, manage, and scale applications. Heroku is also free, with paid specialized memberships, and most services such as a database offer a free tier.

Remark:

This story will focus on application deployment and database interactive without…


Photo designed by vectorjuice / Freepik

Nowadays, with big data becomes reality, people now focus on how to use the data to realize commercial values. One area which is much more mature is how to picture the potential customer or predict the behavior of the customer, to target the market or customer more precisely.

Problem statement

Bertelsman Arvato Financial Solution provided a real-world challenge in Udacity. Arvato provided four demographics datasets. They are:

  1. Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns), named it as azdias.
  2. Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features…

Based on the survey of 2017 and 2020

Picture made by the author

As I am in a career transition phase, have worked in the industry related to coding, but without formal education in the area of computer science, I am wondering whether I can find some hits from the StackOverflow survey, which has the largest developer community.

After reviewing serval years of the survey, I find that the survey 2017 does give the questionary in the survey, but not include in the following years. But sometimes, the hints keep the same, so I choose the survey 2017. …


With two methods: For loop and Counter container

Photo by Jon Tyson on Unsplash

What the problem and target are:

In order to simplify the problem, I take out two columns from my working file which is a Stackoverflow yearly Survey file, as below:

Xue Wang

passionate about data analysis and data science

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store