Movie recommendation system

Updated: Jul 4, 2021

Day by Day collection of data is being on a very large scale to handle it or to deal with it we use data science and machine learning algorithm for creating model . In todays world every thing is revolved around recommendation from finding good hotel based on review to movie selection based on user review and interest .In the market soo many numbers of recommendation system have been developed and implemented and they are used as for movies - Netflix amazon prime , for books - amazon and many more . Recommendation system helps user to generate recommendation of there choice and also takes feedback from user and stores in recommendation database to generate new recommendation for the new user . There are basically three types of approach of recommendation system.

In Content based filtering we do selection based on the interest of the user and on that bases we recommend the next types of item to the user .In Collaborative based filtering we do selection by making group or cluster of same type of user for recommendation . Hybrid based filtering is the combination of both content based filtering and collaborative based filtering.

2. Related works - Approaches for recommendation system -

1 Content based filtering

2.Collaborative based filtering

3.Hybrid based filtering

2.1 Content based filtering - In content based filtering recommendation is based on the user specific by knowing their likes and dislike. In another word Content based filtering is based on the content of the user from its past experience and while exploring the item .This type of filtering is more accurate and easy because we have some resources from where we can extract the information for the recommendation ,one of the best example are TMBD and IMBD .Lets take an example there is user 1 , there are some characteristics of movie like it is action, romantic , adventurous and comedy then if user 1 see movie 1(action) and rated as 5, and movie 2(adventurous) and rated as 4 then recommendation system will learn with user past experience and recommend movie 3 (adventurous or action) with closes rating of movie 1 and movie 2.

The distance between the two vector( two movies) is computed with the help of

1. Cosine similarity

2. Euclidian distance

3. Pearson's correlation

2.2 collaborative based filtering - In collaborative based filtering recommendation is done on the bases of behaviour of the user , in another word it try to find out behaviour of the different types of user and then it recommend to new user .In this profiling is done by making clusters of user of same type of behaviour .lets take a example there are 2 users , user 1 see movie 1 which has rated 5 and movie 2 which has rated 3 and user 2 see movie 1 which rated 4 and movie 2 rated 3 recommendation system will make the cluster of both the user because it has same type of taste soo when a new user comes of this type behaviour that new user will be add in this cluster and recommendation of movie will be as similar to user 1 and user 2 or else we can say if user 1 and user 2 want to see movie 3 then the recommended movie will be same for both because they have same taste and behaviour .

some of the problems which this filtering face are -

1.Cold star - this problem is for new user because there is not enough data to have recommendation for user.

2.Scalability - some user give few ratings soo based on that user wont have that accuracy of recommendation.

2.3 Hybrid based filtering - In Hybrid based filtering the recommendation system is not purely content based nor puerly collaborative filtering it is combination of both the filtering . It can be implemented in several ways - by making collaborative and content based separately and then combine them or by giving imputation power of collaborative to content based (or vice versa) .It the is top most used approach for recommendation system because it over comes the problems of collaborative based filtering.

Most of the recommendation system are based on the content based or collaborative based filtering , in this if i talk about content based, recommendation is based on user itself and when i see collaborative based , recommendation are based on geners and rating provided by the user soo for the new user it is difficult to give recommendation (as it is not added in that cluster where other user are with same behaviour and taste).To make our result more accurate and effective we over come this drawback by using Hybrid based filtering which will lead us to best accuracy because it is combination of both the above filtering .some of the previous approaches lake accuracy because they are totally depend upon the rating and genres given by user but we are combing both the filtering soo if some time cold star or scalabilty happens content based will handle it and vise versa. our approach(recommendation system) gives the best result and accurate for the search of movies and similar to it for recommendation.

Cosine Similarity Easy Explain -

To find the distance between two points(movie) that how much they are similar to each other on the basis of their different properties i have used cosine similarity. It is used to measure the distance or similarity between the two non zero vectors which are projected on multidimensional space .In recommendation system its very helpful and important as it find the similarity among the objects. cosine similarity always ranges from 0 to 1 it cant be negative because frequency (TF-IDF) can not be negative. The formula to compute the angle theta which is cosine similarity between two non zero vector is - see figure 2

Now how the range of cosine similarity is between 0 to 1 lets have look on that see fig 1. if lets say the value of theta is 45 in that so value of cosine similarity is 0.7 it means that distance is less between the two vector soo they are good related or similar to each other ,now lets take theta as 90 degree soo putting in formula we got cosine similarity as 0 it means that they have no similarity between them and if i take theta as 0 degree than cosine similarity is 1 it means that that distance between them is very less soo both the point or vector are highly correlated or similar to each other. After doing cosine similarity i have done normalization to scale down and then i have applied Naive Bayes classifier which will classify the cluster on the basis of their behaviour as well as rating and geners and give output with accuracy and in efficient manner.

Fig 1.

Fig 2.

I have demonstrated the modeling of a movie recommendation system with principle of cosine similarity which is the best and most accurate than other distance metrics and i also used Hybrid based filtering. I have implemented Naive Bayes algorithm for recommendation and accuracy.

Your feedback is appreciated!

Did you find this Blog helpful? Any suggestions for improvement? Please let me know by filling the contact us form or ping me on LinkedIn .


318 views1 comment

Recent Posts

See All