Scheduling data feed processing jobs

Berg, Maarten van den

dc.rights.license	CC-BY-NC-ND
dc.contributor.advisor	Liu, Alison
dc.contributor.author	Berg, Maarten van den
dc.date.accessioned	2023-05-23T00:00:51Z
dc.date.available	2023-05-23T00:00:51Z
dc.date.issued	2023
dc.identifier.uri	https://studenttheses.uu.nl/handle/20.500.12932/43909
dc.description.abstract	Online advertising is an important strategy for companies that sell products on the internet to find customers. These companies often make use of an e-commerce system, which stores data on the products that the company sells. If a company's inventory changes often, it may be desirable to automate the process of creating and updating advertisements, to reduce the workload of keeping advertisements up to date. Channable is a company that provides a tool for automated creation of advertisements, based on the data in a company's e-commerce system. Channable offers a product feed processing system which connects to a company's e-commerce system and regularly downloads information on the company's inventory. Once this data has been downloaded the system can apply customer-defined processing rules to the data and convert the data to a format suitable for submission to one or more advertising platforms or marketplaces. The heavy computational lifting in this system is performed by rule processing servers, which accept inventory data and customer-defined processing rules and produce a datastream that has been processed according to the customer-defined rules. Channable uses multiple of these rule processing servers for redundancy and performance reasons, and so it must balance the workload between the servers. The current method for assigning work to the servers uses a distributed scheduler. This scheduler has some limitations which cause it to distribute the work unevenly between servers, causing some servers to regularly become overloaded while other servers sit idle. In this thesis we implement a better method for assigning work to the rule processing servers, by making use of a centralised scheduler. We compare two methods for detecting overloaded servers and one alternative algorithm for assigning work to servers to the current approach by performing experiments using real-world data to determine which scheduling approach performs best. We show that using our scheduler can significantly improve the performance of the rule processing system: our best-performing scheduling algorithm speeds up the average duration of low-priority jobs by a factor of 2.2 and reduces the average waiting time of low-priority jobs by a factor of 3.5. We are also able to significantly reduce the variance in waiting time between the different rule processing servers, making the product feed processing system's performance more predictable.
dc.description.sponsorship	Utrecht University
dc.language.iso	EN
dc.subject	This thesis describes a thesis project for an external company, Channable. In this project the performance of one of the company's distributed systems is improved by changing the method by which work is assigned to the servers in the system. The new work assignment method is evaluated by running experiments on a partial copy of the company's production environment.
dc.title	Scheduling data feed processing jobs
dc.type.content	Master Thesis
dc.rights.accessrights	Open Access
dc.subject.courseuu	Computing Science
dc.thesis.id	16855

Files in this item

Name:: Scheduling data feed processing ...
Size:: 425.1Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Theses

Show simple item record