Okxe Data Analyst 2022-09-01

Đề bài vòng 1 cho vị trí data analyst ở okxe. JD.

Tóm tắt yêu cầu


Hướng giải

Based on the sample data, I made these assumptions with the ER diagram as follows.

  • "users" table is connected to "sales" table by matching "id" column and "user_id" column from both table, respectively
  • a user can have 0 or many orders
  • "users" table is connected to "locations" table by matching "location_id" column and "id" column from both table, respectively
  • a user must be located at one location. A location can be registered by one or many users.
erDiagram USERS ||--o{ SALES : realized USERS { string first_name string last_name datetime registered_at integer location_id } SALES { integer user_id date sales_date integer product_id integer item_count decimal value } USERS }|..|| LOCATIONS : located_at LOCATIONS { string province string region }

Copy this gist into db-fiddle "Schema SQL" pane to create the sample data tables. Then choose "Database" as "MySQL v8.0". Start to fiddle the SQL.

Q1 (Basic)


Write a query to calculate the number of new registered users each month. Please order the result, so we can see the most recent months first.



  MONTH(users.registered_at) AS month, 
  COUNT(*) AS nb_registered_user
FROM  users
GROUP BY month

Reasoning: The COUNT(*) function is used with a GROUP BY clause to return the number of elements in each group (month).

Q2 (Intermediate)


Write a query to calculate the number of registered users who have the last name of “Nguyen” or “Pham” respectively. The expected output is as follows.



    UPPER(SUBSTRING(users.last_name FROM 1 FOR 1)), 
    LOWER(SUBSTRING(users.last_name FROM 2))
  ) AS last_name, 
  COUNT(*) AS user_count
FROM  users
  users.last_name LIKE 'nguyen' OR 
  users.last_name LIKE 'pham'
GROUP BY last_name
ORDER BY last_name ASC;


  • in mysql, lookup and name comparisons are not case-sensitive
  • We then just need to convert the "last_name" string to proper case.
  • By changing to uppercase the 1st substring derived from the "last_name" string, which contains only the 1st character. Then apply lowercase to the 2nd derived substring containing the rest. Finally, joined these 2 substrings back together and print out the results.

Q3 (Intermediate)


Write a query to calculate the number of transactions and the sales revenue (based on transaction value) that we obtained in the first quarter of 2022 from each province of Vietnam. The expected output is as follows.



  locations.province AS province_name,
  COUNT(sales.id) AS transaction_count,
  SUM(sales.value) AS sales_revenue
FROM sales
INNER JOIN users ON users.id = sales.user_id
INNER JOIN locations ON locations.id = users.location_id
  YEAR(sales.sales_date) = 2022 AND
  QUARTER(sales.sales_date) = 1
GROUP BY province_name;


  • the relationship between 3 tables (users, sales, locations) is visualized in the ER diagram
  • first, we join all 3 tables by inner join type
    • the inner join clause includes only matching rows from both tables "users" and "sales", using the value in the "user_id" column of "sales" table and the "id" column of "users" table to match
    • similarly for both tables "users" and "locations"
  • then we specify a search condition for the rows returned by the query, by limiting only the sales order made in the 1st quarter of 2022
  • lastly we define the 3 required columns as province_name, transaction_count and sales_revenue

Q4 (Advanced)


Write a query to find the product_id which has the second highest sales revenue in May 2022. The expected output is the product_id itself only.



  DISTINCT product_id
    SUM(sales.value) OVER (
      PARTITION BY sales.product_id
      ) totalSalesByProduct
  FROM sales
    YEAR(sales.sales_date) = 2022 AND
    MONTH(sales.sales_date) = 5
  ORDER BY totalSalesByProduct DESC
  ) SalesByProduct2022May


  • the general idea is creating a derived table 1 called "SalesByProduct2022May" that gets the sales revenue in May 2022 of each "product_id". Sort this "SalesByProduct2022May" table by DESC order of "totalSalesByProduct". Then print out the expected "product_id" using LIMIT clause.
  • in details
    • a subquery 2 that search for a sorted table, consisting of: product_id and "totalSalesByProduct" realized only in May 2022
        SUM(sales.value) OVER (
          PARTITION BY sales.product_id
          ) totalSalesByProduct
      FROM sales
        YEAR(sales.sales_date) = 2022 AND
        MONTH(sales.sales_date) = 5
      ORDER BY totalSalesByProduct DESC
    • The SUM() works as a window function 3 that operates on a set of rows defined by the contents of the OVER clause. The window function reports the total sales by "product_id" and output the result in each row, which made multiple duplicated rows for each "product_id" subquery-result
    • The table resulting from subquery is already sorted by "totalSalesByProduct". After having removed the duplicates (via the DISTINCT clause 4 in the main query), we use LIMIT clause 5 with offset value to take only the 1st row of the constrained rows. This is the "product_id" with 2nd-highest-sales in May 2022.


  1. https://www.mysqltutorial.org/mysql-derived-table/˄

  2. https://www.mysqltutorial.org/mysql-subquery/˄

  3. https://www.mysqltutorial.org/mysql-window-functions/˄

  4. https://www.mysqltutorial.org/mysql-distinct.aspx˄

  5. https://www.mysqltutorial.org/mysql-limit.aspx˄

Q5 (Advanced)


Given that data above, write a query to calculate the number of sessions that each user has and the average number of events (actions) per session by that user. Expected output looks like the following.

Q6 Business Question


Please propose your ideas on how to increase the MAU for OKXE. Your ideas should not be limited to data-related initiatives, but can also involve business programs or product changes.


To increase the MAU, we need to understand

  • why your customers use your app
  • what they’re looking to achieve

Paraphrasing into the marketing language, we need to know

  • What goals are my users achieving within my app and web properties
  • What steps are they taking to achieve those goals
  • Where do those steps fall in terms of a conversion funnel or customer journey
  • What are different ways of measuring active usage that are important to your marketing goals
    • why do we need different ways of measuring?
    • Because it’s impossible to create a single one-size-fits all definition of active, as several groups of users will have varying levels and types of engagement with your app.
    • Example of some common stats to track
      • Response to offers, deals, and promotions
      • Willingness to share feedback
      • Frequency of log-ins, by type of device and/or platform
      • Engagement with content (reads, reads to the end of a page, clicking links within text, pageviews per session, total time reading)
      • Sharing of content

Before proposing ideas on how to change, let's see what okxe is offering to its customer.

okxe is a dedicated marketplace platform for motorbike reselling. Current implemented features by okxe:

  • search by bike category, brand
  • have authorized/verified reselling partner program
  • have a whistle-blower and escrow service called "Trạm Dịch Vụ OKXE"
  • each product has clarified tags which describe the released year and the distance traveled by the vehicle
  • cross-sell financial services: installment plan, bike insurance
  • act as a market maker (via the collection program of used bikes), which create the liquidity, increase the transparency and reduce the hindrance of bike reselling market
  • integrated chat feature in the main app.

In my opinion, okxe is doing great with their current offer on business side. However, I see some feasible enhancements on its product's UX, which possibly help to retain users.

  • As a frequent user of many marketplace websites, what do I expect to see on the front page of this kind of website?
    • a big button (visually differentiated color palette) where I can click to propose my offer to sell
    • a big search box to buy, in which I can quickly
      • choose the category of bike
      • select the location of offers
      • set a dynamic price range
      • and see the total number of available product coming from my search query
    • see what are my most 3 recent search queries
  • On desktop version of website, there are so many wasted displaying spaces
    • a top ad-banner occupied 50% of the 1st loading frame
    • a news section listing article from blog is not necessary, and should not be promoted on the main website
  • as a privacy-focused user, I'm not comfortable sharing my current location with any apps. The search box should at least has a filter option for location. The current way of hiding all search filter options under a separate modal which requires one more click to open is not good enough.
  • I guess that the main website shares the code base with its mobile app. That may be the reason why the interactions become sluggish on both platforms (desktop and tablet/mobile). Long loading time when transitioning between products. Long loading time for product's photos. Scrolling is randomly lagged on mobile web version.

Within the data-related perspective, in order to increase the MAU, we should

  • audit how customers have used our products via reporting by sessions and frequency of events
  • know which are the risk of churning, for example via the trend line of number of actions per user per week
  • know which leads have higher probability of converting into customers, via defining a Lead Scores
  • know who are the power users of our product, again via the sessionized event-related data
  • increase the gross merchandise volume, which is the total sales of merchandise transacting through our marketplace in a specific period

  1. Q1
  2. Q2
  3. Q3
  4. Q4
  5. Q5
  6. Q6


  1. topic.data
  2. cat.tut