We consider the bilinear bandit problem where the learner chooses a pair of arms, each from two different action spaces of dimension d1 and d2, respectively. The learner then receives a reward whose expectation is a bilinear function of the two chosen arms with an unknown matrix param- eter Θ∗ ∈ Rd1×d2 with rank r. Despite abundant applications such as drug discovery, the optimal regret rate is unknown for this problem, though it was conjectured to be O ̃(d1d2(d1 + d2)rT ) by Jun et al. (2019) where O ̃ ignores polylogarith- mic factors in T . In this paper, we make progress towards closing the gap between the upper and lower bound on the optimal regret. First, we reject the conjecture above by proposing algorithms that achieve the regret O( d1d2(d1 + d2)T ) using the fact that the action space dimension O(d1+d2) is significantly lower than the matrix parameter di- mension O(d1d2). Second, we additionally devise an algorithm with better empirical performance than previous algorithms.